Site Loader
Rock Street, San Francisco

AbstractIn today’s world Breast cancer is one of the major problem faced by women . Identifying cancer is the primitive  stage and is still challenging. The diagnosis and treatment of the breast cancer have become an urgent. Breast cancer, is widely seen tumor  in Indian women . Early treatment of breast cancer have become an extremely crucial  work to do, not only helps to cure cancer but also helps in curative of its occurence.  Today , there are different  kinds of methods and data mining techniques and various process like knowledge discovery  are developed for predicting the breast cancer.  As per the study , we perform a comparison of different classification and clustering algorithms.  Various classification algorithms and the clustering algorithm are used. The result indicate that the classification algorithms are better predictors than the clustering algorithms. IntroductionNow-a-days breast cancer is common in women. Predicting breast cancer is as important as its treatment. Breast cancer is the most common cause of death among women. If breast cancer predicted at its earlier stages,better treatment can be provided which enable the person to survive.Diagnosis and treatment of breast cancer has become an urgent work to perform.Different datamining methods are used to retrieve valuable information from large databases inorder to make decisions to provide better health services.Breast cancer begins with the abnormal growth of some breast cells. These cells divide more rapidly and continue to accumulate than healthy cells do, forming a lump or mass. These cells may grow through your breast to your lymph nodes or to other parts of your body.Breast cancer varies on the basis ofage groups, it is less common at a young age (i.e., in their thirties), younger women lean to have more aggressive breast cancers than older women.In this paper we perform comparison on different classification as well as clustering algorithm to predict breast cancer. A number of attributes are used in performing comparison. These attributes are compared to find the best classification algorithm.Literature surveyIn paper 1, three different data mining classification methods are used for the prediction of breast cancer. It considers different parameters for prediction of cancer. But for superior prediction, focus is on accuracy and lowest computing time. Studies filtered all algorithms based on lowest computing time and accuracy and it  came up with the conclusion that Naïve Bayes is a superior algorithm compared to decision tree and k-nearest neighbor, because it takes lowest time i.e. 0.02 seconds and at the same time is providing highest accuracy. In 2 paper, WPBC dataset is used for finding an efficient predictor algorithm to predict the recurring or non-recurring nature of disease. This helps Oncologists to differentiate a good prognosis (non-recurrent) from a bad one (recurrent) and can treat the patients more effectively. Eight popular data mining methods have been used, four from clustering algorithms (Kmeans ,EM, PAM and Fuzzy c-means) and four from classification algorithms (SVM, C5.0, KNN and Naive Bayes).The results of these algorithms are clearly outlined in this paper with necessary results. The classification algorithms, C5.0 and SVM have shown 81% accuracy in classifying there occurrence of the disease. This is found to be best among all. On the other hand, EM was found to be the most promising clustering algorithm with the accuracy of 68%. The research shows that the classification algorithms are better predictor than clustering algorithms. The impact factors of various parameters responsible for predicting the occurrence/non-occurrence of the disease can be verified clinically. Further, the identified critical parametersshould be verified by applying on larger medical dataset topredict the recurrence of the disease in future.In paper 3, they intend to build a diagnostic model for breast cancer which is to search the relationship between breast cancer and its symptoms. A feature selection method, INTERACT, is applied to select related and important features in order to improve the accuracy of the diagnostic model. And, SVM is applied to build the classification model. Two diagnostic models are built with and without feature selection for the sake of proving the significance of the feature selection. Through the experiments, the accuracy of the diagnostic model with feature selection is improved obviously compared with the model without feature selection. Meantime, nine features are chosen out as the relevant factors for building the diagnostic model. The information found out in this study can be supplementary information for related practitioner better diagnosing heart disease.In paper 4it focus on the importance of feature selection in breast cancer prognosis. Using proper attribute selection technique, any classification algorithm can be improved significantly. Attributes with less contribution in dataset often misguides the classification and results in poor prediction. In this work, they found Support Vector Machine giving much better output both before and after attribute selection. Area under ROC curve analysis showed results in favor, where Naïve Bayes and Decision Tree showed much better improvement after feature selection method. In this paper we only focused on whether breast cancer is recursive or not. In addition of this work, they try to predict the time of recurrence of cancer which is classified as recursive. Paper 5 presented a survey of classification simulations which can be used for breast cancer detection using WEKA tool. A discussion on a variety of classification techniques that already exist in real world and the performance accuracy is listed from that. By using that we can decide which algorithm is best for the WEKA tool for breast cancer detection. It compares different algorithmsand found SVM is better having high accuracy and expectation maximization with the least accuracy.In paper 6 paper presented a survey of classification simulations which can be used for breast cancer detection using WEKA tool. A variety of classification techniques that already exist in real world are discussed. By using that we can decide which algorithm is best for the WEKA tool for breast cancer detection. Classification AlgorithmsClustering AlgorithmsAlgorithmsConfusion MatrixAccuracyAlgorithmsConfusion MatrixAccuracyC5.0N         RN      47        0      R        11       0 0.8103 K-Means              N         RN          100      48R           23       23 0.6340 KNNN         RN      47        0      R        11       0 0.7068 EM              N         RN          117      31R           31       15 0.6804Naïve BayesN         RN      47        0      R        11       0 0.5344PAM            N         RN            64      84 R            29     17 0.4175 SVMN         RN      47        0      R        11       0 0.8103Fuzzy c-Means             N         R N          50       98 R           24      22 0.3711Table :comparison of clustering and classification algorithms2 Accuracy= (TP+TN)/(TP+TN+FP+FN)TP: True PositiveTN: True NegativeFP: False PositiveFN: False Negative Conclusion From the above comparisons we came up with a conclusion that the classification algorithms works better than the clustering algorithms in predicting breast cancer. Andin the classification algorithms the SVM and C5.0 came up with better performance. The best algorithm for predicting breast cancer is purely based on the accuracy of the algorithm. Reference1 Chintan Shah; Anjali G. Jivani “Comparison of data mining classification algorithms for breast cancer prediction”2  Uma Ojha; Savita Goel “A study on prediction of breast cancer recurrence using data mining techniques” 2017 7th International Conference on Cloud Computing, Data Science & Engineering – Confluence3 Runjie ShenYuanyuan Yang Fengfeng Shao “Intelligent Breast Cancer Prediction Model Using Data Mining Techniques”4 Ahmed Iqbal Pritom; Md. Ahadur Rahman Munshi; ShahedAnzarusSabab;Shihabuzzaman Shihab.”Predicting breast cancer recurrence using effective classification and feature selection technique”5S.Padmapriya, M.Devika,V.Meena, S.B.Dheebikaa.Vinodhini , ” Survey on Breast Cancer Detection Using Weka Tool”6  Jahanvi Joshi,  RinalDoshi,  Jigar Patel, Ph.D,” Diagnosis of Breast Cancer using Clustering Data Mining Approach”

Post Author: admin


I'm Dora!

Would you like to get a custom essay? How about receiving a customized one?

Check it out