THE EFFECT OF FEATURE SELECTION METHODS ON MACHINE LEARNING MODEL PERFORMANCE: A COMPARATIVE STUDY FOR BREAST CANCER PREDICTION
Keywords:
Breast Cancer, Machine Learning, Feature Selection, Breast Cancer Diagnostic DatasetAbstract
Developing countries often face a high incidence of breast cancer, making early detection vital for effective treatment. The risk of developing breast cancer can be evaluated using machine learning methods and regular diagnostic data. In cancer datasets, there is a wealth of patient information, but not all of it is valuable for predicting cancer. This highlights the significance of feature selection methods in uncovering the relevant data. In this field, many studies have attempted to predict the different types of breast tumours, since it is important to diagnose breast cancer medication accurately. This paper aims to perform a comparison such that to show the effect of different feature selection methods on the accuracy of various existing machine learning algorithms. The study focuses on seven machine learning algorithms: K-Nearest Neighbors (KNN), Naive Bayes (NB), Decision Trees (DT), Support Vector Machines (SVM), Logistic Regression (LR), Neural Network (NN), and Random Forest (RF). The feature selection techniques examined include F-test Feature Selection, Mutual Information (MI), and Spearman Correlation Coefficient. The dataset used for the experiments is the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, which is publicly available from the UCI Repository. The findings reveal that when feature selection is implemented, the LR and NN algorithms demonstrate superior accuracy and perform exceptionally well across other metrics compared to the other models
References
REFERENCES
Abunasser, B. S., AL-Hiealy, M. R. J., Zaqout, I. S., & Abu-Naser, S. S. (2023). Convolution Neural Network for Breast Cancer Detection and Classification Using Deep Learning. Asian Pacific Journal of Cancer Prevention, 24(2), 531–544. DOI: 10.31557/APJCP.2023.24.2.531
Ak, M. F. (2020). A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. Healthcare (Switzerland),8(2).DOI:10.3390/healthcare8020111
Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn, J. M. W., & Moni, M. A. (2021). Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison. Computers in Biology and Medicine, 136, 104672. DOI: 10.1016/j.compbiomed.2021.104672
Ara, S., Das, A., & Dey, A. (2021). Malignant and Benign Breast Cancer Classification using Machine Learning Algorithms. 2021 International Conference on Artificial Intelligence, ICAI 2021, 97–101. DOI: 10.1109/ICAI52203.2021.9445249
Botlagunta, M., Botlagunta, M. D., Myneni, M. B., Lakshmi, D., Nayyar, A., Gullapalli, J. S., & Shah, M. A. (2023). Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Scientific Reports, 13(1). DOI: 10.1038/s41598-023-27548-w
Bray, F., Laversanne, M., Sung, H., Ferlay, J., Siegel, R. L., Soerjomataram, I., & Jemal, A. (2024). Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians,74(3),229263.DOI:10.3322/caac.21834
Chaurasia, V., & Pal, S. (2020). Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer. SN Computer Science, 1(5). DOI: 10.1007/s42979-020-00296-8
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. In Journal of Artificial Intelligence Research (Vol. 16).
Chen, H., Wang, N., Du, X., Mei, K., Zhou, Y., & Cai, G. (2023). Classification Prediction of Breast Cancer Based on Machine Learning. Computational Intelligence and Neuroscience, 2023, 1–9. DOI: 10.1155/2023/6530719
Dhal, P., & Azad, C. (2022). A comprehensive survey on feature selection in the various fields of machine learning. Applied Intelligence, 52(4), 4543–4581. DOI: 10.1007/s10489-021-02550-9
Dhanya R, I. R. P. S. S. A. M. S. and J. J. N. (2019). A Comparative Study for Breast Cancer Prediction using Machine Learning and Feature Selection.
Ebrahim, M., Sedky, A. A. H., & Mesbah, S. (2023). Accuracy Assessment of Machine Learning Algorithms Used to Predict Breast Cancer. Data, 8(2). DOI: https://doi.org/10.3390/data8020035
Hauke, J., & Kossowski, T. (2011). Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae,30(2),8793.DOI:10.2478/v10117-011-0021-1
Hossin, M. M., Javed Mehedi Shamrat, F. M., Bhuiyan, M. R., Hira, R. A., Khan, T., & Molla, S. (2023). Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset. Bulletin of Electrical Engineering and Informatics, 12(4), 2446–2456. DOI: 10.11591/eei.v12i4.4448
Islam, M. M., Haque, M. R., Iqbal, H., Hasan, M. M., Hasan, M., & Kabir, M. N. (2020). Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques. SN Computer Science, 1(5). DOI: 10.1007/s42979-020-00305-w
Kadhim, R. R., & Kamil, M. Y. (2023). Comparison of machine learning models for breast cancer diagnosis. IAES International Journal of Artificial Intelligence,12(1),415421.DOI:10.11591/ijai.v12.i1.pp415-421
Kumar, S., & Singh, M. (2021). Breast Cancer Detection Based on Feature Selection Using Enhanced Grey Wolf Optimizer and Support Vector Machine Algorithms. Vietnam Journal of Computer Science,8(2),177197.DOI:10.1142/S219688882150007X
Laghmati, S., Hamida, S., Hicham, K., Cherradi, B., & Tmiri, A. (2024). An improved breast cancer disease prediction system using ML and PCA. Multimedia Tools and Applications, 83(11), 33785–33821. DOI: 10.1007/s11042-023-16874-w
Lappeenranta-. (2023). BREAST CANCER DIAGNOSTIC USING MACHINE LEARNING Applying Supervised Learning Techniques to Coimbra and Wisconsin Datasets.
Lichtenwalter, R. N., Lussier, J. T., & Chawla, N. V. (2010). New perspectives and methods in link prediction. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 243–252. DOI: 10.1145/1835804.1835837
Mahesh, B. (2020). Machine Learning Algorithms - A Review. International Journal of Science and Research (IJSR), 9(1), 381–386. DOI: 10.21275/art20203995
Mohammed, S. A., Darrab, S., Noaman, S. A., & Saake, G. (2020). Analysis of breast cancer detection using different machine learning techniques. Communications in Computer and Information Science, 1234 CCIS, 108–117. DOI: 10.1007/978-981-15-7205-0_10
Naji, M. A., Filali, S. El, Aarika, K., Benlahmar, E. H., Abdelouhahid, R. A., & Debauche, O. (2021). Machine Learning Algorithms for Breast Cancer Prediction and Diagnosis. Procedia Computer Science,191,487492.DOI:10.1016/j.procs.2021.07.062
Nemade, V., & Fegade, V. (2022). Machine Learning Techniques for Breast Cancer Prediction. Procedia Computer Science, 218, 1314–1320. DOI: 10.1016/j.procs.2023.01.110
Patro, S. P., Nayak, G. S., & Padhy, N. (2021). Heart disease prediction by using novel optimization algorithm: A supervised learning prospective. Informatics in Medicine Unlocked, 26, 100696. DOI: 10.1016/j.imu.2021.100696
Sakib, S., Yasmin, N., Tanzeem, A. K., Shorna, F., Md. Hasib, K., & Alam, S. B. (2022). Breast Cancer Detection and Classification: A Comparative Analysis Using Machine Learning Algorithms. Lecture Notes in Electrical Engineering, 844, 703–717. DOI: 10.1007/978-981-16-8862-1_46
Shiny Irene, D., Sethukarasi, T., & Vadivelan, N. (2020). Heart disease prediction using hybrid fuzzy K-medoids attribute weighting method with DBN-KELM based regression model. Medical Hypotheses,143(March),110072.DOI:10.1016/j.mehy.2020.110072
Uddin, K. M. M., Biswas, N., Rikta, S. T., & Dey, S. K. (2023). Machine learning-based diagnosis of breast cancer utilizing feature optimization technique. Computer Methods and Programs in Biomedicine Update,3.DOI: 10.1016/j.cmpbup.2023.100098
Vergara, J. R., & Estévez, P. A. (2015). A Review of Feature Selection Methods Based on Mutual Information. DOI: 10.1007/s00521-013-1368-0
Wolberg, W. M. S. and S. (1995). Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository. DOI: 10.24432/C5DW2B
Yadav, R. K., Singh, P., & Kashtriya, P. (2022). Diagnosis of Breast Cancer using Machine Learning Techniques -A Survey. Procedia Computer Science,218,14341443.DOI:10.1016/j.procs.2023.01.122
Zhou, S., Hu, C., Wei, S., & Yan, X. (2024). Breast Cancer Prediction Based on Multiple Machine Learning Algorithms. Technology in Cancer Research and Treatment, 23. DOI: 10.1177/15330338241234791
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Diman Siddiq Hassan

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work, with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online.