HYBRID TECHNIQUE FOR SOFTWARE DEFECT PREDICTION USING MACHINE LEARNING TECHNIQUES

Authors

  • Muhammad Jumare Haruna Department of Computer Science, Federal University of Education, Zaria, Nigeria
  • Darius T Chinyio Department of Computer Science, Nigerian Defence Academy, Kaduna, Nigeria
  • Martin E Irhebhude Department of Computer Science, Nigerian Defence Academy, Kaduna, Nigeria

DOI:

https://doi.org/10.25271/sjuoz.2025.13.4.1532

Keywords:

Software Defect Prediction, Convolutional Neural Networks (CNN), Climate Forecasting, Deep Learning, CNN-LSTM Hybrid, Attention Mechanism, and Multi-Task Learning, KEYWORDS: Financial fraud detection, Convolutional Neural Networks, deep learning, credit card fraud, anomaly detection, imbalanced data, machine learning, CNN for tabular data, fraud analytics, classification., Deep learning, Hybrid Technique, XGboost

Abstract

Human errors during software development lead to many defects, which emphasizes the importance of early detection and minimization. However, existing approaches often fall short in delivering accurate, scalable, and generalizable predictions due to challenges such as class imbalance, feature extraction limitations, and computational inefficiencies. This study proposes a hybrid method using a Convolutional Neural Network (CNNs) + Long Short-Term Memory (LSTM) for feature extraction, addressing class imbalance with Adaptive Synthetic Sampling (ADASYN) and subsequent training using Extreme Gradient Boosting (XGboost), to predict software defects. The proposed approach was evaluated on five publicly available datasets (CM1, MC1, KC1, PC1, and PC4) and compared with state-of-the-art (SOTA) models. Experimental results demonstrated that the hybrid model significantly outperforms traditional XGBoost-based models in terms of recall, F1-score, and area under the receiver operating characteristic curve (AUC), addressing the shortcomings of existing methods. Results demonstrate the effectiveness of the proposed method, with notable performance metrics achieved across all datasets. For example, on the MC1 dataset, the model attained an accuracy of 0.9980, a precision of 0.9971, a recall of 0.9988, an F1-score of 0.9980, and an AUC-ROC of 0.9999. On the KC1 dataset, it achieved an accuracy of 0.9344, a precision of 0.9265, a recall of 0.9375, an F1-score of 0.9320, and an AUC-ROC of 0.9839. The model achieves better performance than traditional machine learning methods and separate deep learning models, especially in the areas of recall and AUC-ROC. This research presents a robust solution through hybrid approaches that address class imbalance and maintain high predictive accuracy for software development process tasks, offering insights into the trade-offs between machine learning and deep learning methods.

Downloads

Download data is not yet available.

References

Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., & Gandomi, A. H. (2023). Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges. Artificial Intelligence Review, 56(11), 13521–13617. https://doi.org/10.1007/s10462-023-10409-2

Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., & Gandomi, A. H. (2023). Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges. Artificial Intelligence Review, 56(11), 13521–13617. https://doi.org/10.1007/s10462-023-10409-2

AL-Hadidi, T. N., & Hasoon, S. O. (2024). Software Defect Prediction Using Extreme Gradient Boosting (XGBoost) with Optimization Hyperparameter. Al-Rafidain Journal of Computer Sciences and Mathematics (RJCM), 18(1), 22–29. https://doi.org /10.33899/csmj.2023.142739.1081

AL-Hadidi, T. N., & Hasoon, S. O. (2024). Software Defect Prediction Using Extreme Gradient Boosting (XGBoost) with Optimization Hyperparameter. Al-Rafidain Journal of Computer Sciences and Mathematics (RJCM), 18(1), 22–29. https://doi.org /10.33899/csmj.2023.142739.1081

Ali, M., Mazhar, T., Arif, Y., Al-Otaibi, S., Ghadi, Y. Y., Shahzad, T., Khan, M. A., & Hamam, H. (2024). Software Defect Prediction Using an Intelligent Ensemble-Based Model. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3358201

Ali, M., Mazhar, T., Arif, Y., Al-Otaibi, S., Ghadi, Y. Y., Shahzad, T., Khan, M. A., & Hamam, H. (2024). Software Defect Prediction Using an Intelligent Ensemble-Based Model. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3358201

Alkaberi, W., & Assiri, F. (2024). Predicting the Number of Software Faults using Deep Learning. Engineering, Technology & Applied Science Research, 14(2), 13222–13231. https://doi.org/10.48084/etasr.6798

Alkaberi, W., & Assiri, F. (2024). Predicting the Number of Software Faults using Deep Learning. Engineering, Technology & Applied Science Research, 14(2), 13222–13231. https://doi.org/10.48084/etasr.6798

Alzeyani, E. M. M., & Szabó, C. (2024). Comparative Evaluation of Model Accuracy for Predicting Selected Attributes in Agile Project Management. Mathematics, 12(16), 2529. https://doi.org/10.3390/math12162529

Alzeyani, E. M. M., & Szabó, C. (2024). Comparative Evaluation of Model Accuracy for Predicting Selected Attributes in Agile Project Management. Mathematics, 12(16), 2529. https://doi.org/10.3390/math12162529

Charles, J. (2024). Revolutionizing Software Project Development: A CNN-LSTM Hybrid Model for Effective Defect Prediction. International Journal of Advanced Computer Science & Applications, 15(1). https://doi.org/10.14569/ijacsa.2024.0150158

Charles, J. (2024). Revolutionizing Software Project Development: A CNN-LSTM Hybrid Model for Effective Defect Prediction. International Journal of Advanced Computer Science & Applications, 15(1). https://doi.org/10.14569/ijacsa.2024.0150158

Elentukh, A. (2023). People Make Mistakes–A Survey of Common Causes of Software Defects. International Conference on Computer Science and Education in Computer Science, 117–133. https://doi.org/10.1007/978-3-031-44668-9_9

Elentukh, A. (2023). People Make Mistakes–A Survey of Common Causes of Software Defects. International Conference on Computer Science and Education in Computer Science, 117–133. https://doi.org/10.1007/978-3-031-44668-9_9

Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis & Machine Intelligence, 8, 1915–1929. https://doi.org/10.1109/TPAMI.2012.231

Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis & Machine Intelligence, 8, 1915–1929. https://doi.org/10.1109/TPAMI.2012.231

Farid, A. B., Fathy, E. M., Eldin, A. S., & Abd-Elmegid, L. A. (2021). Software Defect Prediction Using Hybrid Model (CBIL) of Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM). PeerJ Computer Science, 7, e739. https://doi.org/10.7717/peerj-cs.739

Farid, A. B., Fathy, E. M., Eldin, A. S., & Abd-Elmegid, L. A. (2021). Software Defect Prediction Using Hybrid Model (CBIL) of Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM). PeerJ Computer Science, 7, e739. https://doi.org/10.7717/peerj-cs.739

Feyzi, F., & Daneshdoost, A. (2023). Studying the effectiveness of deep active learning in software defect prediction. International Journal of Computers and Applications, 45(7–8), 534–552. https://doi.org/10.1080/1206212X.2023.2252117

Feyzi, F., & Daneshdoost, A. (2023). Studying the effectiveness of deep active learning in software defect prediction. International Journal of Computers and Applications, 45(7–8), 534–552. https://doi.org/10.1080/1206212X.2023.2252117

Ghotra, B., McIntosh, S., & Hassan, A. E. (2017). A large-scale study of the impact of feature selection techniques on defect classification models. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), 146–157. https://doi.org/10.1109/MSR.2017.18

Ghotra, B., McIntosh, S., & Hassan, A. E. (2017). A large-scale study of the impact of feature selection techniques on defect classification models. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), 146–157. https://doi.org/10.1109/MSR.2017.18

Giray, G., Bennin, K. E., Köksal, Ö., Babur, Ö., & Tekinerdogan, B. (2023). On the use of deep learning in software defect prediction. Journal of Systems and Software, 195, 111537. https://doi.org/10.1016/j.jss.2022.111537

Giray, G., Bennin, K. E., Köksal, Ö., Babur, Ö., & Tekinerdogan, B. (2023). On the use of deep learning in software defect prediction. Journal of Systems and Software, 195, 111537. https://doi.org/10.1016/j.jss.2022.111537

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep feedforward networks. Deep learning, 1, 161-217.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep feedforward networks. Deep learning, 1, 161-217.

Goyal, S. (2022). Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artificial Intelligence Review, 55(3), 2023–2064. https://doi.org/10.1007/s10462-021-10044-w

Goyal, S. (2022). Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artificial Intelligence Review, 55(3), 2023–2064. https://doi.org/10.1007/s10462-021-10044-w

Hussein, A. S., Li, T., Abd Ali, D. M., Bashir, K., & Yohannese, C. W. (2020). A modified adaptive synthetic sampling method for learning imbalanced datasets. Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020), 76–83. https://doi.org/10.1142/9789811223334_0010

Hussein, A. S., Li, T., Abd Ali, D. M., Bashir, K., & Yohannese, C. W. (2020). A modified adaptive synthetic sampling method for learning imbalanced datasets. Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020), 76–83. https://doi.org/10.1142/9789811223334_0010

Jin, C. (2021). Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 171. https://doi.org/10.1016/j.eswa.2021.114637

Jin, C. (2021). Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 171. https://doi.org/10.1016/j.eswa.2021.114637

Jude, A., & Uddin, J. (2024). Explainable Software Defects Classification Using SMOTE and Machine Learning. Annals of Emerging Technologies in Computing (AETiC), 8(1). https://doi.org/10.33166/AETiC.2024.01.00

Jude, A., & Uddin, J. (2024). Explainable Software Defects Classification Using SMOTE and Machine Learning. Annals of Emerging Technologies in Computing (AETiC), 8(1). https://doi.org/10.33166/AETiC.2024.01.00

Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6). https://doi.org/10.3390/su15065517

Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6). https://doi.org/10.3390/su15065517

Khan, M. F. I., & Masum, A. K. M. (2024). Predictive Analytics and Machine Learning for Real-Time Detection of Software Defects and Agile Test Management. Educational Administration: Theory and Practice, 30(4), 1051–1057.

Khan, M. F. I., & Masum, A. K. M. (2024). Predictive Analytics and Machine Learning for Real-Time Detection of Software Defects and Agile Test Management. Educational Administration: Theory and Practice, 30(4), 1051–1057.

Khleel, N. A. A., & Nehéz, K. (2022). A New Approach to Software Defect Prediction Based on Convolutional Neural Network and Bidirectional Long Short-Term Memory. Production Systems and Information Engineering, 10(3), 1–18. https://doi.org/10.32968/psaie.2022.3.1

Khleel, N. A. A., & Nehéz, K. (2022). A New Approach to Software Defect Prediction Based on Convolutional Neural Network and Bidirectional Long Short-Term Memory. Production Systems and Information Engineering, 10(3), 1–18. https://doi.org/10.32968/psaie.2022.3.1

Krasner, H. (2021). The cost of poor software quality in the US: A 2020 report. Proc. Consortium Inf. Softw. QualityTM (CISQTM), 2.

Krasner, H. (2021). The cost of poor software quality in the US: A 2020 report. Proc. Consortium Inf. Softw. QualityTM (CISQTM), 2.

Kukkar, A., Mohana, R., Nayyar, A., Kim, J., Kang, B. G., & Chilamkurti, N. (2019). A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting. Sensors, 19(13), 2964. https://doi.org/10.3390/s19132964

Kukkar, A., Mohana, R., Nayyar, A., Kim, J., Kang, B. G., & Chilamkurti, N. (2019). A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting. Sensors, 19(13), 2964. https://doi.org/10.3390/s19132964

Kumar, L., Singh, V., Neti, L. B. M., Misra, S., & Krishna, A. (2023). An Empirical Framework for Software Aging-Related Bug Prediction using Weighted Extreme Learning Machine. FedCSIS (Communication Papers), 181–188. https://doi.org/10.15439/2023F9248

Kumar, L., Singh, V., Neti, L. B. M., Misra, S., & Krishna, A. (2023). An Empirical Framework for Software Aging-Related Bug Prediction using Weighted Extreme Learning Machine. FedCSIS (Communication Papers), 181–188. https://doi.org/10.15439/2023F9248

Maddipati, S., & Srinivas, M. (2021). A Hybrid Approach for Cost Effective Prediction of Software Defects. International Journal of Advanced Computer Science and Applications, 12. https://doi.org/10.14569/IJACSA.2021.0120219

Maddipati, S., & Srinivas, M. (2021). A Hybrid Approach for Cost Effective Prediction of Software Defects. International Journal of Advanced Computer Science and Applications, 12. https://doi.org/10.14569/IJACSA.2021.0120219

Mahmoud, A. N., Abdelaziz, A., Santos, V., & Freire, M. M. (2024). A proposed model for detecting defects in software projects. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 290–302. https://doi.org /10.11591/ijeecs.v33.i1

Mahmoud, A. N., Abdelaziz, A., Santos, V., & Freire, M. M. (2024). A proposed model for detecting defects in software projects. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 290–302. https://doi.org /10.11591/ijeecs.v33.i1

Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access. https://doi.org /10.1109/ACCESS.2023.3287326

Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access. https://doi.org /10.1109/ACCESS.2023.3287326

Menzies, T., Krishna, R., & Pryor, D. (2015). The Promise Repository of Empirical Software Engineering Data. http://openscience.us/repo

Menzies, T., Krishna, R., & Pryor, D. (2015). The Promise Repository of Empirical Software Engineering Data. http://openscience.us/repo

Nevendra, M., & Singh, P. (2022). A survey of software defect prediction based on deep learning. Archives of Computational Methods in Engineering, 29(7), 5723–5748. https://doi.org/10.1007/s11831-022-09787-8

Nevendra, M., & Singh, P. (2022). A survey of software defect prediction based on deep learning. Archives of Computational Methods in Engineering, 29(7), 5723–5748. https://doi.org/10.1007/s11831-022-09787-8

Olaleye, T. O., Arogundade, O. T., Misra, S., Abayomi-Alli, A., & Kose, U. (2023). Predictive analytics and software defect severity: A systematic review and future directions. Scientific Programming, 2023(1), 6221388. https://doi.org/10.1155/2023/6221388

Olaleye, T. O., Arogundade, O. T., Misra, S., Abayomi-Alli, A., & Kose, U. (2023). Predictive analytics and software defect severity: A systematic review and future directions. Scientific Programming, 2023(1), 6221388. https://doi.org/10.1155/2023/6221388

Olorunshola, O. E., Irhebhude, M. E., Evwiekpaefe, A. E., & Ogwueleka, F. N. (2020). Evaluation of machine learning classification techniques in predicting software defects. Trans. Mach. Learn. Artif. Intel, 8, 1–15. http:// dx.doi.org/ 10.14738/ tmlai.85.8733

Olorunshola, O. E., Irhebhude, M. E., Evwiekpaefe, A. E., & Ogwueleka, F. N. (2020). Evaluation of machine learning classification techniques in predicting software defects. Trans. Mach. Learn. Artif. Intel, 8, 1–15.

Pachouly, J., Ahirrao, S., Kotecha, K., Selvachandran, G., & Abraham, A. (2022). A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools. Engineering Applications of Artificial Intelligence, 111, 104773.https://doi.org/10.1016/j.engappai.2022.104773

Pachouly, J., Ahirrao, S., Kotecha, K., Selvachandran, G., & Abraham, A. (2022). A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools. Engineering Applications of Artificial Intelligence, 111, 104773.

Patil, D., Rane, N. L., Desai, P., & Rane, J. (2024). Machine learning and deep learning: Methods, techniques, applications, challenges, and future research opportunities. Trustworthy Artificial Intelligence in Industry and Society, 28–81. https://doi.org/10.70593/978-81-981367-4-9

Ponnala, R., & Reddy, C. (2023). Ensemble Model for Software Defect Prediction Using Method Level Features of Spring Framework Open Source Java Project for E-Commerce. Shu Ju Cai Ji Yu Chu Li/Journal of Data Acquisition and Processing, 38, 1645–1650. https://doi.org/10.5281/zenodo.7749985

Saidani, I., Ouni, A., & Mkaouer, M. W. (2022). Improving the prediction of continuous integration build failures using deep learning. Automated Software Engineering, 29(1), 21. https://doi.org/10.1007/s10515-021-00319-5

Shafiq, M., Alghamedy, F. H., Jamal, N., Kamal, T., Daradkeh, Y. I., & Shabaz, M. (2023). Retracted: Scientific programming using optimized machine learning techniques for software fault prediction to improve software quality. IET Software, 17(4), 694–704. https://doi.org/10.1155/2020/8858010

Shen, Z., & Chen, S. (2020). A survey of automatic software vulnerability detection, program repair, and defect prediction techniques. Security and Communication Networks, 2020(1), 8858010. https://doi.org/10.1155/2020/8858010

Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2018). NASA MDP Software Defects Data Sets. figshare. https://doi.org/10.6084/m9.figshare.c.4054940.v1

Tameswar, K., Suddul, G., & Dookhitram, K. (2022). A hybrid deep learning approach with genetic and coral reefs metaheuristics for enhanced defect detection in software. International Journal of Information Management Data Insights, 2(2), 100105. https://doi.org/10.1016/j.jjimei.2022.100105

Thomas, N. S., & Kaliraj, S. (2024). An Improved and Optimized Random Forest Based Approach to Predict the Software Faults. SN Computer Science, 5(5), 530. https://doi.org/10.1007/s42979-024-02764-x

Vogel-Heuser, B., Fay, A., Schaefer, I., & Tichy, M. (2015). Evolution of software in automated production systems: Challenges and research directions. Journal of Systems and Software, 110, 54–84. https://doi.org/10.1016/j.jss.2015.08.026

Wan, X., Zheng, Z., Qin, F., & Lu, X. (2024). Data complexity: A new perspective for analyzing the difficulty of defect prediction tasks. ACM Transactions on Software Engineering and Methodology. https://doi.org/10.1145/3649596

Wang, D., Zhang, B., & Zhu, M. (2022). A survey on convolutional neural network with its applications. Comput. Math. Appl., 83(3), 186–206. https://doi.org/10.1016/j.camwa.2021.11.025

Wang, H., Zhuang, W., & Zhang, X. (2021). Software defect prediction based on gated hierarchical LSTMs. IEEE Transactions on Reliability, 70(2), 711–727. https://doi.org/10.1109/TR.2020.3047396

Wang, S., Huang, L., Gao, A., Ge, J., Zhang, T., Feng, H., Satyarth, I., Li, M., Zhang, H., & Ng, V. (2022). Machine/deep learning for software engineering: A systematic literature review. IEEE Transactions on Software Engineering, 49(3), 1188–1231. https://doi.org/10.1109/TSE.2022.3173346

Downloads

Published

2025-10-01

How to Cite

Haruna, M. J., Chinyio, D. T., & Irhebhude, M. E. (2025). HYBRID TECHNIQUE FOR SOFTWARE DEFECT PREDICTION USING MACHINE LEARNING TECHNIQUES. Science Journal of University of Zakho, 13(4), 448–462. https://doi.org/10.25271/sjuoz.2025.13.4.1532

Issue

Section

Science Journal of University of Zakho