HYBRID TECHNIQUE FOR SOFTWARE DEFECT PREDICTION USING MACHINE LEARNING TECHNIQUES
DOI:
https://doi.org/10.25271/sjuoz.2025.13.4.1532Keywords:
Software Defect Prediction, Convolutional Neural Networks (CNN), Climate Forecasting, Deep Learning, CNN-LSTM Hybrid, Attention Mechanism, and Multi-Task Learning, KEYWORDS: Financial fraud detection, Convolutional Neural Networks, deep learning, credit card fraud, anomaly detection, imbalanced data, machine learning, CNN for tabular data, fraud analytics, classification., Deep learning, Hybrid Technique, XGboostAbstract
Human errors during software development lead to many defects, which emphasizes the importance of early detection and minimization. However, existing approaches often fall short in delivering accurate, scalable, and generalizable predictions due to challenges such as class imbalance, feature extraction limitations, and computational inefficiencies. This study proposes a hybrid method using a Convolutional Neural Network (CNNs) + Long Short-Term Memory (LSTM) for feature extraction, addressing class imbalance with Adaptive Synthetic Sampling (ADASYN) and subsequent training using Extreme Gradient Boosting (XGboost), to predict software defects. The proposed approach was evaluated on five publicly available datasets (CM1, MC1, KC1, PC1, and PC4) and compared with state-of-the-art (SOTA) models. Experimental results demonstrated that the hybrid model significantly outperforms traditional XGBoost-based models in terms of recall, F1-score, and area under the receiver operating characteristic curve (AUC), addressing the shortcomings of existing methods. Results demonstrate the effectiveness of the proposed method, with notable performance metrics achieved across all datasets. For example, on the MC1 dataset, the model attained an accuracy of 0.9980, a precision of 0.9971, a recall of 0.9988, an F1-score of 0.9980, and an AUC-ROC of 0.9999. On the KC1 dataset, it achieved an accuracy of 0.9344, a precision of 0.9265, a recall of 0.9375, an F1-score of 0.9320, and an AUC-ROC of 0.9839. The model achieves better performance than traditional machine learning methods and separate deep learning models, especially in the areas of recall and AUC-ROC. This research presents a robust solution through hybrid approaches that address class imbalance and maintain high predictive accuracy for software development process tasks, offering insights into the trade-offs between machine learning and deep learning methods.
Downloads
References
Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., & Gandomi, A. H. (2023). Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges. Artificial Intelligence Review, 56(11), 13521–13617. https://doi.org/10.1007/s10462-023-10409-2
Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., & Gandomi, A. H. (2023). Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges. Artificial Intelligence Review, 56(11), 13521–13617. https://doi.org/10.1007/s10462-023-10409-2
AL-Hadidi, T. N., & Hasoon, S. O. (2024). Software Defect Prediction Using Extreme Gradient Boosting (XGBoost) with Optimization Hyperparameter. Al-Rafidain Journal of Computer Sciences and Mathematics (RJCM), 18(1), 22–29. https://doi.org /10.33899/csmj.2023.142739.1081
AL-Hadidi, T. N., & Hasoon, S. O. (2024). Software Defect Prediction Using Extreme Gradient Boosting (XGBoost) with Optimization Hyperparameter. Al-Rafidain Journal of Computer Sciences and Mathematics (RJCM), 18(1), 22–29. https://doi.org /10.33899/csmj.2023.142739.1081
Ali, M., Mazhar, T., Arif, Y., Al-Otaibi, S., Ghadi, Y. Y., Shahzad, T., Khan, M. A., & Hamam, H. (2024). Software Defect Prediction Using an Intelligent Ensemble-Based Model. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3358201
Ali, M., Mazhar, T., Arif, Y., Al-Otaibi, S., Ghadi, Y. Y., Shahzad, T., Khan, M. A., & Hamam, H. (2024). Software Defect Prediction Using an Intelligent Ensemble-Based Model. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3358201
Alkaberi, W., & Assiri, F. (2024). Predicting the Number of Software Faults using Deep Learning. Engineering, Technology & Applied Science Research, 14(2), 13222–13231. https://doi.org/10.48084/etasr.6798
Alkaberi, W., & Assiri, F. (2024). Predicting the Number of Software Faults using Deep Learning. Engineering, Technology & Applied Science Research, 14(2), 13222–13231. https://doi.org/10.48084/etasr.6798
Alzeyani, E. M. M., & Szabó, C. (2024). Comparative Evaluation of Model Accuracy for Predicting Selected Attributes in Agile Project Management. Mathematics, 12(16), 2529. https://doi.org/10.3390/math12162529
Alzeyani, E. M. M., & Szabó, C. (2024). Comparative Evaluation of Model Accuracy for Predicting Selected Attributes in Agile Project Management. Mathematics, 12(16), 2529. https://doi.org/10.3390/math12162529
Charles, J. (2024). Revolutionizing Software Project Development: A CNN-LSTM Hybrid Model for Effective Defect Prediction. International Journal of Advanced Computer Science & Applications, 15(1). https://doi.org/10.14569/ijacsa.2024.0150158
Charles, J. (2024). Revolutionizing Software Project Development: A CNN-LSTM Hybrid Model for Effective Defect Prediction. International Journal of Advanced Computer Science & Applications, 15(1). https://doi.org/10.14569/ijacsa.2024.0150158
Elentukh, A. (2023). People Make Mistakes–A Survey of Common Causes of Software Defects. International Conference on Computer Science and Education in Computer Science, 117–133. https://doi.org/10.1007/978-3-031-44668-9_9
Elentukh, A. (2023). People Make Mistakes–A Survey of Common Causes of Software Defects. International Conference on Computer Science and Education in Computer Science, 117–133. https://doi.org/10.1007/978-3-031-44668-9_9
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis & Machine Intelligence, 8, 1915–1929. https://doi.org/10.1109/TPAMI.2012.231
Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis & Machine Intelligence, 8, 1915–1929. https://doi.org/10.1109/TPAMI.2012.231
Farid, A. B., Fathy, E. M., Eldin, A. S., & Abd-Elmegid, L. A. (2021). Software Defect Prediction Using Hybrid Model (CBIL) of Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM). PeerJ Computer Science, 7, e739. https://doi.org/10.7717/peerj-cs.739
Farid, A. B., Fathy, E. M., Eldin, A. S., & Abd-Elmegid, L. A. (2021). Software Defect Prediction Using Hybrid Model (CBIL) of Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (Bi-LSTM). PeerJ Computer Science, 7, e739. https://doi.org/10.7717/peerj-cs.739
Feyzi, F., & Daneshdoost, A. (2023). Studying the effectiveness of deep active learning in software defect prediction. International Journal of Computers and Applications, 45(7–8), 534–552. https://doi.org/10.1080/1206212X.2023.2252117
Feyzi, F., & Daneshdoost, A. (2023). Studying the effectiveness of deep active learning in software defect prediction. International Journal of Computers and Applications, 45(7–8), 534–552. https://doi.org/10.1080/1206212X.2023.2252117
Ghotra, B., McIntosh, S., & Hassan, A. E. (2017). A large-scale study of the impact of feature selection techniques on defect classification models. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), 146–157. https://doi.org/10.1109/MSR.2017.18
Ghotra, B., McIntosh, S., & Hassan, A. E. (2017). A large-scale study of the impact of feature selection techniques on defect classification models. 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), 146–157. https://doi.org/10.1109/MSR.2017.18
Giray, G., Bennin, K. E., Köksal, Ö., Babur, Ö., & Tekinerdogan, B. (2023). On the use of deep learning in software defect prediction. Journal of Systems and Software, 195, 111537. https://doi.org/10.1016/j.jss.2022.111537
Giray, G., Bennin, K. E., Köksal, Ö., Babur, Ö., & Tekinerdogan, B. (2023). On the use of deep learning in software defect prediction. Journal of Systems and Software, 195, 111537. https://doi.org/10.1016/j.jss.2022.111537
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep feedforward networks. Deep learning, 1, 161-217.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep feedforward networks. Deep learning, 1, 161-217.
Goyal, S. (2022). Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artificial Intelligence Review, 55(3), 2023–2064. https://doi.org/10.1007/s10462-021-10044-w
Goyal, S. (2022). Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artificial Intelligence Review, 55(3), 2023–2064. https://doi.org/10.1007/s10462-021-10044-w
Hussein, A. S., Li, T., Abd Ali, D. M., Bashir, K., & Yohannese, C. W. (2020). A modified adaptive synthetic sampling method for learning imbalanced datasets. Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020), 76–83. https://doi.org/10.1142/9789811223334_0010
Hussein, A. S., Li, T., Abd Ali, D. M., Bashir, K., & Yohannese, C. W. (2020). A modified adaptive synthetic sampling method for learning imbalanced datasets. Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020), 76–83. https://doi.org/10.1142/9789811223334_0010
Jin, C. (2021). Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 171. https://doi.org/10.1016/j.eswa.2021.114637
Jin, C. (2021). Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 171. https://doi.org/10.1016/j.eswa.2021.114637
Jude, A., & Uddin, J. (2024). Explainable Software Defects Classification Using SMOTE and Machine Learning. Annals of Emerging Technologies in Computing (AETiC), 8(1). https://doi.org/10.33166/AETiC.2024.01.00
Jude, A., & Uddin, J. (2024). Explainable Software Defects Classification Using SMOTE and Machine Learning. Annals of Emerging Technologies in Computing (AETiC), 8(1). https://doi.org/10.33166/AETiC.2024.01.00
Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6). https://doi.org/10.3390/su15065517
Khalid, A., Badshah, G., Ayub, N., Shiraz, M., & Ghouse, M. (2023). Software Defect Prediction Analysis Using Machine Learning Techniques. Sustainability, 15(6). https://doi.org/10.3390/su15065517
Khan, M. F. I., & Masum, A. K. M. (2024). Predictive Analytics and Machine Learning for Real-Time Detection of Software Defects and Agile Test Management. Educational Administration: Theory and Practice, 30(4), 1051–1057.
Khan, M. F. I., & Masum, A. K. M. (2024). Predictive Analytics and Machine Learning for Real-Time Detection of Software Defects and Agile Test Management. Educational Administration: Theory and Practice, 30(4), 1051–1057.
Khleel, N. A. A., & Nehéz, K. (2022). A New Approach to Software Defect Prediction Based on Convolutional Neural Network and Bidirectional Long Short-Term Memory. Production Systems and Information Engineering, 10(3), 1–18. https://doi.org/10.32968/psaie.2022.3.1
Khleel, N. A. A., & Nehéz, K. (2022). A New Approach to Software Defect Prediction Based on Convolutional Neural Network and Bidirectional Long Short-Term Memory. Production Systems and Information Engineering, 10(3), 1–18. https://doi.org/10.32968/psaie.2022.3.1
Krasner, H. (2021). The cost of poor software quality in the US: A 2020 report. Proc. Consortium Inf. Softw. QualityTM (CISQTM), 2.
Krasner, H. (2021). The cost of poor software quality in the US: A 2020 report. Proc. Consortium Inf. Softw. QualityTM (CISQTM), 2.
Kukkar, A., Mohana, R., Nayyar, A., Kim, J., Kang, B. G., & Chilamkurti, N. (2019). A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting. Sensors, 19(13), 2964. https://doi.org/10.3390/s19132964
Kukkar, A., Mohana, R., Nayyar, A., Kim, J., Kang, B. G., & Chilamkurti, N. (2019). A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting. Sensors, 19(13), 2964. https://doi.org/10.3390/s19132964
Kumar, L., Singh, V., Neti, L. B. M., Misra, S., & Krishna, A. (2023). An Empirical Framework for Software Aging-Related Bug Prediction using Weighted Extreme Learning Machine. FedCSIS (Communication Papers), 181–188. https://doi.org/10.15439/2023F9248
Kumar, L., Singh, V., Neti, L. B. M., Misra, S., & Krishna, A. (2023). An Empirical Framework for Software Aging-Related Bug Prediction using Weighted Extreme Learning Machine. FedCSIS (Communication Papers), 181–188. https://doi.org/10.15439/2023F9248
Maddipati, S., & Srinivas, M. (2021). A Hybrid Approach for Cost Effective Prediction of Software Defects. International Journal of Advanced Computer Science and Applications, 12. https://doi.org/10.14569/IJACSA.2021.0120219
Maddipati, S., & Srinivas, M. (2021). A Hybrid Approach for Cost Effective Prediction of Software Defects. International Journal of Advanced Computer Science and Applications, 12. https://doi.org/10.14569/IJACSA.2021.0120219
Mahmoud, A. N., Abdelaziz, A., Santos, V., & Freire, M. M. (2024). A proposed model for detecting defects in software projects. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 290–302. https://doi.org /10.11591/ijeecs.v33.i1
Mahmoud, A. N., Abdelaziz, A., Santos, V., & Freire, M. M. (2024). A proposed model for detecting defects in software projects. Indonesian Journal of Electrical Engineering and Computer Science, 33(1), 290–302. https://doi.org /10.11591/ijeecs.v33.i1
Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access. https://doi.org /10.1109/ACCESS.2023.3287326
Mehmood, I., Shahid, S., Hussain, H., Khan, I., Ahmad, S., Rahman, S., Ullah, N., & Huda, S. (2023). A Novel Approach to Improve Software Defect Prediction Accuracy Using Machine Learning. IEEE Access. https://doi.org /10.1109/ACCESS.2023.3287326
Menzies, T., Krishna, R., & Pryor, D. (2015). The Promise Repository of Empirical Software Engineering Data. http://openscience.us/repo
Menzies, T., Krishna, R., & Pryor, D. (2015). The Promise Repository of Empirical Software Engineering Data. http://openscience.us/repo
Nevendra, M., & Singh, P. (2022). A survey of software defect prediction based on deep learning. Archives of Computational Methods in Engineering, 29(7), 5723–5748. https://doi.org/10.1007/s11831-022-09787-8
Nevendra, M., & Singh, P. (2022). A survey of software defect prediction based on deep learning. Archives of Computational Methods in Engineering, 29(7), 5723–5748. https://doi.org/10.1007/s11831-022-09787-8
Olaleye, T. O., Arogundade, O. T., Misra, S., Abayomi-Alli, A., & Kose, U. (2023). Predictive analytics and software defect severity: A systematic review and future directions. Scientific Programming, 2023(1), 6221388. https://doi.org/10.1155/2023/6221388
Olaleye, T. O., Arogundade, O. T., Misra, S., Abayomi-Alli, A., & Kose, U. (2023). Predictive analytics and software defect severity: A systematic review and future directions. Scientific Programming, 2023(1), 6221388. https://doi.org/10.1155/2023/6221388
Olorunshola, O. E., Irhebhude, M. E., Evwiekpaefe, A. E., & Ogwueleka, F. N. (2020). Evaluation of machine learning classification techniques in predicting software defects. Trans. Mach. Learn. Artif. Intel, 8, 1–15. http:// dx.doi.org/ 10.14738/ tmlai.85.8733
Olorunshola, O. E., Irhebhude, M. E., Evwiekpaefe, A. E., & Ogwueleka, F. N. (2020). Evaluation of machine learning classification techniques in predicting software defects. Trans. Mach. Learn. Artif. Intel, 8, 1–15.
Pachouly, J., Ahirrao, S., Kotecha, K., Selvachandran, G., & Abraham, A. (2022). A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools. Engineering Applications of Artificial Intelligence, 111, 104773.https://doi.org/10.1016/j.engappai.2022.104773
Pachouly, J., Ahirrao, S., Kotecha, K., Selvachandran, G., & Abraham, A. (2022). A systematic literature review on software defect prediction using artificial intelligence: Datasets, Data Validation Methods, Approaches, and Tools. Engineering Applications of Artificial Intelligence, 111, 104773.
Patil, D., Rane, N. L., Desai, P., & Rane, J. (2024). Machine learning and deep learning: Methods, techniques, applications, challenges, and future research opportunities. Trustworthy Artificial Intelligence in Industry and Society, 28–81. https://doi.org/10.70593/978-81-981367-4-9
Ponnala, R., & Reddy, C. (2023). Ensemble Model for Software Defect Prediction Using Method Level Features of Spring Framework Open Source Java Project for E-Commerce. Shu Ju Cai Ji Yu Chu Li/Journal of Data Acquisition and Processing, 38, 1645–1650. https://doi.org/10.5281/zenodo.7749985
Saidani, I., Ouni, A., & Mkaouer, M. W. (2022). Improving the prediction of continuous integration build failures using deep learning. Automated Software Engineering, 29(1), 21. https://doi.org/10.1007/s10515-021-00319-5
Shafiq, M., Alghamedy, F. H., Jamal, N., Kamal, T., Daradkeh, Y. I., & Shabaz, M. (2023). Retracted: Scientific programming using optimized machine learning techniques for software fault prediction to improve software quality. IET Software, 17(4), 694–704. https://doi.org/10.1155/2020/8858010
Shen, Z., & Chen, S. (2020). A survey of automatic software vulnerability detection, program repair, and defect prediction techniques. Security and Communication Networks, 2020(1), 8858010. https://doi.org/10.1155/2020/8858010
Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2018). NASA MDP Software Defects Data Sets. figshare. https://doi.org/10.6084/m9.figshare.c.4054940.v1
Tameswar, K., Suddul, G., & Dookhitram, K. (2022). A hybrid deep learning approach with genetic and coral reefs metaheuristics for enhanced defect detection in software. International Journal of Information Management Data Insights, 2(2), 100105. https://doi.org/10.1016/j.jjimei.2022.100105
Thomas, N. S., & Kaliraj, S. (2024). An Improved and Optimized Random Forest Based Approach to Predict the Software Faults. SN Computer Science, 5(5), 530. https://doi.org/10.1007/s42979-024-02764-x
Vogel-Heuser, B., Fay, A., Schaefer, I., & Tichy, M. (2015). Evolution of software in automated production systems: Challenges and research directions. Journal of Systems and Software, 110, 54–84. https://doi.org/10.1016/j.jss.2015.08.026
Wan, X., Zheng, Z., Qin, F., & Lu, X. (2024). Data complexity: A new perspective for analyzing the difficulty of defect prediction tasks. ACM Transactions on Software Engineering and Methodology. https://doi.org/10.1145/3649596
Wang, D., Zhang, B., & Zhu, M. (2022). A survey on convolutional neural network with its applications. Comput. Math. Appl., 83(3), 186–206. https://doi.org/10.1016/j.camwa.2021.11.025
Wang, H., Zhuang, W., & Zhang, X. (2021). Software defect prediction based on gated hierarchical LSTMs. IEEE Transactions on Reliability, 70(2), 711–727. https://doi.org/10.1109/TR.2020.3047396
Wang, S., Huang, L., Gao, A., Ge, J., Zhang, T., Feng, H., Satyarth, I., Li, M., Zhang, H., & Ng, V. (2022). Machine/deep learning for software engineering: A systematic literature review. IEEE Transactions on Software Engineering, 49(3), 1188–1231. https://doi.org/10.1109/TSE.2022.3173346
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Muhammad Jumare Haruna, Darius T. Chinyio , Martin, E. Irhebhude,

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-SA 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work, with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online.