TY - UNPB
T1 - TML: A Transformer-Based Meta-Learning Framework for Cross-Project Software Defect Prediction
AU - Bandhu, Himanshu
AU - Ali, Aftab
AU - McClean, Sally
AU - Ullah, Hanif
AU - Abu-Tair, Mamun
AU - Ziolkowski, Adam
AU - Noppen, Joost
PY - 2024/11/15
Y1 - 2024/11/15
N2 - Identifying software defects early is crucial for enhancing software quality and reducing costs. Traditional Within-Project Defect Prediction (WPDP) methods rely on historical project-specific data, limiting their effectiveness when such data is unavailable. Cross-Project Defect Prediction (CPDP) offers a solution by leveraging defect data from different projects, but challenges arise due to the diverse nature of data distributions across projects. This paper presents a novel framework, TML (Transformer-based Meta-Learning), designed to improve CPDP performance by addressing these challenges. TML integrates transformer-based encoder networks for feature extraction, adversarial domain adaptation to align data distributions, and meta-learning to enhance generalization across projects. Additionally, it incorporates ensemble learning and Bayesian optimization to improve model robustness and predictive accuracy. The framework is evaluated on 16 datasets from four major software repositories (AEEM, NASA, Promise, JIRA). Experimental results demonstrate that TML significantly outperforms existing CPDP methods such as ENTL, EGW, and EMKCA in key performance metrics including Precision, Recall, F1-score, G-Mean, and AUC. The results consistently demonstrate the robustness of the TML framework, establishing it as a promising approach for early defect detection in diverse software development environments.
AB - Identifying software defects early is crucial for enhancing software quality and reducing costs. Traditional Within-Project Defect Prediction (WPDP) methods rely on historical project-specific data, limiting their effectiveness when such data is unavailable. Cross-Project Defect Prediction (CPDP) offers a solution by leveraging defect data from different projects, but challenges arise due to the diverse nature of data distributions across projects. This paper presents a novel framework, TML (Transformer-based Meta-Learning), designed to improve CPDP performance by addressing these challenges. TML integrates transformer-based encoder networks for feature extraction, adversarial domain adaptation to align data distributions, and meta-learning to enhance generalization across projects. Additionally, it incorporates ensemble learning and Bayesian optimization to improve model robustness and predictive accuracy. The framework is evaluated on 16 datasets from four major software repositories (AEEM, NASA, Promise, JIRA). Experimental results demonstrate that TML significantly outperforms existing CPDP methods such as ENTL, EGW, and EMKCA in key performance metrics including Precision, Recall, F1-score, G-Mean, and AUC. The results consistently demonstrate the robustness of the TML framework, establishing it as a promising approach for early defect detection in diverse software development environments.
U2 - 10.21203/rs.3.rs-5382592/v1
DO - 10.21203/rs.3.rs-5382592/v1
M3 - Preprint
BT - TML: A Transformer-Based Meta-Learning Framework for Cross-Project Software Defect Prediction
ER -