Abstract
Heterogeneous cross-project defect prediction (HCPDP) aims to predict defects in new software projects using defect data from previous software projects where the source and target projects have some different metrics. Most existing methods only find linear relationships in the software defect features and datasets. Additionally, these methods use multiple defect datasets from different projects as source datasets. In this paper, we propose a novel method called heterogeneous cross-project defect prediction using encoder and transfer learning (ETL). ETL uses encoders to extract the important features from source and target datasets. Also, to minimize negative transfer during transfer learning, we used an augmented dataset that contains pseudo-labels and the source dataset. Additionally, we have used very limited data to train the model. To evaluate the performance of the ETL approach, 16 datasets from four publicly available software defect projects were used. Furthermore, we compared the proposed method with four HCPDP methods namely EGW, HDP_KS, CTKCCA and EMKCA, and one WPDP method from existing literature. The proposed method on average outperforms the baseline methods in terms of PD, PF, F1-score, G-mean and AUC.
Original language | English |
---|---|
Pages (from-to) | 409-419 |
Number of pages | 12 |
Journal | IEEE Access |
Volume | 12 |
Issue number | Early Access |
Early online date | 14 Dec 2023 |
DOIs | |
Publication status | Published online - 14 Dec 2023 |
Bibliographical note
Publisher Copyright:Authors
Keywords
- software defect prediction
- Software Engineering
- Transfer Learning
- Measurement
- Adaptation models
- Software defect
- Transfer learning
- Predictive models
- Feature extraction
- Data models
- Software
- Software engineering