TY - JOUR
T1 - Discriminating features-based cost-sensitive approach for software defect prediction
AU - Ali, Aftab
AU - Khan, Naveed
AU - Abu-Tair, Mamun
AU - Noppen, Joost
AU - McClean, Sally I
AU - McChesney, Ian
N1 - Funding Information:
This research is supported by the BTIIC (BT Ireland Innovation Centre) project, funded by BT and Invest Northern Ireland.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/7/12
Y1 - 2021/7/12
N2 - Correlated quality metrics extracted from a source code repositorycan be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highlyunbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a factthat the selection of the best discriminating features significantly improvesthe robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating featuresthat help in accurately predicting a defect in a software component. Secondly,a cost-sensitive logistic regression and decision tree ensemble-based predictionmodels are applied to the best discriminating features for precisely predictinga defect in a software component. The proposed models are compared withthe most recent schemes in the literature in terms of accuracy, area under thecurve (AUC), and recall. The models are evaluated using 11 datasets and itis evident from the results and analysis that the performance of the proposedprediction models outperforms the schemes in the literature.
AB - Correlated quality metrics extracted from a source code repositorycan be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highlyunbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a factthat the selection of the best discriminating features significantly improvesthe robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating featuresthat help in accurately predicting a defect in a software component. Secondly,a cost-sensitive logistic regression and decision tree ensemble-based predictionmodels are applied to the best discriminating features for precisely predictinga defect in a software component. The proposed models are compared withthe most recent schemes in the literature in terms of accuracy, area under thecurve (AUC), and recall. The models are evaluated using 11 datasets and itis evident from the results and analysis that the performance of the proposedprediction models outperforms the schemes in the literature.
KW - AUC
KW - Cost-sensitivity
KW - Discriminating features
KW - Machine learning models
KW - Recall
KW - Software bugs/defects
UR - http://www.scopus.com/inward/record.url?scp=85110462591&partnerID=8YFLogxK
U2 - 10.1007/s10515-021-00289-8
DO - 10.1007/s10515-021-00289-8
M3 - Article
SN - 0928-8910
VL - 28
JO - Automated software engineering
JF - Automated software engineering
M1 - 11
ER -