Discriminating features-based cost-sensitive approach for software defect prediction

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)
87 Downloads (Pure)

Abstract

Correlated quality metrics extracted from a source code repository
can be utilized to design a model to automatically predict defects in a software system. It is obvious that the extracted metrics will result in a highly
unbalanced data, since the number of defects in a good quality software system should be far less than the number of normal instances. It is also a fact
that the selection of the best discriminating features significantly improves
the robustness and accuracy of a prediction model. Therefore, the contribution of this paper is twofold, first it selects the best discriminating features
that help in accurately predicting a defect in a software component. Secondly,
a cost-sensitive logistic regression and decision tree ensemble-based prediction
models are applied to the best discriminating features for precisely predicting
a defect in a software component. The proposed models are compared with
the most recent schemes in the literature in terms of accuracy, area under thecurve (AUC), and recall. The models are evaluated using 11 datasets and it
is evident from the results and analysis that the performance of the proposed
prediction models outperforms the schemes in the literature.
Original languageEnglish
Article number11
Number of pages18
JournalAutomated software engineering
Volume28
Early online date12 Jul 2021
DOIs
Publication statusPublished online - 12 Jul 2021

Bibliographical note

Funding Information:
This research is supported by the BTIIC (BT Ireland Innovation Centre) project, funded by BT and Invest Northern Ireland.

Publisher Copyright:
© 2021, The Author(s).

Keywords

  • AUC
  • Cost-sensitivity
  • Discriminating features
  • Machine learning models
  • Recall
  • Software bugs/defects

Fingerprint

Dive into the research topics of 'Discriminating features-based cost-sensitive approach for software defect prediction'. Together they form a unique fingerprint.

Cite this