Cluster-based Data relabelling for classification

Huan Wan, Hui Wang, Bryan Scotney, Jun Liu, Xin Wei

Research output: Contribution to journalArticlepeer-review

Abstract

Linear classifiers are generally simpler and more explainable than their nonlinear variants. They can achieve satisfactory classification performance on linearly separable data, but not on nonlinear data. So, linear classifiers need extending, typically by modification of their algorithms, resulting in their nonlinear variants. In this paper we present one general method, cluster-based data relabelling (CBDR), that allows linear classifiers to work effectively on nonlinear data. CBDR partitions the data set into several non-overlapping class-specific clusters and relabels data by the clusters. A linear classifier can then be applied to the relabelled data to seek cluster-based linear decision boundaries instead of class-based decision boundaries. Extensive experimentation has demonstrated that CBDR can significantly enhance the classification performance of linear classifiers, and even outperform their nonlinear variants. Further experimentation has demonstrated that CBDR can also improve the classification performance of nonlinear classifiers. Most significant outperformance was observed on imbalanced data in both cases.
Original languageEnglish
Article number119485
Pages (from-to)1-15
Number of pages15
JournalInformation Sciences
Volume648
Early online date11 Aug 2023
DOIs
Publication statusPublished (in print/issue) - 1 Nov 2023

Bibliographical note

Funding Information:
The work is supported by the National Natural Science Foundation of China under Grant No. 62106090 and 62106093 , and the Jiangxi Urgent Need for Overseas Talents project under Grant No. 20223BCJ25026 and 20223BCJ25040 .

Publisher Copyright:
© 2023 Elsevier Inc.

Keywords

  • Classification
  • Classifier
  • Cluster-based data relabelling
  • Linear discriminant analysis classifier
  • Support vector machine
  • Multilayer perceptron
  • Naive Bayes classifier
  • Decision tree
  • Machine learning
  • Pattern recognition

Fingerprint

Dive into the research topics of 'Cluster-based Data relabelling for classification'. Together they form a unique fingerprint.

Cite this