Abstract
Linear classifiers are generally simpler and more explainable than their nonlinear variants. They can achieve satisfactory classification performance on linearly separable data, but not on nonlinear data. So, linear classifiers need extending, typically by modification of their algorithms, resulting in their nonlinear variants. In this paper we present one general method, cluster-based data relabelling (CBDR), that allows linear classifiers to work effectively on nonlinear data. CBDR partitions the data set into several non-overlapping class-specific clusters and relabels data by the clusters. A linear classifier can then be applied to the relabelled data to seek cluster-based linear decision boundaries instead of class-based decision boundaries. Extensive experimentation has demonstrated that CBDR can significantly enhance the classification performance of linear classifiers, and even outperform their nonlinear variants. Further experimentation has demonstrated that CBDR can also improve the classification performance of nonlinear classifiers. Most significant outperformance was observed on imbalanced data in both cases.
Original language | English |
---|---|
Article number | 119485 |
Pages (from-to) | 1-15 |
Number of pages | 15 |
Journal | Information Sciences |
Volume | 648 |
Early online date | 11 Aug 2023 |
DOIs | |
Publication status | Published (in print/issue) - 1 Nov 2023 |
Bibliographical note
Funding Information:The work is supported by the National Natural Science Foundation of China under Grant No. 62106090 and 62106093 , and the Jiangxi Urgent Need for Overseas Talents project under Grant No. 20223BCJ25026 and 20223BCJ25040 .
Publisher Copyright:
© 2023 Elsevier Inc.
Keywords
- Classification
- Classifier
- Cluster-based data relabelling
- Linear discriminant analysis classifier
- Support vector machine
- Multilayer perceptron
- Naive Bayes classifier
- Decision tree
- Machine learning
- Pattern recognition