Cluster-Based Supervised Classification

  • Huan Wan

Student thesis: Doctoral Thesis

Abstract

Supervised classification is one of our fundamental approaches to understanding the world, and is studied in many research areas. Feature extraction and classification learning are two key processes, which significantly influence the performance of supervised classification. Although impressive progress has been made in supervised classification due to the development of feature extraction methods and classifiers, there are still unsolved problems in supervised classification, such as the class imbalance problem and the few-shot classification problem. In this thesis, we focus on the complex boundary problem — it is hard to obtain high classification accuracy for problems with complex decision boundaries due to the existence of subclass structures. We propose a cluster-based approach to supervised classification and develop cluster-based feature extraction methods and cluster-based classification learning methods.
For feature extraction, to find out the importance of considering within-class multimodality for feature extraction, we conduct a study on within-class multimodal data distribution and classification under such a distribution. This study is guided by five important questions about within-class multimodal data. Systematic experiments using a variety of artificial and real data are conducted to answer the five questions, which further lead to some useful findings. In the second study, a new feature extraction method is proposed, called global subclass discriminant analysis (GSDA). To extract discriminative features, GSDA first obtains clusters in a global way by clustering the whole data set and derives class-specific clusters based on these global clusters. Then it seeks to maximise interclass distance and minimise intraclass distance based on these class-specific clusters. GSDA is extensively evaluated on a wide range of data through comparison with the closely related and state-of-the-art feature extraction methods. Experimental results demonstrate GSDA’s superiority in terms of accuracy and run time.
For classification learning, in the third study, we propose a cluster-based data relabelling (CBDR) method for improving the classification performance of existing classifiers on nonlinear data. CBDR aims to impel classifiers to find cluster-based decision boundaries rather than class-based decision boundaries. Extensive experimentations demonstrate that CBDR dramatically boosts the classification performance of classifiers on nonlinear data, especially for linear classifiers. In the final study, a novel Gaussian mixture model (GMM) classifier is proposed, called separability criterion based GMM (SC-GMM) classifier. In SC-GMM, the separability criterion is employed to find the optimal number of Gaussian components for GMM. Experiments have been carried out on various classification tasks. Experimental results demonstrate the superiority of the SC-GMM classifier.
Date of AwardNov 2020
Original languageEnglish
SupervisorJun Liu (Supervisor), Bryan Scotney (Supervisor) & Hui Wang (Supervisor)

Keywords

  • Clustering
  • Classification
  • LDA
  • Feature extraction
  • Subclass

Cite this

'