Nearest clusters based partial least squares discriminant analysis for the classification of spectral data

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time.
LanguageEnglish
Pages27-38
JournalAnalytica Chimica Acta
Volume1009
Early online date6 Feb 2018
DOIs
Publication statusPublished - 7 Jun 2018

Fingerprint

Discriminant Analysis
Discriminant analysis
discriminant analysis
Least-Squares Analysis
multivariate analysis
nonlinearity
Cluster Analysis
method
Multivariate Analysis

Keywords

  • Partial Least Squares
  • Clustering
  • Nonlinearity
  • Multimodality
  • Spectral pattern recognition.

Cite this

@article{3670ea0a895d4b4a853f49f125bcf22e,
title = "Nearest clusters based partial least squares discriminant analysis for the classification of spectral data",
abstract = "Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time.",
keywords = "Partial Least Squares, Clustering, Nonlinearity, Multimodality, Spectral pattern recognition.",
author = "Weiran Song and Hui Wang and Paul Maguire and Omar Nibouche",
note = "Compliant in UIR; evidence uploaded in 'Other files'",
year = "2018",
month = "6",
day = "7",
doi = "10.1016/j.aca.2018.01.023",
language = "English",
volume = "1009",
pages = "27--38",
journal = "Analytica Chimica Acta",
issn = "0003-2670",
publisher = "Elsevier",

}

TY - JOUR

T1 - Nearest clusters based partial least squares discriminant analysis for the classification of spectral data

AU - Song, Weiran

AU - Wang, Hui

AU - Maguire, Paul

AU - Nibouche, Omar

N1 - Compliant in UIR; evidence uploaded in 'Other files'

PY - 2018/6/7

Y1 - 2018/6/7

N2 - Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time.

AB - Partial Least Squares Discriminant Analysis (PLS-DA) is one of the most effective multivariate analysis methods for spectral data analysis, which extracts latent variables and uses them to predict responses. In particular, it is an effective method for handling high-dimensional and collinear spectral data. However, PLS-DA does not explicitly address data multimodality, i.e., within-class multimodal distribution of data. In this paper, we present a novel method termed nearest clusters based PLS-DA (NCPLS-DA) for addressing the multimodality and nonlinearity issues explicitly and improving the performance of PLS-DA on spectral data classification. The new method applies hierarchical clustering to divide samples into clusters and calculates the corresponding centre of every cluster. For a given query point, only clusters whose centres are nearest to such a query point are used for PLS-DA. Such a method can provide a simple and effective tool for separating multimodal and nonlinear classes into clusters which are locally linear and unimodal. Experimental results on 17 datasets, including 12 UCI and 5 spectral datasets, show that NCPLS-DA can outperform 4 baseline methods, namely, PLS-DA, kernel PLS-DA, local PLS-DA and k-NN, achieving the highest classification accuracy most of the time.

KW - Partial Least Squares

KW - Clustering

KW - Nonlinearity

KW - Multimodality

KW - Spectral pattern recognition.

UR - https://www.sciencedirect.com/science/article/pii/S0003267018300886?via%3Dihub

U2 - 10.1016/j.aca.2018.01.023

DO - 10.1016/j.aca.2018.01.023

M3 - Article

VL - 1009

SP - 27

EP - 38

JO - Analytica Chimica Acta

T2 - Analytica Chimica Acta

JF - Analytica Chimica Acta

SN - 0003-2670

ER -