Local Partial Least Square classifier in high dimensionality classification

Weiran Song, Hui Wang, Paul Maguire, Omar Nibouche

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.
LanguageEnglish
JournalNeurocomputing
Volume0
Early online date28 Dec 2016
DOIs
Publication statusE-pub ahead of print - 28 Dec 2016

Fingerprint

Least-Squares Analysis
Classifiers
Learning algorithms
Learning systems
Spectroscopy
Spectrum Analysis
Learning

Keywords

  • High dimensionality classification
  • Distance function
  • Fractional distance
  • Local Partial Least Squares

Cite this

@article{2b042164b0b842f3936cabba590e6ea3,
title = "Local Partial Least Square classifier in high dimensionality classification",
abstract = "A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.",
keywords = "High dimensionality classification, Distance function, Fractional distance, Local Partial Least Squares",
author = "Weiran Song and Hui Wang and Paul Maguire and Omar Nibouche",
note = "Compliant in UIR; evidence uploaded to 'Other files'",
year = "2016",
month = "12",
day = "28",
doi = "10.1016/j.neucom.2016.12.053",
language = "English",
volume = "0",

}

Local Partial Least Square classifier in high dimensionality classification. / Song, Weiran; Wang, Hui; Maguire, Paul; Nibouche, Omar.

Vol. 0, 28.12.2016.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Local Partial Least Square classifier in high dimensionality classification

AU - Song, Weiran

AU - Wang, Hui

AU - Maguire, Paul

AU - Nibouche, Omar

N1 - Compliant in UIR; evidence uploaded to 'Other files'

PY - 2016/12/28

Y1 - 2016/12/28

N2 - A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.

AB - A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.

KW - High dimensionality classification

KW - Distance function

KW - Fractional distance

KW - Local Partial Least Squares

U2 - 10.1016/j.neucom.2016.12.053

DO - 10.1016/j.neucom.2016.12.053

M3 - Article

VL - 0

ER -