An optimization of ReliefF for classification in large datasets

Yue Huang, Paul McCullagh, Norman Black

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

ReliefF has proved to be a successful feature selector but when handling a large dataset, it is computationally expensive. We present an optimization using Supervised Model Construction which improves starter selection. Effectiveness has been evaluated using 12 UCI datasets and a clinical diabetes database. Experiments indicate that compared with ReliefF, the proposed method improved computation efficiency whilst maintaining the classification accuracy. In the clinical dataset (20,000 records with 47 features), feature selection via Supervised Model Construction (FSSMC) reduced the processing time by 80%, compared to ReliefF, and maintained accuracy for Naive Bayes, IB1 and C4.5 classifiers.
Original languageEnglish
Pages (from-to)1348-1356
JournalData & Knowledge Engineering
Volume68
Issue number11
Early online date15 Jul 2009
DOIs
Publication statusPublished - 1 Nov 2009

Keywords

  • Relief
  • Feature selection
  • Classification
  • Efficiency

Fingerprint Dive into the research topics of 'An optimization of ReliefF for classification in large datasets'. Together they form a unique fingerprint.

  • Cite this