Predicting feature imputability in the absence of ground truth

Niamh McCombe, Xuemei Ding, Girijesh Prasad, David Finn, Stephen Todd, Paula McClean, KongFatt Wong-Lin

Research output: Contribution to conferencePaperpeer-review

41 Downloads (Pure)

Abstract

Data imputation is the most popular method of dealing with missing values, but in most real life applications, large missing data can occur and it is difficult or impossible to evaluate whether data has been imputed accurately (lack of ground
truth). This paper addresses these issues by proposing an effective and simple principal component based method for determining whether individual data features can be accurately imputed - feature imputability. In particular, we establish a strong linear relationship between principal component loadings and feature imputability, even in the presence of extreme missingness and lack of ground truth. This work will have important implications in practical data imputation strategies.
Original languageEnglish
Number of pages5
Publication statusAccepted/In press - 2 Jul 2020
Event37th International Conference on Machine Learning (ICML): The Art of Learning with Missing Values (ARTEMISS) Workshop - Vienna, Austria
Duration: 17 Jul 202017 Jul 2020
https://artemiss-workshop.github.io/

Conference

Conference37th International Conference on Machine Learning (ICML): The Art of Learning with Missing Values (ARTEMISS) Workshop
Abbreviated titleICML 2020: ARTEMISS 2020
Country/TerritoryAustria
CityVienna
Period17/07/2017/07/20
Internet address

Keywords

  • Missing data
  • data imputation
  • principal component analysis PCA
  • NIPALS
  • dementia
  • Alzheimer's disease

Fingerprint

Dive into the research topics of 'Predicting feature imputability in the absence of ground truth'. Together they form a unique fingerprint.

Cite this