Predicting feature imputability in the absence of ground truth

Research output: Contribution to conferencePaper

1 Downloads (Pure)

Abstract

Data imputation is the most popular method of dealing with missing values, but in most real life applications, large missing data can occur and it is difficult or impossible to evaluate whether data has been imputed accurately (lack of ground
truth). This paper addresses these issues by proposing an effective and simple principal component based method for determining whether individual data features can be accurately imputed - feature imputability. In particular, we establish a strong linear relationship between principal component loadings and feature imputability, even in the presence of extreme missingness and lack of ground truth. This work will have important implications in practical data imputation strategies.
Original languageEnglish
Number of pages5
Publication statusAccepted/In press - 2 Jul 2020
Event37th International Conference on Machine Learning (ICML): The Art of Learning with Missing Values (ARTEMISS) Workshop - Vienna, Austria
Duration: 17 Jul 202017 Jul 2020
https://artemiss-workshop.github.io/

Conference

Conference37th International Conference on Machine Learning (ICML): The Art of Learning with Missing Values (ARTEMISS) Workshop
Abbreviated titleICML 2020: ARTEMISS 2020
CountryAustria
CityVienna
Period17/07/2017/07/20
Internet address

Keywords

  • Missing data
  • data imputation
  • principal component analysis PCA
  • NIPALS
  • dementia
  • Alzheimer's disease

Cite this

McCombe, N., Ding, X., Prasad, G., Finn, D., Todd, S., McClean, P., & Wong-Lin, K. (Accepted/In press). Predicting feature imputability in the absence of ground truth. Paper presented at 37th International Conference on Machine Learning (ICML): The Art of Learning with Missing Values (ARTEMISS) Workshop, Vienna, Austria.