Imputation of missing data can improve the classification of dementia severity

Research output: Contribution to conferencePosterpeer-review


Background: Accurate diagnosis is crucial to the treatment and management of Alzheimer’s disease (AD). However, clinical data can be incomplete or inconsistent and the resultant “missing data” can affect computational algorithms seeking to objectively identify the disease severity level. In this work, we employed several computational methods to impute missing data, and tested whether the imputed data can lead to improved classification of cognitive impairment level. Material & Methods: We used the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, focusing on cognitive/functional assessments, as they are often used in clinical decision making. We performed independent simulations in which portions of the values were randomly removed in various systematic ways, reflecting their possible underlying factors. Then multiple missing-data imputation methods were performed including mean/median/mode substitution, multiple imputation (MI), k-nearest neighbours (k-NN), and random forest (RF) algorithms. The effect of the imputed values on the accuracy of predictive models was evaluated using a support vector machine classification algorithm with respect to Clinical Dementia Rating Sum of Boxes (CDRSB). Results: In general, the RF algorithm provides the best method for the missing data conditions. The performance of each method decreases with more missing data. With 20% of data missing, the RF algorithm is the best with R2 of 0.796±0.016 and RMSE of 5.576, with accuracy rising to R2 of 0.854±0.009 and RMSE of 2.090 in the 10% condition. Regarding classification of CDRSB, the accuracy using the linear SVM model is 0.77 (95%CI 0.746-0.794) in the unmodified dataset, 0.669 (95%CI 0.577-0.753) with 20% missing data, and 0.732 (95%CI 0.705-0.757) with RF-imputed data. Conclusions: Overall, computational methods for missing data imputation can offer more value to existing imperfect AD data, through improving the classification accuracy of cognitive impairment level. Further work will investigate missing data in actual clinical datasets and in a more comprehensive way.
Original languageEnglish
Number of pages1
Publication statusPublished (in print/issue) - 12 Sept 2018
EventTMED 9 Conference - City Hotel, Derry, United Kingdom
Duration: 12 Sept 2018 → …


ConferenceTMED 9 Conference
Country/TerritoryUnited Kingdom
Period12/09/18 → …


  • Missing data
  • Data imputation
  • dementia
  • Alzheimer's disease AD


Dive into the research topics of 'Imputation of missing data can improve the classification of dementia severity'. Together they form a unique fingerprint.

Cite this