Abstract
Mild cognitive impairment (MCI) represents a precursor to dementia for many individuals; however, some forms of MCI tend to remain stable over time and do not progress to dementia. In fact, conversion rates vary substantially depending on the diagnostic criteria used and the nature of the analytic sample and clinical setting. To identify personalized strategies to prevent or slow the progression of dementia and to support the clinical development of novel treatments, we need to develop new approaches for modelling disease progression that can differentiate between progressive and non-progressive MCI subjects. The aim of this study was to develop a novel prognostic machine learning (ML) framework utilising longitudinal information encoded in efficient, cost-effective, and non-invasive markers to identify MCI subjects that are at risk for developing dementia. Our approach was developed using the dataset from the National Alzheimer's Coordinating Center. We built two prognostic models based on the patient data from 3 (n = 768) (Model 1) and 4 (n = 409) (Model 2) assessment visits. A novel hybrid prognostic approach, using cognitive trajectory classes, generated through unsupervised learning (Stage 1), as input in supervised ML models (Stage 2), was developed and systematically tested. Our unsupervised learning approach (Stage 1) involved: (i) the implementation of the longitudinal data partitioning method allowing for clustering trajectories based on their shapes; (ii) validation of the optimal number of clusters using three different Clustering Validity Indices (CVIs), and (iii) application of the fusion-based methods for combining CVIs into the fused normalized CVI scores, averaged for each cluster partition to determine the final number of trajectory classes for each type of clinical scores. In Stage 2, we built four types of prognostic models based on random forest (RF), Support Vector Machines (SVM), logistic regression (LR), and kNN ensemble approaches. Classification models incorporating both clinical scores and cognitive trajectory classes input showed up to 6.5 % higher accuracy than models based only on clinical scores (p < 0.05 in all cases). Given the patient data from three time points (Model 1), the highest recorded prediction accuracy was achieved for the ensemble and RF model, i.e., 85.0 % (standard deviation: 3.1 %) and 84.6 % (4.1 %) respectively. Using the patient data from four time points (Model 2), the highest accuracy was reported for RF and ensemble models, i.e., 87.5 % (6.1 %) and 86.8 % (3.7 %) respectively. We showed that the incorporation of the output of unsupervised learning significantly improved the performance of supervised ML models. Our prognostic framework can be applied to improve recruitment in clinical trials and to select early interventions for individuals at high risk of developing dementia.
Original language | English |
---|---|
Article number | 119541 |
Pages (from-to) | 1-9 |
Number of pages | 9 |
Journal | Expert Systems with Applications, online available |
Volume | 217 |
Early online date | 12 Jan 2023 |
DOIs | |
Publication status | Published (in print/issue) - 1 May 2023 |
Bibliographical note
Funding Information:The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIA-funded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P30 AG062428-01 (PI James Leverenz, MD) P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P30 AG062421-01 (PI Bradley Hyman, MD, PhD), P30 AG062422-01 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Thomas Wisniewski, MD), P30 AG013854 (PI Robert Vassar, PhD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P30 AG062429-01(PI James Brewer, MD, PhD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG053760 (PI Henry Paulson, MD, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P30 AG049638 (PI Suzanne Craft, PhD), P50 AG005136 (PI Thomas Grabowski, MD), P30 AG062715-01 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), P50 AG047270 (PI Stephen Strittmatter, MD, PhD). This work was supported by the Dr George Moore Endowment for Data Science at Ulster University and Alzheimer's Research UK. The National Alzheimer's Coordinating Center Uniform Data Set (NACC-UDS) supported by the National Institute on Aging (NIA) (grant U01AG016976) was approved by the University of Washington Institutional Review Board. Written informed consent was obtained from all study participants at the Alzheimer's Disease Research Center where they completed their study visits. Magda Bucholc: Conceptualization, Formal analysis, Investigation, Data curation, Writing – original draft, Funding acquisition. Sofya Titarenko: Conceptualization, Formal analysis, Investigation, Writing – review & editing. Callum Canavan: Validation, Writing – review & editing. Tianhua Chen: Writing – review & editing. The data sets generated and analysed during the current study are available through the publicly available National Alzheimer's Coordinating Center UDS database. The current set includes data from the June 2019 NACC data freeze (proposal nr: 1026).
Funding Information:
The National Alzheimer’s Coordinating Center Uniform Data Set (NACC-UDS) supported by the National Institute on Aging (NIA) (grant U01AG016976) was approved by the University of Washington Institutional Review Board.
Funding Information:
This work was supported by the Dr George Moore Endowment for Data Science at Ulster University and Alzheimer’s Research UK.
Funding Information:
The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIA-funded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P30 AG062428-01 (PI James Leverenz, MD) P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P30 AG062421-01 (PI Bradley Hyman, MD, PhD), P30 AG062422-01 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Thomas Wisniewski, MD), P30 AG013854 (PI Robert Vassar, PhD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P30 AG062429-01(PI James Brewer, MD, PhD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG053760 (PI Henry Paulson, MD, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P30 AG049638 (PI Suzanne Craft, PhD), P50 AG005136 (PI Thomas Grabowski, MD), P30 AG062715-01 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), P50 AG047270 (PI Stephen Strittmatter, MD, PhD).
Publisher Copyright:
© 2023
Keywords
- dementia
- mild cognitive impairment
- machine learning
- supervised learning
- unsupervised learning
- Hybrid model
- prognostic model
- longitudinal modelling
- Prognostic model
- Mild cognitive impairment
- Unsupervised learning
- Machine learning
- Longitudinal modelling
- Dementia