Abstract
Background
Increasing senor people suffer from Alzheimer’s disease (AD), and it is significant to make accurate and early diagnosis for treatment and improvement of life quality. Many computer-aided systems have been widely used to classify dementia. While clinical data usually contain missing data, which may have an adverse effect on classification accuracy. In order to obtain more convincing classifiers, this work handled missing data using four computational algorithms and evaluated if the classification accuracy is improved with imputed data.
Materials & Methods
The initial dataset was collected from a local hospital, containing 185 healthy control samples and 187 AD samples. We randomly replaced 20%, 40% and 60% of complete data with missing value from each feature. The mode substitution, K-nearest neighbours (KNN), multiple imputation (MI) and random forest (RF) algorithms were used to impute missing data. The RF classification approach and J48 algorithm were applied on original complete data and the complete data with imputing value.
Results
The RF algorithm performed best for imputing missing value. The R squared declines with the increasing of the number of missing value. With 40% missing values decreasing to 20% in each feature, the R squared goes from 0.897 down to 0.892. Additionally, the overall classification accuracy was improved with imputed data. The performance of RF classification method, with 89.52% for classification accuracy, 0.954 for AUC and 0.790 for Kappa value, is better than that of J48 (88.63%, 0.944, and 0.718 respectively) for AD classification.
Conclusion
Imputing missing data properly can offer more samples to refine the classifiers with respect to AD diagnosis, which in turn improves the classification accuracy. Future extensions will work on more imputation methods and classifiers.
Increasing senor people suffer from Alzheimer’s disease (AD), and it is significant to make accurate and early diagnosis for treatment and improvement of life quality. Many computer-aided systems have been widely used to classify dementia. While clinical data usually contain missing data, which may have an adverse effect on classification accuracy. In order to obtain more convincing classifiers, this work handled missing data using four computational algorithms and evaluated if the classification accuracy is improved with imputed data.
Materials & Methods
The initial dataset was collected from a local hospital, containing 185 healthy control samples and 187 AD samples. We randomly replaced 20%, 40% and 60% of complete data with missing value from each feature. The mode substitution, K-nearest neighbours (KNN), multiple imputation (MI) and random forest (RF) algorithms were used to impute missing data. The RF classification approach and J48 algorithm were applied on original complete data and the complete data with imputing value.
Results
The RF algorithm performed best for imputing missing value. The R squared declines with the increasing of the number of missing value. With 40% missing values decreasing to 20% in each feature, the R squared goes from 0.897 down to 0.892. Additionally, the overall classification accuracy was improved with imputed data. The performance of RF classification method, with 89.52% for classification accuracy, 0.954 for AUC and 0.790 for Kappa value, is better than that of J48 (88.63%, 0.944, and 0.718 respectively) for AD classification.
Conclusion
Imputing missing data properly can offer more samples to refine the classifiers with respect to AD diagnosis, which in turn improves the classification accuracy. Future extensions will work on more imputation methods and classifiers.
Original language | English |
---|---|
Publication status | Published (in print/issue) - 3 Sept 2019 |
Event | TMED 10 - Londonderry, Londonderry, United Kingdom Duration: 11 Sept 2019 → … |
Conference
Conference | TMED 10 |
---|---|
Country/Territory | United Kingdom |
City | Londonderry |
Period | 11/09/19 → … |