Abstract
Background: The treatment of comorbidities remains costly and represents a major priority in Evidence Based Medicine (EBM). Determining genetically the molecular-subclasses of proinflammatory comorbid conditions is important to stratify patients that may more effectively respond to specific treatment interventions. The objective of this study is to develop a Machine Learning (ML) based classifier to stratify patients with Type-2-Diabetes and different comorbidities.
Methods: A preliminary dataset of samples from 254 people with Type-2 Diabetes recruited at NICSM were genotyped with an Affymetrix UKBioBank Axiom Array. SNP results for 80 patient samples of class DCM1 (i.e. Type-2 Diabetes associated with comorbidities of circulatory system) and 90 patient samples of class DCM2 (i.e. Type-2-Diabetes associated with comorbidities of digestive system) were filtered through feature selection using ANOVA, Chi-square and Fast Correlation Based Filter. The top10 SNPs along with information from Electronic Care Records (ECR), were selected for building 5 ML binary classifiers, using Support Vector Machine, Random Forest, Artificial Neural Network, Decision Tree and Naive Bayes algorithms, and their performances were tested with a 10-fold cross validation.
Results: Of the 5 classifiers, the Naive Bayes algorithm outperformed all others with an Area under the Curve score of 0.681, overall Classification Accuracy of 65.68% and Mathews Correlation Coefficient of 0.316.
Conclusion: Further improvement in the performance of our ML classifier is currently in progress. With the inclusion of further data from ECR, as well as data from public repositories, we hope to build a better classifier.
Methods: A preliminary dataset of samples from 254 people with Type-2 Diabetes recruited at NICSM were genotyped with an Affymetrix UKBioBank Axiom Array. SNP results for 80 patient samples of class DCM1 (i.e. Type-2 Diabetes associated with comorbidities of circulatory system) and 90 patient samples of class DCM2 (i.e. Type-2-Diabetes associated with comorbidities of digestive system) were filtered through feature selection using ANOVA, Chi-square and Fast Correlation Based Filter. The top10 SNPs along with information from Electronic Care Records (ECR), were selected for building 5 ML binary classifiers, using Support Vector Machine, Random Forest, Artificial Neural Network, Decision Tree and Naive Bayes algorithms, and their performances were tested with a 10-fold cross validation.
Results: Of the 5 classifiers, the Naive Bayes algorithm outperformed all others with an Area under the Curve score of 0.681, overall Classification Accuracy of 65.68% and Mathews Correlation Coefficient of 0.316.
Conclusion: Further improvement in the performance of our ML classifier is currently in progress. With the inclusion of further data from ECR, as well as data from public repositories, we hope to build a better classifier.
Original language | English |
---|---|
Title of host publication | 21st Meeting of the Irish Society of Human Genetics |
Publisher | Ulster Medical Journal |
Pages | 70 |
Volume | 88(1) |
Publication status | Published (in print/issue) - 22 Jan 2019 |
Event | 21st Meeting of the Irish Society of Human Genetics - Dublin, Ireland Duration: 21 Sept 2018 → 21 Sept 2018 Conference number: 21 |
Conference
Conference | 21st Meeting of the Irish Society of Human Genetics |
---|---|
Country/Territory | Ireland |
City | Dublin |
Period | 21/09/18 → 21/09/18 |
Keywords
- Machine Learning
- Microarrays
- Personalised Medicine
- Type 2 diabetes
- Comorbidities
- Multimorbidity
- Genomics
- Bioinformatics