Abstract
Recent advances in high-throughput sequencing technologies have accelerated microbiome studies by profiling 16S rRNA genes present in microbial species. Identifying, analyzing, and targeting such microbial composition is important to provide an enriched analysis of microbial samples. In this paper, we propose a novel phylogeny and abundance aware machine learning modelling approach (PAAM-ML) for classifying microbial samples into their respective functional phenotypes. The approach integrates abundance count of microbial species as well as relationships between them, which are encoded in their phylogenetic tree of life. It incorporates the underlying structural tree information into the abundance of microbial species (features) to create a phylogeny and abundance aware matrix structure (PAAM). The matrix is then used as input for machine learning (ML) models for microbiome classification. We compared the classification performance of PAAM-ML with state-of-the-art approaches using Phylogenetic Isometric Log-Ratio Transform (PhILR) and MetaPhyl using three use cases. PAAM-ML significantly improved the performance. It outperformed PhILR with ptextbf<0.01 in Human Microbiome across 4 body sites. We also performed a comprehensive analysis of the proposed approach by applying feature engineering. Our experimental results indicate significant classification performance, for example, the highest accuracy of 0.977 and Mathews Correlation Coefficient of 0.961 was achieved when applying Random Forest and feature engineering over the PAAM associated with Human Microbiome.
Original language | English |
---|---|
Title of host publication | 2018 IEEE International Conference on Bioinformatics and Biomedicine |
Publisher | IEEE |
Pages | 44-49 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5386-5488-0, 978-1-5386-5487-3 |
ISBN (Print) | 978-1-5386-5489-7 |
DOIs | |
Publication status | Published (in print/issue) - 3 Dec 2018 |
Event | 2018 IEEE International Conference on Bioinformatics and Biomedicine - Madrid, Spain Duration: 3 Dec 2018 → 6 Dec 2018 http://orienta.ugr.es/bibm2018/ |
Conference
Conference | 2018 IEEE International Conference on Bioinformatics and Biomedicine |
---|---|
Abbreviated title | BIBM2018 |
Country/Territory | Spain |
City | Madrid |
Period | 3/12/18 → 6/12/18 |
Internet address |
Keywords
- Classification
- machine Learning
- metagenomics
- operational Taxonomic Units (OTUs)
- phylogeny