TY - JOUR
T1 - A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods
AU - Wassan, Jyotsna Talreja
AU - Wang, Haiying / HY
AU - Browne, Fiona
AU - Zheng, Huiru
PY - 2018
Y1 - 2018
N2 - "Metagenomics" is the study of genomic sequences obtained directly from environmental microbial communities with the aim to linking their structures with functional roles. The field has been aided in the unprecedented advancement through high-throughput omics data sequencing. The outcome of sequencing are biologically rich data sets. Metagenomic data consisting of microbial spe-cies which outnumber microbial samples, lead to the "curse of dimensionality". Hence the focus in metagenomics studies has moved towards developing efficient computational models using Machine Learning (ML), reducing the computational cost. In this paper, we comprehensively assessed various ML approaches to classifying high-dimensional human microbiota effectively into their functional phenotypes. We propose the application of embedded feature selection methods, namely, Extreme Gradient Boost-ing and Penalized Logistic Regression to determine important species. The resultant feature set enhanced the performance of one of the most popular state-of-the-art methods, Random Forest (RF) over metagenomic studies. Experimental results indicate that the proposed method achieved best results in terms of accuracy, area under Receiver Operating Characteristic curve (ROC-AUC) and major improvement in processing time. It outperformed other feature selection methods of filters or wrappers over RF and classifiers such as Support Vector Machine (SVM), Extreme Learning Machine (ELM), and k -Nearest Neighbors ( k -NN).
AB - "Metagenomics" is the study of genomic sequences obtained directly from environmental microbial communities with the aim to linking their structures with functional roles. The field has been aided in the unprecedented advancement through high-throughput omics data sequencing. The outcome of sequencing are biologically rich data sets. Metagenomic data consisting of microbial spe-cies which outnumber microbial samples, lead to the "curse of dimensionality". Hence the focus in metagenomics studies has moved towards developing efficient computational models using Machine Learning (ML), reducing the computational cost. In this paper, we comprehensively assessed various ML approaches to classifying high-dimensional human microbiota effectively into their functional phenotypes. We propose the application of embedded feature selection methods, namely, Extreme Gradient Boost-ing and Penalized Logistic Regression to determine important species. The resultant feature set enhanced the performance of one of the most popular state-of-the-art methods, Random Forest (RF) over metagenomic studies. Experimental results indicate that the proposed method achieved best results in terms of accuracy, area under Receiver Operating Characteristic curve (ROC-AUC) and major improvement in processing time. It outperformed other feature selection methods of filters or wrappers over RF and classifiers such as Support Vector Machine (SVM), Extreme Learning Machine (ELM), and k -Nearest Neighbors ( k -NN).
KW - Metagenomics
KW - Microbiota
KW - Embedded Feature Selection
KW - OperationalTaxonomicUnits(OTUs)
KW - Classification
UR - https://pure.ulster.ac.uk/en/publications/a-comprehensive-study-on-predicting-functional-role-of-metagenome
U2 - A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods
DO - A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods
M3 - Article
SN - 1557-9964
VL - 16
SP - 751
EP - 763
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 3
ER -