A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods

Jyotsna Talreja Wassan, Haiying / HY Wang, Fiona Browne, Huiru Zheng

Research output: Contribution to journalArticlepeer-review

11 Citations (Scopus)
263 Downloads (Pure)


"Metagenomics" is the study of genomic sequences obtained directly from environmental microbial communities with the aim to linking their structures with functional roles. The field has been aided in the unprecedented advancement through high-throughput omics data sequencing. The outcome of sequencing are biologically rich data sets. Metagenomic data consisting of microbial spe-cies which outnumber microbial samples, lead to the "curse of dimensionality". Hence the focus in metagenomics studies has moved towards developing efficient computational models using Machine Learning (ML), reducing the computational cost. In this paper, we comprehensively assessed various ML approaches to classifying high-dimensional human microbiota effectively into their functional phenotypes. We propose the application of embedded feature selection methods, namely, Extreme Gradient Boost-ing and Penalized Logistic Regression to determine important species. The resultant feature set enhanced the performance of one of the most popular state-of-the-art methods, Random Forest (RF) over metagenomic studies. Experimental results indicate that the proposed method achieved best results in terms of accuracy, area under Receiver Operating Characteristic curve (ROC-AUC) and major improvement in processing time. It outperformed other feature selection methods of filters or wrappers over RF and classifiers such as Support Vector Machine (SVM), Extreme Learning Machine (ELM), and k -Nearest Neighbors ( k -NN).
Original languageEnglish
Pages (from-to)751-763
Number of pages14
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Issue number3
Early online date23 Jul 2018
Publication statusPublished (in print/issue) - 2018


  • Metagenomics
  • Microbiota
  • Embedded Feature Selection
  • OperationalTaxonomicUnits(OTUs)
  • Classification


Dive into the research topics of 'A Comprehensive Study on Predicting Functional Role of Metagenomes Using Machine Learning Methods'. Together they form a unique fingerprint.

Cite this