Integrative Data Analysis for the Prediction of Metagenomic Functions

  • Jyotsna Talreja Wassan

Student thesis: Doctoral Thesis


The emergence of High-throughput sequencing (HTS) techniques has revolutionised the field of “Metagenomics” which deals with studying the genomic structure and function of uncultured microbial communities in an ecosystem. The field helps in understanding the composition, diversity and functioning of complex microbial communities. The outcome of sequencing is large, complex, heterogeneous, sparse and biologically rich metagenomic datasets. The unprecedented advances in sequencing have necessitated the development of computational methods for analysing such data, thereby reducing the computational costs and increasing the predictive performance of methods. This thesis has applied Machine Learning (ML) techniques to address the task of computationally inferring functions associated with the genes present in microbial communities (in humans, cattle and soil). The aim of this research is twofold, dealing with investigating, developing, and evaluating ML classification approaches for: (i) abundance-driven analyses, and; (ii) phylogeny-driven analyses of microbial genomes in an integrative way. The current thesis has utilized embedded ML techniques to detect and classify microbiome into functions dealing with its high-dimensional and sparse nature and informing the development of a new abundance-driven framework (Chapter 4). The integrative approaches take advantage of the biological evolutionary characteristics (i.e. phylogeny). Phylogenetically similar microbial species could share similar characteristics and henceforth similar functional traits. The novel integrative approaches involving modelling over phylogeny and abundance profiles are proposed to predict metagenomic functions effectively with a key idea of integration of phylogeny at either at the data pre-processing level as a precursor to ML model (Chapter 5) or in an ML model itself (Chapter 6). An additional case study involving the prediction of functions in cattle microbial genes have been presented in this thesis (linked to MetaPlat1, European Commission Project) (Chapter 7). The thesis includes key contributions, novel findings, limitations in the current context and future work with a summary.
Date of AwardMar 2020
Original languageEnglish
SponsorsEuropean Union (EU): H2020-MSCA-RISE Programme
SupervisorFiona Browne (Supervisor), Haiying Wang (Supervisor) & Huiru (Jane) Zheng (Supervisor)


  • Metagenomics
  • Machine Learning
  • Phylogeny
  • Classification

Cite this