Analyzing Large Microbiome Datasets Using Machine Learning and Big Data

Thomas Krause, Jyotsna Talreja Wassan, Paul Mc Kevitt, Haiying Wang, Huiru Zheng, Matthias Hemmje

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)
119 Downloads (Pure)


Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw data, which is a challenge for data processing but also an opportunity for advanced machine learning methods like deep learning that require large datasets. However, in contrast to classical machine learning algorithms, the use of deep learning in metagenomics is still an exception. Regardless of the algorithms used, they are usually not applied to raw data but require several preprocessing steps. Performing this preprocessing and the actual analysis in an automated, reproducible, and scalable way is another challenge. This and other challenges can be addressed by adjusting known big data methods and architectures to the needs of microbiome analysis and DNA sequence processing. A conceptual architecture for the use of machine learning and big data on metagenomic data sets was recently presented and initially validated to analyze the rumen microbiome. The same architecture can be used for clinical purposes as is discussed in this paper.
Original languageEnglish
Pages (from-to)138-165
Number of pages28
Issue number3
Early online date8 Nov 2021
Publication statusPublished online - 8 Nov 2021


  • machine learning
  • deep learning
  • big data
  • metagenomics


Dive into the research topics of 'Analyzing Large Microbiome Datasets Using Machine Learning and Big Data'. Together they form a unique fingerprint.

Cite this