Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)
124 Downloads (Pure)


Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100% for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.
Original languageEnglish
Title of host publicationUnknown Host Publication
Number of pages8
ISBN (Print)978-1-5090-1612-9
Publication statusAccepted/In press - 10 Oct 2017
Event2017 IEEE International Conference in Bioinformatics and Biomedicine - Kansas City, MO, USA
Duration: 10 Oct 2017 → …


Conference2017 IEEE International Conference in Bioinformatics and Biomedicine
Period10/10/17 → …


  • Machine Learning
  • Cyanobacteria blooms
  • OTU (Operational Taxonomic Unit)


Dive into the research topics of 'Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study'. Together they form a unique fingerprint.

Cite this