Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100% for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.
LanguageEnglish
Title of host publicationUnknown Host Publication
Pages2054-2061
Number of pages8
DOIs
Publication statusAccepted/In press - 10 Oct 2017
Event2017 IEEE International Conference in Bioinformatics and Biomedicine - Kansas City, MO, USA
Duration: 10 Oct 2017 → …

Conference

Conference2017 IEEE International Conference in Bioinformatics and Biomedicine
Period10/10/17 → …

Fingerprint

Learning systems
Feature extraction
Public health
Time series
Experiments
Cyanobacteria
Economics
Processing

Keywords

  • Machine Learning
  • Cyanobacteria blooms
  • OTU (Operational Taxonomic Unit)

Cite this

@inproceedings{004416379a774ad2aab2dc7a965ef11d,
title = "Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study",
abstract = "Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100{\%} for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.",
keywords = "Machine Learning, Cyanobacteria blooms, OTU (Operational Taxonomic Unit)",
author = "JianDong Huang and Huiru Zheng and Wang, {Haiying / HY} and Xingpeng Jiang",
year = "2017",
month = "10",
day = "10",
doi = "10.1109/BIBM.2017.8217977",
language = "English",
isbn = "978-1-5090-1612-9",
pages = "2054--2061",
booktitle = "Unknown Host Publication",

}

Huang, J, Zheng, H, Wang, HHY & Jiang, X 2017, Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study. in Unknown Host Publication. pp. 2054-2061, 2017 IEEE International Conference in Bioinformatics and Biomedicine, 10/10/17. https://doi.org/10.1109/BIBM.2017.8217977

Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study. / Huang, JianDong; Zheng, Huiru; Wang, Haiying / HY; Jiang, Xingpeng.

Unknown Host Publication. 2017. p. 2054-2061.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Machine learning approaches for cyanobacteria bloom prediction using metagenomic sequence data, a case study

AU - Huang, JianDong

AU - Zheng, Huiru

AU - Wang, Haiying / HY

AU - Jiang, Xingpeng

PY - 2017/10/10

Y1 - 2017/10/10

N2 - Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100% for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.

AB - Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100% for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.

KW - Machine Learning

KW - Cyanobacteria blooms

KW - OTU (Operational Taxonomic Unit)

UR - http://ieeexplore.ieee.org/document/8217977/

UR - http://ieeexplore.ieee.org/document/8217977/

U2 - 10.1109/BIBM.2017.8217977

DO - 10.1109/BIBM.2017.8217977

M3 - Conference contribution

SN - 978-1-5090-1612-9

SP - 2054

EP - 2061

BT - Unknown Host Publication

ER -