Abstract
Cyanobacteria bloom is a serious public health threat and a global challenge. Literature on the bloom prediction and forecasting has been accumulating and the emphasis appears to have been on the relation between the blooms and environmental factors, whilst the complexity of the bloom mechanism makes it difficult to reach adequate output of the models. Rapid development of next generation sequencing techniques provides a way in which comprehensive and quick examination of the microbial community can be achieved, especially for the bloom community structure. This facilitates using of merely the sequence data along with the machine learning techniques to predict and forecast the bloom occurrence. But there has been rare report on this theme in the literature. In this case study, machine learning approaches were applied with the metagenomic data as the only input (rather than with environmental data) to predict the Cyanobacteria blooms. k-NN classification, SVM classification and k-means clustering were applied and their efficiencies were evaluated using relevant indices. Feature selection was performed and the yielded sub datasets were worked on seriatim. In the predicting experiment with k-NN approach, the final year's data among the 8 years OTU time series were used as target data and various combination of the preceding years' data were used as predictor data; the output came with the best values of 1.00 and 100% for the evaluation indices F1 score and sensitivity, specificity, precision, and accuracy, for the 7 preceding years' predictor input, among the experiment results. This case study demonstrated the feasibility of using machine learning approaches in the Cyanobacteria bloom prediction with only metagenomic sequence data, and the importance of feature selection processing in obtaining better output of the machine learning approaches. The metagenomic data based machine learning approaches are efficient, economic, and faster, possessing the advantage and potential for being adopted as a promising means in the bloom prediction practice.
Original language | English |
---|---|
Title of host publication | Unknown Host Publication |
Publisher | IEEE |
Pages | 2054-2061 |
Number of pages | 8 |
ISBN (Print) | 978-1-5090-1612-9 |
DOIs | |
Publication status | Accepted/In press - 10 Oct 2017 |
Event | 2017 IEEE International Conference in Bioinformatics and Biomedicine - Kansas City, MO, USA Duration: 10 Oct 2017 → … |
Conference
Conference | 2017 IEEE International Conference in Bioinformatics and Biomedicine |
---|---|
Period | 10/10/17 → … |
Keywords
- Machine Learning
- Cyanobacteria blooms
- OTU (Operational Taxonomic Unit)