Abstract
Inflammatory Bowel Disease (IBD) is an umbrella term for a group of inflammatory diseases of the gastrointestinal tract, including Crohn’s Disease and ulcerative colitis. Changes to the intestinal microbiome, the community of micro-organisms that resides in the human gut, have been shown to contribute to the pathogenesis of IBD. IBD diagnosis is often delayed due its non-specific symptoms and because an invasive colonoscopy is required for confirmation, which leads to poor growth in children and worse treatment outcomes. Feature selection algorithms are often applied to microbial communities to identify bacterial groups that drive disease. It has been shown that aggregating Ensemble Feature Selection (EFS) can improve the robustness of feature selection algorithms, which is defined as the variation of feature selector output caused by small changes to the dataset. In this work we apply a two-step filter and an EFS process to generate robust feature subsets that can non-invasively predict IBD subtypes from high-resolution microbiome data. The predictive power of the robust feature subsets is the highest reported in literature to date. Furthermore, we identify five biologically plausible bacterial species that have not previously been implicated in IBD aetiology.
Original language | English |
---|---|
Pages (from-to) | 2078-2088 |
Number of pages | 11 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 16 |
Issue number | 6 |
Early online date | 30 Apr 2018 |
DOIs | |
Publication status | Published (in print/issue) - 5 Dec 2019 |
Keywords
- Robustness
- Feature extraction
- Microorganisms
- Diseases
- DNA
- Knowledge discovery