Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease

Francisco Azuaje, H Zheng, Anyela Camargo, HY Wang

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms.
LanguageEnglish
Pages637-647
JournalJournal of Biomedical Informatics
Volume44
Issue number4
DOIs
Publication statusPublished - 2011

Fingerprint

Genetic Association Studies
Biomarkers
Cardiovascular Diseases
Genes
Biological Phenomena
Bioinformatics
Computational Biology
Demonstrations
Datasets
Research
Neoplasms

Keywords

  • Biomarker discovery
  • Pathway analysis
  • Gene set analysis
  • Cardiovascular diseases
  • Human heart failure
  • Disease networks
  • Translational bioinformatics

Cite this

@article{2b5c8acb538248fe91f8c57bb19347b5,
title = "Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease",
abstract = "The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms.",
keywords = "Biomarker discovery, Pathway analysis, Gene set analysis, Cardiovascular diseases, Human heart failure, Disease networks, Translational bioinformatics",
author = "Francisco Azuaje and H Zheng and Anyela Camargo and HY Wang",
year = "2011",
doi = "10.1016/j.jbi.2011.02.003",
language = "English",
volume = "44",
pages = "637--647",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Elsevier",
number = "4",

}

Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease. / Azuaje, Francisco; Zheng, H; Camargo, Anyela; Wang, HY.

In: Journal of Biomedical Informatics, Vol. 44, No. 4, 2011, p. 637-647.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease

AU - Azuaje, Francisco

AU - Zheng, H

AU - Camargo, Anyela

AU - Wang, HY

PY - 2011

Y1 - 2011

N2 - The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms.

AB - The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms.

KW - Biomarker discovery

KW - Pathway analysis

KW - Gene set analysis

KW - Cardiovascular diseases

KW - Human heart failure

KW - Disease networks

KW - Translational bioinformatics

U2 - 10.1016/j.jbi.2011.02.003

DO - 10.1016/j.jbi.2011.02.003

M3 - Article

VL - 44

SP - 637

EP - 647

JO - Journal of Biomedical Informatics

T2 - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 4

ER -