Knowledge-driven data integration for the prediction of protein-protein interaction networks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Information about the networks of protein interactions within a cell can greatly increase our understanding of protein function and cellular processes. The advent of high- throughput experimental techniques, as well as large-scale computational prediction models, has greatly increased the volume of genomic data aimed at discovering the functionality of genes and proteins. Individually, these data should be viewed with caution as they are often inaccurate and incomplete. Recent years have seen a growing trend towards the adoption of diverse data integration techniques to support large-scale analysis of protein-protein interaction (PPI) networks. However, current research is mainly focused on data-driven approaches. This paper proposes a knowledge-driven computational framework to support systems-level data integration for the prediction of PPI networks. Based on the incorporation of prior knowledge of the relationship between different “omic” datasets, different likelihood-ratio-based Bayesian models (LR-NB) have been developed to combine the evidence from diverse sources, ranging from co- expression to essentiality to formulate PPI predictions. We demonstrate improvements in the PPI prediction performance. Results are evaluated against Gold Standards, which were derived from the MIPS Complex Catalogue (Saccharomyces cerevisiae) and the Human Protein Reference Database. We also implement a novel analysis of local regions of a Receiver Operating Characteristic curve as less biased and more exact approach to assessing the quality of prediction models. This investigation also provides the basis for new PPI network inference and analysis applications in other model organisms and in specific diseases.
LanguageEnglish
Title of host publicationUnknown Host Publication
Number of pages1
Publication statusPublished - Jul 2008
Event16th Annual International Conference Intelligent Systems for Molecular Biology -
Duration: 1 Jul 2008 → …

Conference

Conference16th Annual International Conference Intelligent Systems for Molecular Biology
Period1/07/08 → …

Fingerprint

Data integration
Proteins
Yeast

Cite this

@inproceedings{873e4929d269436b8bbfcf9292d5d3c2,
title = "Knowledge-driven data integration for the prediction of protein-protein interaction networks",
abstract = "Information about the networks of protein interactions within a cell can greatly increase our understanding of protein function and cellular processes. The advent of high- throughput experimental techniques, as well as large-scale computational prediction models, has greatly increased the volume of genomic data aimed at discovering the functionality of genes and proteins. Individually, these data should be viewed with caution as they are often inaccurate and incomplete. Recent years have seen a growing trend towards the adoption of diverse data integration techniques to support large-scale analysis of protein-protein interaction (PPI) networks. However, current research is mainly focused on data-driven approaches. This paper proposes a knowledge-driven computational framework to support systems-level data integration for the prediction of PPI networks. Based on the incorporation of prior knowledge of the relationship between different “omic” datasets, different likelihood-ratio-based Bayesian models (LR-NB) have been developed to combine the evidence from diverse sources, ranging from co- expression to essentiality to formulate PPI predictions. We demonstrate improvements in the PPI prediction performance. Results are evaluated against Gold Standards, which were derived from the MIPS Complex Catalogue (Saccharomyces cerevisiae) and the Human Protein Reference Database. We also implement a novel analysis of local regions of a Receiver Operating Characteristic curve as less biased and more exact approach to assessing the quality of prediction models. This investigation also provides the basis for new PPI network inference and analysis applications in other model organisms and in specific diseases.",
author = "Huiru Zheng and Fiona Browne and Haiying Wang and Francisco Azuaje",
year = "2008",
month = "7",
language = "English",
booktitle = "Unknown Host Publication",

}

Zheng, H, Browne, F, Wang, H & Azuaje, F 2008, Knowledge-driven data integration for the prediction of protein-protein interaction networks. in Unknown Host Publication. 16th Annual International Conference Intelligent Systems for Molecular Biology, 1/07/08.

Knowledge-driven data integration for the prediction of protein-protein interaction networks. / Zheng, Huiru; Browne, Fiona; Wang, Haiying; Azuaje, Francisco.

Unknown Host Publication. 2008.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Knowledge-driven data integration for the prediction of protein-protein interaction networks

AU - Zheng, Huiru

AU - Browne, Fiona

AU - Wang, Haiying

AU - Azuaje, Francisco

PY - 2008/7

Y1 - 2008/7

N2 - Information about the networks of protein interactions within a cell can greatly increase our understanding of protein function and cellular processes. The advent of high- throughput experimental techniques, as well as large-scale computational prediction models, has greatly increased the volume of genomic data aimed at discovering the functionality of genes and proteins. Individually, these data should be viewed with caution as they are often inaccurate and incomplete. Recent years have seen a growing trend towards the adoption of diverse data integration techniques to support large-scale analysis of protein-protein interaction (PPI) networks. However, current research is mainly focused on data-driven approaches. This paper proposes a knowledge-driven computational framework to support systems-level data integration for the prediction of PPI networks. Based on the incorporation of prior knowledge of the relationship between different “omic” datasets, different likelihood-ratio-based Bayesian models (LR-NB) have been developed to combine the evidence from diverse sources, ranging from co- expression to essentiality to formulate PPI predictions. We demonstrate improvements in the PPI prediction performance. Results are evaluated against Gold Standards, which were derived from the MIPS Complex Catalogue (Saccharomyces cerevisiae) and the Human Protein Reference Database. We also implement a novel analysis of local regions of a Receiver Operating Characteristic curve as less biased and more exact approach to assessing the quality of prediction models. This investigation also provides the basis for new PPI network inference and analysis applications in other model organisms and in specific diseases.

AB - Information about the networks of protein interactions within a cell can greatly increase our understanding of protein function and cellular processes. The advent of high- throughput experimental techniques, as well as large-scale computational prediction models, has greatly increased the volume of genomic data aimed at discovering the functionality of genes and proteins. Individually, these data should be viewed with caution as they are often inaccurate and incomplete. Recent years have seen a growing trend towards the adoption of diverse data integration techniques to support large-scale analysis of protein-protein interaction (PPI) networks. However, current research is mainly focused on data-driven approaches. This paper proposes a knowledge-driven computational framework to support systems-level data integration for the prediction of PPI networks. Based on the incorporation of prior knowledge of the relationship between different “omic” datasets, different likelihood-ratio-based Bayesian models (LR-NB) have been developed to combine the evidence from diverse sources, ranging from co- expression to essentiality to formulate PPI predictions. We demonstrate improvements in the PPI prediction performance. Results are evaluated against Gold Standards, which were derived from the MIPS Complex Catalogue (Saccharomyces cerevisiae) and the Human Protein Reference Database. We also implement a novel analysis of local regions of a Receiver Operating Characteristic curve as less biased and more exact approach to assessing the quality of prediction models. This investigation also provides the basis for new PPI network inference and analysis applications in other model organisms and in specific diseases.

M3 - Conference contribution

BT - Unknown Host Publication

ER -