Incorporating semantic similarity into clustering process for identifying protein complexes from affinity purification/mass spectrometry data

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain 'seed clusters' consisting of bait proteins. Starting from these 'seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.

    LanguageEnglish
    Title of host publicationProceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012
    Pages437-440
    Number of pages4
    DOIs
    Publication statusPublished - 1 Dec 2012
    Event2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM2012 - Philadelphia, PA, United States
    Duration: 4 Oct 20127 Oct 2012

    Conference

    Conference2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM2012
    CountryUnited States
    CityPhiladelphia, PA
    Period4/10/127/10/12

    Fingerprint

    Semantics
    Purification
    Mass spectrometry
    Cluster Analysis
    Mass Spectrometry
    Proteins
    Seed
    Seeds
    Clustering algorithms

    Keywords

    • Affinity purification/mass spectrometry (AP-MS)
    • Gene Ontology
    • Protein compelxes
    • Protein-protein interactions
    • Semantic Similarity

    Cite this

    Cai, Bingjing ; Wang, Haiying ; Zheng, Huiru ; Wang, Hui. / Incorporating semantic similarity into clustering process for identifying protein complexes from affinity purification/mass spectrometry data. Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012. 2012. pp. 437-440
    @inproceedings{2618a636f26649cbbeb5cb5c5a98a372,
    title = "Incorporating semantic similarity into clustering process for identifying protein complexes from affinity purification/mass spectrometry data",
    abstract = "This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain 'seed clusters' consisting of bait proteins. Starting from these 'seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.",
    keywords = "Affinity purification/mass spectrometry (AP-MS), Gene Ontology, Protein compelxes, Protein-protein interactions, Semantic Similarity",
    author = "Bingjing Cai and Haiying Wang and Huiru Zheng and Hui Wang",
    year = "2012",
    month = "12",
    day = "1",
    doi = "10.1109/BIBM.2012.6392718",
    language = "English",
    isbn = "9781467325585",
    pages = "437--440",
    booktitle = "Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012",

    }

    Cai, B, Wang, H, Zheng, H & Wang, H 2012, Incorporating semantic similarity into clustering process for identifying protein complexes from affinity purification/mass spectrometry data. in Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012., 6392718, pp. 437-440, 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM2012, Philadelphia, PA, United States, 4/10/12. https://doi.org/10.1109/BIBM.2012.6392718

    Incorporating semantic similarity into clustering process for identifying protein complexes from affinity purification/mass spectrometry data. / Cai, Bingjing; Wang, Haiying; Zheng, Huiru; Wang, Hui.

    Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012. 2012. p. 437-440 6392718.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - Incorporating semantic similarity into clustering process for identifying protein complexes from affinity purification/mass spectrometry data

    AU - Cai, Bingjing

    AU - Wang, Haiying

    AU - Zheng, Huiru

    AU - Wang, Hui

    PY - 2012/12/1

    Y1 - 2012/12/1

    N2 - This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain 'seed clusters' consisting of bait proteins. Starting from these 'seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.

    AB - This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain 'seed clusters' consisting of bait proteins. Starting from these 'seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.

    KW - Affinity purification/mass spectrometry (AP-MS)

    KW - Gene Ontology

    KW - Protein compelxes

    KW - Protein-protein interactions

    KW - Semantic Similarity

    UR - http://www.scopus.com/inward/record.url?scp=84872568304&partnerID=8YFLogxK

    U2 - 10.1109/BIBM.2012.6392718

    DO - 10.1109/BIBM.2012.6392718

    M3 - Conference contribution

    SN - 9781467325585

    SP - 437

    EP - 440

    BT - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012

    ER -