P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining

Martin Swain, Cândida Silva, Nuno Loureiro-Ferreira, Vitaliy Ostropytskyy, João Brito, Olivier Riche, Frederick Stahl, Werner Dubitzky, Rui M M Brito

    Research output: Contribution to journalArticle

    5 Citations (Scopus)

    Abstract

    The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories -- this is an important and challengingaspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologiesweare developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.
    LanguageEnglish
    Pages424-433
    JournalFuture Generation Computer Systems
    Volume26
    Issue number3
    DOIs
    Publication statusPublished - Mar 2010

    Fingerprint

    Protein folding
    Data warehouses
    Data mining
    Information management

    Cite this

    Swain, M., Silva, C., Loureiro-Ferreira, N., Ostropytskyy, V., Brito, J., Riche, O., ... Brito, R. M. M. (2010). P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining. Future Generation Computer Systems, 26(3), 424-433. https://doi.org/10.1016/j.future.2009.08.008
    Swain, Martin ; Silva, Cândida ; Loureiro-Ferreira, Nuno ; Ostropytskyy, Vitaliy ; Brito, João ; Riche, Olivier ; Stahl, Frederick ; Dubitzky, Werner ; Brito, Rui M M. / P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining. In: Future Generation Computer Systems. 2010 ; Vol. 26, No. 3. pp. 424-433.
    @article{bcbb6ea20a5c466789e2c91a94184bf8,
    title = "P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining",
    abstract = "The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories -- this is an important and challengingaspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologiesweare developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.",
    author = "Martin Swain and C{\^a}ndida Silva and Nuno Loureiro-Ferreira and Vitaliy Ostropytskyy and Jo{\~a}o Brito and Olivier Riche and Frederick Stahl and Werner Dubitzky and Brito, {Rui M M}",
    year = "2010",
    month = "3",
    doi = "10.1016/j.future.2009.08.008",
    language = "English",
    volume = "26",
    pages = "424--433",
    journal = "Future Generation Computer Systems",
    issn = "0167-739X",
    publisher = "Elsevier",
    number = "3",

    }

    Swain, M, Silva, C, Loureiro-Ferreira, N, Ostropytskyy, V, Brito, J, Riche, O, Stahl, F, Dubitzky, W & Brito, RMM 2010, 'P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining', Future Generation Computer Systems, vol. 26, no. 3, pp. 424-433. https://doi.org/10.1016/j.future.2009.08.008

    P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining. / Swain, Martin; Silva, Cândida; Loureiro-Ferreira, Nuno; Ostropytskyy, Vitaliy; Brito, João; Riche, Olivier; Stahl, Frederick; Dubitzky, Werner; Brito, Rui M M.

    In: Future Generation Computer Systems, Vol. 26, No. 3, 03.2010, p. 424-433.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - P-found: Grid-enabling distributed repositories of protein folding and unfolding simulations for data mining

    AU - Swain, Martin

    AU - Silva, Cândida

    AU - Loureiro-Ferreira, Nuno

    AU - Ostropytskyy, Vitaliy

    AU - Brito, João

    AU - Riche, Olivier

    AU - Stahl, Frederick

    AU - Dubitzky, Werner

    AU - Brito, Rui M M

    PY - 2010/3

    Y1 - 2010/3

    N2 - The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories -- this is an important and challengingaspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologiesweare developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.

    AB - The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories -- this is an important and challengingaspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologiesweare developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.

    U2 - 10.1016/j.future.2009.08.008

    DO - 10.1016/j.future.2009.08.008

    M3 - Article

    VL - 26

    SP - 424

    EP - 433

    JO - Future Generation Computer Systems

    T2 - Future Generation Computer Systems

    JF - Future Generation Computer Systems

    SN - 0167-739X

    IS - 3

    ER -