Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications.

Daniel Berrar, Catherine Hack, Werner Dubitzky

    Research output: Contribution to journalArticle

    22 Citations (Scopus)

    Abstract

    Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.
    LanguageEnglish
    Pages31-52
    JournalCritical Reviews in Biotechnology
    Volume25
    Issue number1-2
    Publication statusPublished - 2005

    Fingerprint

    Biotechnology
    Information Storage and Retrieval
    Data Mining
    Knowledge Bases
    Research
    Decision Making
    Language
    Technology

    Cite this

    @article{4c29835dbb2248fab9dd89105dbed03c,
    title = "Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications.",
    abstract = "Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.",
    author = "Daniel Berrar and Catherine Hack and Werner Dubitzky",
    year = "2005",
    language = "English",
    volume = "25",
    pages = "31--52",
    journal = "Critical Reviews in Biotechnology",
    issn = "0738-8551",
    number = "1-2",

    }

    Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications. / Berrar, Daniel; Hack, Catherine; Dubitzky, Werner.

    In: Critical Reviews in Biotechnology, Vol. 25, No. 1-2, 2005, p. 31-52.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Knowledge discovery in biology and biotechnology texts: a review of techniques, evaluation strategies, and applications.

    AU - Berrar, Daniel

    AU - Hack, Catherine

    AU - Dubitzky, Werner

    PY - 2005

    Y1 - 2005

    N2 - Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.

    AB - Arguably, the richest source of knowledge (as opposed to fact and data collections) about biology and biotechnology is captured in natural-language documents such as technical reports, conference proceedings and research articles. The automatic exploitation of this rich knowledge base for decision making, hypothesis management (generation and testing) and knowledge discovery constitutes a formidable challenge. Recently, a set of technologies collectively referred to as knowledge discovery in text (KDT) has been advocated as a promising approach to tackle this challenge. KDT comprises three main tasks: information retrieval, information extraction and text mining. These tasks are the focus of much recent scientific research and many algorithms have been developed and applied to documents and text in biology and biotechnology. This article introduces the basic concepts of KDT, provides an overview of some of these efforts in the field of bioscience and biotechnology, and presents a framework of commonly used techniques for evaluating KDT methods, tools and systems.

    M3 - Article

    VL - 25

    SP - 31

    EP - 52

    JO - Critical Reviews in Biotechnology

    T2 - Critical Reviews in Biotechnology

    JF - Critical Reviews in Biotechnology

    SN - 0738-8551

    IS - 1-2

    ER -