An approach for measuring semantic similarity between words using multiple information sources

Yuhua Li, Zuhair Bandar, David McLean

    Research output: Contribution to journalArticle

    797 Citations (Scopus)

    Abstract

    Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.
    LanguageEnglish
    Pages871-882
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume15
    Issue number4
    Publication statusPublished - Jul 2003

    Fingerprint

    Semantics
    Computational linguistics
    Taxonomies
    Artificial intelligence

    Cite this

    @article{74ec2f2034f146d6b6229a1bf29b61af,
    title = "An approach for measuring semantic similarity between words using multiple information sources",
    abstract = "Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.",
    author = "Yuhua Li and Zuhair Bandar and David McLean",
    note = "This paper rigorously investigates the contributions of different information sources to similarity between words. It presents word similarity measures by nonlinearly combining structural semantic information from lexical taxonomy and information content from corpus. Our approach outperforms previously published measures: best published correlation against the benchmark set of word pairs of Rubenstein-Goodenough's human similarity ratings has been 0.8484, whilst ours is 0.8914. The paper has been cited over 70 times (SCI) and 300 times (Google Scholar) as of Jan 2010, selected as advanced reading material in CIS526 Machine Learning, Temple University, Philadelphia, and adopted by other researchers in real system developments, e.g., Bibster - a semantics-based bibliographic peer-to-peer system.",
    year = "2003",
    month = "7",
    language = "English",
    volume = "15",
    pages = "871--882",
    journal = "IEEE Transactions on Knowledge and Data Engineering",
    issn = "1041-4347",
    number = "4",

    }

    An approach for measuring semantic similarity between words using multiple information sources. / Li, Yuhua; Bandar, Zuhair; McLean, David.

    In: IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 4, 07.2003, p. 871-882.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - An approach for measuring semantic similarity between words using multiple information sources

    AU - Li, Yuhua

    AU - Bandar, Zuhair

    AU - McLean, David

    N1 - This paper rigorously investigates the contributions of different information sources to similarity between words. It presents word similarity measures by nonlinearly combining structural semantic information from lexical taxonomy and information content from corpus. Our approach outperforms previously published measures: best published correlation against the benchmark set of word pairs of Rubenstein-Goodenough's human similarity ratings has been 0.8484, whilst ours is 0.8914. The paper has been cited over 70 times (SCI) and 300 times (Google Scholar) as of Jan 2010, selected as advanced reading material in CIS526 Machine Learning, Temple University, Philadelphia, and adopted by other researchers in real system developments, e.g., Bibster - a semantics-based bibliographic peer-to-peer system.

    PY - 2003/7

    Y1 - 2003/7

    N2 - Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.

    AB - Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.

    UR - http://computer.org/tkde/

    M3 - Article

    VL - 15

    SP - 871

    EP - 882

    JO - IEEE Transactions on Knowledge and Data Engineering

    T2 - IEEE Transactions on Knowledge and Data Engineering

    JF - IEEE Transactions on Knowledge and Data Engineering

    SN - 1041-4347

    IS - 4

    ER -