Abstract
Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.
Original language | English |
---|---|
Pages (from-to) | 871-882 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 15 |
Issue number | 4 |
Publication status | Published (in print/issue) - Jul 2003 |