Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.
|Journal||IEEE Transactions on Knowledge and Data Engineering|
|Publication status||Published - Jul 2003|
Li, Y., Bandar, Z., & McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871-882. http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/trans/tk/&toc=comp/trans/tk/2003/04/k4toc.xml&DOI=10.1109/TKDE.2003.1209005