Abstract
Vector Space Models (VSM) and neural word embeddings are core components in recent Machine Learning (ML) and Natural Language Processing (NLP) pipelines. By encoding words, sentences and documents as high-dimensional vectors via distributional semantics, they enable Information Retrieval (IR) systems to capture semantic relatedness between queries and answers. This paper compares different semantic representation strategies for query-statement matching, evaluating paraphrase identification within an IR framework using partial and syntactically varied queries of different lengths. Motivated by the Word Mover’s Distance (WMD) model, similarity is evaluated using the distance between individual words of queries and statements, as opposed to the common similarity measure of centroids of neural word embeddings. Results from ranked query and response statements demonstrate significant gains in accuracy using the combined approach of similarity ranking through WMD with the word embedding techniques. Our top-performing WMD + GloVe system consistently outperformed Doc2Vec and an LSA baseline across three return-rate thresholds, achieving 100% correct matches within the top-3 ranked results and 89.83% top-1 accuracy. Beyond the substantial gains from WMD-based similarity ranking, our results indicate that large, pre-trained word embeddings, trained on vast amounts of data, result in portable, domain-agnostic language processing solutions suitable for diverse business use cases.
| Original language | English |
|---|---|
| Pages (from-to) | 51-66 |
| Number of pages | 15 |
| Journal | Digital Technologies Research and Applications |
| Volume | 4 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published (in print/issue) - 11 Oct 2025 |
Data Availability Statement
No new data were generated or analyzed in this study. All data used are from publicly available sources citedwithin the manuscript.
Funding
This work received no external funding
Keywords
- Semantic Information Retrieval
- Word embeddings
- document similarity measure
- Query-statement Matching
- GloVe
- WMD
Fingerprint
Dive into the research topics of 'Evaluating Semantic Representation Strategies for Robust Information Retrieval Matching'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver