Query Representation through Lexical Association for Information Retrieval

Pawan Goyal, Laxmidhar Behera, Martin McGinnity

    Research output: Contribution to journalArticle

    5 Citations (Scopus)

    Abstract

    A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25) and Language Model (LM). Experiments over various datasets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state-of-the-art in global query expansion.
    Original languageEnglish
    Pages (from-to)2260-2273
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume24
    Issue number12
    DOIs
    Publication statusPublished - Dec 2012

      Fingerprint

    Keywords

    • Information Retrieval
    • Lexical Association
    • Query Expansion
    • Language Model

    Cite this