A Novel Neighborhood Based Document Smoothing Model for Information Retrieval

Pawan Goyal, Laxmidhar Behera, TM McGinnity

    Research output: Contribution to journalArticle

    5 Citations (Scopus)

    Abstract

    In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.
    LanguageEnglish
    Pages391-425
    JournalInformation Retrieval
    Volume16
    Issue number3
    DOIs
    Publication statusPublished - 1 Jun 2013

    Fingerprint

    Information retrieval
    information retrieval
    Vector spaces
    indexing
    divergence
    language
    experiment

    Cite this

    Goyal, Pawan ; Behera, Laxmidhar ; McGinnity, TM. / A Novel Neighborhood Based Document Smoothing Model for Information Retrieval. In: Information Retrieval. 2013 ; Vol. 16, No. 3. pp. 391-425.
    @article{179305e3e83d409e8142b0a93824228f,
    title = "A Novel Neighborhood Based Document Smoothing Model for Information Retrieval",
    abstract = "In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.",
    author = "Pawan Goyal and Laxmidhar Behera and TM McGinnity",
    year = "2013",
    month = "6",
    day = "1",
    doi = "10.1007/s10791-012-9202-3",
    language = "English",
    volume = "16",
    pages = "391--425",
    journal = "Information Retrieval",
    issn = "1386-4564",
    number = "3",

    }

    A Novel Neighborhood Based Document Smoothing Model for Information Retrieval. / Goyal, Pawan; Behera, Laxmidhar; McGinnity, TM.

    In: Information Retrieval, Vol. 16, No. 3, 01.06.2013, p. 391-425.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - A Novel Neighborhood Based Document Smoothing Model for Information Retrieval

    AU - Goyal, Pawan

    AU - Behera, Laxmidhar

    AU - McGinnity, TM

    PY - 2013/6/1

    Y1 - 2013/6/1

    N2 - In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.

    AB - In this paper, a novel neighborhood based document smoothing model for information retrieval has been proposed. Lexical association between terms is used to provide a context sensitive indexing weight to the document terms, i.e. the term weights are redistributed based on the lexical association with the context words. A generalized retrieval framework has been presented and it has been shown that the vector space model (VSM), divergence from randomness (DFR), Okapi Best Matching 25 (BM25) and the language model (LM) based retrieval frameworks are special cases of this generalized framework. Being proposed in the generalized retrieval framework, the neighborhood based document smoothing model is applicable to all the indexing models that use the term-document frequency scheme. The proposed smoothing model is as efficient as the baseline retrieval frameworks at runtime. Experiments over the TREC datasets show that the neighborhood based document smoothing model consistently improves the retrieval performance of VSM, DFR, BM25 and LM and the improvements are statistically significant.

    U2 - 10.1007/s10791-012-9202-3

    DO - 10.1007/s10791-012-9202-3

    M3 - Article

    VL - 16

    SP - 391

    EP - 425

    JO - Information Retrieval

    T2 - Information Retrieval

    JF - Information Retrieval

    SN - 1386-4564

    IS - 3

    ER -