A Context-based Word Indexing Model for Document Summarization

Pawan Goyal, Laxmidhar Behera, TM McGinnity

    Research output: Contribution to journalArticle

    31 Citations (Scopus)

    Abstract

    Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context sensitive document indexing model based on the Bernoulli model of randomness. The Bernoulli model of randomness has been used to find the probability of the cooccurrences of two terms in a large corpus. A new approach using the lexical association between terms to give a context sensitive weight to the document terms has been proposed. The resulting indexing weights are used to compute the sentence similarity matrix. The proposed sentence similarity measure has been used with the baseline graph-based ranking models for sentence extraction. Experiments have been conducted over the benchmark DUC data sets and it has been shown that the proposed Bernoulli-based sentence similarity model provides consistent improvements over the baseline IntraLink and UniformLink methods.
    LanguageEnglish
    Pages1693-1705
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume25
    Issue number8
    DOIs
    Publication statusPublished - Aug 2013

    Fingerprint

    Experiments

    Cite this

    @article{9b76980604ce409498b414ab747cd5cc,
    title = "A Context-based Word Indexing Model for Document Summarization",
    abstract = "Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context sensitive document indexing model based on the Bernoulli model of randomness. The Bernoulli model of randomness has been used to find the probability of the cooccurrences of two terms in a large corpus. A new approach using the lexical association between terms to give a context sensitive weight to the document terms has been proposed. The resulting indexing weights are used to compute the sentence similarity matrix. The proposed sentence similarity measure has been used with the baseline graph-based ranking models for sentence extraction. Experiments have been conducted over the benchmark DUC data sets and it has been shown that the proposed Bernoulli-based sentence similarity model provides consistent improvements over the baseline IntraLink and UniformLink methods.",
    author = "Pawan Goyal and Laxmidhar Behera and TM McGinnity",
    year = "2013",
    month = "8",
    doi = "10.1109/TKDE.2012.114",
    language = "English",
    volume = "25",
    pages = "1693--1705",
    journal = "IEEE Transactions on Knowledge and Data Engineering",
    issn = "1041-4347",
    number = "8",

    }

    A Context-based Word Indexing Model for Document Summarization. / Goyal, Pawan; Behera, Laxmidhar; McGinnity, TM.

    In: IEEE Transactions on Knowledge and Data Engineering, Vol. 25, No. 8, 08.2013, p. 1693-1705.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - A Context-based Word Indexing Model for Document Summarization

    AU - Goyal, Pawan

    AU - Behera, Laxmidhar

    AU - McGinnity, TM

    PY - 2013/8

    Y1 - 2013/8

    N2 - Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context sensitive document indexing model based on the Bernoulli model of randomness. The Bernoulli model of randomness has been used to find the probability of the cooccurrences of two terms in a large corpus. A new approach using the lexical association between terms to give a context sensitive weight to the document terms has been proposed. The resulting indexing weights are used to compute the sentence similarity matrix. The proposed sentence similarity measure has been used with the baseline graph-based ranking models for sentence extraction. Experiments have been conducted over the benchmark DUC data sets and it has been shown that the proposed Bernoulli-based sentence similarity model provides consistent improvements over the baseline IntraLink and UniformLink methods.

    AB - Existing models for document summarization mostly use the similarity between sentences in the document to extract the most salient sentences. The documents as well as the sentences are indexed using traditional term indexing measures, which do not take the context into consideration. Therefore, the sentence similarity values remain independent of the context. In this paper, we propose a context sensitive document indexing model based on the Bernoulli model of randomness. The Bernoulli model of randomness has been used to find the probability of the cooccurrences of two terms in a large corpus. A new approach using the lexical association between terms to give a context sensitive weight to the document terms has been proposed. The resulting indexing weights are used to compute the sentence similarity matrix. The proposed sentence similarity measure has been used with the baseline graph-based ranking models for sentence extraction. Experiments have been conducted over the benchmark DUC data sets and it has been shown that the proposed Bernoulli-based sentence similarity model provides consistent improvements over the baseline IntraLink and UniformLink methods.

    U2 - 10.1109/TKDE.2012.114

    DO - 10.1109/TKDE.2012.114

    M3 - Article

    VL - 25

    SP - 1693

    EP - 1705

    JO - IEEE Transactions on Knowledge and Data Engineering

    T2 - IEEE Transactions on Knowledge and Data Engineering

    JF - IEEE Transactions on Knowledge and Data Engineering

    SN - 1041-4347

    IS - 8

    ER -