Query Representation through Lexical Association for Information Retrieval

Pawan Goyal, Laxmidhar Behera, Martin McGinnity

    Research output: Contribution to journalArticle

    5 Citations (Scopus)

    Abstract

    A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25) and Language Model (LM). Experiments over various datasets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state-of-the-art in global query expansion.
    LanguageEnglish
    Pages2260-2273
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume24
    Issue number12
    DOIs
    Publication statusPublished - Dec 2012

    Fingerprint

    Information retrieval
    Vector spaces
    Experiments

    Keywords

    • Information Retrieval
    • Lexical Association
    • Query Expansion
    • Language Model

    Cite this

    @article{85cb96f4549a4605920931fc476c1969,
    title = "Query Representation through Lexical Association for Information Retrieval",
    abstract = "A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25) and Language Model (LM). Experiments over various datasets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state-of-the-art in global query expansion.",
    keywords = "Information Retrieval, Lexical Association, Query Expansion, Language Model",
    author = "Pawan Goyal and Laxmidhar Behera and Martin McGinnity",
    note = "Reference text: [1] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, 1975. [2] S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter, “Probabilistic models of indexing and searching,” in SIGIR ’80: Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. Kent, UK, UK: Butterworth & Co., 1981, pp. 35–56. [3] H. Turtle and W. B. Croft, “Inference networks for document retrieval,” in Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR ’90. New York, NY, USA: ACM, 1990, pp. 1–24. [4] T. Kalt, “A new probabilistic model of text classification and retrieval title2:,” Amherst, MA, USA, Tech. Rep., 1998. [5] J. M. Ponte and W. B. Croft, “A language modeling approach to information retrieval,” in SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1998, pp. 275–281. [6] B.-H. Cho, C. Lee, and G. G. Lee, “Exploring term dependences in probabilistic information retrieval model,” Inf. Process. Manage., vol. 39, no. 4, pp. 505–519, 2003. [7] D. Downey, S. Dumais, D. Liebling, and E. Horvitz, “Understanding the relationship between searchers’ queries and information goals,” in CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge management. New York, NY, USA: ACM, 2008, pp. 449– 458. [8] B. J. Jansen, A. Spink, and T. Saracevic, “Real life, real users, and real needs: a study and analysis of user queries on the web,” Inf. Process. Manage., vol. 36, no. 2, pp. 207–227, 2000. [9] G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, “The vocabulary problem in human-system communication,” Commun. ACM, vol. 30, no. 11, pp. 964–971, 1987. [10] T. Custis and K. Al-Kofahi, “A new approach for evaluating query expansion: query-document term mismatch,” in SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2007, pp. 575–582. [11] S.-H. Na, I.-S. Kang, J.-E. Roh, and J.-H. Lee, “An empirical study of query expansion and cluster-based retrieval in language modeling approach,” Inf. Process. Manage., vol. 43, no. 2, pp. 302–314, 2007. [12] J. Rocchio, Relevance Feedback in Information Retrieval, 1971, pp. 313– 323. [13] I. Ruthven, “Re-examining the potential effectiveness of interactive query expansion,” in SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York, NY, USA: ACM, 2003, pp. 213–220. [14] P. Anick, “Using terminological feedback for web search refinement: a log-based study,” in SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York, NY, USA: ACM, 2003, pp. 88–95. [15] J. Xu and W. B. Croft, “Improving the effectiveness of information retrieval with local context analysis,” ACM Trans. Inf. Syst., vol. 18, no. 1, pp. 79–112, 2000. [16] M. Okabe, K. Umemura, and S. Yamada, “Query expansion with the minimum user feedback by transductive learning,” in HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Morristown, NJ, USA: Association for Computational Linguistics, 2005, pp. 963–970. [17] V. Lavrenko and W. B. Croft, “Relevance based language models,” in SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2001, pp. 120–127. [18] C. Zhai and J. Lafferty, “Model-based feedback in the language modeling approach to information retrieval,” in CIKM ’01: Proceedings of the tenth international conference on Information and knowledge management. New York, NY, USA: ACM, 2001, pp. 403–410. [19] S. P. Harter, “Psychological relevance and information science,” J. Am. Soc. Inf. Sci., vol. 43, no. 9, pp. 602–615, 1992. [20] T. Saracevic, “Saracevic, T. (1996). Relevance reconsidered. Information science: Integration in perspectives.” Proceedings of the Second Conference on Conceptions of Library and Information Science, Copenhagen, Denmark, pp. 201–218, 1996. [21] A. Schutz and R. Zaner, Reflections on the Problem of Relevance. Yale University Press, 1970. [22] T. Saracevic, “Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance,” Journal of the American Society for Information Science and Technology, vol. 58, no. 13, pp. 1915–1933, 2007. [23] D. Swanson, “Subjective versus objective relevance in bibliographic retrieval systems,” The Library Quarterly, vol. 56, no. 4, pp. 389–398, 1986. [24] T. Park, “The nature of relevance in information retrieval: An empirical study,” The library quarterly, pp. 318–351, 1993. [25] W. Cooper, “A definition of relevance for information retrieval* 1,” Information storage and retrieval, vol. 7, no. 1, pp. 19–37, 1971. [26] P. Wilson, “Situational relevance,” Information storage and retrieval, vol. 9, no. 8, pp. 457–471, 1973. [27] C. L. Barry, “Document representations and clues to document relevance,” J. Am. Soc. Inf. Sci., vol. 49, pp. 1293–1303, December 1998. [28] P. Wang, M. W. Berry, and Y. Yang, “Mining longitudinal web queries: trends and patterns,” J. Am. Soc. Inf. Sci. Technol., vol. 54, no. 8, pp. 743–758, 2003. [29] C. J. Crouch, “An approach to the automatic construction of global thesauri,” Inf. Process. Manage., vol. 26, no. 5, pp. 629–640, 1990. [30] Y. Qiu and H.-P. Frei, “Concept based query expansion,” in SIGIR ’93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1993, pp. 160–169. [31] J. Lafferty and C. Zhai, “Document language models, query models, and risk minimization for information retrieval,” in SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2001, pp. 111–119. [32] J. Bai and J.-Y. Nie, “Adapting information retrieval to query contexts,” Inf. Process. Manage., vol. 44, no. 6, pp. 1901–1922, 2008. [33] H. Sch¨utze and J. O. Pedersen, “A cooccurrence-based thesaurus and two applications to information retrieval,” Inf. Process. Manage., vol. 33, no. 3, pp. 307–318, 1997. [34] D. Song and P. D. Bruza, “Towards context sensitive information inference,” J. Am. Soc. Inf. Sci. Technol., vol. 54, no. 4, pp. 321–334, 2003. [35] J. Bai, D. Song, P. Bruza, J.-Y. Nie, and G. Cao, “Query expansion using term relationships in language models for information retrieval,” in CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA: ACM, 2005, pp. 688–695. [36] L. A. F. Park and K. Ramamohanarao, “An analysis of latent semantic term self-correlation,” ACM Trans. Inf. Syst., vol. 27, no. 2, pp. 1–35, 2009. [37] E. M. Voorhees, “Query expansion using lexical-semantic relations,” in SIGIR ’94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: Springer-Verlag New York, Inc., 1994, pp. 61–69. [38] G. Salton, Automatic Information Organization and Retrieval. McGraw Hill Text, 1968. [39] G. Cao, J.-Y. Nie, and J. Bai, “Integrating word relationships into language models,” in SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2005, pp. 298–305. [40] M.-H. Hsu, M.-F. Tsai, and H.-H. Chen, “Combining wordnet and conceptnet for automatic query expansion: a learning approach,” in AIRS’08: Proceedings of the 4th Asia information retrieval conference on Information retrieval technology. Berlin, Heidelberg: Springer- Verlag, 2008, pp. 213–224. [41] J. Zhang, B. Deng, and X. Li, “Concept based query expansion using wordnet,” in AST ’09: Proceedings of the 2009 International e- Conference on Advanced Science and Technology. Washington, DC, USA: IEEE Computer Society, 2009, pp. 52–55. [42] F. J. Pinto, A. F. Martinez, and C. F. Perez-Sanjulian, “Joining automatic query expansion based on thesaurus and word sense disambiguation using wordnet,” Int. J. Comput. Appl. Technol., vol. 33, no. 4, pp. 271– 279, 2009. [43] J. Bhogal, A. Macfarlane, and P. Smith, “A review of ontology based query expansion,” Inf. Process. Manage., vol. 43, no. 4, pp. 866–886, 2007. [44] C. Buckley, G. Salton, J. Allan, and A. Singhal, “Automatic query expansion using smart: Trec 3,” in TREC, 1994. [45] C. Carpineto, R. de Mori, G. Romano, and B. Bigi, “An informationtheoretic approach to automatic query expansion,” ACM Trans. Inf. Syst., vol. 19, no. 1, pp. 1–27, 2001. [46] D. Metzler and W. B. Croft, “Latent concept expansion using markov random fields,” in SIGIR, 2007, pp. 311–318. [47] K. Collins-Thompson and J. Callan, “Estimation and use of uncertainty in pseudo-relevance feedback,” in SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2007, pp. 303–310. [48] G. Cao, J.-Y. Nie, J. Gao, and S. Robertson, “Selecting good expansion terms for pseudo-relevance feedback,” in SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2008, pp. 243–250. [49] S. K. M. Wong, W. Ziarko, and P. C. N. Wong, “Generalized vector spaces model in information retrieval,” in SIGIR ’85: Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1985, pp. 18–25. [50] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson Correlation Coefficient,” Noise Reduction in Speech Processing, pp. 1–4, 2009. [51] E. M. Voorhees and D. K. Harman, “Overview of the sixth text retrieval conference (trec-6),” in Proceedings of the Sixth Text REtrieval Conference (TREC-6), 1998, pp. 83–91. [52] D. Wollersheim and J. Rahayu, “Ontology based query expansion framework for use in medical information systems,” International Journal of Web Information Systems, vol. 1, no. 2, pp. 101–115, 2005. [53] R. Navigli and P. Velardi, “An analysis of ontology-based query expansion strategies,” in Workshop on Adaptive Text Extraction and Mining. Citeseer, 2003, pp. 42–49. [54] M. Song, I.-Y. Song, X. Hu, and R. B. Allen, “Integration of association rules and ontologies for semantic query expansion,” Data Knowl. Eng., vol. 63, pp. 63–75, October 2007. [55] K. S. Jones, S. Walker, and S. E. Robertson, “A probabilistic model of information retrieval: development and comparative experiments,” Inf. Process. Manage., vol. 36, no. 6, pp. 779–808, 2000. [56] C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to information retrieval,” ACM Trans. Inf. Syst., vol. 22, no. 2, pp. 179–214, 2004. [57] D. J. C. Mackay and L. Peto, “A hierarchical dirichlet language model,” Natural Language Engineering, vol. 1, no. 3, pp. 1–19, 1994. [58] F. Jelinek and R. Mercer, “Interpolated estimation of markov source parameters from sparse data,” Pattern Recognition in Practice, pp. 381– 402, 1980.",
    year = "2012",
    month = "12",
    doi = "10.1109/TKDE.2011.171",
    language = "English",
    volume = "24",
    pages = "2260--2273",
    journal = "IEEE Transactions on Knowledge and Data Engineering",
    issn = "1041-4347",
    number = "12",

    }

    Query Representation through Lexical Association for Information Retrieval. / Goyal, Pawan; Behera, Laxmidhar; McGinnity, Martin.

    In: IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 12, 12.2012, p. 2260-2273.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Query Representation through Lexical Association for Information Retrieval

    AU - Goyal, Pawan

    AU - Behera, Laxmidhar

    AU - McGinnity, Martin

    N1 - Reference text: [1] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Commun. ACM, vol. 18, no. 11, pp. 613–620, 1975. [2] S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter, “Probabilistic models of indexing and searching,” in SIGIR ’80: Proceedings of the 3rd annual ACM conference on Research and development in information retrieval. Kent, UK, UK: Butterworth & Co., 1981, pp. 35–56. [3] H. Turtle and W. B. Croft, “Inference networks for document retrieval,” in Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, ser. SIGIR ’90. New York, NY, USA: ACM, 1990, pp. 1–24. [4] T. Kalt, “A new probabilistic model of text classification and retrieval title2:,” Amherst, MA, USA, Tech. Rep., 1998. [5] J. M. Ponte and W. B. Croft, “A language modeling approach to information retrieval,” in SIGIR ’98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1998, pp. 275–281. [6] B.-H. Cho, C. Lee, and G. G. Lee, “Exploring term dependences in probabilistic information retrieval model,” Inf. Process. Manage., vol. 39, no. 4, pp. 505–519, 2003. [7] D. Downey, S. Dumais, D. Liebling, and E. Horvitz, “Understanding the relationship between searchers’ queries and information goals,” in CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge management. New York, NY, USA: ACM, 2008, pp. 449– 458. [8] B. J. Jansen, A. Spink, and T. Saracevic, “Real life, real users, and real needs: a study and analysis of user queries on the web,” Inf. Process. Manage., vol. 36, no. 2, pp. 207–227, 2000. [9] G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, “The vocabulary problem in human-system communication,” Commun. ACM, vol. 30, no. 11, pp. 964–971, 1987. [10] T. Custis and K. Al-Kofahi, “A new approach for evaluating query expansion: query-document term mismatch,” in SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2007, pp. 575–582. [11] S.-H. Na, I.-S. Kang, J.-E. Roh, and J.-H. Lee, “An empirical study of query expansion and cluster-based retrieval in language modeling approach,” Inf. Process. Manage., vol. 43, no. 2, pp. 302–314, 2007. [12] J. Rocchio, Relevance Feedback in Information Retrieval, 1971, pp. 313– 323. [13] I. Ruthven, “Re-examining the potential effectiveness of interactive query expansion,” in SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York, NY, USA: ACM, 2003, pp. 213–220. [14] P. Anick, “Using terminological feedback for web search refinement: a log-based study,” in SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval. New York, NY, USA: ACM, 2003, pp. 88–95. [15] J. Xu and W. B. Croft, “Improving the effectiveness of information retrieval with local context analysis,” ACM Trans. Inf. Syst., vol. 18, no. 1, pp. 79–112, 2000. [16] M. Okabe, K. Umemura, and S. Yamada, “Query expansion with the minimum user feedback by transductive learning,” in HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Morristown, NJ, USA: Association for Computational Linguistics, 2005, pp. 963–970. [17] V. Lavrenko and W. B. Croft, “Relevance based language models,” in SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2001, pp. 120–127. [18] C. Zhai and J. Lafferty, “Model-based feedback in the language modeling approach to information retrieval,” in CIKM ’01: Proceedings of the tenth international conference on Information and knowledge management. New York, NY, USA: ACM, 2001, pp. 403–410. [19] S. P. Harter, “Psychological relevance and information science,” J. Am. Soc. Inf. Sci., vol. 43, no. 9, pp. 602–615, 1992. [20] T. Saracevic, “Saracevic, T. (1996). Relevance reconsidered. Information science: Integration in perspectives.” Proceedings of the Second Conference on Conceptions of Library and Information Science, Copenhagen, Denmark, pp. 201–218, 1996. [21] A. Schutz and R. Zaner, Reflections on the Problem of Relevance. Yale University Press, 1970. [22] T. Saracevic, “Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: nature and manifestations of relevance,” Journal of the American Society for Information Science and Technology, vol. 58, no. 13, pp. 1915–1933, 2007. [23] D. Swanson, “Subjective versus objective relevance in bibliographic retrieval systems,” The Library Quarterly, vol. 56, no. 4, pp. 389–398, 1986. [24] T. Park, “The nature of relevance in information retrieval: An empirical study,” The library quarterly, pp. 318–351, 1993. [25] W. Cooper, “A definition of relevance for information retrieval* 1,” Information storage and retrieval, vol. 7, no. 1, pp. 19–37, 1971. [26] P. Wilson, “Situational relevance,” Information storage and retrieval, vol. 9, no. 8, pp. 457–471, 1973. [27] C. L. Barry, “Document representations and clues to document relevance,” J. Am. Soc. Inf. Sci., vol. 49, pp. 1293–1303, December 1998. [28] P. Wang, M. W. Berry, and Y. Yang, “Mining longitudinal web queries: trends and patterns,” J. Am. Soc. Inf. Sci. Technol., vol. 54, no. 8, pp. 743–758, 2003. [29] C. J. Crouch, “An approach to the automatic construction of global thesauri,” Inf. Process. Manage., vol. 26, no. 5, pp. 629–640, 1990. [30] Y. Qiu and H.-P. Frei, “Concept based query expansion,” in SIGIR ’93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1993, pp. 160–169. [31] J. Lafferty and C. Zhai, “Document language models, query models, and risk minimization for information retrieval,” in SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2001, pp. 111–119. [32] J. Bai and J.-Y. Nie, “Adapting information retrieval to query contexts,” Inf. Process. Manage., vol. 44, no. 6, pp. 1901–1922, 2008. [33] H. Sch¨utze and J. O. Pedersen, “A cooccurrence-based thesaurus and two applications to information retrieval,” Inf. Process. Manage., vol. 33, no. 3, pp. 307–318, 1997. [34] D. Song and P. D. Bruza, “Towards context sensitive information inference,” J. Am. Soc. Inf. Sci. Technol., vol. 54, no. 4, pp. 321–334, 2003. [35] J. Bai, D. Song, P. Bruza, J.-Y. Nie, and G. Cao, “Query expansion using term relationships in language models for information retrieval,” in CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management. New York, NY, USA: ACM, 2005, pp. 688–695. [36] L. A. F. Park and K. Ramamohanarao, “An analysis of latent semantic term self-correlation,” ACM Trans. Inf. Syst., vol. 27, no. 2, pp. 1–35, 2009. [37] E. M. Voorhees, “Query expansion using lexical-semantic relations,” in SIGIR ’94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: Springer-Verlag New York, Inc., 1994, pp. 61–69. [38] G. Salton, Automatic Information Organization and Retrieval. McGraw Hill Text, 1968. [39] G. Cao, J.-Y. Nie, and J. Bai, “Integrating word relationships into language models,” in SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2005, pp. 298–305. [40] M.-H. Hsu, M.-F. Tsai, and H.-H. Chen, “Combining wordnet and conceptnet for automatic query expansion: a learning approach,” in AIRS’08: Proceedings of the 4th Asia information retrieval conference on Information retrieval technology. Berlin, Heidelberg: Springer- Verlag, 2008, pp. 213–224. [41] J. Zhang, B. Deng, and X. Li, “Concept based query expansion using wordnet,” in AST ’09: Proceedings of the 2009 International e- Conference on Advanced Science and Technology. Washington, DC, USA: IEEE Computer Society, 2009, pp. 52–55. [42] F. J. Pinto, A. F. Martinez, and C. F. Perez-Sanjulian, “Joining automatic query expansion based on thesaurus and word sense disambiguation using wordnet,” Int. J. Comput. Appl. Technol., vol. 33, no. 4, pp. 271– 279, 2009. [43] J. Bhogal, A. Macfarlane, and P. Smith, “A review of ontology based query expansion,” Inf. Process. Manage., vol. 43, no. 4, pp. 866–886, 2007. [44] C. Buckley, G. Salton, J. Allan, and A. Singhal, “Automatic query expansion using smart: Trec 3,” in TREC, 1994. [45] C. Carpineto, R. de Mori, G. Romano, and B. Bigi, “An informationtheoretic approach to automatic query expansion,” ACM Trans. Inf. Syst., vol. 19, no. 1, pp. 1–27, 2001. [46] D. Metzler and W. B. Croft, “Latent concept expansion using markov random fields,” in SIGIR, 2007, pp. 311–318. [47] K. Collins-Thompson and J. Callan, “Estimation and use of uncertainty in pseudo-relevance feedback,” in SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2007, pp. 303–310. [48] G. Cao, J.-Y. Nie, J. Gao, and S. Robertson, “Selecting good expansion terms for pseudo-relevance feedback,” in SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2008, pp. 243–250. [49] S. K. M. Wong, W. Ziarko, and P. C. N. Wong, “Generalized vector spaces model in information retrieval,” in SIGIR ’85: Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 1985, pp. 18–25. [50] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson Correlation Coefficient,” Noise Reduction in Speech Processing, pp. 1–4, 2009. [51] E. M. Voorhees and D. K. Harman, “Overview of the sixth text retrieval conference (trec-6),” in Proceedings of the Sixth Text REtrieval Conference (TREC-6), 1998, pp. 83–91. [52] D. Wollersheim and J. Rahayu, “Ontology based query expansion framework for use in medical information systems,” International Journal of Web Information Systems, vol. 1, no. 2, pp. 101–115, 2005. [53] R. Navigli and P. Velardi, “An analysis of ontology-based query expansion strategies,” in Workshop on Adaptive Text Extraction and Mining. Citeseer, 2003, pp. 42–49. [54] M. Song, I.-Y. Song, X. Hu, and R. B. Allen, “Integration of association rules and ontologies for semantic query expansion,” Data Knowl. Eng., vol. 63, pp. 63–75, October 2007. [55] K. S. Jones, S. Walker, and S. E. Robertson, “A probabilistic model of information retrieval: development and comparative experiments,” Inf. Process. Manage., vol. 36, no. 6, pp. 779–808, 2000. [56] C. Zhai and J. Lafferty, “A study of smoothing methods for language models applied to information retrieval,” ACM Trans. Inf. Syst., vol. 22, no. 2, pp. 179–214, 2004. [57] D. J. C. Mackay and L. Peto, “A hierarchical dirichlet language model,” Natural Language Engineering, vol. 1, no. 3, pp. 1–19, 1994. [58] F. Jelinek and R. Mercer, “Interpolated estimation of markov source parameters from sparse data,” Pattern Recognition in Practice, pp. 381– 402, 1980.

    PY - 2012/12

    Y1 - 2012/12

    N2 - A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25) and Language Model (LM). Experiments over various datasets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state-of-the-art in global query expansion.

    AB - A user query for information retrieval (IR) applications may not contain the most appropriate terms (words) as actually intended by the user. This is usually referred to as the term mismatch problem and is a crucial research issue in IR. Using the notion of relevance, we provide a comprehensive theoretical analysis of a parametric query vector, which is assumed to represent the information needs of the user. A lexical association function has been derived analytically using the system relevance criteria. The derivation is further justified using an empirical evidence from the user relevance criteria. Such analytical derivation as presented in this paper provides a proper mathematical framework to the query expansion techniques, which have largely been heuristic in the existing literature. By using the generalized retrieval framework, the proposed query representation model is equally applicable to the vector space model (VSM), Okapi best matching 25 (Okapi BM25) and Language Model (LM). Experiments over various datasets from TREC show that the proposed query representation gives statistically significant improvements over the baseline Okapi BM25 and LM as well as other well known global query expansion techniques. Empirical results along with the theoretical foundations of the query representation confirm that the proposed model extends the state-of-the-art in global query expansion.

    KW - Information Retrieval

    KW - Lexical Association

    KW - Query Expansion

    KW - Language Model

    U2 - 10.1109/TKDE.2011.171

    DO - 10.1109/TKDE.2011.171

    M3 - Article

    VL - 24

    SP - 2260

    EP - 2273

    JO - IEEE Transactions on Knowledge and Data Engineering

    T2 - IEEE Transactions on Knowledge and Data Engineering

    JF - IEEE Transactions on Knowledge and Data Engineering

    SN - 1041-4347

    IS - 12

    ER -