A geometric framework for data fusion in information retrieval

Shengli Wu, Fabio Crestani

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)

Abstract

Data fusion in information retrieval has been investigated by many researchers and a number of data fusion methods have been proposed. However, problems such as why data fusion can increase effectiveness and favourable conditions for the use of data fusion methods are poorly resolved at best. In this paper, we formally describe data fusion under a geometric framework, in which each component result returned from an information retrieval system for a given query is represented as a point in a multi-dimensional space. The Euclidean distance is the measure by which the effectiveness and similarity of search results are judged. This allows us to explain all component results and fused results using geometrical principles. In such a framework, score-based data fusion becomes a deterministic problem. Several interesting features of the centroid-based data fusion method and the linear combination method are discussed. Nevertheless, in retrieval evaluation, ranking-based measures are the most popular. Therefore, this paper investigates the relation and correlation between the Euclidean distance and several typical ranking-based measures. We indeed find that a very strong correlation exists between these. It means that the theorems and observations obtained using the Euclidean distance remain valid when ranking-based measures are used. The proposed framework enables us to have a better understanding of score-based data fusion and use score-based data fusion methods more precisely and effectively in various ways.
Original languageEnglish
Pages (from-to)20-35
JournalInformation Systems
Volume50
Early online date12 Jan 2015
DOIs
Publication statusPublished (in print/issue) - 1 Jun 2015

Keywords

  • database searching
  • geometric modeling
  • information retrieval
  • data fusion

Fingerprint

Dive into the research topics of 'A geometric framework for data fusion in information retrieval'. Together they form a unique fingerprint.

Cite this