Cost-effective data fusion in information retrieval

Research output: Contribution to journalArticlepeer-review

Abstract

Data fusion has demonstrated its effectiveness in enhancing information retrieval across various studies. However, advanced fusion methods typically require a dataset with extensive relevance judgments to train optimal model weights, necessitating labor-intensive and costly manual efforts. This study explores efficient methods for generating training data to facilitate affordable relevance judgments and improve fusion model quality. Experiments conducted on six datasets from TREC’s Precision Medicine and Deep Learning tracks reveal that with careful sampling design, near-optimal fusion weights can be achieved using only 5% of the documents compared to the full TREC judgments. This translates to a dataset comprising 20 queries and 500 relevance-judged documents in total. The findings highlight the potential for sophisticated fusion techniques to become more accessible to researchers and practitioners, delivering substantial performance improvements with minimal judgment effort and cost.
Original languageEnglish
Pages (from-to)1-25
Number of pages25
JournalKnowledge and Information Systems
Early online date5 Jun 2025
DOIs
Publication statusPublished online - 5 Jun 2025

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Data Access Statement

No datasets were generated or analyzed during the current study

Keywords

  • Data fusion
  • Linear combination
  • Optimal model weights
  • Affordable relevance judgement
  • Affordable relevance judgment

Fingerprint

Dive into the research topics of 'Cost-effective data fusion in information retrieval'. Together they form a unique fingerprint.

Cite this