Abstract
Data fusion has demonstrated its effectiveness in enhancing information retrieval across various studies. However, advanced fusion methods typically require a dataset with extensive relevance judgments to train optimal model weights, necessitating labor-intensive and costly manual efforts. This study explores efficient methods for generating training data to facilitate affordable relevance judgments and improve fusion model quality. Experiments conducted on six datasets from TREC’s Precision Medicine and Deep Learning tracks reveal that with careful sampling design, near-optimal fusion weights can be achieved using only 5% of the documents compared to the full TREC judgments. This translates to a dataset comprising 20 queries and 500 relevance-judged documents in total. The findings highlight the potential for sophisticated fusion techniques to become more accessible to researchers and practitioners, delivering substantial performance improvements with minimal judgment effort and cost.
Original language | English |
---|---|
Pages (from-to) | 1-25 |
Number of pages | 25 |
Journal | Knowledge and Information Systems |
Early online date | 5 Jun 2025 |
DOIs | |
Publication status | Published online - 5 Jun 2025 |
Bibliographical note
Publisher Copyright:© The Author(s) 2025.
Data Access Statement
No datasets were generated or analyzed during the current studyKeywords
- Data fusion
- Linear combination
- Optimal model weights
- Affordable relevance judgement
- Affordable relevance judgment