Subset selection based fusion for biomedical information retrieval tasks

Jiahui Sun, Shengli Wu, Xiangjun Shen, CD Nugent, Hu Lu

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

To improve the effectiveness and efficiency of biomedical information retrieval by proposing ranking-based methods for selecting an optimal subset of retrieval systems for data fusion, we propose three ranking-based subset selection methods SFS (Sequential Forward Search), D&P (Diversity & Performance), and P&D (Performance & Diversity). These methods were applied in combination with the Reciprocal Rank Fusion technique. Experiments were conducted on four medical datasets from TREC, using between 62 and 125 candidate retrieval systems, and selecting up to 15 for fusion. The proposed subset selection methods significantly improved retrieval performance. Fusing the selected systems using RRF yielded improvements ranging from 10% to over 60% compared to the best individual retrieval system across the datasets. They also outperform the state-of-the-art technology by a large margin. In summary, our subset selection approach offers a practical and cost-efficient solution for biomedical information retrieval, achieving substantial performance gains while reducing computational overhead.
Original languageEnglish
Article number11
Pages (from-to)1-17
Number of pages17
JournalBMC Bioinformatics
Volume27
Early online date9 Dec 2025
DOIs
Publication statusPublished (in print/issue) - 13 Jan 2026

Bibliographical note

© 2025. The Author(s).

Data Access Statement

No datasets were generated or analysed during the current study.

Funding

No external funding was received for this study.

Keywords

  • Data fusion
  • Subset selection
  • Information retrieval
  • Biomedical information retrieval
  • Ranking-based method
  • Algorithms
  • Humans
  • Information Storage and Retrieval/methods
  • Computational Biology/methods
  • Databases, Factual

Fingerprint

Dive into the research topics of 'Subset selection based fusion for biomedical information retrieval tasks'. Together they form a unique fingerprint.

Cite this