Abstract
To improve the effectiveness and efficiency of biomedical information retrieval by proposing ranking-based methods for selecting an optimal subset of retrieval systems for data fusion, we propose three ranking-based subset selection methods SFS (Sequential Forward Search), D&P (Diversity & Performance), and P&D (Performance & Diversity). These methods were applied in combination with the Reciprocal Rank Fusion technique. Experiments were conducted on four medical datasets from TREC, using between 62 and 125 candidate retrieval systems, and selecting up to 15 for fusion. The proposed subset selection methods significantly improved retrieval performance. Fusing the selected systems using RRF yielded improvements ranging from 10% to over 60% compared to the best individual retrieval system across the datasets. They also outperform the state-of-the-art technology by a large margin. In summary, our subset selection approach offers a practical and cost-efficient solution for biomedical information retrieval, achieving substantial performance gains while reducing computational overhead.
| Original language | English |
|---|---|
| Article number | 11 |
| Pages (from-to) | 1-17 |
| Number of pages | 17 |
| Journal | BMC Bioinformatics |
| Volume | 27 |
| Early online date | 9 Dec 2025 |
| DOIs | |
| Publication status | Published (in print/issue) - 13 Jan 2026 |
Bibliographical note
© 2025. The Author(s).Data Access Statement
No datasets were generated or analysed during the current study.Funding
No external funding was received for this study.
Keywords
- Data fusion
- Subset selection
- Information retrieval
- Biomedical information retrieval
- Ranking-based method
- Algorithms
- Humans
- Information Storage and Retrieval/methods
- Computational Biology/methods
- Databases, Factual