Clustering-based fusion for medical information retrieval

Qiuyu Xu, Yidong Huang, Shengli Wu, Chris Nugent

Research output: Contribution to journalArticlepeer-review

Abstract

Medicine is a fast-moving field, and the number of medical publications has increased rapidly over recent years. How to find relevant information from this vast pool of research effectively and efficiently has therefore become highly challenges. Previous studies have demonstrated that data fusion can improve search performance if properly utilized. However, in most cases effectiveness is the only concern and efficiency is not considered. A fusion-based system is by nature more complicated and expensive computationally than other retrieval models such as BM25, because many component retrieval systems and an extra layer of fusion are required. The number of component retrieval systems involved is an important indicator of complexity of the fusion-based system. We aim to select the optimal k-subset of component retrieval systems for any given number k, to optimize both fusion performance and reduce the cost of data fusion. A clustering-based approach is proposed. First all the candidates are divided into clusters by the Chameleon clustering algorithm, then representatives from every cluster are chosen by Sequential Forward Selection for fusion. Evaluated with two datasets from TREC, the proposed method performs more effectively than the other baseline methods including the state-of-the-art subset selection method significantly. When either of the two typical fusion methods is used, an improvement rate of over 10% is observed for both measures Mean Average Precision and Recall-level Precision, and an improvement rate of over 5% is observed for both measures Precision at 10 document level and Mean Reciprocal Rank. [Abstract copyright: Copyright © 2022. Published by Elsevier Inc.]
Original languageEnglish
Article number104213
JournalJournal of Biomedical Informatics
Volume135
Early online date30 Sep 2022
DOIs
Publication statusPublished (in print/issue) - 30 Nov 2022

Bibliographical note

Publisher Copyright:
© 2022

Keywords

  • Data fusion
  • Subset selection
  • Medical information retrieval
  • Clustering
  • Efficiency and effectiveness

Fingerprint

Dive into the research topics of 'Clustering-based fusion for medical information retrieval'. Together they form a unique fingerprint.

Cite this