Robustness of Question Answering Systems in the Biomedical Domain: a study of the BioASQ dataset

  • Andrew Reeves
  • , Hang Dong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Robustness is a critical consideration when integrating artificial intelligence (AI) systems into decision-making processes. This concern is particularly relevant for generative AI systems, which are designed to consistently produce convincing outputs but can be prone to hallucinations, risking overconfidence in their outputs. This paper investigates the performance of fine-tuning and retrieval augmented generation (RAG) under external data quality perturbations, including typographical errors and factual inaccuracies. Results from experiments using the BioASQ Task 12b question answering dataset and PubMed articles showed nuanced trade-offs, with either RAG or fine-tuning performing better for different scenarios. Furthermore, an analysis of LLMs’ self-reported confidence scores indicated a tendency toward overconfidence, particularly in the presence of inconsistent or erroneous context data. A novel mitigation strategy, leveraging an LLM for data quality error correction was evaluated, but the results demonstrated limited effectiveness, highlighting the need for more advanced correction techniques.
Original languageEnglish
Title of host publication2025 12th International Conference on Information Technology (ICIT)
PublisherIEEE
Pages247-252
Number of pages6
ISBN (Electronic)979-8-3315-0894-4
ISBN (Print)979-8-3315-0895-1
DOIs
Publication statusPublished online - 1 Jul 2025
Event2025 12th International Conference on Information Technology (ICIT) - Amman, Jordan
Duration: 27 May 202530 May 2025
https://icit.zuj.edu.jo/Home/

Publication series

Name2025 12th International Conference on Information Technology (ICIT)
PublisherIEEE Control Society
ISSN (Print)2831-3380
ISSN (Electronic)2831-3399

Conference

Conference2025 12th International Conference on Information Technology (ICIT)
Country/TerritoryJordan
CityAmman
Period27/05/2530/05/25
Internet address

Keywords

  • Large language models
  • Retrieval augmented generation
  • Robustness
  • Question answering

Fingerprint

Dive into the research topics of 'Robustness of Question Answering Systems in the Biomedical Domain: a study of the BioASQ dataset'. Together they form a unique fingerprint.

Cite this