Abstract
Robustness is a critical consideration when integrating artificial intelligence (AI) systems into decision-making processes. This concern is particularly relevant for generative AI systems, which are designed to consistently produce convincing outputs but can be prone to hallucinations, risking overconfidence in their outputs. This paper investigates the performance of fine-tuning and retrieval augmented generation (RAG) under external data quality perturbations, including typographical errors and factual inaccuracies. Results from experiments using the BioASQ Task 12b question answering dataset and PubMed articles showed nuanced trade-offs, with either RAG or fine-tuning performing better for different scenarios. Furthermore, an analysis of LLMs’ self-reported confidence scores indicated a tendency toward overconfidence, particularly in the presence of inconsistent or erroneous context data. A novel mitigation strategy, leveraging an LLM for data quality error correction was evaluated, but the results demonstrated limited effectiveness, highlighting the need for more advanced correction techniques.
| Original language | English |
|---|---|
| Title of host publication | 2025 12th International Conference on Information Technology (ICIT) |
| Publisher | IEEE |
| Pages | 247-252 |
| Number of pages | 6 |
| ISBN (Electronic) | 979-8-3315-0894-4 |
| ISBN (Print) | 979-8-3315-0895-1 |
| DOIs | |
| Publication status | Published online - 1 Jul 2025 |
| Event | 2025 12th International Conference on Information Technology (ICIT) - Amman, Jordan Duration: 27 May 2025 → 30 May 2025 https://icit.zuj.edu.jo/Home/ |
Publication series
| Name | 2025 12th International Conference on Information Technology (ICIT) |
|---|---|
| Publisher | IEEE Control Society |
| ISSN (Print) | 2831-3380 |
| ISSN (Electronic) | 2831-3399 |
Conference
| Conference | 2025 12th International Conference on Information Technology (ICIT) |
|---|---|
| Country/Territory | Jordan |
| City | Amman |
| Period | 27/05/25 → 30/05/25 |
| Internet address |
Keywords
- Large language models
- Retrieval augmented generation
- Robustness
- Question answering
Fingerprint
Dive into the research topics of 'Robustness of Question Answering Systems in the Biomedical Domain: a study of the BioASQ dataset'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver