TY - UNPB
T1 - EHRs Enable Robust Lung Cancer Risk Stratification with Transformer-based Models: A Retrospective Multi-center Validation Study
AU - Alonso, Eduardo
AU - Mendez, Naroa
AU - Garcia-Navarro, Teresa
AU - Arana-Arri, Eunate
AU - Idoyaga-Uribarrena, Jon Eneko
AU - Giraldez-Álvarez, Miguel
AU - Moreno-Conde, Alberto
AU - Moreno-Conde, Jesús
AU - Núñez-Benjumea, Francisco J.
AU - Vicente-Baz, David
AU - Guiot, Julien
AU - Paulus, Astrid
AU - Gangolf, Marjorie
AU - Henket, Monique
AU - Ernst, Benoit
AU - Gogulancea, Valentina
AU - Rankin, Debbie
AU - Black, Michaela
AU - Gurrutxaga, Ibai
AU - Beristain, Andoni
AU - Garin-Muga, Alba
AU - Macía, Ivan
AU - Calle, Xabier
PY - 2025/12/2
Y1 - 2025/12/2
N2 - Early detection of lung cancer is challenging, and current screening eligibility relies on costly, difficult-to-scale questionnaires. We developed and validated risk stratification models using routinely collected longitudinal structured Electronic Health Records (EHRs) to support population-level screening and evaluation. In this retrospective, multicentre study, we trained four AI models, comparing non-temporal approaches (Count-Based Logistic Regression and time-agnostic Transformer) with temporal sequence modeling approaches (LSTM network and time-aware Transformer). External validation was performed on two independent cohorts from Osakidetza (26,348 individuals from Spain) and the University Hospital of Liège (33,576 individuals from Belgium), evaluating external validity and screening efficiency. The time-aware transformer model (STraTS_t) was the top performer (AUROC 0.809) in the Andalusian Health Service training cohort (202,830 individuals from Spain). Its performance was robustly preserved during sequential external validation (Osakidetza AUROC 0.794; Liège AUROC 0.743). STraTS_t also showed superior screening efficiency, requiring only 26.54% of the population to be screened to detect 70% of lung cancer cases, compared to 41.01% for the baseline CB model. Our findings demonstrate that structured routine EHRs and time-aware transformers deliver accurate, robust lung-cancer risk stratification sustained across distinct European health systems. This capability makes a strong case for screening approaches that are cost- and time-efficient, suitable for population-level deployment without requiring new data collection.
AB - Early detection of lung cancer is challenging, and current screening eligibility relies on costly, difficult-to-scale questionnaires. We developed and validated risk stratification models using routinely collected longitudinal structured Electronic Health Records (EHRs) to support population-level screening and evaluation. In this retrospective, multicentre study, we trained four AI models, comparing non-temporal approaches (Count-Based Logistic Regression and time-agnostic Transformer) with temporal sequence modeling approaches (LSTM network and time-aware Transformer). External validation was performed on two independent cohorts from Osakidetza (26,348 individuals from Spain) and the University Hospital of Liège (33,576 individuals from Belgium), evaluating external validity and screening efficiency. The time-aware transformer model (STraTS_t) was the top performer (AUROC 0.809) in the Andalusian Health Service training cohort (202,830 individuals from Spain). Its performance was robustly preserved during sequential external validation (Osakidetza AUROC 0.794; Liège AUROC 0.743). STraTS_t also showed superior screening efficiency, requiring only 26.54% of the population to be screened to detect 70% of lung cancer cases, compared to 41.01% for the baseline CB model. Our findings demonstrate that structured routine EHRs and time-aware transformers deliver accurate, robust lung-cancer risk stratification sustained across distinct European health systems. This capability makes a strong case for screening approaches that are cost- and time-efficient, suitable for population-level deployment without requiring new data collection.
U2 - 10.21203/rs.3.rs-8200847/v1
DO - 10.21203/rs.3.rs-8200847/v1
M3 - Preprint
SP - 1
EP - 30
BT - EHRs Enable Robust Lung Cancer Risk Stratification with Transformer-based Models: A Retrospective Multi-center Validation Study
ER -