Abstract
Background
Conventional wearable ECG validation excludes inconclusive results and assumes single-attempt testing, which inflates reported diagnostic performance for atrial fibrillation (AF) detection. Current reporting frameworks do not reflect real-world clinical use.
Objective
To introduce and evaluate an intention-to-diagnose framework incorporating inconclusive outputs and repeat testing for realistic assessment of wearable ECG diagnostic performance in AF detection.
Methods
Prospective observational study comparing three reporting frameworks—naive (exclude inconclusive), pragmatic (count as incorrect), and intention-to-diagnose (permit three attempts), applied to identical Apple Watch ECG recordings using native algorithm and AI-enabled neural network. Study conducted at a teaching hospital in Ireland with 296 participants after exclusions.
Results
Apple Watch naive sensitivity/specificity were 96.1%/97.9%; pragmatic 78.1%/81.0%; intention-to-diagnose 92.2%/91.0%. The AI algorithm achieved pragmatic 98.4%/96.1% and intention-to-diagnose 98.4%/96.6%, with 92% reduction in inconclusive outputs versus Apple Watch. Repeatability was substantial for Apple Watch (kappa 0.77) and near-perfect for AI (0.96).
Conclusion
Wearable ECG reporting should adopt intention-to-diagnose frameworks, explicitly handling inconclusive outputs and repeat testing. Naive reporting inflates performance, while pragmatic reporting deflates it. AI-enhanced interpretation materially reduces inconclusive results and improves intention-to-diagnose accuracy, providing a more usable and reliable pathway for real-world AF detection.
Conventional wearable ECG validation excludes inconclusive results and assumes single-attempt testing, which inflates reported diagnostic performance for atrial fibrillation (AF) detection. Current reporting frameworks do not reflect real-world clinical use.
Objective
To introduce and evaluate an intention-to-diagnose framework incorporating inconclusive outputs and repeat testing for realistic assessment of wearable ECG diagnostic performance in AF detection.
Methods
Prospective observational study comparing three reporting frameworks—naive (exclude inconclusive), pragmatic (count as incorrect), and intention-to-diagnose (permit three attempts), applied to identical Apple Watch ECG recordings using native algorithm and AI-enabled neural network. Study conducted at a teaching hospital in Ireland with 296 participants after exclusions.
Results
Apple Watch naive sensitivity/specificity were 96.1%/97.9%; pragmatic 78.1%/81.0%; intention-to-diagnose 92.2%/91.0%. The AI algorithm achieved pragmatic 98.4%/96.1% and intention-to-diagnose 98.4%/96.6%, with 92% reduction in inconclusive outputs versus Apple Watch. Repeatability was substantial for Apple Watch (kappa 0.77) and near-perfect for AI (0.96).
Conclusion
Wearable ECG reporting should adopt intention-to-diagnose frameworks, explicitly handling inconclusive outputs and repeat testing. Naive reporting inflates performance, while pragmatic reporting deflates it. AI-enhanced interpretation materially reduces inconclusive results and improves intention-to-diagnose accuracy, providing a more usable and reliable pathway for real-world AF detection.
| Original language | English |
|---|---|
| Journal | Heart Rhythm |
| Early online date | 7 Nov 2025 |
| DOIs | |
| Publication status | Published online - 7 Nov 2025 |
Bibliographical note
© 2025 Heart Rhythm Society. All rights are reserved, including those for text and data mining, AI training, and similar technologies.Funding
Royal Commission for the Exhibition of 1851
Keywords
- Atrial fibrillation
- artificial intelligence
- Diagnostic accuracy
- Inconclusive results
- Apple Watch
- Single-lead ECG
- Arrhythmia detection
- Artificial intelligence