Automation Bias in Medicine: The Influence of Automated Diagnoses on Interpreter Accuracy and Uncertainty when Reading Electrocardiograms

RR Bond, Tomas Novotny, Irena Andrsova, Lumir Koc, Martina Sisakova, D Finlay, D Guldenring, James McLaughlin, Aaron Peace, V. E. McGilligan, Stephen Leslie, H. Wang, Marek Malik

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Introduction: Interpretation of the 12-lead Electrocardiogram (ECG) is normally assisted with an automated diagnosis (AD), which can facilitate an ‘automation bias’ where interpreters can be anchored. In this paper, we studied, 1) the effect of an incorrect AD on interpretation accuracy and interpreter confidence (a proxy for uncertainty), and 2) whether confidence and other interpreter features can predict interpretation accuracy using machine learning. Methods: This study analysed 9000 ECG interpretations from cardiology and non-cardiology fellows (CFs and non-CFs). One third of the ECGs involved no ADs, one third with ADs (half as incorrect) and one third had multiple ADs. Interpretations were scored and interpreter confidence was recorded for each interpretation and subsequently standardised using sigma scaling. Spearman coefficients were used for correlation analysis and C5.0 decision trees were used for predicting interpretation accuracy using basic interpreter features such as confidence, age, experience and designation.Results: Interpretation accuracies achieved by CFs and non-CFs dropped by 43.20% and 58.95% respectively when an incorrect AD was presented (p<0.001). Overall correlation between scaled confidence and interpretation accuracy was higher amongst CFs. However, correlation between confidence and interpretation accuracy decreased for both groups when an incorrect AD was presented. We found that an incorrect AD disturbs the reliability of interpreter confidence in predicting accuracy. An incorrect AD has a greater effect on the confidence of non-CFs (although this is not statistically significant it is close to the threshold, p=0.065). The best C5.0 decision tree achieved an accuracy rate of 64.67% (p<0.001), however this is only 6.56% greater than the no-information-rate. Conclusion: Incorrect ADs reduce the interpreter’s diagnostic accuracy indicating an automation bias. Non-CFs tend to agree more with the ADs in comparison to CFs, hence less expert physicians are more effected by automation bias. Incorrect ADs reduce the interpreter’s confidence and also reduces the predictive power of confidence for predicting accuracy (even more so for non-CFs). Whilst a statistically significant model was developed, it is difficult to predict interpretation accuracy using machine learning on basic features such as interpreter confidence, age, reader experience and designation.
LanguageEnglish
Pages1-12
JournalJournal of Electrocardiology
Early online date10 Aug 2018
DOIs
Publication statusE-pub ahead of print - 10 Aug 2018

Fingerprint

Automation
Uncertainty
Reading
Electrocardiography
Medicine
Decision Trees
Proxy
Cardiology
Physicians

Keywords

  • ECG
  • Automation bias
  • AI
  • clinical decision making
  • Cardiology

Cite this

Bond, RR ; Novotny, Tomas ; Andrsova, Irena ; Koc, Lumir ; Sisakova, Martina ; Finlay, D ; Guldenring, D ; McLaughlin, James ; Peace, Aaron ; McGilligan, V. E. ; Leslie, Stephen ; Wang, H. ; Malik, Marek. / Automation Bias in Medicine: The Influence of Automated Diagnoses on Interpreter Accuracy and Uncertainty when Reading Electrocardiograms. In: Journal of Electrocardiology. 2018 ; pp. 1-12.
@article{d3ef84aa1ced4d4087348b3681eb16f4,
title = "Automation Bias in Medicine: The Influence of Automated Diagnoses on Interpreter Accuracy and Uncertainty when Reading Electrocardiograms",
abstract = "Introduction: Interpretation of the 12-lead Electrocardiogram (ECG) is normally assisted with an automated diagnosis (AD), which can facilitate an ‘automation bias’ where interpreters can be anchored. In this paper, we studied, 1) the effect of an incorrect AD on interpretation accuracy and interpreter confidence (a proxy for uncertainty), and 2) whether confidence and other interpreter features can predict interpretation accuracy using machine learning. Methods: This study analysed 9000 ECG interpretations from cardiology and non-cardiology fellows (CFs and non-CFs). One third of the ECGs involved no ADs, one third with ADs (half as incorrect) and one third had multiple ADs. Interpretations were scored and interpreter confidence was recorded for each interpretation and subsequently standardised using sigma scaling. Spearman coefficients were used for correlation analysis and C5.0 decision trees were used for predicting interpretation accuracy using basic interpreter features such as confidence, age, experience and designation.Results: Interpretation accuracies achieved by CFs and non-CFs dropped by 43.20{\%} and 58.95{\%} respectively when an incorrect AD was presented (p<0.001). Overall correlation between scaled confidence and interpretation accuracy was higher amongst CFs. However, correlation between confidence and interpretation accuracy decreased for both groups when an incorrect AD was presented. We found that an incorrect AD disturbs the reliability of interpreter confidence in predicting accuracy. An incorrect AD has a greater effect on the confidence of non-CFs (although this is not statistically significant it is close to the threshold, p=0.065). The best C5.0 decision tree achieved an accuracy rate of 64.67{\%} (p<0.001), however this is only 6.56{\%} greater than the no-information-rate. Conclusion: Incorrect ADs reduce the interpreter’s diagnostic accuracy indicating an automation bias. Non-CFs tend to agree more with the ADs in comparison to CFs, hence less expert physicians are more effected by automation bias. Incorrect ADs reduce the interpreter’s confidence and also reduces the predictive power of confidence for predicting accuracy (even more so for non-CFs). Whilst a statistically significant model was developed, it is difficult to predict interpretation accuracy using machine learning on basic features such as interpreter confidence, age, reader experience and designation.",
keywords = "ECG, Automation bias, AI, clinical decision making, Cardiology",
author = "RR Bond and Tomas Novotny and Irena Andrsova and Lumir Koc and Martina Sisakova and D Finlay and D Guldenring and James McLaughlin and Aaron Peace and McGilligan, {V. E.} and Stephen Leslie and H. Wang and Marek Malik",
year = "2018",
month = "8",
day = "10",
doi = "10.1016/j.jelectrocard.2018.08.007",
language = "English",
pages = "1--12",
journal = "Journal of Electrocardiology",
issn = "0022-0736",
publisher = "Elsevier",

}

Automation Bias in Medicine: The Influence of Automated Diagnoses on Interpreter Accuracy and Uncertainty when Reading Electrocardiograms. / Bond, RR; Novotny, Tomas; Andrsova, Irena; Koc, Lumir; Sisakova, Martina; Finlay, D; Guldenring, D; McLaughlin, James; Peace, Aaron; McGilligan, V. E.; Leslie, Stephen; Wang, H.; Malik, Marek.

In: Journal of Electrocardiology, 10.08.2018, p. 1-12.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Automation Bias in Medicine: The Influence of Automated Diagnoses on Interpreter Accuracy and Uncertainty when Reading Electrocardiograms

AU - Bond, RR

AU - Novotny, Tomas

AU - Andrsova, Irena

AU - Koc, Lumir

AU - Sisakova, Martina

AU - Finlay, D

AU - Guldenring, D

AU - McLaughlin, James

AU - Peace, Aaron

AU - McGilligan, V. E.

AU - Leslie, Stephen

AU - Wang, H.

AU - Malik, Marek

PY - 2018/8/10

Y1 - 2018/8/10

N2 - Introduction: Interpretation of the 12-lead Electrocardiogram (ECG) is normally assisted with an automated diagnosis (AD), which can facilitate an ‘automation bias’ where interpreters can be anchored. In this paper, we studied, 1) the effect of an incorrect AD on interpretation accuracy and interpreter confidence (a proxy for uncertainty), and 2) whether confidence and other interpreter features can predict interpretation accuracy using machine learning. Methods: This study analysed 9000 ECG interpretations from cardiology and non-cardiology fellows (CFs and non-CFs). One third of the ECGs involved no ADs, one third with ADs (half as incorrect) and one third had multiple ADs. Interpretations were scored and interpreter confidence was recorded for each interpretation and subsequently standardised using sigma scaling. Spearman coefficients were used for correlation analysis and C5.0 decision trees were used for predicting interpretation accuracy using basic interpreter features such as confidence, age, experience and designation.Results: Interpretation accuracies achieved by CFs and non-CFs dropped by 43.20% and 58.95% respectively when an incorrect AD was presented (p<0.001). Overall correlation between scaled confidence and interpretation accuracy was higher amongst CFs. However, correlation between confidence and interpretation accuracy decreased for both groups when an incorrect AD was presented. We found that an incorrect AD disturbs the reliability of interpreter confidence in predicting accuracy. An incorrect AD has a greater effect on the confidence of non-CFs (although this is not statistically significant it is close to the threshold, p=0.065). The best C5.0 decision tree achieved an accuracy rate of 64.67% (p<0.001), however this is only 6.56% greater than the no-information-rate. Conclusion: Incorrect ADs reduce the interpreter’s diagnostic accuracy indicating an automation bias. Non-CFs tend to agree more with the ADs in comparison to CFs, hence less expert physicians are more effected by automation bias. Incorrect ADs reduce the interpreter’s confidence and also reduces the predictive power of confidence for predicting accuracy (even more so for non-CFs). Whilst a statistically significant model was developed, it is difficult to predict interpretation accuracy using machine learning on basic features such as interpreter confidence, age, reader experience and designation.

AB - Introduction: Interpretation of the 12-lead Electrocardiogram (ECG) is normally assisted with an automated diagnosis (AD), which can facilitate an ‘automation bias’ where interpreters can be anchored. In this paper, we studied, 1) the effect of an incorrect AD on interpretation accuracy and interpreter confidence (a proxy for uncertainty), and 2) whether confidence and other interpreter features can predict interpretation accuracy using machine learning. Methods: This study analysed 9000 ECG interpretations from cardiology and non-cardiology fellows (CFs and non-CFs). One third of the ECGs involved no ADs, one third with ADs (half as incorrect) and one third had multiple ADs. Interpretations were scored and interpreter confidence was recorded for each interpretation and subsequently standardised using sigma scaling. Spearman coefficients were used for correlation analysis and C5.0 decision trees were used for predicting interpretation accuracy using basic interpreter features such as confidence, age, experience and designation.Results: Interpretation accuracies achieved by CFs and non-CFs dropped by 43.20% and 58.95% respectively when an incorrect AD was presented (p<0.001). Overall correlation between scaled confidence and interpretation accuracy was higher amongst CFs. However, correlation between confidence and interpretation accuracy decreased for both groups when an incorrect AD was presented. We found that an incorrect AD disturbs the reliability of interpreter confidence in predicting accuracy. An incorrect AD has a greater effect on the confidence of non-CFs (although this is not statistically significant it is close to the threshold, p=0.065). The best C5.0 decision tree achieved an accuracy rate of 64.67% (p<0.001), however this is only 6.56% greater than the no-information-rate. Conclusion: Incorrect ADs reduce the interpreter’s diagnostic accuracy indicating an automation bias. Non-CFs tend to agree more with the ADs in comparison to CFs, hence less expert physicians are more effected by automation bias. Incorrect ADs reduce the interpreter’s confidence and also reduces the predictive power of confidence for predicting accuracy (even more so for non-CFs). Whilst a statistically significant model was developed, it is difficult to predict interpretation accuracy using machine learning on basic features such as interpreter confidence, age, reader experience and designation.

KW - ECG

KW - Automation bias

KW - AI

KW - clinical decision making

KW - Cardiology

U2 - 10.1016/j.jelectrocard.2018.08.007

DO - 10.1016/j.jelectrocard.2018.08.007

M3 - Article

SP - 1

EP - 12

JO - Journal of Electrocardiology

T2 - Journal of Electrocardiology

JF - Journal of Electrocardiology

SN - 0022-0736

ER -