Abstract
Aim
To complete a scoping review of the literature investigating the performance of artificial intelligence (AI) systems currently in development for their ability to detect fractures on plain radiographic images.
Methods
A systematic approach was adopted to identify papers for inclusion in this scoping review and utilised the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Statement (PRISMA). Following application of inclusion and exclusion criteria, sixteen studies were included in the final review.
Results
With the exception of one study, all studies report that AI models demonstrated an ability to perform fracture identification tasks on plain skeletal radiographs. Metrics used to report performance are variable throughout all reviewed studies and include area under the receiver operating characteristic curve (AUC), sensitivity and specificity, positive predictive value, negative predictive value, precision, recall, F1 score and accuracy. Reported performances for studies indicated AUC values range from AUC 0.78 (weakest) to the best performing system reporting AUC 0.99.
Conclusion
The review found a great variation in the AI model architectures, training and testing methodology as well as the metrics used to report the performance of the networks. A standardisation of the reporting metrics and methods would permit comparison of proposed models and training methods which may accelerate the testing of AI systems in the clinical setting. Prevalence agnostic metrics should be used to reflect the true performance of such systems. Many studies lacked any explainability for the algorithmic decision making of the AI models, and there was a lack of interrogation into the potential reasons for misclassification errors. This type of ‘failure analysis’ would have provided insight into the biases and the aetiology of AI misclassifications.
To complete a scoping review of the literature investigating the performance of artificial intelligence (AI) systems currently in development for their ability to detect fractures on plain radiographic images.
Methods
A systematic approach was adopted to identify papers for inclusion in this scoping review and utilised the Preferred Reporting Items for Systematic Reviews and Meta-Analysis Statement (PRISMA). Following application of inclusion and exclusion criteria, sixteen studies were included in the final review.
Results
With the exception of one study, all studies report that AI models demonstrated an ability to perform fracture identification tasks on plain skeletal radiographs. Metrics used to report performance are variable throughout all reviewed studies and include area under the receiver operating characteristic curve (AUC), sensitivity and specificity, positive predictive value, negative predictive value, precision, recall, F1 score and accuracy. Reported performances for studies indicated AUC values range from AUC 0.78 (weakest) to the best performing system reporting AUC 0.99.
Conclusion
The review found a great variation in the AI model architectures, training and testing methodology as well as the metrics used to report the performance of the networks. A standardisation of the reporting metrics and methods would permit comparison of proposed models and training methods which may accelerate the testing of AI systems in the clinical setting. Prevalence agnostic metrics should be used to reflect the true performance of such systems. Many studies lacked any explainability for the algorithmic decision making of the AI models, and there was a lack of interrogation into the potential reasons for misclassification errors. This type of ‘failure analysis’ would have provided insight into the biases and the aetiology of AI misclassifications.
Original language | English |
---|---|
Article number | 100033 |
Number of pages | 18 |
Journal | Intelligence-Based Medicine |
Volume | 5 |
Early online date | 21 Apr 2021 |
DOIs | |
Publication status | Published online - 21 Apr 2021 |
Keywords
- Artificial intelligence
- Fracture identification
- Radiology
- X-ray
- Radiographic image interpretation
- Plain radiography