Abstract
Background: Digital health apps allow for proactive rather than reactive healthcare and have the potential to take the pressure off healthcare providers. With over 350,000 digital health apps available on the app stores today, those apps need to be of sufficient quality to be safe to use. Discovering the typology of digital health apps regarding professional/clinical assurance, user experience, data privacy and user ratings may help in determining the areas where digital health apps can improve.
Objectives: This study had two objectives: 1) Discover the types (clusters) of digital health apps with regards to their quality (scores) across three domains (their professional/clinical assurance, user experience and data privacy) and user ratings. 2) Determine whether the National Institute for Health and Care Excellence (NICE) Evidence Standard Framework’s tier, target users of the digital health apps, categories or features have any association with this typology.
Methods: This study was conducted using data from 1402 digital health app assessments. Each app was assessed using the ORCHA baseline review (OBR), regarding the app’s professional/clinical assurance (PCA), user experience (UX) and data privacy (DP). K-medoids clustering was used with this data to discover a typology of digital health apps. The number of clusters was determined using the elbow method and by trying a different number of clusters and observing the differences. Shapiro-Wilk test was used to check if the user ratings or the OBR scores are normally distributed. Following the results of the Shapiro-Wilk tests, the unpaired two-samples Wilcoxon tests were used to compare corresponding user ratings and the OBR scores among clusters. Post hoc analysis was conducted by counting the prevalence of each target users, categories and features in each cluster. Fisher exact test (p-value<.05, adjusted with Bonferroni corrected alpha value) was used to determine whether the difference in proportion was statistically significant among the clusters and the effect size was determined using Cohen’s W.
Results: For the first objective, four clusters have been discovered, which have been labelled: 1) Apps (n=220, 15.7%) with poor user ratings, 2) Apps (n=252, 18.0%) with poor PCA/DP scores, 3) Apps (n=415, 29.6%) with poor PCA scores and 4) Higher quality apps (n=515, 36.7%) with higher user ratings. For the second objective, prevalence of NICE ESF tiers, target users, categories and features in the clusters was tested and compared with Fisher’s exact test. p-value adjusted for multiple hypotheses testing with Bonferroni alpha value. Only clusters with the largest and smallest percentage prevalence have been compared using a statistical test. For example, tier A apps in clusters labelled ‘Apps with poor user rating (1.82%, 4/220)’ and ‘Apps with poor PCA/DP (0%, 0/252)’ were tested. This was done to see whether a variable (in this example NICE ESF tier A) has any effect on the four clusters discovered. Using this approach, the following is the number of statistically significant results: NICE ESF tiers (2/3), target users (0/14), categories (4/33) and features (6/19). However, effect size measured with Cohen’s W was <.3 (small) for all. The effect size was highest for feature Service Signposting (Cohen’s W = .241) and NICE ESF tier B (Cohen’s W = .193), and smallest for category Healthy Living (Cohen’s W = .128) and category Respiratory (Cohen’s W = .158).
Conclusion: The principal findings of the analysis resulting from K-medoids analysis were: 1) The most frequent digital health apps were those with high user ratings and high OBR quality scores (36.7%). 2) There are many digital health apps (29.6%) that lack professional/clinical assurance but excel in user ratings, user experience and data privacy. 3) User ratings are not indicative of OBR quality assessment scoring; digital health apps can receive high user ratings and low OBR scores and vice versa. Digital health apps can be classified into a four-cluster typology. Those being: 1) Apps (n=220, 15.7%) with poor user ratings, 2) Apps (n=252, 18.0%) with poor PCA/DP scores, 3) Apps (n=415, 29.6%) with poor PCA scores and 4) Higher quality apps (n=515, 36.7%) with higher user ratings. Knowledge of the quality shortcomings in digital health apps and how prevalent they are as shown by the four-clusters and their cluster size, can inform the direction needed for future research. This study showed that the examined NICE tiers, target users, categories, and features of digital health apps are not strongly associated with the four-cluster typology of digital health apps, further study is required to determine why that is.
Objectives: This study had two objectives: 1) Discover the types (clusters) of digital health apps with regards to their quality (scores) across three domains (their professional/clinical assurance, user experience and data privacy) and user ratings. 2) Determine whether the National Institute for Health and Care Excellence (NICE) Evidence Standard Framework’s tier, target users of the digital health apps, categories or features have any association with this typology.
Methods: This study was conducted using data from 1402 digital health app assessments. Each app was assessed using the ORCHA baseline review (OBR), regarding the app’s professional/clinical assurance (PCA), user experience (UX) and data privacy (DP). K-medoids clustering was used with this data to discover a typology of digital health apps. The number of clusters was determined using the elbow method and by trying a different number of clusters and observing the differences. Shapiro-Wilk test was used to check if the user ratings or the OBR scores are normally distributed. Following the results of the Shapiro-Wilk tests, the unpaired two-samples Wilcoxon tests were used to compare corresponding user ratings and the OBR scores among clusters. Post hoc analysis was conducted by counting the prevalence of each target users, categories and features in each cluster. Fisher exact test (p-value<.05, adjusted with Bonferroni corrected alpha value) was used to determine whether the difference in proportion was statistically significant among the clusters and the effect size was determined using Cohen’s W.
Results: For the first objective, four clusters have been discovered, which have been labelled: 1) Apps (n=220, 15.7%) with poor user ratings, 2) Apps (n=252, 18.0%) with poor PCA/DP scores, 3) Apps (n=415, 29.6%) with poor PCA scores and 4) Higher quality apps (n=515, 36.7%) with higher user ratings. For the second objective, prevalence of NICE ESF tiers, target users, categories and features in the clusters was tested and compared with Fisher’s exact test. p-value adjusted for multiple hypotheses testing with Bonferroni alpha value. Only clusters with the largest and smallest percentage prevalence have been compared using a statistical test. For example, tier A apps in clusters labelled ‘Apps with poor user rating (1.82%, 4/220)’ and ‘Apps with poor PCA/DP (0%, 0/252)’ were tested. This was done to see whether a variable (in this example NICE ESF tier A) has any effect on the four clusters discovered. Using this approach, the following is the number of statistically significant results: NICE ESF tiers (2/3), target users (0/14), categories (4/33) and features (6/19). However, effect size measured with Cohen’s W was <.3 (small) for all. The effect size was highest for feature Service Signposting (Cohen’s W = .241) and NICE ESF tier B (Cohen’s W = .193), and smallest for category Healthy Living (Cohen’s W = .128) and category Respiratory (Cohen’s W = .158).
Conclusion: The principal findings of the analysis resulting from K-medoids analysis were: 1) The most frequent digital health apps were those with high user ratings and high OBR quality scores (36.7%). 2) There are many digital health apps (29.6%) that lack professional/clinical assurance but excel in user ratings, user experience and data privacy. 3) User ratings are not indicative of OBR quality assessment scoring; digital health apps can receive high user ratings and low OBR scores and vice versa. Digital health apps can be classified into a four-cluster typology. Those being: 1) Apps (n=220, 15.7%) with poor user ratings, 2) Apps (n=252, 18.0%) with poor PCA/DP scores, 3) Apps (n=415, 29.6%) with poor PCA scores and 4) Higher quality apps (n=515, 36.7%) with higher user ratings. Knowledge of the quality shortcomings in digital health apps and how prevalent they are as shown by the four-clusters and their cluster size, can inform the direction needed for future research. This study showed that the examined NICE tiers, target users, categories, and features of digital health apps are not strongly associated with the four-cluster typology of digital health apps, further study is required to determine why that is.
Original language | English |
---|---|
Number of pages | 14 |
Journal | JMIR mHealth and uHealth |
Publication status | Accepted/In press - 26 May 2025 |
Fingerprint
Dive into the research topics of 'Grouping Digital Health Apps Based on Their Quality and User Ratings Using K-Medoids Clustering: Cross-sectional Study'. Together they form a unique fingerprint.Student theses
-
Using data science techniques to analyse quality assessment data of digital health apps
Zych, M. M. (Author), Martinez Carracedo, J. (Supervisor), Mulvenna, M. (Supervisor) & Bond, R. (Supervisor), May 2025Student thesis: Doctoral Thesis
File