Failure of a numerical quality assessment scale to identify potential risk of bias in a systematic review: A comparison study

S.R. O'Connor, M.A. Tully, B. Ryan, J.M. Bradley, G.D. Baxter, S.M. McDonough

Research output: Contribution to journalArticlepeer-review

171 Citations (Scopus)
32 Downloads (Pure)


Assessing methodological quality of primary studies is an essential component of systematic reviews. Following a systematic review which used a domain based system [United States Preventative Services Task Force (USPSTF)] to assess methodological quality, a commonly used numerical rating scale (Downs and Black) was also used to evaluate the included studies and comparisons were made between quality ratings assigned using the two different methods. Both tools were used to assess the 20 randomized and quasi-randomized controlled trials examining an exercise intervention for chronic musculoskeletal pain which were included in the review. Inter-rater reliability and levels of agreement were determined using intraclass correlation coefficients (ICC). Influence of quality on pooled effect size was examined by calculating the between group standardized mean difference (SMD).

Inter-rater reliability indicated at least substantial levels of agreement for the USPSTF system (ICC 0.85; 95% CI 0.66, 0.94) and Downs and Black scale (ICC 0.94; 95% CI 0.84, 0.97). Overall level of agreement between tools (ICC 0.80; 95% CI 0.57, 0.92) was also good. However, the USPSTF system identified a number of studies (n = 3/20) as “poor” due to potential risks of bias. Analysis revealed substantially greater pooled effect sizes in these studies (SMD −2.51; 95% CI −4.21, −0.82) compared to those rated as “fair” (SMD −0.45; 95% CI −0.65, −0.25) or “good” (SMD −0.38; 95% CI −0.69, −0.08).

In this example, use of a numerical rating scale failed to identify studies at increased risk of bias, and could have potentially led to imprecise estimates of treatment effect. Although based on a small number of included studies within an existing systematic review, we found the domain based system provided a more structured framework by which qualitative decisions concerning overall quality could be made, and was useful for detecting potential sources of bias in the available evidence.
Original languageEnglish
Article number224 (2015)
Number of pages7
JournalBMC Research Notes
Issue number1
Publication statusPublished (in print/issue) - 6 Jun 2015


  • Quality assessment
  • Risk of bias
  • Systematic review methods


Dive into the research topics of 'Failure of a numerical quality assessment scale to identify potential risk of bias in a systematic review: A comparison study'. Together they form a unique fingerprint.

Cite this