How many words make a sample? Determining the minimum number of word tokens needed in connected speech samples for child speech assessment

Yvonne Wren, J Titterington, Paul White

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
101 Downloads (Pure)


Connected speech (CS) is an important component of child speech assessment in both clinical practice and research. There is debate in the literature regarding what size sample of CS is required to facilitate reliable measures of speech output. The aim of this study was to identify the minimum number of word tokens required to obtain a reliable measure of CS across a range of measures. Participants were 776 5-year-olds from a longitudinal community population cohort study (Avon Longitudinal Study of Parents and Children, ALSPAC). Children’s narratives from a story retell task were audio-recorded and phonetically transcribed. Automatic analysis of the transcribed speech samples was completed using an automated transcription and analysis system. Measures of speech performance extracted included: a range of profiles of percentage consonant correct; frequency of substitutions, omissions, distortions and additions (SODA); percentage of syllable and stress pattern matches; and a measure of whole word complexity (Phonological Mean Length of Utterance, pMLU). Statistical analyses compared these measures at different CS sample sizes in increments using averages and weighted moving averages, and investigated how measures performed between CS samples grouped into word tokens of at least 50, 75 and 100, and restricted to samples of 50–74, 75–99 and 100–125. Key findings showed that sample sizes of 75 word tokens and above showed minimal differences in most measures of speech output, suggesting that the minimum requirement for samples of CS is a word count of 75. The exception to this is in the case of pMLU and measures of substitutions and distortions when a word count of 100 is recommended.
Original languageEnglish
Pages (from-to)761-778
Number of pages18
JournalClinical Linguistics & Phonetics
Issue number8
Early online date6 Oct 2020
Publication statusPublished (in print/issue) - 3 Aug 2021

Bibliographical note

Funding Information:
The UK Medical Research Council and Wellcome (Grant ref: 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors who will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website ( Yvonne Wren was funded by a National Institute for Health Research (NIHR) Postdoctoral Fellowship for this research project. Additional funding was provided by North Bristol NHS Trust Research Capability Funding. The views expressed are those of the authors and not necessarily those of the National Health Service (NHS), the NIHR, or the Department of Health and Social Care;Medical Research Council/Wellcome Trust [102215/2/13/2]; We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. We are also grateful to Joy Newbold who carried out the transcriptions of the samples.

Publisher Copyright:
© 2020 The Author(s). Published with license by Taylor & Francis Group, LLC.


  • speech
  • transcription
  • speech sound disorder
  • connected speech
  • sample size
  • alspac
  • Speech
  • speech Sound Disorder


Dive into the research topics of 'How many words make a sample? Determining the minimum number of word tokens needed in connected speech samples for child speech assessment'. Together they form a unique fingerprint.

Cite this