Project Details
Description
This pilot project will develop a longitudinal, annotated corpus of spontaneous Irish child-directed speech to
support qualitative and quantitative analysis of transcription accuracy and speech variability. Longitudinal
recordings will be collected from parents of children from Munster, Connacht, and Ulster dialects of Irish,
capturing both dialectal and developmental variation in a real-world, home environment.
The audio will be transcribed and linguistically annotated to enable qualitative interpretation and quantitative
benchmarking. We will evaluate Whisper, an open-source ASR model, using both standard quantitative
metrics (e.g., Word Error Rate) and qualitative categorization of transcription errors, e.g. recognition errors tied
to initial mutation, dialect-specific verb forms, and bilingual influence from English. The annotated corpus,
evaluation tools, and linguistic diagnostics will be made publicly available, addressing a key resource gap for
Irish language technology. The project lays the groundwork for future research in computational linguistics,
child language acquisition, and inclusive speech processing for minority languages.
| Status | Active |
|---|---|
| Effective start/end date | 1/10/25 → 30/09/27 |
Collaborative partners
- Queens University Belfast (lead)
Funding
- The British Academy: £9,967.40
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.