Project Details

Description

This pilot project will develop a longitudinal, annotated corpus of spontaneous Irish child-directed speech to support qualitative and quantitative analysis of transcription accuracy and speech variability. Longitudinal recordings will be collected from parents of children from Munster, Connacht, and Ulster dialects of Irish, capturing both dialectal and developmental variation in a real-world, home environment. The audio will be transcribed and linguistically annotated to enable qualitative interpretation and quantitative benchmarking. We will evaluate Whisper, an open-source ASR model, using both standard quantitative metrics (e.g., Word Error Rate) and qualitative categorization of transcription errors, e.g. recognition errors tied to initial mutation, dialect-specific verb forms, and bilingual influence from English. The annotated corpus, evaluation tools, and linguistic diagnostics will be made publicly available, addressing a key resource gap for Irish language technology. The project lays the groundwork for future research in computational linguistics, child language acquisition, and inclusive speech processing for minority languages.
StatusActive
Effective start/end date1/10/2530/09/27

Collaborative partners

  • Queens University Belfast (lead)

Funding

  • The British Academy: £9,967.40

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.