Broadcast Language Identification & Subtitling System (BLISS)

Jinling Wang, Karla Munoz Esquivel, J Connolly, Kevin Curran, PM McKevitt

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

56 Downloads (Pure)

Abstract

Accessibility is an important area of Human Computer Interaction (HCI) and regulations within many countries mandate that broadcast media content be accessible to all. Currently, most subtitles for offline and live broadcasts are produced by people. However, subtitling methods employing re-speaking with Automatic Speech Recognition (ASR) technology are increasingly replacing manual methods. We discuss here the subtitling component of BLISS (Broadcast Language Identification & Subtitling System), an ASR system for automated subtitling and broadcast monitoring built using the Kaldi ASR Toolkit. The BLISS Gaussian Mixture Model (GMM)/Hidden Markov Model (HMM) acoustic model has been trained with ~960 hours of read speech, and language model with ~900k words combined with a pronunciation dictionary of 200k words from the LibriSpeech corpus. In tests with ~5 hours of unseen clean speech test data with little background noise and seen accents BLISS gives recognition accuracy of 91.87% based on the WER (Word Error Rate) metric. For ~5 hours of unseen challenge speech test data, with higher-WER speakers, BLISS’s accuracy reduces to 75.91%. A BLISS Deep Learning Neural Network (DNN) acoustic model has also been trained with ~100 hours of read speech data. It’s accuracy for ~5 hours of unseen clean and unseen challenge speech test data is 92.88% and 77.27% respectively based on WER. Future work includes training the DNN model with ~960 hours of read speech data using CUDA GPUs and also incorporating algorithms for background noise reduction. The BLISS core engine is also intended as a Language Identification system for broadcast monitoring (BLIS). This paper focuses on its Subtitling application (BLSS).
Original languageEnglish
Title of host publicationProceedings of the 32nd International BCS Human Computer Interaction Conference (HCI)
Pages1-6
Number of pages6
DOIs
Publication statusPublished (in print/issue) - 31 Jul 2018

Publication series

NameProceedings of British HCI 2018
PublisherBCS Learning and Development Ltd

Keywords

  • Automatic Speech Recognition (ASR)
  • Accent
  • Automated Subtitling
  • Background Noise
  • BLISS
  • Human-Computer Interaction
  • Kaldi
  • LibriSpeech

Fingerprint

Dive into the research topics of 'Broadcast Language Identification & Subtitling System (BLISS)'. Together they form a unique fingerprint.

Cite this