Audio-Driven Facial Animation with Deep Learning: A Survey

Diqiong Jiang, Jian Chang, Lihua You, Shaojun Bian, Robert Kosk, Greg Maguire

Research output: Contribution to journalReview articlepeer-review

7 Downloads (Pure)

Abstract

Audio-driven facial animation is a rapidly evolving field that aims to generate realistic facial expressions and lip movements synchronized with a given audio input. This survey provides a comprehensive review of deep learning techniques applied to audio-driven facial animation, with a focus on both audio-driven facial image animation and audio-driven facial mesh animation. These approaches employ deep learning to map audio inputs directly onto 3D facial meshes or 2D images, enabling the creation of highly realistic and synchronized animations. This survey also explores evaluation metrics, available datasets, and the challenges that remain, such as disentangling lip synchronization and emotions, generalization across speakers, and dataset limitations. Lastly, we discuss future directions, including multi-modal integration, personalized models, and facial attribute modification in animations, all of which are critical for the continued development and application of this technology.
Original languageEnglish
Article number 675
Pages (from-to)1-24
Number of pages24
JournalInformation
Volume15
Issue number11
Early online date28 Oct 2024
DOIs
Publication statusPublished online - 28 Oct 2024

Bibliographical note

Publisher Copyright:
© 2024 by the authors.

Data Access Statement

No new data were created or analyzed in this study. Data sharing is
not applicable to this article.

Keywords

  • Deep learning
  • audio processing
  • talking head
  • face generation

Fingerprint

Dive into the research topics of 'Audio-Driven Facial Animation with Deep Learning: A Survey'. Together they form a unique fingerprint.

Cite this