AdaMesh

Updates

!!! Render the predicted meshes into photo-realistic avatars with an image2image translation network to show one application of our work.

More comparison results on VOCASET dataset in the supplementary file.

Compare with state-of-the-art methods.

01:29-01:49 Speech-lip synchronization
01:50-02:06 Expression richness
02:07-02:30 Head pose naturalness
02:52-03:08 Compare with different topologies and sing
03:08-03:11 Learn more/arbitrary personalized talking styles
03:12-03:33 Render meshes into photo-realistic avatars

Abstract

Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including facial expression and head pose styles. Several works intend to capture the personalities by fine-tuning modules. However, limited training data leads to the lack of vividness. In this work, we propose AdaMesh, a novel adaptive speech-driven facial animation approach, which learns the personalized talking style from a reference video of about 10 seconds and generates vivid facial expressions and head poses. Specifically, we propose mixture-of-low-rank adaptation (MoLoRA) to fine-tune the expression adapter, which efficiently captures the facial expression style. For the personalized pose style, we propose a pose adapter by building a discrete pose prior and retrieving the appropriate style embedding with a semantic-aware pose style matrix without fine-tuning. Extensive experimental results show that our approach outperforms state-of-the-art methods, preserves the talking style in the reference video, and generates vivid facial animation.

Approach

Expression Adapter

To achieve efficient adaptation for facial expressions, we pre-train the expression adapter to learn general and person-agnostic information that ensures lip synchronization and then optimize the MoLoRA parameters to equip the expression adapter with a specific expression style.

Pose Adapter

For modeling the pose style, we propose a pose adapter by formulating the adaption as a simple but efficient retrieval task instead of fine-tuning modules.