Generating speech-driven 3D talking heads presents numerous challenges; among those is dealing with varying mesh topologies where no point-wise correspondence exists across the meshes the model can animate. While previous literature works assume fixed mesh structures, in thisworkwe present the first framework capable of animating 3Dfaces in arbitrary topologies, including real scanned data. Our approach leverages heat diffusion to predict features that are robust to the mesh topology. We explore two training settings: a registered one, in which meshes in a training sequences share a fixed topology but any mesh can be animated at test time, and an fully unregistered one, which allows effective training with varying mesh structures. Additionally, we highlight the limitations of current evaluation metrics and propose new metrics for better lip-syncing evaluation. An extensive evaluation shows our approach performs favorably compared to fixed topology techniques, setting a new benchmark by offering a versatile and high-fidelity solution for 3D talking heads where the topology constraint is dropped.
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads / Federico Nocentini, Thomas Besnier, Claudio Ferrari, Sylvain Arguillere, Mohamed Daoudi, Stefano Berretti. - In: INTERNATIONAL JOURNAL OF COMPUTER VISION. - ISSN 1573-1405. - STAMPA. - 134:(2026), pp. 105.1-105.18. [10.1007/s11263-025-02726-7]
Beyond Fixed Topologies: Unregistered Training and Comprehensive Evaluation Metrics for 3D Talking Heads
Federico Nocentini;Claudio Ferrari;Stefano Berretti
2026
Abstract
Generating speech-driven 3D talking heads presents numerous challenges; among those is dealing with varying mesh topologies where no point-wise correspondence exists across the meshes the model can animate. While previous literature works assume fixed mesh structures, in thisworkwe present the first framework capable of animating 3Dfaces in arbitrary topologies, including real scanned data. Our approach leverages heat diffusion to predict features that are robust to the mesh topology. We explore two training settings: a registered one, in which meshes in a training sequences share a fixed topology but any mesh can be animated at test time, and an fully unregistered one, which allows effective training with varying mesh structures. Additionally, we highlight the limitations of current evaluation metrics and propose new metrics for better lip-syncing evaluation. An extensive evaluation shows our approach performs favorably compared to fixed topology techniques, setting a new benchmark by offering a versatile and high-fidelity solution for 3D talking heads where the topology constraint is dropped.| File | Dimensione | Formato | |
|---|---|---|---|
|
s11263-025-02726-7 (1).pdf
accesso aperto
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
Creative commons
Dimensione
3 MB
Formato
Adobe PDF
|
3 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



