In this paper, we address the challenge of generating re-alistic 3D human motions for action classes that were never seen during the training phase. Our approach involves de-composing complex actions into simpler movements, specifically those observed during training, by leveraging the knowledge of human motion contained in GPTs models. These simpler movements are then combined into a single, realistic animation using the properties of diffusion models. Our claim is that this decomposition and subsequent recombination of simple movements can synthesize an animation that accurately represents the complex input action. This method operates during the inference phase and can be in-tegrated with any pre-trained diffusion model, enabling the synthesis of motion classes not present in the training data. We evaluate our method by dividing two benchmark human motion datasets into basic and complex actions, and then compare its performance against the state-of-the-art. Our code and models are publicly available at our github page.

Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models / Mandelli, Lorenzo; Berretti, Stefano. - ELETTRONICO. - (2025), pp. 1279-1288. ( 2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025 USA 2025) [10.1109/wacv61041.2025.00132].

Generation of Complex 3D Human Motion by Temporal and Spatial Composition of Diffusion Models

Mandelli, Lorenzo;Berretti, Stefano
2025

Abstract

In this paper, we address the challenge of generating re-alistic 3D human motions for action classes that were never seen during the training phase. Our approach involves de-composing complex actions into simpler movements, specifically those observed during training, by leveraging the knowledge of human motion contained in GPTs models. These simpler movements are then combined into a single, realistic animation using the properties of diffusion models. Our claim is that this decomposition and subsequent recombination of simple movements can synthesize an animation that accurately represents the complex input action. This method operates during the inference phase and can be in-tegrated with any pre-trained diffusion model, enabling the synthesis of motion classes not present in the training data. We evaluate our method by dividing two benchmark human motion datasets into basic and complex actions, and then compare its performance against the state-of-the-art. Our code and models are publicly available at our github page.
2025
Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025
2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025
USA
2025
Goal 9: Industry, Innovation, and Infrastructure
Mandelli, Lorenzo; Berretti, Stefano
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1436384
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact