Diffusion models have become the most popular approach for high-quality image generation, but their high computational cost still remains a significant challenge. To address this problem, we propose U-Shape Mamba (USM), a novel diffusion model that leverages Mamba-based layers within a U-Net-like hierarchical structure. By progressively reducing sequence length in the encoder and restoring it in the decoder through Mamba blocks, USM significantly lowers computational overhead while maintaining strong generative capabilities. Experimental results against Zigma, which is currently the most efficient Mamba-based diffusion model, demonstrate that USM achieves one-third the GFlops, requires less memory and is faster, while outperforming Zigma in image quality. Frechet Inception Distance (FID) is improved by 15.3, 0.84 and 2.7 points on AFHQ, CelebAHQ and COCO datasets, respectively. These findings highlight USM as a highly efficient and scalable solution for diffusion-based generative models, making highquality image synthesis more accessible to the research community while reducing computational costs.

U-Shape Mamba: State Space Model for Faster Diffusion / Ergasti, Alex; Botti, Filippo; Fontanini, Tomaso; Ferrari, Claudio; Bertozzi, Massimo; Prati, Andrea. - STAMPA. - (2025), pp. 3242-3249. ( 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025 usa 2025) [10.1109/cvprw67362.2025.00307].

U-Shape Mamba: State Space Model for Faster Diffusion

Ferrari, Claudio;
2025

Abstract

Diffusion models have become the most popular approach for high-quality image generation, but their high computational cost still remains a significant challenge. To address this problem, we propose U-Shape Mamba (USM), a novel diffusion model that leverages Mamba-based layers within a U-Net-like hierarchical structure. By progressively reducing sequence length in the encoder and restoring it in the decoder through Mamba blocks, USM significantly lowers computational overhead while maintaining strong generative capabilities. Experimental results against Zigma, which is currently the most efficient Mamba-based diffusion model, demonstrate that USM achieves one-third the GFlops, requires less memory and is faster, while outperforming Zigma in image quality. Frechet Inception Distance (FID) is improved by 15.3, 0.84 and 2.7 points on AFHQ, CelebAHQ and COCO datasets, respectively. These findings highlight USM as a highly efficient and scalable solution for diffusion-based generative models, making highquality image synthesis more accessible to the research community while reducing computational costs.
2025
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2025
usa
2025
Ergasti, Alex; Botti, Filippo; Fontanini, Tomaso; Ferrari, Claudio; Bertozzi, Massimo; Prati, Andrea
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1453039
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact