Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Baldrati, Alberto; Morelli, Davide; Cartella, Giuseppe; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita

doi:10.1109/iccv51070.2023.02138

Fashion illustration is used by designers to communi- cate their vision and to bring the design idea from con- ceptualization to realization, showing how clothes inter- act with the human body. In this context, computer vi- sion can thus be used to improve the fashion design pro- cess. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an ap- proach that has not been used before in the fashion do- main. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental re- sults on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and col- lected multimodal annotations are publicly available at

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing / Baldrati, Alberto; Morelli, Davide; Cartella, Giuseppe; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita. - ELETTRONICO. - (2023), pp. 23336-23345. ( IEEE International Conference on Computer Vision (ICCV)) [10.1109/iccv51070.2023.02138].