Fashion illustration is used by designers to communi- cate their vision and to bring the design idea from con- ceptualization to realization, showing how clothes inter- act with the human body. In this context, computer vi- sion can thus be used to improve the fashion design pro- cess. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an ap- proach that has not been used before in the fashion do- main. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental re- sults on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and col- lected multimodal annotations are publicly available at
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing / Baldrati, Alberto; Morelli, Davide; Cartella, Giuseppe; Cornia, Marcella; Bertini, Marco; Cucchiara, Rita. - ELETTRONICO. - (2023), pp. 23336-23345. ( IEEE International Conference on Computer Vision (ICCV)) [10.1109/iccv51070.2023.02138].
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
Baldrati, Alberto;Bertini, Marco;Cucchiara, Rita
2023
Abstract
Fashion illustration is used by designers to communi- cate their vision and to bring the design idea from con- ceptualization to realization, showing how clothes inter- act with the human body. In this context, computer vi- sion can thus be used to improve the fashion design pro- cess. Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches. We tackle this problem by proposing a new architecture based on latent diffusion models, an ap- proach that has not been used before in the fashion do- main. Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets, namely Dress Code and VITON-HD, with multimodal annotations collected in a semi-automatic manner. Experimental re- sults on these new datasets demonstrate the effectiveness of our proposal, both in terms of realism and coherence with the given multimodal inputs. Source code and col- lected multimodal annotations are publicly available at| File | Dimensione | Formato | |
|---|---|---|---|
|
2304.02051v2.pdf
accesso aperto
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Open Access
Dimensione
22.92 MB
Formato
Adobe PDF
|
22.92 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



