Building on the recent advances in multimodal zero-shot representation learning, in this paper we explore the use of features obtained from the recent CLIP model to perform conditioned image retrieval. Starting from a reference image and an additive textual description of what the user wants with respect to the reference image, we learn a Combiner network that is able to understand the image content, integrate the textual description and provide combined feature used to perform the conditioned image retrieval. Starting from the bare CLIP features and a simple baseline, we show that a carefully crafted Combiner network, based on such multimodal features, is extremely effective and outperforms more complex state of the art approaches on the popular FashionIQ dataset.

Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features / Baldrati A.; Bertini M.; Uricchio T.; Del Bimbo A.. - ELETTRONICO. - (2021), pp. 1-5. (Intervento presentato al convegno 3rd ACM International Conference on Multimedia in Asia, MMAsia 2021 tenutosi a aus nel 2021) [10.1145/3469877.3493593].

Conditioned Image Retrieval for Fashion using Contrastive Learning and CLIP-based Features

Baldrati A.;Bertini M.;Uricchio T.;Del Bimbo A.
2021

Abstract

Building on the recent advances in multimodal zero-shot representation learning, in this paper we explore the use of features obtained from the recent CLIP model to perform conditioned image retrieval. Starting from a reference image and an additive textual description of what the user wants with respect to the reference image, we learn a Combiner network that is able to understand the image content, integrate the textual description and provide combined feature used to perform the conditioned image retrieval. Starting from the bare CLIP features and a simple baseline, we show that a carefully crafted Combiner network, based on such multimodal features, is extremely effective and outperforms more complex state of the art approaches on the popular FashionIQ dataset.
2021
ACM International Conference Proceeding Series
3rd ACM International Conference on Multimedia in Asia, MMAsia 2021
aus
2021
Baldrati A.; Bertini M.; Uricchio T.; Del Bimbo A.
File in questo prodotto:
File Dimensione Formato  
3469877.3493593.pdf

Accesso chiuso

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 552.97 kB
Formato Adobe PDF
552.97 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1297139
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact