In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR) an image is combined with a text that provides information regarding user intentions, and is relevant for application domains like e-commerce. The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder. Then in a second training stage we learn a more complex combiner network that merges visual and textual features. Contrastive learning is used in both stages. The proposed approach obtains state-of-the-art performance for conditioned CBIR on the FashionIQ dataset and for composed CBIR on the more recent CIRR dataset.
Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features / Baldrati, Alberto; Bertini, Marco; Uricchio, Tiberio; Del Bimbo, Alberto. - ELETTRONICO. - (2022), pp. 4955-4964. ( IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops) [10.1109/cvprw56347.2022.00543].
Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features
Baldrati, Alberto;Bertini, Marco;Uricchio, Tiberio;Del Bimbo, Alberto
2022
Abstract
In this paper, we present an approach for conditioned and composed image retrieval based on CLIP features. In this extension of content-based image retrieval (CBIR) an image is combined with a text that provides information regarding user intentions, and is relevant for application domains like e-commerce. The proposed method is based on an initial training stage where a simple combination of visual and textual features is used, to fine-tune the CLIP text encoder. Then in a second training stage we learn a more complex combiner network that merges visual and textual features. Contrastive learning is used in both stages. The proposed approach obtains state-of-the-art performance for conditioned CBIR on the FashionIQ dataset and for composed CBIR on the more recent CIRR dataset.| File | Dimensione | Formato | |
|---|---|---|---|
|
Conditioned_and_composed_image_retrieval_combining_and_partially_fine-tuning_CLIP-based_features.pdf
accesso aperto
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Open Access
Dimensione
1.99 MB
Formato
Adobe PDF
|
1.99 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



