In this paper, we address the problem of content-based image retrieval (CBIR) by learning images representations based on the activations of a Convolutional Neural Network. We propose an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on the trainable aggregation layer NetVLAD and bags of local features obtained by splitting the activations, allowing to reduce the dimensionality of the descriptor and to increase the performance of retrieval. Training is performed using an improved triplet mining procedure that selects samples based on their difficulty to obtain an effective image representation, reducing the risk of overfitting and loss of generalization. Extensive experiments show that our approach, that can be effectively used with different CNN architectures, obtains state-of-the-art results on standard and challenging CBIR datasets.

Effective Triplet Mining Improves Training of Multi-scale Pooled CNN for Image Retrieval / Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo. - In: MACHINE VISION AND APPLICATIONS. - ISSN 0932-8092. - ELETTRONICO. - 33:(2022), pp. 0-0. [10.1007/s00138-021-01260-z]

Effective Triplet Mining Improves Training of Multi-scale Pooled CNN for Image Retrieval

Federico Vaccaro;Marco Bertini
;
Tiberio Uricchio;Alberto Del Bimbo
2022

Abstract

In this paper, we address the problem of content-based image retrieval (CBIR) by learning images representations based on the activations of a Convolutional Neural Network. We propose an end-to-end trainable network architecture that exploits a novel multi-scale local pooling based on the trainable aggregation layer NetVLAD and bags of local features obtained by splitting the activations, allowing to reduce the dimensionality of the descriptor and to increase the performance of retrieval. Training is performed using an improved triplet mining procedure that selects samples based on their difficulty to obtain an effective image representation, reducing the risk of overfitting and loss of generalization. Extensive experiments show that our approach, that can be effectively used with different CNN architectures, obtains state-of-the-art results on standard and challenging CBIR datasets.
2022
33
0
0
Federico Vaccaro, Marco Bertini, Tiberio Uricchio, Alberto Del Bimbo
File in questo prodotto:
File Dimensione Formato  
Vaccaro2022_Article_EffectiveTripletMiningImproves.pdf

accesso aperto

Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 842.61 kB
Formato Adobe PDF
842.61 kB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1247635
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact