Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select "on the fly" from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results.

A data-driven approach for tag refinement and localization in web videos / Ballan, Lamberto; Bertini, Marco; Serra, Giuseppe; Del Bimbo,Alberto. - In: COMPUTER VISION AND IMAGE UNDERSTANDING. - ISSN 1077-3142. - STAMPA. - 140:(2015), pp. 59-67. [10.1016/j.cviu.2015.05.009]

A data-driven approach for tag refinement and localization in web videos

BALLAN, LAMBERTO;BERTINI, MARCO;DEL BIMBO, ALBERTO
2015

Abstract

Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook. On the other hand, tagging a video sequence is more complicated and time consuming, so that users just tag the overall content of a video. In this paper we present a method for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to keyframes. Our approach exploits collective knowledge embedded in user-generated tags and web sources, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr, as well as web sources like Google and Bing. Given a keyframe, our method is able to select "on the fly" from these visual sources the training exemplars that should be the most relevant for this test sample, and proceeds to transfer labels across similar images. Compared to existing video tagging approaches that require training classifiers for each tag, our system has few parameters, is easy to implement and can deal with an open vocabulary scenario. We demonstrate the approach on tag refinement and localization on DUT-WEBV, a large dataset of web videos, and show state-of-the-art results.
2015
140
59
67
Ballan, Lamberto; Bertini, Marco; Serra, Giuseppe; Del Bimbo,Alberto
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S1077314215001204-main.pdf

Accesso chiuso

Descrizione: Articolo principale
Tipologia: Pdf editoriale (Version of record)
Licenza: Tutti i diritti riservati
Dimensione 2.26 MB
Formato Adobe PDF
2.26 MB Adobe PDF   Richiedi una copia
1407.0623.pdf

accesso aperto

Tipologia: Preprint (Submitted version)
Licenza: Open Access
Dimensione 9.09 MB
Formato Adobe PDF
9.09 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1019388
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 12
social impact