The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However this approach does not model the temporal information of the video stream. In this paper we present a method to introduce temporal information within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW model. The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two datasets, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.

Video Event Classification Using Bag of Words and String Kernels / Lamberto Ballan;Marco Bertini;Alberto Del Bimbo;Giuseppe Serra. - STAMPA. - 5716:(2009), pp. 170-178. (Intervento presentato al convegno 15th International Conference on Image Analysis and Processing (ICIAP) tenutosi a Vietri sul Mare, Salerno nel September 8-11) [10.1007/978-3-642-04146-4_20].

Video Event Classification Using Bag of Words and String Kernels

BALLAN, LAMBERTO;BERTINI, MARCO;DEL BIMBO, ALBERTO;SERRA, GIUSEPPE
2009

Abstract

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However this approach does not model the temporal information of the video stream. In this paper we present a method to introduce temporal information within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW model. The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two datasets, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.
2009
Proc. of International Conference on Image Analysis and Processing (ICIAP)
15th International Conference on Image Analysis and Processing (ICIAP)
Vietri sul Mare, Salerno
September 8-11
Lamberto Ballan;Marco Bertini;Alberto Del Bimbo;Giuseppe Serra
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/363593
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 10
social impact