In this paper, we propose an adaptive structured pooling strategy to solve the action recognition problem in videos. Our method aims at individuating several spatio-temporal pooling regions each corresponding to a consistent spatial and temporal subset of the video. Each subset of the video gives a pooling weights map and is represented as a Fisher vector computed from the soft weighted contributions of all dense trajectories evolving in it. We further represent each video through a graph structure, defined over multiple granularities of spatio-temporal subsets. The graph structures extracted from all videos are finally compared with an efficient graph matching kernel. Our approach does not rely on a fixed partitioning of the video. Moreover, the graph structure depicts both spatial and temporal relationships between the spatio-temporal subsets. Experiments on the UCF Sports and the HighFive datasets show performance above the state-of-the-art.

Adaptive structured pooling for action recognition / S. Karaman;L. Seidenari;S. Ma;A. Del Bimbo;S. Sclaroff. - ELETTRONICO. - (2014), pp. 1-12. (Intervento presentato al convegno BMVC nel 2014).

Adaptive structured pooling for action recognition

KARAMAN, SVEBOR;SEIDENARI, LORENZO;DEL BIMBO, ALBERTO;
2014

Abstract

In this paper, we propose an adaptive structured pooling strategy to solve the action recognition problem in videos. Our method aims at individuating several spatio-temporal pooling regions each corresponding to a consistent spatial and temporal subset of the video. Each subset of the video gives a pooling weights map and is represented as a Fisher vector computed from the soft weighted contributions of all dense trajectories evolving in it. We further represent each video through a graph structure, defined over multiple granularities of spatio-temporal subsets. The graph structures extracted from all videos are finally compared with an efficient graph matching kernel. Our approach does not rely on a fixed partitioning of the video. Moreover, the graph structure depicts both spatial and temporal relationships between the spatio-temporal subsets. Experiments on the UCF Sports and the HighFive datasets show performance above the state-of-the-art.
2014
British Machine Vision Conference
BMVC
2014
S. Karaman;L. Seidenari;S. Ma;A. Del Bimbo;S. Sclaroff
File in questo prodotto:
File Dimensione Formato  
paper109.pdf

accesso aperto

Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Open Access
Dimensione 6.63 MB
Formato Adobe PDF
6.63 MB Adobe PDF

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/949394
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact