In this paper, we propose an adaptive structured pooling strategy to solve the action recognition problem in videos. Our method aims at individuating several spatio-temporal pooling regions each corresponding to a consistent spatial and temporal subset of the video. Each subset of the video gives a pooling weights map and is represented as a Fisher vector computed from the soft weighted contributions of all dense trajectories evolving in it. We further represent each video through a graph structure, defined over multiple granularities of spatio-temporal subsets. The graph structures extracted from all videos are finally compared with an efficient graph matching kernel. Our approach does not rely on a fixed partitioning of the video. Moreover, the graph structure depicts both spatial and temporal relationships between the spatio-temporal subsets. Experiments on the UCF Sports and the HighFive datasets show performance above the state-of-the-art.
Adaptive structured pooling for action recognition / S. Karaman;L. Seidenari;S. Ma;A. Del Bimbo;S. Sclaroff. - ELETTRONICO. - (2014), pp. 1-12. (Intervento presentato al convegno BMVC nel 2014).
Adaptive structured pooling for action recognition
KARAMAN, SVEBOR;SEIDENARI, LORENZO;DEL BIMBO, ALBERTO;
2014
Abstract
In this paper, we propose an adaptive structured pooling strategy to solve the action recognition problem in videos. Our method aims at individuating several spatio-temporal pooling regions each corresponding to a consistent spatial and temporal subset of the video. Each subset of the video gives a pooling weights map and is represented as a Fisher vector computed from the soft weighted contributions of all dense trajectories evolving in it. We further represent each video through a graph structure, defined over multiple granularities of spatio-temporal subsets. The graph structures extracted from all videos are finally compared with an efficient graph matching kernel. Our approach does not rely on a fixed partitioning of the video. Moreover, the graph structure depicts both spatial and temporal relationships between the spatio-temporal subsets. Experiments on the UCF Sports and the HighFive datasets show performance above the state-of-the-art.File | Dimensione | Formato | |
---|---|---|---|
paper109.pdf
accesso aperto
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
Open Access
Dimensione
6.63 MB
Formato
Adobe PDF
|
6.63 MB | Adobe PDF |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.