In this paper we present a novel method to improve action recognition by leveraging a set of captioned videos. By learning linear projections to map videos and text onto a common space, our approach shows that improved results on unseen videos can be obtained. We also propose a novel structure preserving loss that further ameliorates the quality of the projections. We tested our method on the challenging, realistic, Hollywood2 action recognition dataset where a considerable gain in performance is obtained. We show that the gain is proportional to the number of training samples used to learn the projections.
Do textual descriptions help action recognition? / Bruni, Matteo; Uricchio, Tiberio; Seidenari, Lorenzo; Del Bimbo, Alberto. - ELETTRONICO. - (2016), pp. 645-649. (Intervento presentato al convegno ACM Multimedia tenutosi a gbr nel 2016) [10.1145/2964284.2967301].
Do textual descriptions help action recognition?
BRUNI, MATTEO;URICCHIO, TIBERIO;SEIDENARI, LORENZO;DEL BIMBO, ALBERTO
2016
Abstract
In this paper we present a novel method to improve action recognition by leveraging a set of captioned videos. By learning linear projections to map videos and text onto a common space, our approach shows that improved results on unseen videos can be obtained. We also propose a novel structure preserving loss that further ameliorates the quality of the projections. We tested our method on the challenging, realistic, Hollywood2 action recognition dataset where a considerable gain in performance is obtained. We show that the gain is proportional to the number of training samples used to learn the projections.File | Dimensione | Formato | |
---|---|---|---|
p645-bruni.pdf
Accesso chiuso
Tipologia:
Pdf editoriale (Version of record)
Licenza:
Tutti i diritti riservati
Dimensione
798.62 kB
Formato
Adobe PDF
|
798.62 kB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.