More and more popular, Smart TVs and set-top boxes open new ways for richer experiences in our living rooms. But to offer richer and novel functionalities, a better understanding of the multimedia content is crucial. If many works try to automatically annotate videos at object level, or classify them, we think that investigating the emotions will allow great TV experience improvements. With our work, we propose a temporal saliency detection approach capable of defining the most exciting parts of a video that will be of the most interest to the users. To identify the most interesting events, without performing their classification (in order to be independent from the video domain), we compute a time series of arousal (excitement level of the content), based on audio-visual features. Our goal is to merge this preliminary work with user emotions analysis, in order to create a multi-modal system, allowing to bridge the gap between users’ needs and multimedia contents.
Towards temporal saliency detection: better video understanding for richer TV experiences / Dumoulin, Joël; Mugellini, Elena; Abou Khaled, Omar; Bertini, Marco; Del Bimbo, Alberto. - ELETTRONICO. - (2014), pp. 199-202. (Intervento presentato al convegno ICDS 2014, The Eighth International Conference on Digital Society).
Towards temporal saliency detection: better video understanding for richer TV experiences
DUMOULIN, JOEL;BERTINI, MARCO;DEL BIMBO, ALBERTO
2014
Abstract
More and more popular, Smart TVs and set-top boxes open new ways for richer experiences in our living rooms. But to offer richer and novel functionalities, a better understanding of the multimedia content is crucial. If many works try to automatically annotate videos at object level, or classify them, we think that investigating the emotions will allow great TV experience improvements. With our work, we propose a temporal saliency detection approach capable of defining the most exciting parts of a video that will be of the most interest to the users. To identify the most interesting events, without performing their classification (in order to be independent from the video domain), we compute a time series of arousal (excitement level of the content), based on audio-visual features. Our goal is to merge this preliminary work with user emotions analysis, in order to create a multi-modal system, allowing to bridge the gap between users’ needs and multimedia contents.I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.