While engaged in a face-to-face conversation, being capable of understanding the attitude, emotion, and intention of another person allows one guiding his/her behavior establishing a comfortable communication either verbal and non-verbal (i.e., body and face language). This paper introduces the "The IMEmo Interpersonal Multi-Emotion video dataset", a new in-the-wild dataset of face-to-face interaction, built from movies of romance and drama categories. We manually collected over 100 clips from different movies in different languages. The dataset consists of 79.3 minutes scenes, with a duration of each clip ranging between 0.20 and 2.13 minutes. Each clip contains two people communicating with each other both verbally and with expression and body pose and gestures. Currently, it includes age, gender, emotions, social relationships, actions and valence/arousal annotations for both individuals. Emotion recognition results using a baseline CNN approach are also reported to provide an estimation of the difficulty of the data also in comparison to existing benchmarks.
IMEmo: An Interpersonal Relation Multi-Emotion Dataset / Guerdelli H.; Ferrari C.; Berretti S.; Del Bimbo A.. - ELETTRONICO. - 12:(2024), pp. 1-10. (Intervento presentato al convegno IEEE 18th International Conference on Automatic Face and Gesture Recognition tenutosi a Istanbul, Turkye nel 27-31 May, 2024) [10.1109/FG59268.2024.10581895].
IMEmo: An Interpersonal Relation Multi-Emotion Dataset
Guerdelli H.;Berretti S.;Del Bimbo A.
2024
Abstract
While engaged in a face-to-face conversation, being capable of understanding the attitude, emotion, and intention of another person allows one guiding his/her behavior establishing a comfortable communication either verbal and non-verbal (i.e., body and face language). This paper introduces the "The IMEmo Interpersonal Multi-Emotion video dataset", a new in-the-wild dataset of face-to-face interaction, built from movies of romance and drama categories. We manually collected over 100 clips from different movies in different languages. The dataset consists of 79.3 minutes scenes, with a duration of each clip ranging between 0.20 and 2.13 minutes. Each clip contains two people communicating with each other both verbally and with expression and body pose and gestures. Currently, it includes age, gender, emotions, social relationships, actions and valence/arousal annotations for both individuals. Emotion recognition results using a baseline CNN approach are also reported to provide an estimation of the difficulty of the data also in comparison to existing benchmarks.File | Dimensione | Formato | |
---|---|---|---|
IMEmo_An_Interpersonal_Relation_Multi-Emotion_Dataset.pdf
Accesso chiuso
Descrizione: file finale
Tipologia:
Versione finale referata (Postprint, Accepted manuscript)
Licenza:
Tutti i diritti riservati
Dimensione
1.07 MB
Formato
Adobe PDF
|
1.07 MB | Adobe PDF | Richiedi una copia |
I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.