While engaged in a face-to-face conversation, being capable of understanding the attitude, emotion, and intention of another person allows one guiding his/her behavior establishing a comfortable communication either verbal and non-verbal (i.e., body and face language). This paper introduces the "The IMEmo Interpersonal Multi-Emotion video dataset", a new in-the-wild dataset of face-to-face interaction, built from movies of romance and drama categories. We manually collected over 100 clips from different movies in different languages. The dataset consists of 79.3 minutes scenes, with a duration of each clip ranging between 0.20 and 2.13 minutes. Each clip contains two people communicating with each other both verbally and with expression and body pose and gestures. Currently, it includes age, gender, emotions, social relationships, actions and valence/arousal annotations for both individuals. Emotion recognition results using a baseline CNN approach are also reported to provide an estimation of the difficulty of the data also in comparison to existing benchmarks.

IMEmo: An Interpersonal Relation Multi-Emotion Dataset / Guerdelli H.; Ferrari C.; Berretti S.; Del Bimbo A.. - ELETTRONICO. - 12:(2024), pp. 1-10. (Intervento presentato al convegno IEEE 18th International Conference on Automatic Face and Gesture Recognition tenutosi a Istanbul, Turkye nel 27-31 May, 2024) [10.1109/FG59268.2024.10581895].

IMEmo: An Interpersonal Relation Multi-Emotion Dataset

Guerdelli H.;Berretti S.;Del Bimbo A.
2024

Abstract

While engaged in a face-to-face conversation, being capable of understanding the attitude, emotion, and intention of another person allows one guiding his/her behavior establishing a comfortable communication either verbal and non-verbal (i.e., body and face language). This paper introduces the "The IMEmo Interpersonal Multi-Emotion video dataset", a new in-the-wild dataset of face-to-face interaction, built from movies of romance and drama categories. We manually collected over 100 clips from different movies in different languages. The dataset consists of 79.3 minutes scenes, with a duration of each clip ranging between 0.20 and 2.13 minutes. Each clip contains two people communicating with each other both verbally and with expression and body pose and gestures. Currently, it includes age, gender, emotions, social relationships, actions and valence/arousal annotations for both individuals. Emotion recognition results using a baseline CNN approach are also reported to provide an estimation of the difficulty of the data also in comparison to existing benchmarks.
2024
IEEE 18th International Conference on Automatic Face and Gesture Recognition
IEEE 18th International Conference on Automatic Face and Gesture Recognition
Istanbul, Turkye
27-31 May, 2024
Goal 9: Industry, Innovation, and Infrastructure
Guerdelli H.; Ferrari C.; Berretti S.; Del Bimbo A.
File in questo prodotto:
File Dimensione Formato  
IMEmo_An_Interpersonal_Relation_Multi-Emotion_Dataset.pdf

Accesso chiuso

Descrizione: file finale
Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 1.07 MB
Formato Adobe PDF
1.07 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1399814
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact