In this paper, we propose an approach to learn generic multi-modal mesh surface representations using a novel scheme for fusing texture and geometric data. Our approach defines an inverse mapping between different geometric descriptors computed on the mesh surface or its down-sampled version, and the corresponding 2D texture image of the mesh, allowing the construction of fused geometrically augmented images (FGAI). This new fused modality enables us to learn feature representations from 3D data in a highly efficient manner by simply employing standard CNNs in a transfer-learning mode. The proposed approach is both computationally and memory efficient, preserves intrinsic geometric information and learns highly discriminative feature representations by effectively fusing shape and texture information at data level. The efficacy of our approach is demonstrated for the tasks of facial action unit detection and expression classification. The extensive experiments conducted on the Bosphorus and BU-4DFE datasets show that our method produces a significant boost in the performance when compared to state-of-the-art solutions.

Learned 3D Shape Representations Using Fused Geometrically Augmented Images: Application to Facial Expression and Action Unit Detection / B. Taha, M. Hayat, S. Berretti, D. Hatzinakos, N. Werghi. - In: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. - ISSN 1051-8215. - STAMPA. - 30:(2020), pp. 2900-2916. [10.1109/TCSVT.2020.2984241]

Learned 3D Shape Representations Using Fused Geometrically Augmented Images: Application to Facial Expression and Action Unit Detection

S. Berretti;
2020

Abstract

In this paper, we propose an approach to learn generic multi-modal mesh surface representations using a novel scheme for fusing texture and geometric data. Our approach defines an inverse mapping between different geometric descriptors computed on the mesh surface or its down-sampled version, and the corresponding 2D texture image of the mesh, allowing the construction of fused geometrically augmented images (FGAI). This new fused modality enables us to learn feature representations from 3D data in a highly efficient manner by simply employing standard CNNs in a transfer-learning mode. The proposed approach is both computationally and memory efficient, preserves intrinsic geometric information and learns highly discriminative feature representations by effectively fusing shape and texture information at data level. The efficacy of our approach is demonstrated for the tasks of facial action unit detection and expression classification. The extensive experiments conducted on the Bosphorus and BU-4DFE datasets show that our method produces a significant boost in the performance when compared to state-of-the-art solutions.
2020
30
2900
2916
Goal 9: Industry, Innovation, and Infrastructure
B. Taha, M. Hayat, S. Berretti, D. Hatzinakos, N. Werghi
File in questo prodotto:
File Dimensione Formato  
tcsvt2020.pdf

Accesso chiuso

Descrizione: articolo principale
Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 5.27 MB
Formato Adobe PDF
5.27 MB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1191364
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? 20
social impact