Reconstructing accurate 3D shapes of human faces from a single 2D image is a highly challenging Computer Vision problem that was studied for decades. Statistical modeling techniques, such as the 3D Morphable Model (3DMM), have been widely employed because of their capability of reconstructing a plausible model grounding on the prior knowledge of the facial shape. However, most of them derive a and smooth approximation of the real shape, without accounting for the surface details. In this work, we propose an approach based on a Conditional Generative Adversarial Network (CGAN) for refining the reconstruction provided by a 3DMM. The latter is represented as a threechannel image, where the pixel intensities represent, respectively, the depth and the azimuth and elevation angles of the surface normals. The network architecture is an encoderdecoder, which is trained progressively, starting from the lower-resolution layers; this technique allows a more stable training, which led to the generation of high quality outputs even when high-resolution images are fed during the training. Experimental results show that our method is able to produce detailed realistic reconstructions and obtain lower errors with respect to the 3DMM. Finally, a comparison with a state-of-the-art solution evidences competitive performance and a clear improvement in the quality of the generated models.

Coarse-to-Fine 3D Face Reconstruction / C. Ferrari, L. Galteri, G. Lisanti, S. Berretti, A. Del Bimbo. - STAMPA. - (2019), pp. 25-31. (Intervento presentato al convegno IEEE Conference on Computer Vision Workshops tenutosi a Long Beach, California nel 16-20 June, 2019).

Coarse-to-Fine 3D Face Reconstruction

C. Ferrari
;
L. Galteri
;
G. Lisanti;S. Berretti;A. Del Bimbo
2019

Abstract

Reconstructing accurate 3D shapes of human faces from a single 2D image is a highly challenging Computer Vision problem that was studied for decades. Statistical modeling techniques, such as the 3D Morphable Model (3DMM), have been widely employed because of their capability of reconstructing a plausible model grounding on the prior knowledge of the facial shape. However, most of them derive a and smooth approximation of the real shape, without accounting for the surface details. In this work, we propose an approach based on a Conditional Generative Adversarial Network (CGAN) for refining the reconstruction provided by a 3DMM. The latter is represented as a threechannel image, where the pixel intensities represent, respectively, the depth and the azimuth and elevation angles of the surface normals. The network architecture is an encoderdecoder, which is trained progressively, starting from the lower-resolution layers; this technique allows a more stable training, which led to the generation of high quality outputs even when high-resolution images are fed during the training. Experimental results show that our method is able to produce detailed realistic reconstructions and obtain lower errors with respect to the 3DMM. Finally, a comparison with a state-of-the-art solution evidences competitive performance and a clear improvement in the quality of the generated models.
2019
IEEE Conference on Computer Vision Workshops
IEEE Conference on Computer Vision Workshops
Long Beach, California
16-20 June, 2019
C. Ferrari, L. Galteri, G. Lisanti, S. Berretti, A. Del Bimbo
File in questo prodotto:
File Dimensione Formato  
cvprw19.pdf

Accesso chiuso

Descrizione: articolo principale
Tipologia: Versione finale referata (Postprint, Accepted manuscript)
Licenza: Tutti i diritti riservati
Dimensione 806.37 kB
Formato Adobe PDF
806.37 kB Adobe PDF   Richiedi una copia

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/1175152
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact