Modèles actifs d’apparences adaptés - Adapted active appearance models

Modèles actifs d’apparences adaptés

Adapted active appearance models

Renaud Séguier Sylvain Le Gallou  Gaspard Breton  Christophe Garcia 

SUPÉLEC/IETR, Avenue de la Boulaie, 35511 Cesson-Sévigné

Orange Labs – TECH/IRIS, 4 rue du clos courtel, 35 512 Cesson Sévigné

Page: 
367-380
|
Received: 
21 December 2007
|
Accepted: 
N/A
|
Published: 
31 October 2008
| Citation

OPEN ACCESS

Abstract: 

Adapted Active Appearance Models

Active Appearance Models (AAM) are able to align known faces in an efficient manner when face pose and illumination are controlled. The AAM exploit a set of face examples in order to extract a statistical model. There is no difficulty to align a face with the same type (same morphology, illumination and pose) which constitute the example data set. Unfortunately, the AAM are less outstanding from the moment when the illumination, pose and face type changes. AAM robustness is link to the variability introduced in the learning base. The more the AAM will contain variability, the more it will be able to adapt itself to variable faces with the following drawback: the data represented in the reduced parameters space will then form different classes letting appear holes, regions without any data (see Fig. 1). It is therefore very difficult to make the AAM converge in this scattered space.

We propose in this paper a robust Active Appearance Models allowing a real-time implementation. To increase the AAM robustness to illumination changes, we propose Oriented Map AAM (OM-AAM). Adapted AAM will be presented after to increase the AAM robustness to any other types of variability (in identity, pose, expression etc.).

OM-AAM

We propose a specific transformation of the active model texture in an orientation map, which change the normalization process of the AAM. First of all, we apply systematically an adaptive histogram equalization from CLAHE [Zuiderweld 1994] on the images. We then evaluate the horizontal and vertical gradient and simply use the gradient angle on each pixel instead of its gray level. This angle is evaluated on Na values. In practice we quantify it on eight bits so Na = 255. The new texture is then made up from an image representing the orientation of each pixel, that we call an oriented map. To overcome a discontinuity problem between 0 and 2Pi associated to similar edge orientations, we realize a mapping (Eq. 11) from [0..2] to 0..2 . In order to reduce the noise in uniform regions as illustrated in the background of Figure 3c, we emphasize the signal correlated with the region with high gradient information as it is proposed by [Cooste 2001] in using a non-linear function f (see Eq. 12). During the modelization, the oriented texture from image I4 (Eq. 13) will replace the texture usually used by the AAM.

We compare the OM-AAM performance to those of the DM-AAM and classical AAM on two public databases. The first one is dedicated to illumination problems (CMU-PIE: 1386 images of 66 faces under 21 different illuminations [SIM 2002]) and the other one is composed of different faces with several expressions taken in different backgrounds (BIOID: 1521 images [Res01]) under variable light exposition (see Fig. 5). These comparisons will be made in a generalization context: the faces used to construct the model (18 persons from the M2VTS database [Pigeon 1996]) do not belong to the testing databases.

We normalize the error found on four relevant points (gravity centers of the eyes, nose and mouth) by the distance between the eyes (see Eq. 15). Figure 6 represents the percentage of the images which have been aligned with the error e. On the CMU-PIE database, the OM-AAM are able to align 94% of the faces with a precision less or equal to 15% where the DM-AAM and classical ones are less efficient: their performances are of 88% and 79% respectively. But when the faces are acquired in real situations, our proposition overcome other methods: on BIOID database, the OM-AAM can align 52% of the faces with a precision less or equal to 15% which represents a gain of 27 and 42% in the performance relatively to classical AAM and DM ones respectively.

Adapted AAM

As said above, the AAM robustness is related to the face variability in the learning base. Instead of using a very generic model containing a lot of variability, we propose to use an initial model M0 which contains only a variability in identity and then use a specific model Madapt containing variability in pose and expression.

Let a general database contains three types of variability: in expression, identity and pose (see Fig. 7). It is made of several different faces, holding four distinct expressions. Each of the face presents each expression in five poses. The initial model M0 is realized from a database B DD0 containing different frontal faces in neutral expression (see Fig. 8). This initial model will be used to perform a rough alignment on the unknown face. Let C0 be the appearance vector after the alignment of the model M0 on the unknown analyzed face. In the reduced space of the model parameters, we seek for the k nearest parameters vectors of C0 belonging to the learning initial database B DD0. These k nearest neighbors correspond to the k nearest face of the analyzed one. For example in figure 9, the vector Cp will identify the face number p as the most resemble to the analyzed one. From this set of k nearest identities, we generate an adapted database B DDadapt containing the corresponding faces in different expressions and poses. B DDadapt is a subset of the general database (Fig. 7) from which we generate the adapted model Madapt. When k = 1, 2 or 3 it is possible to evaluate beforehand the adapted model. When we need to align an unknown face in a static image, we then simply align the face with the initial model M0 and apply the pre-computed model which corresponds to the k nearest faces.

We test the adapted AAM on the static images of the general database B DD0 (Fig. 7). A test sequence is then constituted of one unknown person showing four expressions under five different poses, the learning base associated to this testing base being constituted of all the other persons. A cross-validation of type Leave-one-out is used: every face is tested separately, using all the other one for the learning base. All the faces of the database have been tested, representing at the end a set of 580 images.

We compare the performance of our system when k = 2 (“Adapted AAM”) to the two other different AAM. The first one (“AAM 28”) get identity as the only variability and is constructed from the 28 faces (the twenty-ninth being tested) in frontal view and neutral expression. The second one (“AAM 560”) is full variability since it is based on 560 images representing 28 faces showing four expressions under five different poses. Figure 11 shows the superiority of the“Adapted AAM” on the two other models. If we look at the performances at the reference error of 15% our proposition is ten times more rapid and 16% more effective than a model constructed from a more rich database (“AAM 560”). If now we compare the “Adapted AAM” to the “AAM 28” which has the same computational complexity, it is more effective in 55% of the cases (94% versus 49%).

As a conclusion our model is more rapid and effective compared to the other models because it focuses on a relevant database relatively to the testing face.

Résumé

Les Modèles Actifs d’Apparence (MAA) sont efficaces lorsqu’il s’agit d’aligner (détecter les contours des yeux, du nez et de la bouche) des visages connus dans des espaces contraints (illumination et pose contrôlées). Nous proposons des Modèles Actifs d’Apparence Adaptés afin d’aligner des visages inconnus dans des poses et illuminations quelconques. Notre proposition repose d’une part sur une transformation des textures du modèle actif en carte d’orientation, ce qui impacte l’opération de normalisation des MAA; et d’autre part sur une recherche dans une banque de modèles pré-calculés du MAA le plus adapté au visage inconnu. Des tests sur des bases publiques et privées (BioId, CMU-PIE) montrent l’intérêt de notre approche: il devient possible d’aligner en temps réel des visages inconnus dans des situations où la lumière et la pose sont non contrôlées.

Keywords: 

Active Appearance Model, Human Machine Interface, Face analysis

Mots clés

Modèle Actif d’apparence, Interface Homme-Machine, Analyse de visage

1. Introduction
2. État De L’art
3. Modèles Actifs D’Apparence
4. MAA Robustes À L’illumination
5. MAA Robustesse À La Pose Et À L’identité
6. Conclusion Et Perspectives
  References

[Aidarous 2007] AIDAROUS Y., LE GALLOUS S., SATTAR A. AND SÉGUIER R., «Face Alignment using active appearance model optimized by simplex », International Conference on Computer Vision Theory and Applications (VISAPP) Barcelona, 2007.

[Blanz 1999] BLANZ V. , VETTER T., «A morphable model for the synthesis of 3{D} faces ». SIGGRAPH’99, Computer Graphics Proceedings, pages 187-194. Addison Wesley Longman, 1999.

[Belaroussi et al. 2005] BELAROUSSI R., PREVOST L. AND MILGRAM M. «Classifier combination for face localization in color images», International Conference on Image Analysis and Processing (ICIAP), 2005.

[Res01] HUMANSCAN AG BIOID~TECHNOLOGY RESEARCH, «The bioid face database» http://www.bioid.com/downloads/facedb/, 2001.

[Canzler 2004] CANZLER U. AND B.WEGENER, «Person-adaptive Facial Feature Analysis», International Conference on Electrical Engineering, 2004.

[Christoudias 2004] CHRISTOUDIAS C.M., MORENCY L.P. AND DARRELL T., « Light field appearance manifolds ». European Conference on Computer Vision, pages 481-493, 2004.

[Christoudias 2005] CHRISTOUDIAS C.M. AND DARRELL T., «On modelling nonlinear shape-and-texture appearance manifolds » CVPR’05, Conference on Computer Vision and Pattern Recognition, volume~2, pages 1067-1074, 2005.

[Chang 2004] CHANG Y., HU C. AND TURK M., «Probabilistic expression analysis on manifolds », CVPR’04, Computer Vision and Pattern Recognition, 2004.

[Cootes 1998] COOTES T.F., EDWARDS G.J. AND TAYLOR C.J., «Active Appearance Models», ECCV’98, European Conference on Computer Vision, 1998.

[Cootes 1999] COOTES T.F. AND TAYLOR C.J., «A mixture model for representing shape variation », Image and Vision Computing, 1999.

[Cootes 2000] COOTES T.F., WALKER K.N. AND TAYLOR C.J., «Viewbased active appearance models », FGR’00, International Conference on Automatic Face and Gesture Recognition, pages 227-232, 2000

[Cootes 2000b] COOTES T.F.,. WHEELER G.V, WALKER K.N. AND TAYLOR C.J., «Coupled-view active appearance models » BMVC’00, British Machine Vision Conference, volume~1, pages 52-61, 2000.

[Cootes 2001] COOTES T.F. AND TAYLOR C.J., «On representing edge structure for model matching », IEEE Computer Vision and Pattern Recognition (CVPR), 2001.

[Cristinacce 2003] CRISTINACCE D. AND COOTES T., «A Comparison of two Real-Time Face Detection Methods », International Workshop on Performance Evaluation of Tracking and Surveillance, 2003.

[Cristinacce 2006] CRISTINACCE D. AND COOTES T., « Feature Detection and Tracking with Constrained Local Model », Proc. British Machine Vision Conference, 2006.

[Du 2005] DU B., SHAN S., QING L. AND GAO W., «Empirical Comparisons of Several Preprocessing Methods for Illumination Insensitive Face Recognition», ICASSP’05, International Conference on Acoustics, Speech, and Signal Processing, 2005

[Feng 2006] FENG X., LV B. AND LI Z., «Automatic facial expression recognition using both local and global information » Chinese Control Conference, pages 1878-1881, 2006

[Froba 2002] FROBA B. AND KULLBECK C., «Robust face detection at video frame rate on edge orientation features », International Conference on Automatic Face and Gesture Recognition, 2002.

[Garcia 2004] GARCIA C. AND DELAKIS M., «Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection», IEEE Transactions on Pattern Analysis and Machine Intelligence}, 26(11), Novembre 2004, p. 1408-1423.

[Georghiades 2000] GEORGHIADES A., BELHUMEUR P.N. AND KRIEGMAN D.J. , «From Few to Many: Generative Models for Recognition Under Variable Pose and Illumination », Automatic Face and Gesture Recognition, 2000.

[Giri et al 2006] GIRI D., ROSENWALD M., VILLENEUVE B., LE GALLOU S. AND SÉGUIER R., «Scale Normalization for the Distance Maps AAM », International Conference on Control, Automation, Robotics and Vision (ICARCV) Singapore, 2006.

[Gross 2002] GROSS R., MATTHEWS I. AND BAKER S., «Fisher LightFields for Face Recognition Across Pose and Illumination », German Symposium on Pattern Recognition, 2002.

[Gross 2002] GROSS R., MATTHEWS I. AND BAKER S., «Generic vs. person specific active appearance models », Image and Vision Computing, 2005.

[Hu 2003a] HU C., FERIS R. AND TURK M., «Active wavelet networks for face alignment», BMVC’03, British Machine Vision Conference, 2003.

[Hu 2003b] HU C., FERIS R. AND TURK M., «Real-time wiew-based face alignment using active wavelets networks», AMFG’03, International Workshop on Analysis and Modeling of Faces and Gestures, 2003.

[Hu 2004] HU C., CHANG Y., FERIS R. AND TURK M., «Manifold based analysis of facial expression». CVPRW’04, Conference on Computer Vision and Pattern Recognition Workshop, volume 5, pages 81-87, 2004.

[Huang 2004] HUANG Y., LIN S., LI S.Z., LU H. AND SHUM H.Y., «Face Alignment Under Variable Illumination », FGR’04, International Conference on Automatic Face and Gesture Recognition, 2004.

[Kittipanya-ngam] KITTIPANYA-NGAM P. AND COOTES T.F., «The effect of texture representations on AAM performance», ICPR’06, International Conference on Pattern Recognition, 2006.

[Langs 2005] LANGS G., PELOSCHEK P., DONNER R. AND BISCHOF H., «A clique of active appearance models by minimum description length » BMVC’05, British Machine Vision Conference, pages 859-868, 2005.

[Lee 2005] LEE K.C., HO J. AND KRIegman D.J., «Acquiring Linear Subspaces for Face Recognition under Variable Lighting », IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005.

[Li 2001] LI Y., GONG S. AND LIDDELL H., «Modelling faces dynamically across views and over time » International Conference on Computer Vision, pages 554-559, 2001.

[Li 2001] LI Y., SHUICHENG Y.,. ZHANG H.J. AND CHENG Q.S., « Multi-view face alignment using direct appearance models ». FGR’02, International Conference on Automatic Face and Gesture Recognition, pages 324-329, 2002.

[Pigeon 1996] PIGEON, « M2VTS Projec», www.tele.ucl.ac.be/PROJECTS/ M2VTS/m2fdb.html, M2VTS, 1996.

[Romdhani 1999] ROMDHANI S., GONG S. AND PSARROU A., «A multi-view nonlinear active shape model using kernel pca ». BMVC’99, British Machine Vision Conference, pages 483-492, 1999.

[Sattar 2007] SATTAR A., AIDAROUS Y., LE GALLOU S. AND SÉGUIER R., «Face Alignment by 2.5D Active Appearance Model Optimized by Simplex », ICVS’07, International Conference on Computer Vision Systems, 2007

[Scott 2003] SCOTT I.M., COOTES T.F. AND TAYLOR C.J., «Improving Appearance Model Matching Using Local Image Structure » IPMI’03, Information Processing in Medical Imaging, 2003.

[SIM 2002] SIM T., BAKER S. AND BSAT M., «The cmu pose, illumination, and expression (pie) database ». FG’02, IEEE International Conference on Automatic Face and Gesture Recognition, 2002.

[Stegmann 2000] STEGMANN M.B., « Active Appearance Models : Theory, Extensions and Cases », Informatics and Mathematical Modelling, Technical University of Denmark, DTU, 2000.

[Stegmann 2000] STEGMANN M.B. AND LARSEN R., «Multi-band modelling of appearance », GMBV’02, Workshop on Generative Model-Based Vision, 2002.

[Tong 2006] TONG Y., WANG Y., ZHU Z. AND Ji Q., «Facial Feature Tracking using a Multi-State Hierarchical Shape Model under Varying Face Pose and Facial Expression », ICPR’06, International Conference on Pattern Recognition, 2006

[Viola 2004] VIOLA P.. AND JONES M.J., «Robust real-time face detection», International Journal of Computer Vision, 2004.

[Xiao 2004] XIAO J., BAKER S., MATTHEWS I. AND KANADE T., « Real-Time Combined 2D+3D Active Appearance Models », CVPR’04, Conference on Computer Vision and Pattern Recognition, 2004

[Xu 2005] XU Z., CHEN H. AND HU S.C., «A high resolution grammatical model for face representation and sketching ». CVPR’05, Computer Vision and Pattern Recognition, volume 2, pages 470-477, 2005.

[Zalewski 2005] ZALEWSKI L. AND GONG S., «2d statistical models of facial expressions for realistic 3d avatar animation». CVPR’05, Conference on Computer Vision and Pattern Recognition, volume~2, pages 217-222, 2005.

[Zhao 2000] ZHAO W.Y AND CHELLAPPA R., «Illumination-Insensitive Face Recognition Using Symmetric Shape-from-Shading », CVPR’00, Conference on Computer Vision and Pattern Recognition, 2000.

[Zhu 2003] ZHU J., LIU B. AND SCHWARTZ S.C., «General illumination correction and its application to face normalization», International Conference on Acoustics, Speech, and Signal Processing, 2003 [Zuiderveld 1994] ZUIDERVELD K., « Contrast Limited Adaptive Histogram Equalization», Graphics Gems IV, 1994.