Théorie de l’Évidence pour Suivi de Visage

Théorie de l’Évidence pour Suivi de Visage

Francis Faux Franck Luthon 

Université de Pau et des Pays de l’Adour, Laboratoire d’Informatique EA3000 IUT de Bayonne-Pays Basque 2, allée du parc Montaury, F-64600 Anglet

Page: 
515-545
|
DOI: 
https://doi.org/10.3166/TS.28.515-545
Received: 
N/A
| |
Accepted: 
N/A
| | Citation

OPEN ACCESS

Abstract: 

This paper deals with real time face detection and tracking by a video camera. The method is based on a simple and fast initializing stage for learning. The transferable belief model is used to deal with the prior model incompleteness due to the lack of exhaustiveness of the learning stage. The algorithm works in two steps. The detection phase synthesizes an evidential face model by merging basic beliefs elaborated from the Viola and Jones face detector and from colour mass functions. These functions are computed from information sources in a logarithmic colour space. To deal with the colour information dependence in the fusion process, we propose a compromise operator close to the Denoeux cautious rule. As regards the tracking phase, the pignistic probabilities from the face model guarantee the compatibility between the believes and the probability formalism. They are the inputs of a particle filter which ensures face tracking at video rate. The optimal parameter tuning of the evidential model is discussed.

Extended abstract

This paper presents an original method both for face detection based on an evidential face model and for face tracking with a classical bootstrap particle filter technique. The aim of the application is to automatically track the face of a person located in the field of view of a motorized pan-tilt-zoom camera. The proposed method takes control of the servo-camera to perform a dynamic centering of the face in the image plane during the whole video sequence. The application context is limited to indoor environments, typically a laboratory or an office. Face detection by computer vision is made difficult by the variability of appearance of this deformable moving object due not only to the lighting variations on the face zone such as shadows or highlights, and to the background clutter that disturbs the detection, but also to individual morphological differences (nose shape, eye color, skin color, beard), to face expression changes, or to the presence of visual artifacts like glasses or occlusions. In this paper, to address the face complexity, a supervised method is proposed, where the user selects manually a zone of the face on the first image of the video sequence. This fast initializing step constitutes the learning stage which yields the prior model. The proposed method handles simple contextual knowledge representative of the application background during a quick initializing stage, contrary to current techniques that are based on huge learning databases and complex algorithms to get generic face models (e.g. active appearance models). The transferable belief model is used to counteract the incompleteness of the prior model due to the lack of exhaustiveness of the learning stage and to the subjectivity on the face appearance.

The algorithm works in two steps. The detection phase is a pre-processing step consisting in the fusion of information to get an evidential face model. It merges complementary basic beliefs (or mass functions) elaborated from the Viola and Jones (VJ) face detector and from a skin colour detector. The VJ detector generates a target container (or bounding box) very reliable when the face is in front-view or slightly from profile. However it fails in the case of important rotations or occlusions, or when it recognizes falsely a face-like artifact in the background. In order to model the VJ attribute by a belief function, we assign to each pixel a simple mass according to its position with respect to the bounding box and proportionally to a parameter of reliability, that may be tuned online from the data available. The colour mass functions are computed from information sources in a logarithmic colour space (Logarithmic hUe eXtension LUX). A classification approach using the Appriou model is formalized to build simple mass functions integrating the colour information. Colour sources are obviously not independent since they are computed from the same raw data (RGB color pixels), whereas independence is granted only if two pieces of evidence have been obtained by different means. To deal with this colour information dependence in the fusion process, we propose a compromise operator close to the Denoeux cautious rule. In order to limit the risk of artifact detection, the algorithm dynamically discounts the reliability of the VJ mass functions by estimating the conflict K inside the bounding box. Indeed, when the VJ detector recognizes falsely a shape-like artifact of a facein the background with a high reliability degree, skin colour and VJ mass functions are discording, so that an important conflict is generated inside the VJ face container. Finally, in order to synthetize a discriminatory face model, the colour mass sets and the VJ mass sets are fused via the Florea rule.

As regards the tracking phase, the evidential face model constitutes the entry to a tracking filter. Indeed, the pignistic probabilities from the face model serve as inputs for computing the weights of a particle filter which ensures face tracking at video rate. The pignistic probabilities guarantee the compatibility between the belief formalism and the probability formalism. Probabilistic tracking by a particle filter is well suited here since the face, positioned relatively close to the camera, has unpredictable egomotion and frequent direction changes. The goal is to estimate the parameters of a state vector which represents the cinematic characteristics of the target-object, i.e. the face. The outer contour of the face is approximated by an ellipse whose parameters are stored in the state vector. The tracking algorithm begins classically with an initialization step. The zone of the face selected manually by the user during the learning stage is used to initialize the parameters of the state vector. Then the algorithm is organized according to two main successive stages: (i) first, the coordinates of the centre of the state vector are estimated by computing the quadratic sum of the pignistic probabilities contained inside each ellipse; (ii) then, the ellipse size and orientation are estimated using an elliptic measure based on least squares fitting method; (iii) if necessary, one resampling operation is performed, when the informative content associated with the particle estimating the state vector is lower than a predefined threshold value. 

We illustrate the algorithm behaviour on various sequences registered in our laboratory in the presence of total or partial occlusion or pose variation, and we make also a performance comparison on a benchmark sequence often used in the literature. In order to quantify the tracking performances, the ROC curves (Receiver Operating Characteristics) are drawn for various values of the influence parameters: namely the VJ face detector reliability parameter  and the color compromise parameter. The optimal and adaptive parameter tuning of the evidential model is discussed. By setting jointly the adaptive parameter values of the evidential model and the particle filter, it is shown that a noticeable improvement of the tracking behaviour is achieved.

RÉSUMÉ

Le suivi de visage par caméra vidéo est abordé ici sous l’angle de la fusion évidentielle. La méthode proposée repose sur un apprentissage sommaire basé sur une initialisation supervisée. Le formalisme du modèle de croyances transférables est utilisé pour pallier l’incomplétude du modèle a priori de visage due au manque d’exhaustivité de la base d’apprentissage. L’algorithme se décompose en deux étapes. La phase de détection de visage synthétise un modèle évidentiel où les attributs du détecteur de Viola et Jones sont convertis en fonctions de croyance, et fusionnés avec des fonctions de masse couleur modélisant un détecteur de teinte chair, opérant dans un espace chromatique original obtenu par transformation logarithmique. Pour fusionner les sources couleur dépendantes, nous proposons un opérateur de compromis inspiré de la règle prudente de Denoeux. Pour la phase de suivi, les probabilités pignistiques issues du modèle de visage garantissent la compatibilité entre les cadres crédibiliste et probabiliste. Elles alimentent un filtre particulaire classique qui permet le suivi du visage en temps réel. Nous analysons l’influence des paramètres du modèle évidentiel sur la qualité du suivi.

Keywords: 

face detection, tracking, LUX colour space, particle filter, evidence theory, Dempster-Shafer, transferable belief model, cautious rule, computer vision, pattern recognition.

MOTS-CLÉS

détection de visage, espace couleur LUX, filtrage particulaire, Dempster-Shafer, modèle de croyances transférables, règle prudente de Denoeux, reconnaissance des formes.

1. Introduction
2. Détection Informatique du Visage : État de l’Art
3. Théorie de l’évidence
4. Modélisation Évidentielle du Visage
5. Suivi de Visage
6. Analyse des Résultats du Suivi Vidéo
7. Discussion
  References

Appriou A. (1999, novembre). Multisensor signal processing in the framework of the theory of evidence. In Application of mathematical signal processing techniques to mission systems, Research and Technology Organisation (lecture series 216), p. 5.1-5.31.

Arulampalam M., S.Maskell, Gordon N., Clapp T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, vol. 50, no 2, p. 174-188.

Babenko B., Yang M., Belongie S. (2011). Robust object tracking with online multiple instance learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no 8, p. 1619-1632.

Castrillón M., Déniz O., Hernández D., Lorenzo J. (2011). A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Machine Vision and Applications, vol. 22, p. 481-494.

Comaniciu D., Ramesh V., Meer P. (2003). Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, no 5, p. 564-575.

Cootes T. F., Edwards G. J., Taylor C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no 6, p. 681-685.

Cootes T. F., Taylor C. J. (1992). Active shape models -’smart snakes’. In Proceedings of British machine vision conference, p. 266-275.

Crétual F. C. A., Bouthemy P. (1998). Complex object tracking by visual servoing based on 2D image motion. In International conference on pattern recognition, vol. 2, p. 1251-1254. Brisbane, Australie.

Dempster A. (1967). Upper and lower probabilities induced by a multivalued mapping. Annals of Mathematical Statistics, vol. 38, p. 325-339.

Denoeux T. (1995). A k-nearest neighbour classification rule based on Dempster-Shafer theory. IEEE Transactions on Systems, Man and Cybernetics, vol. 25, no 5, p. 804-813.

Denoeux T. (1997). Analysis of evidence-theoretic decision rules for pattern classification. Pattern Recognition, vol. 30, no 7, p. 1095-1107.

Denoeux T. (2008). Conjunctive and disjunctive combination of belief functions induced by non distinct bodies of evidence. Artificial Intelligence, vol. 172, p. 234-264.

Denoeux T., Smet P. (2006). Classification using belief functions: the relationship between the case-based and model-based approaches. IEEE Transactions on Systems, Man and Cybernetics B, vol. 36, no 6, p. 1395-1406.

Faux F. (2009). Détection et suivi de visage par la théorie de l’évidence. Thèse de doctorat non publiée, université de Pau et des Pays de l’Adour, Anglet, France.

Faux F., Luthon F. (2006). Robust face tracking using colour Dempster-Shafer fusion and particle filter. In The 9th international conference on information fusion (FUSION’06), p. 1-7. Firenze, Italy.

Fitzgibbon A., Pilu M., Fisher R. (1999, mai). Direct least squares fitting of ellipses. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no 5, p. 476-480.

Florea M. C., Jousselme A.-L., Bossé E., Grenier D. (2009). Robust combinaison rules for evidence theory. Information Fusion, vol. 10, p. 183-197.

Hammal Z., Couvreur L., Caplier A., Rombaut M. (2007, décembre). Facial expression classification: An approach based on the fusion of facial deformations using the transferable belief model. International Journal of Approximate Reasoning, vol. 46, no 3, p. 542-567.

Hjelmås E., Low B. (2001, septembre). Face detection: A survey. Computer Vision and Image Understanding, vol. 83, p. 236-274.

Huang L.-L., Shimizu A., Kobatake H. (2005). Robust face detection using Gabor filter features. Pattern Recognition Letters, vol. 26, no 11, p. 1641-1649.

Isard M., MacCormick J. (2001). BraMBLe: A Bayesian multiple-blob tracker. In IEEE international conference on computer vision (ICCV), p. 34-41.

Kallel A., Le Hégarat-Mascle S. (2009). Combination of partially non-distinct beliefs: The cautious-adaptive rule. International Journal of Approximate Reasoning, vol. 50, no 7, p. 1000-1021.

Klein J., Lecomte C., Miché P. (2010). Hierarchical and conditional combination of belief functions induced by visual tracking. International Journal of Approximate Reasoning, vol. 51, no 4, p. 410-428.

Knothe R., Amberg B., Romdhani S., Blanz V., Vetter T. (2011). Handbook of face recognition, Chapter Morphable models of faces. Edited by Stan Li and Anil Jain, Springer-Verlag.

Liévin M., Luthon F. (2004). Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video. IEEE Transactions on Image Processing, vol. 13, no 1, p. 63-71.

Luthon F., Beaumesnil B. (2004). Color and R.O.I with JPEG2000 for wireless videosurveillance. In International conference on image processing (ICIP’ 04), p. 3205-3208.

Luthon F., Beaumesnil B., Dubois N. (2010). LUX color transform for mosaic image rendering. In Proceedings of the 17th IEEE international conference on automation, quality and testing, robotics (AQTR 2010), vol. 3, p. 93-98. Cluj-Napoca, Romania.

Martin A., Osswald C. (2007). Towards a combinaison rule to deal with partial conflict and specificity in belief functions theory. In International conference on information fusion (FUSION’07), p. 9-12. Québec, Canada.

Muñoz-Salinas R., Medina-Carnicer R., Madrid-Cuevas F. J., Carmona-Poyato A. (2009). Multi-camera people tracking using evidential filters. International Journal of Approximate Reasoning, vol. 50, no 5, p. 732-749.

Phung S., Bouzerdoum A., Chai D. (2005). Skin segmentation using color pixel classification: Analysis and comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no 1, p. 148-154.

Pichon F., Denoeux T. (2009). Interpretation and computation of alpha-junctions for combining belief functions. In 6th international symposium on imprecise probability: Theories and applications (ISIPTA ‘09), p. 1-10. Durham, United Kingdom.

Pérez P., Vermaak J., Blake A. (2004). Data fusion for visual tracking with particles. Proceedings of IEEE, vol. 92, no 3, p. 495-513.

Ramasso E., Panagiotakis C., Rombaut M., Pellerin D. (2010). Belief scheduler based on model failure detection in the TBM framework. Application to human activity recognition. International Journal of Approximate Reasoning, vol. 51, no 7, p. 846-865.

Rathi Y., Vaswani N., Tannenbaum A., Yezzi A. (2007). Tracking deforming objects using particle filtering for geometric active contours. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no 8, p. 1470-1475.

Sakai T., Nagao M., Fujibayashi S. (1969). Line extraction and pattern recognition in a photograph. Pattern Recognition, vol. 1, p. 233-248.

Shafer G. (1976). A mathematical theory of evidence. NJ, Princeton University Press.

Sigal L., Sclaroff S., Athitsos V. (2004). Skin color-based video segmentation under timevarying illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no 7, p. 862-877.

Smarandache F., Dezert J. (2009). Advances and applications of DSmT for information fusion, collected works (vol. 3). American Research Press.

Smets P. (1986). Bayes’ theorem generalized for belief functions. In European Conference on Artificial Intelligence (ECAI’86), vol. 2, p. 169-171. Brighton, UK.

Smets P. (1990). The combination of evidence in the Transferable Belief Model. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no 5, p. 447-458.

Smets P. (1993). Belief functions: the disjunctive rule of combinaison and the Generalized Bayesian Theorem. International Journal of Approximate Reasoning, vol. 9, p. 1-35.

Smets P. (1995). The canonical decomposition of a weighted belief. In International joint conference on artificial inteligence, p. 1896-1901. San Mateo, CA, USA, Morgan Kaufman.

Smets P., Kennes R. (1994). The Transferable Belief Model. Artificial Intelligence, vol. 66, no 2, p. 191-234.

Soriano M., Martinkauppi B., Huovinen S., Laaksonen M. (2003). Adaptive skin color modeling using the skin locus for selecting training pixels. Pattern Recognition, vol. 36, no 3, p. 681-690.

Vezhnevets V., Sazonov V., Andreeva A. (2003). A survey on pixel based skin color detection techniques. In Proceedings of Graphicon, p. 85-92. Moscow, Russia.

Viola P., Jones M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, p. 511-518.

Viola P., Jones M. (2003). Fast multi-view face detection. Rapport technique. Mitsubishi Electric Research Laboratories.

Yaghlane B. B., Smets P., Mellouli K. (2000). Independence concepts for belief functions. In 8th International Conference on Information Processing and Management of Uncertainty in Knowledge-based Systems (IPMU), vol. 1, p. 357-364. Madrid, Spain.

Yang M.-H., Kriegman D., Ahuja N. (2002). Detecting faces in images: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no 1, p. 35-58.

Yilmaz A., Javed O., Shah M. (2006). Object tracking: A survey. ACM Computing Surveys, vol. 38, no 4, p. 1-45.

Zheng W., Bhandarkar S. M. (2009). Face detection and tracking using a boosted adaptive particle filter. Journal of Visual Communication and Image Representation, vol. 20, p. 9- 27.

Zouhal L., Denoeux T. (1998). An evidence-theoric k-NN rule parameter optimization. IEEE Transactions on Systems, Man and Cybernetics - Part C, vol. 28, no 2, p. 263-271.