Segmentation semi-automatique de signes à partir de corpus vidéo en langue des signes

Segmentation semi-automatique de signes à partir de corpus vidéo en langue des signes

Matilde Gonzalez Christophe Collet 

Université Paul Sabatier 118 Route de Narbonne F-31062, Toulouse cedex 9

Corresponding Author Email: 
collet@irit.fr
Page: 
333-358
|
DOI: 
https://doi.org/10.3166/TS.29.333-358
Received: 
N/A
| |
Accepted: 
N/A
| | Citation

OPEN ACCESS

Abstract: 

Many researches focus on the study of automatic sign language recognition. Many of them need a large amount of data to train the recognition systems. Our work addresses the annotation of sign language video corpus in order to collect training data. We propose a robust tracking algorithm for hands and head, a method to segment hands during occlusions and an approach to segment gestures using motion and hand shape features. In order to show the advantages and limitations of the proposed approaches, we have evaluated each one using international corpus. The full sign segmentation approach shows promising results.

RÉSUMÉ

De nombreuses études sont en cours afin de développer des méthodes de traitement automatique des langues des signes. Plusieurs approches nécessitent de grandes quantités de données annotées pour l’apprentissage des systèmes de reconnaissance. Nos travaux concernent l’annotation semi-automatique de ces corpus de données vidéo. Nous proposons une méthode de suivi de composantes corporelles, de segmentation de la main pendant occultation et de segmentation des gestes à l’aide des caractéristiques de mouvement et de forme de la main. Afin de montrer les avantages et limitations de nos contributions, nous avons évalué chacune des méthodes proposées à l’aide de corpus internationaux. Le système de segmentation des signes montre des résultats prometteurs.

Keywords: 

sign language, annotation, corpora

MOTS-CLÉS

langue des signes, annotation, corpus

Extended Abstract
1. Introduction
2. État De L’art
3. Suivi Des Composantes Corporelles
4. Segmentation De La Main Devant Le Visage
5. Segmentation Automatique Des Signes
6. Évaluations Et Résultats
7. Conclusion Et Perspectives
  References

Ahmad T., Taylor C., Lanitis A., Cootes T. (1997). Tracking and recognising hand gestures, using statistical shape models. Image and Vision Computing, vol. 15, no 5, p. 345–352.

Birchfield S. (1998). Elliptical head tracking using intensity gradients and color histograms. In Proc. cvpr, p. 232–237.

Boutora L., Braffort A. (2011). DEfi Geste Langue des Signes. Corpus DEGELS1. Corpus ID oai:crdo.fr:crdo000767 : Video en LSF, informateur A.

Braffort A., Boutora L. (2012, June). Défi d’annotation DEGELS2012 : la segmentation. In Jep-taln-recital 2012, p. 1-8.

Braffort A., Choisier A., Collet C., Dalle P., Gianni F., Lenseigne B. (2004, mai). Toward an annotation software for video of sign language, including image processing tools and signing space modelling. In Proc. of 4th international conference on language resources and evaluation - lrec 2004, vol. 1, p. 201–203. Lisbon, Portugal.

Cooper H., Holt B., Bowden R. (2011). Sign language recognition. Visual Analysis of Humans, p. 539–562.

Diamanti O., Maragos P. (2008). Geodesic active regions for segmentation and tracking of human gestures in sign language videos. In Image processing, 2008. icip 2008. 15th ieee international conference on, p. 1096–1099.

Dreuw P., Ney H. (2008). Towards automatic sign language annotation for the elan tool. In Lrec workshop on the representation and processing of sign languages: Construction and exploitation of sign language corpora.

Gianni F., Collet C., Dalle P. (2009). Robust tracking for processing of videos of communications gestures. In Springer-Verlag (Ed.), p. 93–101. Springer.

Gonzalez M., Collet C. (2012, janvier). Segmentation semi-automatique de corpus vidéo en Langue des Signes. In Actes de la conférence RFIA 2012, p. 978-2-9539515-2-3. Lyon, France. http://hal.archives-ouvertes.fr/hal-00656505 (Session "Articles")

Grobel K., Assan M. (1997). Isolated sign language recognition using hidden markov models. In Ieee int. conference on systems, man, and cybernetics, vol. 1, p. 162–167.

Habili N., Lim C., Moini A. (2004). Segmentation of the face and hands in sign language video sequences using color and motion cues. IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no 8, p. 1086–1097.

Hamada Y., Shimada N., Shirai Y. (2002). Hand shape estimation using sequence of multiocular images based on transition network. In Proceedings of the international conference on vision interface.

Hanke T., König L., Wagner S., Matthes S. (2010). Dgs corpus & dicta-sign: The hamburg studio setup. In 4th workshop on the representation and processing of sign languages: Corpora and sign language technologies (cslt 2010), valletta, malta, p. 106–110.

Holden E., Lee G., Owens R. (2005). Australian sign language recognition. Machine Vision and Applications, vol. 16, no 5, p. 312–320.

Imagawa K., Lu S., Igi S. (1998). Color-based hands tracking system for sign language recognition. In Proc. 3rd ieee international conference on automatic face and gesture recognition, p. 462–467.

Isard M., Blake A. (1998). Condensation-conditional density propagation for visual tracking. International journal of computer vision, vol. 29, no 1, p. 5–28.

Lefebvre-Albaret F. (2010). Traitement automatique de vidéos en lsf, modélisation et exploitation des contraintes phonologiques du mouvement,. Phd thesis, University of Toulouse.

Lefebvre-Albaret F., Dalle P. (2009, Feb). Body posture estimation in a sign language video. In Proc of The 8th International Gesture Workshop.

Lefebvre-Albaret F., Segouat J. (2012, June). Influence de la segmentation temporelle sur la caractérisation de signes. DEfi Geste Langue des Signes, JEP-TALN-RECITAL, p. 73-83.

LS-COLIN. (2002). 13:08 - 15:15 le 11 septembre 2001 par Nasredine Chab http://corpusdelaparole.in2p3.fr/spip.phparticle30&ldf_id=oai:crdo.vjf.cnrs.fr:crdoFSL-CUC020_SOUND.

MacCormick J., Blake A. (2000). A probabilistic exclusion principle for tracking multiple objects. International Journal of Computer Vision, vol. 39, no 1, p. 57–71.

Micilotta A., Bowden R. (2004). View-based location and tracking of body parts for visual interaction. In Proc. of british machine vision conference, vol. 2, p. 849–858.

Millet A., Estève I. (2012, June). Segmenter et annoter le discours d’un locuteur de lsf : permanence formelle et variablité fonctionnelle des unités. DEfi Geste Langue des Signes, p. 57-72.

Nayak S., Sarkar S., Loeding B. (2009, June). Automated extraction of signs from continuous sign language sentences using iterated conditional modes. IEEE Conference CVPR, p. 2583-2590.

Ong S., Ranganath S. (2005). Automatic sign language analysis: A survey and the future beyond lexical meaning. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no 6, p. 873–891.

Piater J., Hoyoux T., Du W. (2010). Video analysis for continuous sign language recognition. In Proceedings of 4th workshop on the representation and processing of sign languages: Corpora and sign language technologies, p. 22–23.

Ramamoorthy A., Vaswani N., Chaudhury S., Banerjee S. (2003). Recognition of dynamic hand gestures. Pattern Recognition, vol. 36, p. 2069–2081.

Shiosaki T., Matsuo T., Shirai Y., Shimada N. (2008). Motion segmentation using hand movement and hand shape for sign language recognition. The Fourth Joint Workshop on Machine Perception and Robotics(MPR2008). Beijing, China.

Smith P., Vitoria Lobo N. da, Shah M. (2007). Resolving hand over face occlusion. Image and Vision Computing, vol. 25, p. 1432–1448.

Starner T., Pentland A. (1995). Real-time american sign language recognition from video using hidden markov models. In Proc. international symposium on computer vision, p. 265–270.

Tanibata N., Shimada N., Shirai Y. (2002). Extraction of hand features for recognition of sign language words. In The 15th international conference on vision interface, p. 391–398.

Theodorakis S., Katsamanis A., Maragos P. (2009). Product-hmms for automatic sign language recognition. In Proceedings of the 2009 ieee international conference on acoustics, speech and signal processing, p. 1601–1604.

Von Agris U., Zieren J., Canzler U., Bauer B., Kraiss K. (2008). Recent developments in visual sign language recognition. Universal Access in the Information Society, vol. 6, no 4, p. 323–362.

Wittenburg P., Brugman H., Russel A., Klassmann A., Sloetjes H. (2006). Elan: a professional framework for multimodality research. In Proc. of the 5th international conference on language resources and evaluation (lrec 2006), p. 1556–1559.

Yang R., Sarkar S., Loeding B., Karshmer A. (2006). Efficient generation of large amounts of training data for sign language recognition: A semi-automatic tool. Computers Helping People with Special Needs, p. 635–642.

Zieren J., Canzler U., Bauer B., Kraiss K. (2006). Sign language recognition. Advanced ManMachine Interaction, p. 95–139.