TPT-Dance&Actions : un corpus multimodal d’activités humaines

TPT-Dance&Actions : un corpus multimodal d’activités humaines

Aymeric Masurelle Ahmed Rida Sekkat  Slim Essid  Gaël Richard 

LTCI, CNRS, Télécom ParisTech, Université Paris-Saclay, 75013, Paris, France

Corresponding Author Email: 
{aymeric.masurelle,ahmed.sekkat,slim.essid,gael.richard}@telecom-paristech.fr
Page: 
443-475
|
DOI: 
https://doi.org/10.3166/TS.32.443-475
Received: 
27 February 2015
| |
Accepted: 
9 December 2015
| | Citation

OPEN ACCESS

Abstract: 

We present a new multimodal database of human activities, TPT - Dance & Actions, for research in multimodal scene analysis and understanding. This corpus focuses on dance scenes (lindy hop, salsa and classical dance), fitness and isolated sequences. 20 dancers and 16 participants were recorded performing respectively 14 dance choreographies and 13 sequences of other human activities. These different multimodal scenes have been captured through a variety of media modalities, including video cameras, depth sensors, microphones, piezoelectric transducers and wearable inertial devices (accelerometers, gyroscopes et magnetometers). These data will be available to download for research purpose.

RÉSUMÉ

Nous présentons une nouvelle base de données multimodales d’activités humaines, TPT - Dance & Actions, s’adressant, parmi d’autres, aux domaines de recherche liés à l’analyse de scènes multimodales. Ce corpus se focalise sur des scènes de danse (lindy hop, salsa et danse classique), de fitness rythmées par de la musique, et inclut également des enregistrements d’actions isolées plus “classiques”. Il regroupe 14 chorégraphies de danse et 13 séquences d’autres activités multimodales effectuées en partie par 20 danseurs et 16 participants respectivement. Ces différentes scènes multimodales sont enregistrées à travers une variété de réseaux de capteurs : caméras, capteurs de profondeur, microphones, capteurs piézoélectriques et une combinaison de capteurs inertiels (accéléromètres, gyroscopes et magnétomètres). Ces données seront disponibles sur internet librement à des fins de recherche.

Keywords: 

dance, lindy hop, salsa, classical dance, fitness, isolated actions, multimodal data, audio, video, depthmaps, inertial data, synchronisation, multimodal activity analysis, gestures and actions recognition

MOTS-CLÉS

danse, lindy hop, salsa, danse classique, fitness, actions isolées, données multimodales, audio, vidéos, cartes de profondeur, données inertielles, synchronisation, analyse d’activités multimodales, reconnaissance de gestes et d’actions

1. Introduction
2. Travaux Connexes
3. Présentation Du Corpus
4. Protocole D’enregistrement
5. Musiques, Séquences De Gestes Et Indications
6. Matériel D’enregistrement
7. Annotations Des Séquences De Gestes
8. Préparation Des Données Et Leur Mise En Disponibilité
9. Champs D’applications
10. Conclusions
  References

Alexiadis D. S., Kelly P., Daras P., O’Connor N. E., Boubekeur T., Moussa M. B. (2011). Evaluating a dancer’s performance using kinect-based skeleton tracking. In Proceedings of conference on multimedia, p. 659–662.

Alonso M., Richard G., David B. (2005). Extracting note onsets from musical recordings. In Proceedings of conference on multimedia and expo, 2005, p. 1–4.

Altun K., Barshan B. (2010). Human activity recognition using inertial/magnetic sensor units. In Proceedings of workshop on human behavior understanding, p. 38–51.

De la Torre F., Hodgins J. K., Montano J., Valcarcel S. (2009). Detailed human data acquisition of kitchen activities: the cmu-multimodal activity database (cmu-mmac). In Proceedings of workshop on developing shared home behavior datasets to advance hci and ubiquitous computing research.

Essid S., Alexiadis D., Tournemenne R., Gowing M., Kelly P., Monaghan D. et al. (2012). An advanced virtual dance performance evaluator. In Proceedings of conference on acoustics, speech, and signal processing, p. 2269-2272.

Essid S., Lin X., Gowing M., Kordelas G., Aksay A., Kelly P. et al. (2012). A multi-modal dance corpus for research into interaction between humans in virtual environments. Journal on Multimodal User Interfaces: Special issue on multimodal corpora, vol. 7, no 1-2, p. 157-170.

Gkalelis N., Kim H., Hilton A., Nikolaidis N., Pitas I. (2009). The i3dpost multi-view and 3d human action/interaction database. In Proceedings of conference for visual media production, p. 159–168.

Gorelick L., Blank M., Shechtman E., Irani M., Basri R. (2007). Actions as space-time shapes. Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no 12, p. 2247–2253.

Gowing M., Ahmadi A., Destelle F., Monaghan D., O’Connor N., Moran K. (2014). Kinect vs. low-cost inertial sensing for gesture recognition. In Proceedings of conference on multimedia modeling, p. 484–495.

Hofmann M., Gavrila D. (2012). Multi-view 3d human pose estimation in complex environment. Journal on Computer Vision, vol. 96, no 1, p. 103-124.

Ji X., Liu H. (2010). Advances in view-invariant human motion analysis: A review. Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 40, no 1, p. 13–24.

Kadous M. (2002). Temporal classification: Extending the classification paradigm to multivariate time series. Thèse de doctorat non publiée, School of Computer Science & Engineering, University of New South Wales.

Laptev I., Marszałek M., Schmid C., Rozenfeld B. (2008). Learning realistic human actions from movies. In Proceedings of conference on computer vision & pattern recognition, p.1–8.

Liu J., Luo J., Shah M. (2009). Recognizing realistic actions from videos "in the wild". In Proceedings of conference on computer vision & pattern recognition, p. 1996–2003.

Marszałek M., Laptev I., Schmid C. (2009). Actions in context. In Proceedings of conference on computer vision & pattern recognition, p. 2929 – 2936.

Masurelle A., Essid S., Richard G. (2013). Multimodal classification of dance movements using body joint trajectories and step sounds. In Proceedings of workshop on image and audio analysis for multimedia interactive services, p. 1–4.

Masurelle A., Essid S., Richard G. (2014). Gesture recognition using a nmf-based representation of motion-traces extracted from depth silhouettes. In Proceedings of conference on acoustics, speech, and signal processing, p. 1275–1279.

Rodriguez M., Ahmed J., Shah M. (2008). Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In Proceedings of conference on computer vision & pattern recognition, p. 1–8.

Roetenberg D., Luinge H., Slycke P. (2009). Xsens mvn: Full 6dof human motion tracking using miniature inertial sensors. Rapport technique. Xsens.

Schüldt C., Laptev I., Caputo B. (2004). Recognizing human actions: A local svm approach. In Proceedings of conference on pattern recognition, p. 32-36.

Sigal L., Balan A., Black M. (2010). Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Journal on Computer Vision, vol. 87, p. 4–27.

Singh S., Velastin S., Ragheb H. (2010). Muhavi: A multicamera human action video dataset for the evaluation of action recognition methods. In Proceedings of conference on advanced video and signal based surveillance, p. 48–55.

Ushizaki M., Okatani T., Deguchi K. (2006). Video synchronization based on co-occurrence of appearance changes in video sequences. In Proceedings of conference on pattern recognition, p. 71-74.

Weinland D., Ronfard R., Boyer E. (2006). Free viewpoint action recognition using motion history volumes. Journal on Computer Vision & Image Understanding, vol. 104, no 2-3, p. 249–257.

Yang A., Jafari R., Sastry S., Bajcsy R. (2009). Distributed recognition of human actions using wearable motion sensor networks. Journal on Ambient Intelligence & Smart Environments, vol. 1, p. 103–115.

Zhang Z. (2000). A flexible new technique for camera calibration. Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no 11, p. 1330–1334.