Analyse multi-échelle de trajectoires de points critiques pour la reconnaissance d’actions humaines

Analyse multi-échelle de trajectoires de points critiques pour la reconnaissance d’actions humaines

Cyrille Beaudry Renaud Péteri  Laurent Mascarilla 

Laboratoire MIA, Univ. La Rochelle Avenue Michel Crépeau F-17042 La Rochelle CEDEX

Corresponding Author Email: 
{cyrille.beaudry,renaud.peteri,laurent.mascarilla}@univ-larochelle.fr
Page: 
https://doi.org/10.3166/TS.32.265-286
|
DOI: 
265-286
Received: 
17 December 2014
| |
Accepted: 
10 June 2015
| | Citation

OPEN ACCESS

Abstract: 

This paper focuses on human action recognition in video sequences. A method based on optical flow estimation is presented, where critical points of this flow field are extracted. Multi-scale trajectories are generated from those points and are frequentially characterized. Finally, a sequence is described by fusing this frequency information with motion orientation and shape information. Experiments on video datasets show that this method achieves recognition rates among the highest in the state of the art. Contrary to recent dense sampling strategies, the proposed method only requires critical points of motion flow field, thus permitting a lower computational cost and a better sequence description. Results, comparison and perspectives on complex actions recognition are then discussed.

RÉSUMÉ

Cet article porte sur la reconnaissance d’actions humaines dans des vidéos. La mé-thode présentée est basée sur l’estimation du flot optique dans chaque séquence afin d’en extraire des points critiques caractéristiques du mouvement. Des trajectoires d’intérêt multi-échelles sont ensuite générées à partir de ces points puis caractérisées fréquentiellement. Le descripteur final de la vidéo est obtenu en fusionnant ces caractéristiques de trajectoire avec des informations supplémentaires d’orientation de mouvements et de contours. Les résultats expérimentaux montrent que la méthode proposée permet d’atteindre, sur différentes bases de vidéos, des taux de classification parmi les plus élevés de la littérature. Contrairement aux ré-centes stratégies nécessitant des grilles denses de points d’intérêt, la méthode a l’avantage de ne considérer que les points critiques du mouvement, ce qui permet une baisse du coût de calcul ainsi qu’une caractérisation plus qualitative de chaque séquence. Les perspectives de ce travail sont finalement discutées, notamment celle portant sur la reconnaissance d’actions dites complexes.

Keywords: 

action recognition, critical points, frequential characterization of trajectories

MOTS-CLÉS

reconnaissance d’actions, points critiques, caractérisation fréquentielle de trajectoires

1. Introduction
2. Points Critiques Et Trajectoires
3. Descripteurs Calculés À Partir Des Points Critiques Et De Leurs Trajectoires
4. Évaluation De La Méthode Pour La Reconnaissance D’actions
5. Conclusion
  References

Chang C.-C., Lin C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2, p. 27:1-27:27.

Dollar P., Rabaud V., Cottrell G., Belongie S. (2005, oct.). Behavior recognition via sparse spatio-temporal features. In Ieee international workshop on visual surveillance and performance evaluation of tracking and surveillance, p. 65 - 72.

Fan R.-E., Chang K.-W., Hsieh C.-J., Wang X.-R., Lin C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, vol. 9, p. 1871–1874.

Gorelick L., Blank M., Shechtman E., Irani M., Basri R. (2007, December). Actions as spacetime shapes. IEEE Trans. Pattern Anal. Machine Intell., vol. 29, no 12, p. 2247–2253.

Hastie T., Rosset S., Zhu J., Zou H. (2009). Multi-class AdaBoost. Statistics and Its Interface, vol. 2, no 3, p. 349–360.

Jain M., Jégou H., Bouthemy P. (2013, avril). Better exploiting motion for better action recognition. In Proc. conf. comp. vision pattern rec. Portland, États-Unis.

Laptev I. (2005). On space-time interest points. Int. J. Computer Vision, vol. 64, no 2-3, p. 107–123.

Laptev I., Marszalek M., Schmid C., Rozenfeld B. (2008, June). Learning realistic human actions from movies. In Proc. conf. comp. vision pattern rec., p. 1 -8.

Lazebnik S., Schmid C., Ponce J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. conf. comp. vision pattern rec., vol. 2, p. 2169-2178.

Liu J., Luo J., Shah M. (2009, juin). Recognizing realistic actions from videos "in the wild". In Proc. conf. comp. vision pattern rec., p. 1996–2003.

Murthy O., Goecke R. (2013, December). Ordered trajectories for large scale human action recognition. In Proc. int. conf. computer vision, p. 412-419.

Peng X., Wang L., Wang X., Qiao Y. (2014). Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Computing Research Repository, vol. abs/1405.4506.

Péteri R., Fazekas S., Huiskes M. J. (2010). DynTex : a Comprehensive Database of Dynamic Textures. Pattern Recognition Letters.

Raptis M., Soatto S. (2010). Tracklet descriptors for action modeling and video analysis. In Proc. europ. conf. computer vision, p. 577–590. Berlin, Heidelberg, Springer-Verlag.

Reddy K. K., Shah M. (2012). Recognizing 50 human action categories of web videos. Machine Vision and Applications, vol. 24, no 5, p. 971–981.

Schuldt C., Laptev I., Caputo B. (2004). Recognizing human actions: a local svm approach. In Proc. int. conf. pattern recognition, vol. 3, p. 32-36 Vol.3.

Shi F., Petriu E., Laganiere R. (2013, June). Sampling strategies for real-time action recognition. In Proc. conf. comp. vision pattern rec.

Sun D., Roth S., Black M. (2010). Secrets of optical flow estimation and their principles. In Proc. conf. comp. vision pattern rec., p. 2432-2439.

Ullah M. M., Laptev I. (2012). Actlets: A novel local representation for human action recognition in video. In Proc. ieee international conference on image processing, p. 777-780.

Vrigkas M., Karavasilis V., Nikou C., Kakadiaris A. (2014). Matching mixtures of curves for human action recognition. Computer Vision and Image Understanding, vol. 119, p. 27–40.

Wang H., Klaser A., Schmid C., Liu C.-L. (2011, June). Action recognition by dense trajectories. In Proc. conf. comp. vision pattern rec., p. 3169 -3176.

Wang H., Kläser A., Schmid C., Liu C.-L. (2013, mai). Dense trajectories and motion boundary descriptors for action recognition. Int. J. Computer Vision, vol. 103, no 1, p. 60–79.

Wang H., Muneeb Ullah M., Kläser A., Laptev I., Schmid C. (2009). Evaluation of local spatio-temporal features for action recognition. In University of central florida, u.s.a.

Wang H., Schmid C. (2013). Action Recognition with Improved Trajectories. In Proc. int. conf. computer vision, p. 3551-3558. Sydney, Australie, IEEE.

Willems G., Tuytelaars T., Gool L. (2008). An efficient dense and scale-invariant spatiotemporal interest point detector. In Proc. europ. conf. computer vision, p. 650–663. Berlin, Heidelberg, Springer-Verlag.

Zhang J., Marszalek M., Lazebnik S., Schmid C. (2006). Local features and kernels for classification of texture and object categories: A comprehensive study. In Proc. conf. comp. vision pattern rec., p. 13-13.