Video Trajectory-based Event Recognition using Hidden Markov Models. Reconnaissance D’Événements Vidéos par L’Analyse de Trajectoires À L’Aide de Modèles de Markov

Video Trajectory-based Event Recognition using Hidden Markov Models

Reconnaissance D’Événements Vidéos par L’Analyse de Trajectoires À L’Aide de Modèles de Markov

Alexandre Hervieu Patrick Bouthemy  Jean-Pierre Le Cadre 

INRIA, Centre Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes Cedex, France

IRISA/CNRS, Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France

18 February 2008
30 June 2009
| Citation



We address the problem of dynamic event recognition in videos. This is motivated by increasing needs for contentbased exploitation of video footage, as encouraged in numerous applications, e.g., retrieving video sequences in large TV archives, creating automatic video summarization of sport TV programs, or detecting specific actions or activities in video-surveillance. It implies to tackle the well-known semantic gap between computed low-level features and high-level concepts. Considering 2D trajectories is attractive since they form computable image features which capture elaborated spatio-temporal information on the viewed actions. Methods for tracking moving objects in an image sequence are now available to get reliable enough 2D trajectories in various situations. These trajectories are given as a set of consecutive positions (x, y) in the image plane over time. If they are embedded in an appropriate modeling framework, high-level information on the dynamic scene can then be reachable.

We aim at designing a general trajectory classification method that does not exploit strong a priori information on the scene structure, the camera set-up, the 3D object motions, while taking into account both the trajectory shape (geometrical information related to the type of motion and to variations in the motion direction) and the speed changesof the moving object on its trajectory (dynamics-related information). Appropriate local differential features combining curvature and motion magnitude are defined and robustly computed on the motion trajectories.

Moreover, these features are not affected by the location of the trajectory in the image plane (invariance to translation), by its direction in the image plane (invariance to rotation) and by the distance of the viewed action to the camera (invariance to scale), and may allow comparison of trajectories from different cameras. A robust enough non-parametric feature extraction framework is also proposed since local differential features computed on the extracted trajectories are prone to be noise corrupted.

To efficiently process the invariant trajectory characterization, probabilistic networks, and more specifically hidden Markov models (HMM) are used since the inherent properties of this modeling help taking into account the temporal evolution of the spatio-temporal information contained in the trajectories. Classical HMM, relying on Gaussian mixture modes (GMM), are designed to model data of sufficient sizes. Hence they may fail treating small trajectories with only few dozens of observations. An original HMM modeling, based on a uniform quantization of the observation space and dealing efficiently with small trajectories, is proposed, and an efficient HMM state number selection is also developed. To compare trajectories, a similarity measure is defined based on the Rabiner distance between HMM, and used to process the video event retrieval task.

By considering a feature having those invariances and characteristics (considering both the trajectory shapes and speed evolutions), we consider the trajectory as a dynamical pattern while other methods consider the trajectories as attached to the camera point of view. We have compared our approach with other methods to put forward the properties (spatio-temporal modeling and efficient processing of small trajectories) of the developed method. Methods relying on feature histogram comparisons, on HMM/GMM modeling and on support vector machine (SVM) were considered. A set of comparative experiments on real videos (especially Formula One and ski TV program) with classification ground truth has been conducted and showed that the proposed method supplies accurate results and offers better performances than other methods.


Nous présentons une méthode originale de classification de trajectoires dans des séquences vidéos pour la reconnaissance d’événements dynamiques. Les Modèles de Markov Cachés (MMC) sont utilisés afin de représenter chaque trajectoire et d’évaluer leurs similarités. Nous avons pu valider notre méthode en la comparant à plusieurs autres méthodes telles que la comparaison d’histogrammes, une méthode utilisant les Séparateurs à Vaste Marge (SVM) ainsi qu’une méthode de MMC utilisant des modélisations par mélanges de gaussiennes. Des descripteurs appropriés, invariants à la translation, à la rotation ainsi qu’au facteur d’échelle sont calculés sur les trajectoires, puis exploités dans une représentation par MMC. Une méthode statistique est également proposée pour le choix du nombre d’états pour la modélisation par MMC choisie. Nous avons testé notre méthode sur deux ensembles de trajectoires issues de vidéos (respectivement de Formule 1 et de ski) obtenues par une méthode de suivi dans des vidéos de sport.


Computer vision, Image sequence analysis, Event recognition, Hidden Markov models.

Mots clés

Vision par ordinateur, Analyse de séquences d’images, Reconnaissance d’évènements, Modèles de Markov cachés.

1. Introduction
2. Représentation des Trajectoires
3. Distance entre Trajectoires et Classification
4. Autres Méthodes de Classification
5. Expérimentations
6. Conclusion

[1] F. BASHIR, A. KHOKHAR et D. SCHONFELD. Real-time motion trajectory-based indexing and retrieval of video sequences. IEEE Trans. on Multimedia, vol.9, no.1, pp. 58-65, 2007.

[2] C. BURGES. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, Springer, no.2, pp. 121-167, 1998.

[3] W. FELLER. An Introduction to Probability Theory and Its Applications. Vol. 2, 3rd ed, Wiley, New York, 1971.

[4] W. HÄRDLE, M. MULLER, S. SPERLICH et A. WERWATZ. Nonparametric and semiparametric models. Springer, Springer series in statistics, Berlin, Germany, 2004.

[5] W. HU, X. XIAO, Z. FU, D. XIE, T. TAN et S. MAYBANK. A system for learning statistical motion patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, no.9, pp 1450-1464, sept. 2006.

[6] T. IZO et W.E L. GRIMSON. Unsupervised modeling of object tracks for fast anomaly detection. IEEE Int. Conf. on Image processing, ICIP’07, San Antonio, US, sept. 2007.

[7] A. KOKARAM, N. REA, R. DAHYOT, M. TEKALP, P. BOUTHEMY, P. GROS, et I. SEZAN. Browsing sports video (Trends in sports-related indexing and retrieval work). IEEE Signal Processing Magazine, vol.23, no.2, pp 47-58, mars 2006.

[8] G. PIRIOU, P. BOUTHEMY et J.-F. YAO. Recognition of dynamic video contents with global probabilistic models of visual motion. IEEE Trans. on Image Processing, vol.15, no.11, pp 3417-3430, 2006.

[9] F. PORIKLI. Trajectory distance metric using hidden Markov model based representation. PETS Workshop, Prague, mai 2004.

[10] J. FORD et J. MOORE. Adaptive estimation of HMM transition probabilities. IEEE Trans. on Signal Processing, vol. 46, no. 5, mai 1998.

[11] L. RABINER. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, vol. 77, no. 2, pp. 121-167, 1989.

[12] X. WANG, K. TIEU et E. GRIMSON. Learning semantic scene models by trajectory analysis. Europ. Conf. on Computer Vision, ECCV’06, Graz, Autriche, Mai 2006.

[13] N. GENGEMBRE et P. PÉREZ. Probabilistic color-based multiobject tracking with application to team sports. Technical report, INRIA, RR-6555, mai 2008.