Autoencodeurs discriminants pour la détection et la reconnaissance de véhicules en imagerie aérienne

Autoencodeurs discriminants pour la détection et la reconnaissance de véhicules en imagerie aérienne

Sébastien Razakarivony Frédéric Jurie 

72 Rue de la Tour Billy 95100 Argenteuil, France

Université de Caen, U.F.R. Sciences Boulevard du Maréchal Juin 14032 Caen Cedex, France

Corresponding Author Email: 
srazakarivony@gmail.com
Page: 
245-264
|
DOI: 
https://doi.org/10.3166/TS.32.245-264
Received: 
7 December 2014
|
Accepted: 
2 June 2015
|
Published: 
31 August 2015
| Citation

OPEN ACCESS

Abstract: 

The autoencoders allow to model data with manifolds. In an object detection task, they model the appearance of objects to detect. The distance between a vector to classify and the manifold can then be used as a measure of the probability that the vector belong to it. However, if the learnt manifold is such that all vectors of the class belong to it, nothing garanties that the vectors of other classes will not. We propose to remove this limitation with a new kind of encoders, the discriminative autoencoders, which have the property to build manifolds that move away the negative examples from the positive ones. An experimental validation on the context of detection and recognition of vehicles allows to conclude on the method.

RÉSUMÉ

Les autoencodeurs, qui permettent de modéliser des données au moyen de variétés, peuvent être utilisés dans un contexte de détection d’objets pour modéliser l’apparence des classes d’objets à détecter. La distance entre un vecteur à classer et la variété peut alors être utilisée comme une mesure de probabilité d’appartenance du vecteur à la classe. Cependant, en construisant la variété de manière à ce que les vecteurs de la classe appartiennent à la variété, rien ne garantit que des vecteurs d’autres classes ne lui appartiennent pas également. Nous cherchons à lever cette limitation en proposant un nouveau type d’autoencodeurs, les autoencodeurs discriminants, qui ont la propriété de construire des variétés éloignant les formes n’appartenant pas à la classe d’objets à détecter de la variété. Une validation expérimentale dans un contexte de détection et reconnaissance de véhicules en imagerie aérienne permet de conclure sur la pertinence de la méthode proposée.

Keywords: 

computer vision, object detection, manifold, machine learning

MOTS-CLÉS

vision par ordinateur, détection d’objets, variétés, apprentissage statistique

1. Introduction
2. État De L’art
3. Autoencodeurs Discriminants
4. Expériences
5. Résultats
6. Conclusions Et Perspectives
  References

Ackley D., Hinton G., Sejnowski T. (1985). A learning algorithm for boltzmann machines. Cognitive Science, vol. 9, no 1, p. 147–169.

Benenson R., Mathias M., Timofte R., Van Gool L. (2012). Pedestrian detection at 100 frames per second. In Computer vision and pattern recognition (cvpr), 2012 ieee conference on, p. 2903-2910.

Carvalho G., Moraes L., Cavalcanti G., Ren T. (2011). A weighted image reconstruction based on pca for pedestrian detection. In International joint conference on neural networks.

Cortes C., Vapnik V. (1995). Support-vector networks. Machine Learning, vol. 20, no 3, p. 273-297.

Cybenko G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, vol. 2, no 4, p. 303–314.

Dalal N., Triggs B. (2005). Histograms of oriented gradients for human detection. In Ieee cvpr.

Enzweiler M., Gavrila D. (2009). Monocular pedestrian detection: Survey and experiments. IEEE PAMI, vol. 31, p. 2179–2195.

Felzenszwalb P., Girshick R., Mcallester D., Ramanan D. (2009). Object detection with discriminatively trained part based models. IEEE PAMI, vol. 32, no 9, p. 1627–1645.

Feraud R., Bernier O., Viallet J., Collobert M. (2001). A fast and accurate face detector based on neural networks. , vol. 23, p. 42–53.

Fisher R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, vol. 7, no 2, p. 179–188.

Garcia C., Delakis M. (2004). Convolutional face finder: A neural architecture for fast and robust face detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 26, no 11, p. 1408–1423.

Hinton G. (2000). Training products of experts by minimizing contrastive divergence. Neural Computation, vol. 14, p. 2002.

Hinton G., Srivastava N., Krizhevsky A., Sutskever I., Salakhutdinov R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

Hinton G. E., Salakhutdinov R. R. (2006, juillet). Reducing the Dimensionality of Data with Neural Networks. Science, vol. 313, p. 504-507.

Hotelling H. (1933). Analysis of a complex statistical variable into principal components. Journal of educational psychology, vol. 24, no 6, p. 417.

Kembhavi A., Harwood D., Davis L. S. (2011). Vehicle detection using partial least squares. IEEE PAMI, vol. 33, no 6, p. 1250-1265.

Kramer M. (1991). Nonlinear principal component analysis using autoassociative neural networks. Am. Inst. of Chem. Engineers Jour., vol. 37, no 2, p. 233–243.

Lampert C., Blaschko M., Hofmann T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In Ieee cvpr.

LeCun, Bottou, Bengio, Haffner. (1998). Gradient-based learning applied to document recognition. Proc. of the IEEE, vol. 86, no 11, p. 2278–2324.

LeCun Y., Bottou L., Orr G., Müller K. (1998). Efficient backprop. In Neu. net.: Tricks of the trade.

Mignon A., Jurie F. (2012). Pcca: A new approach for distance learning from sparse pairwise constraints. In Cvpr.

Munder S. (2006). An experiment study on pedestrian classification. In, vol. 28.

Oquab M., Bottou L., Laptev I., Sivic J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Ieee cvpr.

Osadchy M., Cun Y. L., Miller M. L. (2007). Synergistic face detection and pose estimation with energy-based models. The Journal of Machine Learning Research, vol. 8, p. 1197–1215.

Pentland A. (1994). Viewbased and modular eigenspaces for face recognition. In Ieee cvpr.

Razakarivony S., Jurie F. (2013). Small target detection combining foreground and background manifolds. In Iapr international conference on machine vision and application.

Razakarivony S., Jurie F. (2014). Autoencodeurs discriminants pour la détection de cibles faiblement résolues. In Rfia.

Razakarivony S., Jurie F. (2015, mars). Vehicle Detection in Aerial Imagery : A small target detection benchmark. Technical Report no GREYC-2015-03-04. GREYC CNRS UMR 6072, Universite de Caen. Consulté sur https://hal.archives-ouvertes.fr/hal-01122605

Rumelhart D., Hinton G., Williams R. (1985). Learning internal representations by error propagation. Rapport technique. DTIC Document.

Rutishauser U., Walther D., Koch C., Perona P. (2004). Is bottom-up attention useful for object recognition? In Ieee cvpr.

Saul L., Roweis S. (2003). Think globally, fit locally: unsupervised learning of low dimensional manifolds. JMLR, vol. 4, p. 119–155.

Seo H., Milanfar P. (2010). Visual saliency for automatic target detection, boundary detection, and image quality assessment. In Ieee icasp.

Sermanet P., Chintala S., LeCun Y. (2012). Convolutional neural networks applied to house numbers digit classification. In Icpr.

Snoek J., Adams R. P., Larochelle H. (2012). Nonparametric guidance of autoencoder representations using label information. The Journal of Machine Learning Research, vol. 13, no 1, p. 2567–2588.

Stilla U., Michaelsen E., Soergel U., Hinz S., Ender H. (2004). Airborne monitoring of vehicle activity in urban areas. International Archives of Photogrammetry and Remote Sensing, vol. 35, no B3, p. 973–979.

Tenenbaum J. B., de Silva V., Langford J. C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, vol. 290, no 5500, p. 2319–2323.

Viola P., Jones M. J., Snow D. (2005, juillet). Detecting pedestrians using patterns of motion and appearance. IJCV, vol. 63, no 2, p. 153–161.

Wang X., Han T., Yan. S. (2009). An hog-lbp human detector with partial occlusion handling. In Iccv, p. 32–39.

Zhang X., Gao X., Caelli T. (2012). Parametric manifold of an object under different viewing directions. In Eccv, p. 186–199.

Zhao T., Nevatia R. (2003). Car detection in low resolution aerial images. Image and Vision Computing, vol. 21, no 8, p. 693–703.