Classification d’Images à Grande Échelle avec des SVM

Classification d’Images à Grande Échelle avec des SVM

Thanh-Nghi Doan
Thanh-Nghi Do 
François Poulet 

IRISA, Campus de Beaulieu, 263 Avenue du Général Leclerc 35042 Rennes Cedex, France

College of Information Technology, Can Tho University 1 Ly Tu Trong street, Can Tho city, Vietnam

Université de Rennes I, Campus de Beaulieu 263 avenue du Général Leclerc 35042 Rennes Cedex, France

Page: 
39-56
|
DOI: 
https://doi.org/10.3166/TS.31.39-56
Received: 
30 September 2013
| |
Accepted: 
12 May 2014
| | Citation

OPEN ACCESS

Abstract: 

We present some improvements of Power Mean SVM (PmSVM) for large scale image classification, with large number of images and classes. Usual SVM algorithms can only deal with two classes, for more than two classes, one can use one against one or one against all approaches. With large datasets, one against all is the most used because of computational cost but this implies very imbalanced classes. To deal with these imbalanced data, we present a parallel bagging of SVM algorithms to get results within reasonable time. For the classification of the 1000 largest classes of ImageNet dataset (900000 images, 12.5GB) our algorithm is 286 times faster than original PmSVM and 1434 times faster than state-of-the-art algorithms like LIBLINEAR with a relative increase of accuracy more than 20% compared to the latter one. 

Extended Abstract  

We present some improvements of the Power Mean SVM algorithm (PmSVM) for large scale image classification, with large number of images and classes. Usual SVM algorithms can only deal with two classes, for k > 2 classes, one can use one against one (train k(k-1)/2 classifiers) or one against all (train k classifiers) approaches. With large datasets, one against all is the most used because of computational cost but this implies very imbalanced classes with large number of classes. With 1000 classes, the one against one approach has to train one million classifiers and in one against all approach, the minority class is only 0.1% of the whole dataset. The class prediction is then performed with a majority vote. To deal with these very imbalanced data, we present balanced bagging of SVM algorithms. It uses under sampling of the majority class to separate the ith class from the rest (this already seed-up the training task too). To get results within reasonable time the algorithms are parallelized on several machines / cores. We perform the training task of the k classifiers in a parallel way. In order to test the efficiency of our approach, we have used the following datasets: ImageNet 10 (made of the 10 largest classes of ImageNet data set (14 million images 21000 classes for the whole dataset)), ImageNet 100 (the 100 largest classes) and ILSVRC2010 (the 1000 largest classes). The parallelization task is performed via the use of two classical libraries: OpenMP (Open Multi Processing) and MPI (Message Passing Interface). For the classification of the 1000 largest classes of ImageNet dataset (900000 images, 12.5GB) our algorithm is 286 times faster than original PmSVM and 1434 times faster than state-of-the-art algorithms like LIBLINEAR with a relative increase of accuracy more than 20% compared to the latter one. This means we are able to perform the classification in a few (almost two) minutes while the original Pm-SVM performs in some hours (almost ten) and LIBLINEAR in some days (almost two).

RÉSUMÉ

Nous présentons des améliorations de l’algorithme de Power Mean SVM (PmSVM) pour la classification d’ensembles d’images tels qu’ImageNet (14 millions d’images et 21 000 classes) qui rendent l’apprentissage beaucoup plus complexe (temps et coût mémoire). Les SVM ne sachant traiter que deux classes, les approches les plus utilisées sont un contre un ou un contre le reste. Avec les grands ensembles de données, l’approche un contre le reste est préférée pour des raisons de coût, mais implique des problèmes de déséquilibre. Pour le déséquilibre, nous proposons un algorithme de bagging équilibré de SVM, parallélisé pour obtenir les résultats dans un temps raisonnable. Sur les 1 000 plus grandes classes d’ImageNet (ILSVRC 2010) il est 286 fois plus rapide que le PmSVM original et 1 434 fois plus rapide que LIBLINEAR avec une amélioration relative de plus de 20 % de la précision par rapport à ce dernier. 

Keywords: 

 large scale visual classification, high performance computing, sampling strategy, parallel support vector machines

MOTS-CLÉS

catégorisation d’images, passage à l’échelle, algorithmes parallèles de SVM, échantillonnage, HPC. 

1. Introduction
2. État de l’Art
3. Power Mean SVM
4. Améliorations du Pm-SVM
5. Résultats Expérimentaux
6. Conclusion et Travaux Futurs
Remerciements
  References

Bay H., Tuytelaars T. et Van Gool L. (2006). SURF: Speeded Up Robust Features. Proc. of 9th European Conference on Computer Vision, p. 404-017. 

Benabdeslem K. et Bennani Y. (2006). Dendogram based svm for multi-class classification. Journal of Computing and Information Technology, 14(4):283-289. 

Berg A., Deng J., Li F.-F. (2010). Large scale visual recognition challenge 2010. Technical Report. Breiman L. (1996). Bagging predictors. Machine Learning 24(2) : 123-140. 

Chawla N. V., Lazarevic A., Hall L. O., et Bowyer K. W. (2003). Smoteboost: improving prediction of the minority class in boosting. In the Principles of Knowledge Discovery in Databases, p. 107-119. Cristianini N. et Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press. 

Csurka G., Dance C. R., Fan L., Willamowski J., et Bray C. (2004). Visual categorization with bags of keypoints. Workshop on Statistical Learning in Computer Vision, European Conference on Computer Vision, p. 1-22. 

Dalal N. et Triggs B. (2005). Histograms of oriented gra dients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 886-893. IEEE Computer Society. 

Deng J., Berg A. C., Li K., et Li F.-F. (2010). What does classifying more than 10, 000 image categories tell us? European Conference on Computer Vision, p. 71-84. 

Deng J., Dong W., Socher R., Li L.-J., Li K., et Li F.- F. (2009). Imagenet: A large-scale hierarchical image database. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 248-255.  

Do T-N, Nguyen V-H., Poulet F. (2008). Speed up SVM Algorithms for Massive Classification Tasks. Proc. of Advanced Data Mining and Applications (ADMA’08), Chengdu, China, C.Tang, C.X.Ling, X.Zhou, N.Cercone, X.i (Eds), Lecture in Computer Science, vol. 5139, Springer-Verlag, Oct.2008, p. 147-157. 

Everingham M., Van Gool L., Williams C. K. I., Winn J., et Zisserman A. (2010). The Pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303-338. 

Fergus R., Weiss Y., et Torralba A. (2009). Semi- supervised learning in gigantic image collections. Advances in Neural Information Processing Systems, p. 522-530. 

Griffin G., Holub A., et Perona P. (2007). Caltech-256 Object Category Dataset. Technical Report CNS-TR-2007-001, California Institute of Technology. 

Griffin G. et Perona D. (2008). Learning and using taxonomies for fast visual categorization. IEEE Computer Society Conference on Computer Vision and Pattern Recognition.  

Guermeur Y. (2007). Svm multiclasses, the´orie et applications. HDR, Université Nancy 1. 

Hido S. et Kashima H. (2008). Roughly balanced bagging for imbalanced data. Proc. of SIAM International Conference on Data Mining, p. 143-152. 

Hsieh C.-J., Chang K.-W., Lin C.-J., Keerthi S. S., et Sundararajan S. (2008). A dual coordinate descent method for large-scale linear SVM. International Conference on Machine Learning, p. 408-415. Japkowicz N., editor (2000). AAAI’Workshop on Learning from Imbalanced Data Sets, number WS-00-05 in AAAI Tech Report. 

Je´gou H., Douze M., et Schmid C. (2011). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1):117-128. 

Krebel U. (1999). Pairwise classification and support vector machines. Advances in Kernel Methods: Support Vector Learning, p. 255-268. 

Lazebnik S., Schmid C., et Ponce J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 2169-2178. 

Lenca P., Lallich S., Do T. N., et Pham N. K. (2008). A comparison of different off-centered entropies to deal with class imbalance for decision trees. The Pacific Asia Conference on Knowledge Discovery and Data Mining, LNAI 5012, p. 634-643. Springer-Verlag. 

Li F.-F., Fergus R., et Perona P. (2007). Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 106(1):59-70. 

Li Y., Crandall D. J., et Huttenlocher D. P. (2009). Landmark classification in large-scale image collections. IEEE 12th International Conference on Computer Vision, p. 19571964. IEEE. 

Lin Y., Lv F., Zhu S., Yang M., Cour T., Yu K., Cao L., et Huang T. S. (2011). Large-scale image classification: Fast feature extraction and svm training. IEEE Computer Society Conference on Computer Vi- sion and Pattern Recognition, p. 1689-1696. 

Liu X.-Y., Wu J., et Zhou Z.-H. (2009). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 39(2):539-550. 

Lowe D. G. (1999). Object recognition from local scale-invariant features. Proc. of International Conference on Computer Vision, vol. 2, p.1150-1157. 

Maji S., Berg A. C., et Malik J. (2008). Classification using intersection kernel support vector machines is efficient. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 

Maji S., Berg A. C., et Malik J. (2013). Efficient classification for additive kernel SVMs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):66-77. 

Mikolajczyk K. et Schmid C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(27):1615-1630. 

MPI-Forum. Mpi: A message passing interface standard.  

OpenMP Architecture Review Board (2008). OpenMP application program interface version 3.0. 

Perronnin F., Sa´nchez J., et Liu Y. (2010). Large-scale image categorization with explicit data embedding. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 2297-2304. 

Pham N. K., Do T. N., Lenca P., et Lallich S. (2008). Using local node information in decision trees: coupling a local decision rule with an off-centered entropy. International Conference on Data Mining, p. 117-123, Las Vegas, Nevada, USA. CSREA Press. 

Platt J., Cristianini N., et Shawe-Taylor J. (2000). Large margin dags for multiclass classification. Advances in Neural Information Processing Systems, 12:547-553. 

Poulet F., Pham N-K. (2010). High Dimensional Image Categorization. 6th International Conference of ADMA’10, Advanced Data Mining and Applications, Chongqing, China, Springer LNCS 6440, p. 147-157. 

Ricamato M. T., Marrocco C., et Tortorella F. (2008). Mcs-based balancing techniques for skewed classes: An empirical comparison. International Conference on Pattern Recognition, p. 1-4. 

Sánchez J. et Perronnin F. (2011). High-dimensional signature compression for large-scale image classification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 1665-1672. 

Vapnik V. (1995). The Nature of Statistical Learning Theory. Springer-Verlag. 

Vedaldi A., Gulshan V., Varma M., et Zisserman A. (2009). Multiple kernels for object detection. IEEE 12th International Conference on Computer Vision, p. 606-613. 

Vedaldi A. et Zisserman A. (2012). Efficient additive kernels via explicit feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3):480-492. 

Visa S. et Ralescu A. (2005). Issues in mining imbalanced data sets - A review paper. Midwest Artificial Intelligence and Cognitive Science Conf., p. 67-73, Dayton, USA. 

Vural V. et Dy J. (2004). A hierarchical method for multi-class support vector machines. Proceedings of the twenty-first international conference on Machine learning, p. 831-838. 

Wang C., Yan S., et Zhang H.-J. (2009). Large scale natural image classification by sparsity exploration. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, p. 3709-3712. IEEE. 

Weiss G. M. et Provost F. (2003). Learning when train- ing data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19:315-354. 

Weston J. et Watkins C. (1999). Support vector machines for multi-class pattern recognition. Proceedings of the Seventh European Symposium on Artificial Neural Networks, p. 219224. 

Wu J. (2010). A fast dual method for hik svm learning. In Daniilidis, K., Maragos, P., and Paragios, N., editors, European Conference on Computer Vision, vol. 6312 of Lecture Notes in Computer Science, p. 552-565. Springer. 

Wu, J. (2012). Power mean svm for large scale visual classification. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, p. 2344-2351. 

Wu J., Tan W.-C., et Rehg J. M. (2011). Efficient and effective visual codebook generation using additive kernels. Journal of Machine Learning Research, 12:3097-3118. 

Yu K., Zhang T., et Gong Y. (2009). Nonlinear learning using local coordinate coding. Advances in Neural Information Processing Systems, p. 2223-2231. 

Yuan G.-X., Ho C.-H., et Lin C.-J. (2012). Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584-2603. 

Zhou X., Yu K., Zhang T., et Huang T. S. (2010). Image classification using super-vector coding of local image descriptors. European Conference on Computer Vision, p. 141-154.