Simulation de point de vue pour la mise en correspondance et la localisation

Simulation de point de vue pour la mise en correspondance et la localisation

Pierre Rolin Marie-Odile Berger  Frédéric Sur 

LORIA, UMR CNRS 7503 Université de Lorraine INRIA Nancy Grand Est

Corresponding Author Email: 
pierre.rolin@loria.fr
Page: 
169-194
|
DOI: 
https://doi.org/10.3166/TS.32.169-194
Received: 
8 December 2014
| |
Accepted: 
2 June 2015
| | Citation

OPEN ACCESS

Abstract: 

We consider the problem of camera pose estimation from a scene model obtained beforehand by a structure from-motion (SfM) algorithm. The model is made of 3D points, each one of them being represented by its coordinates and a set of photometric descriptors such as SIFT, extracted from some of the input images of the SfM stage. Pose estimation is based on the matching of interest points from a test view with model points, using the descriptors. Descriptors having a limited invariance with respect to viewpoint changes, such an approach is likely to fail when the test view is far away from the images used to construct the model. Viewpoint simulation techniques, as ASIFT, have proved effective for wide-baseline image matching. This paper explores how these techniques can enrich a scene model by adding descriptors from simulated views, and evaluate the respective benefits of affine and homographic simulations. In particular we show that viewpoint simulation increases the proportion of correct correspondances, and permits pose estimation in situations where the approach based on the sole SIFT descriptors simply fails.

RÉSUMÉ

On considère le problème de la localisation d’une caméra à partir d’un modèle non structuré obtenu par un algorithme de type structure from motion. Dans ce modèle, un point est représenté par ses coordonnées et un ensemble de descripteurs photométriques issus des images dans lesquelles il est observé. La localisation repose sur l’appariement de points d’intérêt de la vue courante avec des points du modèle, sur la base des descripteurs. Cependant le manque d’invariance des descripteurs aux changements de point de vue rend difficile la mise en correspondance dès que la vue courante est éloignée des images ayant servi à construire le modèle. Les techniques de simulation de point de vue, comme ASIFT, ont récemment montré leur intérêt pour la mise en correspondance entre images. Cet article explore l’apport de ces techniques pour enrichir le modèle initial par des descripteurs simulés et évalue le bénéfice respectif de simulations affines et homographiques. Nous montrons en particulier que la simulation augmente la proportion de bons appariements et la précision du calcul de pose et permet de calculer une pose là où l’approche basée uniquement sur les descripteurs SIFT échoue.

Keywords: 

pose estimation, viewpoint simulation, 3D model matching

MOTS-CLÉS

calcul de pose, simulation de point de vue, appariement à un modèle 3D

1. Introduction
2. Simulation De Points De Vue Dans Un Monde Localement Plan
3. Mise En Œuvre
4. Étude Expérimentale
5. Conclusion
  References

Aanæs H., Dahl A., Pedersen K. (2012). Interesting interest points. International Journal of Computer Vision, vol. 97, no 1, p. 18–35.

Bhat S., Berger M.-O., Sur F. (2011). Visual words for 3D reconstruction and pose computation. In Proc. 3DimPVT, p. 326-333.

Boiman O., Shechtman E., Irani M. (2008). In defense of Nearest-Neighbor based image classification. In Proc. Conference on Computer Vision and Pattern Recognition.

Collet A., Berenson D., Srinivasa S., Ferguson D. (2009). Object recognition and full pose registration from a single image for robotic manipulation. In Proc. International Conference on Robotics and Automation, p. 48-55.

Comaniciu D., Meer P. (2002). Mean shift : a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, p. 603 -619.

DeMenthon D., Davis L. (1995). Model-based object pose in 25 lines of code. International Journal of Computer Vision, vol. 15, no 1-2, p. 123–141.

Fischler M., Bolles R. (1981). Random Sample Consensus : A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, vol. 24, no 6, p. 381–395.

Furukawa Y., Ponce J. (2010). Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no 8, p. 1362–1376.

Gordon I., Lowe D. (2006). What and where : 3D object recognition with accurate pose. In J. Ponce, M. Hebert, C. Schmid, A. Zisserman (Eds.), Toward category-level object recognition, vol. 4170, p. 67-82. Springer.

Hartley R. I., Zisserman A. (2004). Multiple view geometry in computer vision (Second éd.). Cambridge University Press.

Hesch J., Roumeliotis S. (2011). A direct least-squares (DLS) method for PnP. In Proc. International Conference on Computer Vision, p. 383–390. Barcelona, Spain.

Hoppe H., DeRose T., Duchamp T., J.McDonald, Stuetzle W. (1992). Surface reconstruction from unorganized points. In Computer graphics (SIGGRAPH ’92 proc.), vol. 26, p. 71–78.

Hsiao E., Collet A., Hebert M. (2010). Making specific features less discriminative to improve point-based 3D object recognition. In Proc. Conference on Computer Vision and Pattern Recognition, p. 2653–2660.

Irschara A., Zach C., Frahm J.-M., Bischof H. (2009). From structure-from-motion point clouds to fast location recognition. In Proc. Conference on Computer Vision and Pattern Recognition, p. 2599-2606.

Kushnir M., Shimshoni I. (2012). Epipolar geometry estimation for urban scenes with repetitive structures. In Proc. Asian Conference on Computer Vision, p. 163-176.

Lepetit V., Fua P. (2005). Monocular model-based 3D tracking of rigid objects : A survey. Foundations and Trends in Computer Graphics and Vision, vol. 1, no 1, p. 1–89.

Lepetit V., Moreno-Noguer F., Fua P. (2009). EPnP : An Accurate O(n) Solution to the PnP Problem. International Journal of Computer Vision, vol. 81, no 2, p. 155–166.

Liu Z., Monasse P., Marlet R. (2014). Match selection and refinement for highly accurate twoview structure from motion. In Proc. European Conference on Computer Vision, p. 818–833.

Lowe D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no 2, p. 91–110.

Moreels P., Perona P. (2007). Evaluation of features detectors and descriptors based on 3D objects. International Journal of Computer Vision, vol. 73, no 3, p. 263–284.

Morel J.-M., Yu G. (2009). ASIFT : A new framework for fully affine invariant image comparison. SIAM Journal on Imaging Sciences, vol. 2, no 2, p. 438-469.

Morel J.-M., Yu G. (2011). Is SIFT scale invariant ? AIMS Inverse Problems and Imaging, vol. 5, no 1, p. 115–136.

Mount D., Arya S. (2010). ANN : A library for approximate nearest neighbor searching. http://www.cs.umd.edu/~mount/ANN/.

Noury N., Sur F., Berger M.-O. (2010). How to overcome perceptual aliasing in ASIFT ? In Proc. International Symposium on Visual Computing, part. 1, p. 231–242.

Ozuysal M., Calonder M., Lepetit V., Fua P. (2010). Fast keypoint recognition using random ferns. IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 32, no 3, p. 448–461.

Roberts R., Sinha S., Szeliski R., Steedly D. (2011). Structure from motion for scenes with large duplicate structures. In Proc. Conference on Computer Vision and Pattern Recognition, p. 3137–3144.

Rothganger F., Lazebnik S., Schmid C., Ponce J. (2006). 3D object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. International Journal of Computer Vision, vol. 66, no 3, p. 231–259.

Schindler G., Brown M., Szeliski R. (2007). City-scale location recognition. In Proc. Conference on Computer Vision and Pattern Recognition.

Sur F., Noury N., Berger M.-O. (2013). An a contrario model for matching interest points under geometric and photometric constraints. SIAM Journal on Imaging Sciences, vol. 6, no 4, p. 1956–1978.

Williams B., Klein G., Reid I. (2007). Real-time SLAM relocalisation. In Proc. International Conference on Computer Vision.

Wu C. (2011). VisualSFM : A visual structure from motion system. http://homes.cs.washington.edu/~ccwu/vsfm/.

Wu C., Agarwal S., Curless B., Seitz S. (2011). Multicore bundle adjustment. In Proc. Conference on Computer Vision and Pattern Recognition, p. 3057-3064.

Wu C., Clipp B., Li X., Frahm J.-M., Pollefeys M. (2008). 3D model matching with viewpointinvariant patches (VIP). Proc. Conference on Computer Vision and Pattern Recognition.

Yu G., Morel J.-M. (2011). ASIFT : An algorithm for fully affine invariant comparison. Image Processing On Line, vol. 2011.