OPEN ACCESS
In the field of scene analysis for computer vision, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated for each task. Using an adaptive vision system provides a more flexible solution as its analysis strategy can be changed according to the information available concerning the execution context. We describe how to create and evaluate a visual attention system tailored for interacting with a computer vision system so that it adapts its processing according to the interest (or salience) of each element of the scene. We propose a new set of constraints named PAIRED to evaluate the adequacy of a model with respect to its different applications. We justify why dynamical systems provide good properties for simulating the dynamic competition between different kinds of information. We present different results that demonstrate that our results are fast and highly configurable and plausible.
Extended Abstract
While machine vision systems are becoming increasingly powerful, in most regards they are still far inferior to their biological counterparts. In human, the mechanisms of evolution have generated the visual attention system which selects the most important information in order to reduce both cognitive load and scene understanding ambiguity. Thus, studying the biological systems and applying the findings to the construction of computational vision models and artificial vision systems are a promising way of advancing the field of machine vision.
In the field of scene analysis for computer vision, a trade-off must be found between the quality of the results expected, and the amount of computer resources allocated for each task. It is usually a design time decision, implemented through the choice of pre-defined algorithms and parameters. However, this way of doing it limits the generality of the system. Using an adaptive vision system provides a more flexible solution as its analysis strategy can be changed according to the information available concerning the execution context. As a consequence, such a system requires some kind of guiding mechanism to explore the scene faster and more efficiently.
In this article, we propose a first step to building a bridge between computer vision algorithms and visual attention. In particular, we describe how to create and evaluate a visual attention system tailored for interacting with a computer vision system so that it adapts its processing according to the interest (or salience) of each element of the scene. Somewhere in between hierarchical salience based and competitive distributed models, we propose a hierarchical yet competitive model. Our original approach allows us to generate the evolution of attentional focus points without the need of either saliency map or explicit inhibition of return mechanism. This new real-time computational model is based on a dynamical system. The use of such a complex system is justified by an adjustable trade-off between nondeterministic attentional behavior and properties of stability, reproducibility and reactiveness.
In the first two sections, we start by giving a brief overview of the main theories and concepts of human visual attention and we provide the forces and weaknesses of state of the art attention models. This analysis is based on their potential of integration into adaptable computer vision system. We propose a new set of constraints called ‘PAIRED’ to evaluate the adequacy of a model with respect to its different applications.
In a third section, we provide an in-depth description of our model and its implementation. We justify why dynamical systems are a good choice for visual attention simulation, and we show that preys/predators models provide good properties for simulating the dynamic competition between different kinds of information. This dynamical system is also used to generate a focus point at each time step of the simulation. In order to show that our model can be integrated in an adaptable computer vision system, we show that this architecture is fast and allows a flexible real time visual attention simulation. In particular, we present a feedback mechanism used to change the scene exploration behavior of the model. This mechanism can be used to maximize the scene coverage (explore each and every part) or maximize focalization on a particular salient area (tracking).
In a last section we present the evaluation results of our model. Since the model is highly configurable, its evaluation will not cover not its plausibility compared to human eye fixations (already studied in (Perreira Da Silva et al., 2011)), but the influence of each parameter on a set of properties:
– stability: do the values of the dynamical system stay within their nominal range when the different parameters of the model are changed?
– reproducibility: as discrete dynamical system can have a chaotic behavior, what is the influence of the various parameters of the model (in particular, noise) on the variability of the focus paths generated during different simulations on the same data?
– scene exploration: which parameters influence the scene exploration strategy of our model?
– system dynamics: how can we influence the reactivity of the system? In particular how do we deal with mean fixation time?
For all of these properties we have also studied the influence of top-down feedback.
RÉSUMÉ
Dans le domaine de l’analyse de scène en vision par ordinateur, un compromis doit être trouvé entre la qualité des résultats attendus et les ressources allouées pour effectuer les traitements. Une solution flexible consiste à utiliser un système de vision adaptatif capable de moduler sa stratégie d’analyse en fonction de l’information disponible et du contexte. Dans cet article, nous décrivons comment concevoir et évaluer un système d’attention visuelle conçu pour interagir avec un système de vision de façon à ce que ce dernier adapte ses traitements en fonction de l’intérêt (de la saillance) de chaque élément de la scène. Nous proposons également un nouvel ensemble de contraintes nommé PAIRED, permettant d’évaluer l’adéquation du modèle à différentes applications. Nous justifions le choix des systèmes dynamiques par leurs propriétés intéressantes pour simuler la compétition entre différentes sources d’informations. Nous présentons enfin une validation à travers différentes métriques montrant que nos résultats sont rapides, hautement configurables et pertinents.
attention model, dynamical model, adaptive vision, implementation, evaluation.
MOTS-CLÉS
modèle dynamique d’attention, vision adaptative, implémentation, évaluation.
Ahmad S. (1992). Visit: An efficient computational model of human visual attention. Phd, University of Illinois, Champaign, IL. http://ftp.icsi.berkeley.edu/ftp/pub/techreports/1991/tr-91-049.pdf
Allport D. A. (1987). Selection for action: Some behavioral and neurophysiological considerations of attention and action. In H. Heuer, S. A.F. (Eds.), Perspectives on perception and action, p. 395–419. Hillsdale, NJ, Lawrence Erlbaum Associates.
Avraham T., Lindenbaum M. (2010). Esaliency (extended saliency): meaningful attention using stochastic image modeling. IEEE transactions on pattern analysis and machine intelligence, vol. 32, no 4, p. 693–708. http://www.ncbi.nlm.nih.gov/pubmed/20224124
Aziz M., Mertsching B. (2009). Towards Standardization of Evaluation Metrics and Methods for Visual Attention Models. In Attention in cognitive systems, p. 227–241. Springer. http://www.springerlink.com/index/v713433834617727.pdf
Baldi P., Itti L. (2005). Attention: Bits versus Wows. In 2005 international conference on neural networks and brain, p. 56–61. Ieee. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1614548
Belardinelli A., Pirri F., Carbone A. (2009). Motion Saliency Maps from Spatiotemporal
Filtering. In Lecture notes in artificial intelligence, p. 112–123. Springer. http://www.springerlink.com/index/425j618q84762l43.pdf
Berthoz A. (2009). La simplexité. Paris, Odile Jacob.
Bruce B., Jernigan E. (2003). Evolutionary design of context-free attentional operators. In Proc. icip’03, p. 0–3. Citeseer. http://www.cse.yorku.ca/~neil/ICIPnbruce.pdf
Bruce N. D. B., Tsotsos J. K. (2009). Saliency, attention, and visual search: An information theoretic approach. Journal of Vision, vol. 9, no 3, p. 5. http://www.journalofvision.org/content/9/3/5.full.pdf
Deco G. (2004). A Neurodynamical cortical model of visual attention and invariant object recognition. Vision Research, vol. 44, no 6, p. 621–642. http://linkinghub.elsevier.com/retrieve/pii/S0042698903006928
Desimone R., Duncan J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, vol. 18, p. 193–222. http://www.ncbi.nlm.nih.gov/pubmed/7605061
Dorr M., Gegenfurtner K. R., Barth E. (2010). Variability of eye movements when viewing dynamic natural scenes. Journal of Vision, vol. 10, p. 1–17. http://www.journalofvision.org/content/10/10/28.full.pdf
Eliasmith C. (1995). Mind as a dynamical system. Thèse de master, University of Waterloo. http://www.arts.uwaterloo.ca/~celiasmi/Papers/eliasmith.1995.dynamic% 20mind.masters.pdf
Fox M. D., Snyder A. Z., Vincent J. L., Raichle M. E. (2007). Intrinsic Fluctuations within Cortical Systems Account for Intertrial Variability in Human Behavior. Neuron, vol. 56, no 1, p. 171–184. http://linkinghub.elsevier.com/retrieve/pii/S0896627307006666
Frintrop S. (2005). VOCUS: A Visual Attention System for Object Detection and Goal- Directed Search. Phd, University of Bonn. http://www.iai.uni-bonn.de/~frintrop/paper/frintrop_phd06.pdf
Frintrop S., Backer G., Rome E. (2005). Selecting what is important: Training visual attention. In 28th annual german conference on ai (ki), p. 351–366. Koblenz, Germany, Springer Verlag. http://www.iai.uni-bonn.de/~frintrop/paper/frintrop_etal_ki05.pdf
Frintrop S., Klodt M., Rome E. (2007). A real-time visual attention system using integral images. In 5th international conference on computer vision systems (icvs). Bielefeld, Germany, Applied Computer Science Group. http://biecoll.ub.uni-bielefeld.de/volltexte/2007/36/pdf/ICVS2007-66.pdf
Gilles S. (1996). Description and experimentation of image matching using mutual information. Rapport technique. Oxford University, Robotics Research Group, Department of Engineering Science. http://www.robots.ox.ac.uk/~cvrg/trinity2002/seb/mutual_info.ps.gz
Hamker F. (2005). The emergence of attention by population-based inference and its role in distributed processing and cognitive control of vision. Computer Vision and Image Understanding, vol. 100, no 1-2, p. 64–106. http://linkinghub.elsevier.com/retrieve/pii/S1077314205000767
Heijden A. H. C. van der, Bem S. (1997). Successive approximations to an adequate model of attention. Consciousness and cognition, vol. 6, no 2-3, p. 413–28. http://www.ncbi.nlm.nih.gov/pubmed/9262419
Idema T. (2005). The behaviour and attractiveness of the Lotka-Volterra equations. Phd, Universiteit Leiden. http://www.ilorentz.org/\~{ idema/publications/maththesis.pdf
Itti L., Koch C. (2001). Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, vol. 10, p. 161–169. http://papers.klab.caltech.edu/84/
Itti L., Koch C., Niebur E., Others. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, vol. 20, no 11, p. 1254–1259. http://ilab.usc.edu/publications/doc/Itti_etal98pami.pdf
Kadir T., Brady M. (2001). Saliency, scale and image description. International Journal of Computer Vision, vol. 45, no 2, p. 83–105. http://www.springerlink.com/index/T45N2G8543574026.pdf
Koch C., Ullman S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiology, vol. 4, no 4, p. 219–27. http://papers.klab.caltech.edu/104/1/200.pdf
Le Meur O. (2005). Attention sélective en visualisation d’images fixes et animées affichées sur écran : modèles et évaluation de performances - applications. Thèse de doctorat, Ecole polytechnique de l’Université de Nantes. http://www.irisa.fr/temics/staff/lemeur/publi/LeMeur_These.pdf
Le Meur O., Le Callet P. (2009). What we see is most likely to be what matters: Visual attention and applications. In International conference on image processing. Cairo, Egypt. http://www.irisa.fr/temics/staff/lemeur/publi/LeMeur_ICIP09.pdf
Le Meur O., Le Callet P., Barba D., Thoreau D. (2006). A coherent computational approach to model bottom-up visual attention. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no 5, p. 802–817. http://www.irccyn.ec-nantes.fr/\~{}lecallet/paper/LeMeur-IEEEPAMI06.pdf
Lesser M., Dinah M. (1998). Mind as a dynamical system : Implications for autism. In In psychobiology of autism : current research & practice.
Lienhart R., Maydt J. (2002). An extended set of haar-like features for rapid object detection. In Ieee icip, vol. 1, p. 900–903. Citeseer. http://mmc36.informatik.uni-augsburg.de/mediawiki/images/c/c3/Icip2002.pdf
Lopez M., Fernandezcaballero A., Fernandez M., Mira J., Delgado A. (2006). Motion features to enhance scene segmentation in active visual attention. Pattern Recognition Letters, vol. 27, no 5, p. 469–478. http://linkinghub.elsevier.com/retrieve/pii/S0167865505002631
Mancas M. (2007). Computational Attention : Towards attentive computers. Phd, Faculté Polytechnique de Mons. http://theses.eurasip.org/media/theses/documents/mancas-matei-computational-attention-towards-attentive-computers.pdf
Mozer M. C., Sitton M. (1998). Computational modeling of spatial attention. Attention, p. 341–393. http://www.nbu.bg/cogs/events/2002/materials/Mozer/mozer1998.pdf
Murray J. (2003). Mathematical biology: An introduction. Berlin, Heidelberg, Springer Verlag.
Navalpakkam V., Arbib M., Itti L. (2005). Attention and scene understanding. In L. Itti, G. Rees, J. Tsotsos (Eds.), Neurobiology of attention, p. 197–203. ACADEMIC PRESS. http://ilab.usc.edu/publications/doc/Navalpakkam_etal05noa.pdf
Navalpakkam V., Itti L. (2006). Top-down attention selection is fine grained. Journal of Vision, vol. 6, no 11, p. 4. http://www.journalofvision.org/content/6/11/4.full.pdf
Orabona F., Metta G., Sandini G. (2008). A Proto-object based visual attention model. In L. Paletta (Ed.), Attention in cognitive systems. theories and systems from an interdisciplinary viewpoint (wapcv), p. 198–215. Berlin, Heidelberg, Springer. http://www.springerlink.com/
index/71U3T3262424M763.pdf
Park S., An K., Lee M. (2002). Saliency map model with adaptive masking based on independent component analysis. Neurocomputing, vol. 49, no 1, p. 417–422. http://www.ingentaconnect.com/content/els/09252312/2002/00000049/00000001/art00637
Perreira Da Silva M., Courboulay V., Estraillier P. (2011). Objective validation of a dynamical and plausible computational model of visual attention. In 3rd european workshop on visual information processing (euvip). http://hal.archives-ouvertes.fr/docs/00/61/77/30/PDF/euvip_perreira.pdf
Peters R., Iyer A., Itti L., Koch C. (2005). Components of bottom-up gaze allocation in natural images. Vision Research, vol. 45, p. 2397–2416. http://linkinghub.elsevier.com/retrieve/pii/S0042698905001975
Rensink R. A. (2000). The dynamic representation of scenes. Visual Cognition, vol. 7, p. 17–42. http://homepages.rpi.edu/~grayw/courses/cogs6962/papers/REN00_VisCog.pdf
Rissanen J. (1978). Modeling by shortest data description. Automatica, vol. 14, p. 465–471.
Spratling M. W., Johnson M. H. (2004). A feedback model of visual attention. Journal of cognitive neuroscience, vol. 16, no 2, p. 219–37. http://www.ncbi.nlm.nih.gov/pubmed/15068593
Sun Y., Fisher R.,Wang F., Gomes H. (2008). A computer vision model for visual-object-based attention and eye movements. Computer Vision and Image Understanding, vol. 112, no 2, p. 126–142. http://linkinghub.elsevier.com/retrieve/pii/S1077314208000167
Tatler B. W. (2007). The central fixation bias in scene viewing : Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, vol. 7, p. 1–17. http://www.journalofvision.org/content/7/14/4.full.pdf
Tatler B.W., Baddeley R. J., Gilchrist I. D. (2005). Visual correlates of fixation selection: effects of scale and time. Vision research, vol. 45, no 5, p. 643–59. http://www.ncbi.nlm.nih.gov/pubmed/15621181
Torralba A., Oliva A., Castelhano M. S., Henderson J. M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychological review, vol. 113, no 4, p. 766–86. http://www.ncbi.nlm.nih.gov/pubmed/17014302
Treisman A. (1969). Strategies and models of selective attention. Psychological Review, vol. 76, p. 282–299.
Treisman A., Gelade G. (1980). A Feature-Integration Theory of Attention. Cognitive Psychology, vol. 136, no 12, p. 97–136. http://www.yorku.ca/mfallah/bandb/treisman_gelade.pdf
Tsotsos J., Liu Y., Martineztrujillo J., Pomplun M., Simine E., Zhou K. (2005). Attending to visual motion. Computer Vision and Image Understanding, vol. 100, no 1-2, p. 3–40. http://linkinghub.elsevier.com/retrieve/pii/S1077314205000779
Tsotsos J. K. (1990). Analysing vision at the complexity level. Behavioral. and. Brain. Sciences, vol. 13, p. 423–469. http://www.cse.yorku.ca/~tsotsos/Homepage%20of%20John%20K_files/bbs-90.pdf
Tsotsos J. K. (2007). A selective History of Visual Attention. ECCV 2008 Tutorial. http://www.cse.yorku.ca/~albertlr/attention_tutorial_eccv2008.htm
Van Rullen R., Koch C. (2005). Visual Attention and Visual Awareness. In G. Celesia (Ed.), Disorders of visual processing, vol 5, vol. 91125, p. 65–83. Elsevier. http://papers.klab.caltech.edu/277/1/442.pdf
Viola P., Jones M. (2002). Robust real-time object detection. International Journal of Computer
Vision, vol. 57, no 2, p. 137–154. http://research.microsoft.com/en-us/um/people/viola/
Pubs/Detect/violaJones_IJCV.pdf
Vitay J., Rougier N., Alexandre F. (2005). A distributed model of spatial visual attention. In Biomimetic neural learning for intelligent robots, p. 54–72. Springer. http://www.springerlink.com/index/2qwwddx022jy6naq.pdf
Walther D., Koch C. (2006). Modeling attention to salient proto-objects. Neural networks : the official journal of the International Neural Network Society, vol. 19, no 9, p. 1395–407. http://www.ncbi.nlm.nih.gov/pubmed/17098563