Understanding vision: theory, models, and data.pdf

发布时间：2022-06-10 发布人：admin 分类：说明书资料大小：7.29M 资料格式：pdf 举报版权申诉

haz2yy-10826348-4744300845401081708.pdf-第1页.png

第1页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第2页.png

第2页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第3页.png

第3页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第4页.png

第4页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第5页.png

第5页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第6页.png

第6页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第7页.png

第7页 / 共191页

haz2yy-10826348-4744300845401081708.pdf-第8页.png

第8页 / 共191页

文本预览

Understanding vision: theory, models, and data (provisional title) cLi Zhaoping University College London, UK This book came originally from lecture notes used to teach students on computational/theoretical vision. Some readers may ﬁnd that a paper, Zhaoping (2006) “Theoretical understanding of the early visual processes by data compression and data selection” in Network: Computation in neural systems 17(4):301-334, is an abbreviation of some parts in chapter 2-4 of this book. The book is still a very rough draft — I hope to update the draft continuously and make it available to students. Feedbacks are welcome. If you like better explanations or more details in any parts of this manuscript, or if you think certain parts of the text are not clear or confusing, or anything else, please do not hesitate to contact me at z.li@ucl.ac.uk . This document was produced on September 29, 2011

Contents 1 Introduction and scope 1.1 The approach . . 1.2 The problem of vision . 1.3 What is known about vision experimentally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Theory, models, and data . . . . . . . . . . . . . 1.3.1 Neurons, neural circuits, cortical areas, and the brain . . 1.3.2 Visual processing stages along the visual pathway . . . . . . . . 1.3.3 Retina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Vision seen through visual encoding, selection, and decoding . . . . . 1.2.2 Retina and V1 seen through visual encoding and bottom-up selection . . . . . . . . 1.2.3 Visual decoding and higher visual cortical areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The retinotopic map . . The receptive ﬁelds in the primary visual cortex — the feature detectors . . . The inﬂuences on a V1 neuron’s response from contextual stimuli outside the . . . . . . . . Receptive ﬁelds of the retinal ganglion cells . . . . Contrast sensitivity to sinusoidal gratings . . . . . . . . . . . . . Color processing in the retina . . . . . . . . . . . . Spatial sampling on the retina . . . . . . . . . . . . . . . . . . . . . . . . . . . receptive ﬁeld . . . 1.3.5 The higher visual areas . . . 1.3.6 Behavioral studies on vision . . . . 1.3.7 Etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.4 The primary visual cortex (V1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Information encoding in early vision: the efﬁcient coding principle . . . . . . . . . . . . . . . . . . . . 2.2 Formulation of the efﬁcient coding principle . . . . . . . 2.3 Efﬁcient neural sampling in the retina . . . 2.1 A brief introduction on information theory — skip if not needed . . . . . . . . . . . . Measuring information amount . . . . . . . . Information transmission, information channels, and mutual information . . . . . . . . . . Information redundancy and error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Contrast sampling in a ﬂy’s compound eye . . . . . . . . . 2.3.2 Spatial sampling by receptor distribution on the retina . . 2.3.3 Color sampling by wavelength sensitivities of the cones . . . . Efﬁcient coding by early visual receptive ﬁelds . . . . . . . . . . . 2.4.1 Obtaining the efﬁcient code, and the related sparse code, in low noise limit . . . . . . . . . 2.4.2 The general analytical solution to efﬁcient codings of gaussian signals . . . . . . . Illustration: stereo coding in V1 . . . . . . . . . . . 2.5.1 Principal component analysis . . . . 2.5.2 Gain control . . . . . . . . . . . 2.5.3 Contrast enhancement, decorrelation, and whitening in the high S/N region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . by numerical simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 7 7 7 8 9 11 12 13 13 14 16 16 20 23 24 24 25 26 30 31 31 31 33 33 34 35 37 40 43 43 44 47 48 48 50 52 53 57 59

4 CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Smoothing and output correlation in the low S/N region . . . . . . 2.5.4 Many equivalent solutions of optimal encoding . . . . . . . 2.5.5 A special, most local, class of optimal coding . . . . . 2.5.6 . . . . 2.5.7 Adaptation of the optimal code to the statistics of the input environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binocular cells, monocular cells, and ocular dominance columns . . Coupling between stereo coding and spatial scale coding . . . . . . . . . . . Adaptation of stereo coding to light levels . . . . . . Strabismus . . . . . . . . . . . . Adaptation of stereo coding with animal species . . . . . . . . . Coupling between stereo coding and the preferred orientation of V1 neurons Monocular deprivation . . . . 2.6 Applying efﬁcient coding to understand coding in space, color, time, and scale in . . . . . . retina and V1 . . . . . . . . . . . 2.6.1 Efﬁcient spatial coding for retina . . . . . . . . . . . 2.6.2 Efﬁcient coding in time . . . . 2.6.3 Efﬁcient coding in color . . . . . . . . . . . . 2.6.4 Coupling space and color coding in retina . . . . . 2.6.5 Efﬁcient Spatial Coding in V1 . . . . . . . . . 2.6.6 Coupling the spatial and color coding in V1 . . 2.6.7 Coupling spatial coding with stereo coding . . . . . 2.6.8 Coupling spatial space with temporal, chromatic, and stereo coding in V1 . . 2.7 How to get the efﬁcient codes by developmental rules or unsupervised learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 V1 and information coding visual objects . 3.1 Pursuit of efﬁcient coding in V1 by reducing higher order redundancy . . . . 3.1.1 Higher order statistics contains much of the meaningful information about . . . . . . . . . . . . . . 3.1.2 Characterizing higher order statistics . . . . 3.1.3 Efforts to understand V1 by removal of higher order redundancy . 3.2 Problems in understanding V1 by the goal of efﬁcient coding . . . . . . . . 3.3 Meanings versus Amount of Information, and Information Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Information selection in early vision: the V1 hypothesis — creating a bottom up saliency map for pre-attentive selection and segmentation 4.1 The problems and frameworks . . . . . . 4.1.1 The problem of visual segmentation . . . 4.1.2 Visual selection, attention, and saliency . . . . . . . . . . . . . . . . . . . Visual saliency, and a brief overview of its behavioral manifestation . . . . . How can one probe bottom-up saliency through reaction times when behav- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 61 61 63 63 64 65 65 65 65 66 66 68 75 78 79 81 85 86 86 86 89 89 89 91 93 93 95 97 97 97 98 99 4.2 Testing the V1 saliency map in a V1 model . . . . Saliency regardless of input features . . . Detailed formulation of the V1 saliency hypothesis ior is controlled by both top-down and bottom-up factors? . . . . . . . . . . . . . . . 4.2.1 The V1 model: its neural elements, connections, and desired behavior . . . . . . 4.2.2 Calibration of the V1 model to the biological reality . . . . . 4.2.3 Computational requirements on the dynamic behavior of the model . . . . . . . 4.2.4 Applying the V1 model to visual segmentaion and visual search . . . . . Quantitative assessments of saliency from the V1 responses . . . . Feature search and conjunction search by the V1 model . . . . . . . . . . . A trivial case of visual search asymmetry through the presence or the absence . . . . . . . . . . . . . . . . . . . . . . . . . . 101 . . 102 . . 104 . . 107 . . 107 . . 114 . . 114 . . 117 . . 118 . . 120 . . . . . . . . . . . . . . . . . . of a feature in the target . . . . . The inﬂuence of background variability on the ease of visual search . . . Inﬂuence of the density of input items on saliencies by feature contrast . . How does a hole in a texture attract attention? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 . . 123 . . 124 . . 124

CONTENTS 5 Segmenting two identical abutting textures from each other . More subtle examples of visual search asymmetry . . . . . . . . . . . . . . . . . . . . . . . . 128 . 130 4.3 Neural circuit and nonlinear dynamics in the primary visual cortex for saliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Dynamic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . computation . . . . . . . 4.3.1 A minimal model of the primary visual cortex . . A less-than-minimal recurrent model of V1 . . . . A minimal recurrent model with hidden units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Testing the feature-blind “auction” framework of the V1 saliency map . . . . . . A single pair of neurons . . . Two interacting pairs of neurons with non-overlapping receptive ﬁelds A one dimensional array of identical bars . . . . . Two dimensional textures and texture boundaries . . . . . . . . Translation invariance and pop-out . . . . . . . . Filling-in and leaking-out . . . . . . . . . . . . Hallucination prevention, and neural oscillations . . . . . . . . 4.4 Psychophysical test of the V1 theory of bottom up saliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 . 132 . 132 . 137 . 144 . 145 . 145 . 146 . 148 . 151 . 152 . 154 . 156 . 158 . 158 Further discussions and explorations on the interference by task irrelevant features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Contrasting with the feature-map-to-master-map framework of the previous . . . 4.4.2 . . . . views . . . . . . . . . . . . . . . . . . . . . 4.5 The respective roles of V1 and other cortical areas for attentional guidance . . . . . Fingerprints of V1 in the bottom-up saliency . . . . . . . Fingerprint of V1’s conjunctive cells . . . . Fingerprint of V1’s monocular cells . . . . . . . . . . . . Fingerprint of V1’s colinear facilitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Visual recognition and discrimination 6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 161 . 163 . 163 . 166 . 170 . 172 179 181

6 CONTENTS

Chapter 1 Introduction and scope 1.1 The approach Vision is the most intensively studied aspect of the brain, physiologically, anatomically, and behav- iorally.162 The saying that our eyes are the windows to our brain is not unreasonable since, at least in primates, brain areas devoted to visual functions occupy a large portion, about 50% in monkeys (see Fig. (1.5)), of the cerebral cortex. Understanding visual functions can hopefully reveal much about how the brain works. Vision researchers come from many specialist ﬁelds, including physi- ology, psychology, anatomy, medicine, engineering, mathematics and physics, each with its distinct approach and value. A common language is essential for effective communication and collabora- tion between visual scientists. One way to achieve this is to frame and deﬁne everything clearly before communicating the details. This is what I will try my best to do in this book, with a clear def- inition of the problems and terms used whenever the need arises. These deﬁnitions also includes scoping, or division of problems or domains into sub-problems or sub-domains in order to better study them. For example, vision may be divided into low level, mid-level, and high level vision according to a rough temporal progression of the computation involved, and visual attentional se- lection may be divided into those by top-down and bottom-up factors. Many of these divisions and scopings are likely to appear sub-optimal, and can be improved, after more knowledge are ob- tained through research progresses. However, not dividing or scoping the problems and domains now for fear of imperfections in the process often makes the research progress slower. 1.1.1 Theory, models, and data This book aims to understand vision through the interplay between theory, models, and data, each playing their respective roles, as illustrated in Fig. (1.1). Theoretical studies of vision suggest com- putational principles or hypotheses to understand why physiology and anatomy are as they are from visual behavior, and vice versa. They should provide non-trivial insights in the multitudes of experimental observations, link seemingly unrelated data to each other, and motivate experimental investigations. Often, appropriate mathematical formulations of the theories are necessary to make the theories sufﬁciently precise and powerful. Experimental data of all aspects, physiological, be- havioral, anatomical, provide inspiration to, and ultimate tests of, the theories. For example, this book presents detailed materials on two theories of early vision, one is the Efﬁcient coding theory (details in chapter 2) of the early visual receptive ﬁelds, and the other is the V1 saliency hypothesis on a functional role of the primary visual cortex (in chapter 4). The experimental data inspiring the theories include the receptive ﬁelds of the neurons in the retina and cortex and their dependence on the animal species and their adaptation to the environment, human sensitivities to various visual stimuli, the intra-cortical circuits in V1, and the visual behavior in visual search and segmentation tasks. Models, including phenomenological, biophysical, and neural circuit models of neural mech- anisms, are very useful tools in linking the theory and data, particularly when their complexity is 7

8 CHAPTER1. INTRODUCTIONANDSCOPE Theory: hypotheses, principles e.g., early visual processing has a goal to find an efficient representation of visual input information demonstrate implement inspire test predict explain Data: psychological observations physiological and e.g., neurons’ visual receptive fields, visual behavorial sensitivities, and their adaptations to environment Models: characterizing the mechanisms or phenomena e.g., a difference of two gaussian spatial functions for a receptive field of a retinal ganglion neuron fit Figure 1.1: The roles of theory, models, and data in understanding vision. designed to suit the questions asked. They can for example be used to illustrate or demonstrate the theoretical hypotheses, or to test the feasibilities of the hypotheses by speciﬁc neural mechanisms. Note that while the models are very useful, they are just tools intended to illustrate, demon- strate, and to link between the theory and the data. They often involve simpliﬁcations and approxi- mations which make them quantitatively incorrect, as long as their purpose in speciﬁc applications does not require quantitative precision. Hence, their quantitative imprecision should not be the bases to dismiss a theory, especially when simpliﬁed toy models are used to illustrate a theoretical concept. For example, if Newton’s Laws could not predict the trajectory of a rocket precisely be- cause the knowledge about the Earth’s atmosphere was insufﬁcient, the Laws should not be thrown out with the bath water. Similarly, the theoretical proposal that the early visual processing has a goal to recode the raw visual input by an efﬁcient representation (details in chapter 2) could still be correct even if the visual receptive ﬁelds of the retinal ganglion cells are modelled simply as differences of gaussians to illustrate the efﬁcient coding transform. Focusing on the why of the physiology, this book de-emphasizes purely descriptive models concerning what and how, e.g., models of the center-surround receptive ﬁelds of the retinal ganglion cells, or mechanistic models of how orientation tuning in V1 develops, except when using them for illustrative or other purpose. 1.2 The problem of vision Vision could be deﬁned as the inverse problem of imaging or computer graphics, which is the oper- ation of transforming the three dimensional visual world containing objects reﬂecting light to two- dimensional images formed by these lights hitting the imaging planes, see Fig. (1.2). Any visual world can give rise to an unique image given a viewing direction or imaging, simply by projecting in that direction from the 3D scene to a 2D image. Hence, this imaging problem is well understood, as manifested in the success of computer graphics applied to movie making. Meanwhile, the in- verse problem of imaging or graphics is to obtain the three dimensional scene information from the two dimensional images. Human vision is poorly understood, partly because, if we see vision as the inverse problem of imaging, there is typically no unique solution of the three dimensional visual world given the two dimensional images. This can be illustrated explicitly in a simpliﬁed

分享到：

赞收藏

资料库

Understanding vision: theory, models, and data.pdf

相关推荐

音视频

热门标签

最新资料