Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 23
1.1.4 The human vision system
ОглавлениеThe human vision system (HVS) is an important part of the central nervous system, which enables us to observe and perceive our surroundings (Hyvärinen et al 2009). After its long-term adaptation to natural scenes, the HVS is highly efficient in working with natural scenes through multi-layer perceptive operations. Here, natural scenes refer to daily-life inputs to the HVS. Visual perception begins with the pupils which catch light, then the information carried by light photons is processed step by step, and finally analyzed for perception in the brain, as depicted in figure 1.2. This pathway consists of neurons. Typically, a neuron consists of a cell body (soma), dendrites to weight and integrate inputs, and an axon to output a signal, also referred to as an action potential. In the following, we briefly introduce the multiple layer structure of HVS.
Figure 1.2. A schematic of the HVS pathway.
The first stage involves light photons reaching the retina. The retina is the innermost light-sensitive layer of tissue of the eye. It is covered by more than a hundred million photoreceptors, which translate the light into electrical neural impulses. Depending on their function, the photoreceptors can be divided into two types—cone cells and rod cells. Rod cells are mainly distributed in the peripheral area of the fovea, which is sensitive to light and can respond even to a single photon. These cells are mainly responsible for vision in a low-light environment, with neither high acuity nor color sensing. Contrary to rod cells, cone cells are distributed in the fovea region, and are responsible for perception of details and colors in a bright environment, but are light-insensitive.
In the second stage, the electrical signals are transmitted and processed through neural layers. One of the most important cell-types, called Ganglion cells, gather all the information from other cells and send the signal from the eye along their long axons. The visual signals are initially processed in this stage. Neurobiologists have found that the receptive field of ganglion cells is usually centralized or circularly symmetric, with the center either excited or inhibited by light. Such light responses can be simulated by the Laplace of Gaussian (LOG) or zero-phase component (ZCA) operator. We describe two kinds of LOG operator in figure 1.3 from three perspectives: 3D visualization, 2D plane figure, and center profile.
Figure 1.3. Visualization of the LOG operator.
In the HVS, the receptive field of a visual neuron is defined as the specific light pattern over the photoreceptors of the retina which yields the maximum response of the neuron. We illustrate this operation with a vivid example decipted in figure 1.4 with two different operators.
Figure 1.4. Responses of the LOG and Gabor filters, which can be modeled as convolutions with an underlying image. Lena image © Playboy Enterprises, Inc.
Next, the signal is transmitted to the lateral geniculate nucleus (LGN) of the thalamus, which is the main sensory processing area in the brain. The receptive field of the LGN is also centralized or circularly symmetric. After processing by the LGN, the signal is transmitted to the visual cortex at the back of the brain for subsequent processing steps. It is worth mentioning that, different to the retina, the number of ganglion or LGN cells is not great, only just over a million. That is to say, they work with the compressed features from the retina after reducing the redundancy in original data.
The first place in the cortex where most of the signals go is the primary visual cortex, or V1 for short. One type of cell in V1, which we understand the best, is the simple cells, whose receptive fields are well understood (Ringach 2002). Simple cells have responses that depend on the direction and spatial frequency of the stimulus signal. These responses can be modeled as a Gabor function or Gaussian derivative. Hence, the receptive fields of simple cells are interpreted as Gabor-like or directional band-pass filters. The Gabor function can be regarded as a combination of Gaussian and sine functions. There are several parameters to control the shape of a Gabor function. Similarly to LOG visualization, we also describe the Gabor function in figure 1.5 with different parameter settings. Observe how the parameters affect the Gabor function.
Figure 1.5. Visualization of the Gabor function.
With selective characteristics, hundreds of millions of simple cells work together in V1. Neurobiologists have found that only a few cells are activated when a signal is inputted, which means that simple cells implement a sparse coding scheme. After being processed in V1, the signal is transformed to multiple destinations for further processing in the cortex. The destinations can be categorized into ‘where’ and ‘what’ pathways. The ‘where’ pathway is also known as the dorsal pathway going from V1/V2 through V3 to V5. It distinguishes moving objects and helps the brain to recognize where objects are in space. The ‘what’ pathway, namely the ventral pathway, begins from V1/V2 to V4 and inferior temporal cortex, IT, where the HVS performs content discrimination and pattern recognition (Cadieu et al 2007). Given the emphasis of this book on medical imaging, we emphasize the ‘what’ pathway that is modeled as multi-layer perceptive operations from simple to complex when the visual field becomes increasingly larger, as illustrated in figure 1.6.
Figure 1.6. Multi-layer structures of HVS, perceiving the world in multiple stages from primitive to semantic.
In addition to the simple cells, there are also other kinds of visual neurons in the HVS. Another kind of visual cell we have studied extensively is the complex cells, which are mainly distributed in V1, V2, and V3. Complex cells integrate the outputs of nearby simple cells. They respond to specific stimuli located within the receptive field. In addition, there are also hypercomplex cells, called end-stopped cells, which are located in V1, V2, and V3, and respond maximally to a given size of stimuli in the receptive field. This kind of cell is recognized to perceive corners and curves, and moving structures.
To date, the investigation of our brains has been far from sufficient. We only have some partial knowledge of these areas, in particular of deeper layers such as V4 and the posterior regions. Generally speaking, visual cells in V1 and V2 detect primary visual features with selectivity for directions, frequencies, and phases. Some specific cells in V2 also provide stereopsis based on the difference in binocular cues, which helps recover the surface information of an object. In V4, the visual cells perceive the simple geometric shapes of objects in receptive fields larger than that of V2. This shape-oriented analysis capability is due to the selectivity of V4 cells for complex stimuli and is invariant with respect to spatial translation. In posterior regions of the visual pathway, such as the IT, image semantic structures are recognized, which depend on much larger receptive fields than that of V4. In general, billions of various visual neurons construct the hierarchically sophisticated visual system that analyzes and synthesizes visual features for observing and perceiving the outside world. Figure 1.6 illustrates the hierarchy of the HVS.
Fred Attneave and Horace Barlow realized that the HVS perceives surroundings in an ‘economical description’ or ‘economical thought’ that compresses the information redundancy in the visual stimuli. Actually, this point of view suggests an opportunity for us to consider extracting prior information in the HVS perspective. Specifically, in neurophysiological studies Barlow proposed the efficient coding hypothesis in 1961, as a theoretical model of sensory coding in the human brain. In the brain, neurons communicate with one another by sending electrical impulses or spikes (action potentials), which represent and process information on the outside world. Since among the hundreds of millions of neurons in the visual cortex only a few neurons are activated in response to a specific input, Barlow hypothesized that a neural code formed by the spikes represents visual information efficiently; that is, the HVS has the sparse representation ability. HVS tends to minimize the number of spikes needed to transmit a given signal, which can be modeled as an optimization problem. In his hypothesis, the brain uses an efficient coding system suitable for expressing the visual information of different scenes. Barlow’s model treats the sensory pathway as a communication channel, in which neuronal spikes are sensory signals, with the goal to maximize the channel capacity by reducing the redundancy in a representation. They thought that the goal of the HVS is to use a collection of independent events to explain natural images. To form an efficient representation of natural images, the HVS uses pre-processing operations to get off first- and second-order redundancy. In natural image statistics, first-order statistics gives the direct current (DC), which is average luminance, and the second order describes variance and covariance, i.e. the contrast of the image. The heuristics is that image recognition should not be changed by the average luminance and contrast scale. In mathematics, this pre-processing can be modeled as zero-phase component analysis (ZCA). Interestingly, it was found that the responses of ganglion and LGN cells are similar to features obtained with natural image statistics techniques such as ZCA.
Inspired by the mechanism of the HVS, researchers have worked to mimic the HVS by reducing the redundancy of images so as to represent them efficiently. In this context, machine learning techniques were used to obtain similar features as observed in the HVS. In figure 1.7, we explain the relationship between an artificial neural network (to be explained in chapter 3) and the HVS. Furthermore, in the HVS feature extraction and representation, high-order redundancy is also reduced. Specifically, the receptive field properties are accounted for with a strategy to sparsify the output activity in response to natural images. The ‘sparse coding’ concept was introduced to describe this phenomenon. Olshausen and Field, based on neurobiological observations, used a network to code image patches in an over-complete basis to capture image structures under sparse constraints. They found that the features have local, oriented, receptive fields, essentially the same as V1 receptive fields. That is to say, the HVS and natural image statistics are closely related, both of which are very relevant to prior information extraction.
Figure 1.7. The relationship between an artificial neural network and the HVS.
In the following sub-sections, we will introduce several HVS models, and describe how to learn features from natural images in the light of visual neurophysiological findings.