Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 47
3.1.4 Discrete convolution and weights
ОглавлениеIt is well known that a convolution is a linear operation, which is of great importance in mathematics. A discrete convolution is a weighted summation of components of a vector/matrix/tensor. In the signal processing field, the convolution is used to recognize a local pattern in an image by extracting local features and integrating them properly. There are often local correlations in images, and the convolution is to find a local linear correlation. It will become clear below that the multi-layer convolution network is a powerful multi-resolution analysis, being consistent with the inner-working of the HVS. The three most common types of convolution operations for signal processing are full convolution, same convolution, and valid convolution. Without loss of generality, in the 1D case let us assume that an input signal x∈Rn is a one-dimensional vector, and the filter w∈Rm is another one-dimensional vector, the convolution algorithm can be categorized into: 1. Full convolution
y=convx,w,“full”=(y(1),…,y(t),…,y(n+m−1))∈Rn+m−1y(t)=∑i=1mx(t−i+1)·w(i),t=1,2,…,n+m−1,(3.8)
where zero padding is applied as needed.
2. Same convolution
y=convx,w,“same”=centerconvx,w,“full”,n∈Rn.(3.9)
The result of the same convolution is the central part of the full convolution, which is of the same size as the input vector x.
3. Valid convolution
y=convx,w,“valid”=(y(1),…,y(t),…,y(n−m−1))∈Rn−m+1y(t)=∑i=1mx(t+i−1)·w(i),t=1,2,…,n−m+1,(3.10)
where n>m. In contrast to the full and same convolutions, no zero padding is involved in the valid convolution.
The ideas behind the one-dimensional convolutions can be extended to the 2D case. Assuming that a two-dimensional input image is X∈Rn×m and the two-dimensional filter is W∈Rs×k. Then, the discrete two-dimensional convolution operation can be represented as follows:
Y(p,t)=(X*W)(p,t)=∑i∑jX(i,j)·W(p−i,t−j),(3.11)
where * represents convolution and · represents multiplication. Likewise, the convolution operations (full, same, and valid) can be defined in higher dimensional cases.
In contrast to the convolution formulas given above, cross-correlation functions can be defined in nearly the same way as the convolution functions:
Y(p,t)=(X*W)(p,t)=∑i∑jX(p+i,t+j)·W(i,j).(3.12)
The difference between cross-correlation and convolution is whether the filter W is flipped or not. It is not common in the machine learning field to use convolution exactly, but instead we often process an image with a cross-correlation operation; that is, we do not flip the filter W. Without flipping W, we also call the operation convolution (rigorously, cross-correlation).
Figure 3.10 illustrates an example of a convolution operation (without flipping) on a 2D image.
Figure 3.10. Example of a 2D convolution operation (weight without flipping) on a 2D input image.
In the neural network, a convolution operation is specified with two accessory parameters, namely, stride and zero padding. Stride refers the step increment with which the filter window jumps from its current position to the next position. For example, in figure 3.10 the initial position of the window is at the first pixel, and then the second position is at the second pixel, thus stride = 2 − 1 = 1. Zero padding refers to the number of zeros appended to the original data along a dimensional direction. Generally speaking, when a valid convolution operation is combined with stride and zero padding, the output size is calculated as follows (without loss of generality, in the 2D case):
Y=X∗W∈Ru×vu=n−s+2·zeropaddingstride+1v=m−k+2·zeropaddingstride+1,(3.13)
where ‘⌊⌋’ represents a downward rounding.
In the early neural networks, the connection between layers is in a fully connected form; that is, each neuron is connected to all neurons in the previous layer, needing a large number of parameters. Improving upon the fully connected network, convolutional neural networks rely on convolutions, greatly reducing the number of parameters. The core of the convolution operation is that it reduces unnecessary weighting links, only keeps local connections, and shares weights across the field of view. Since the convolution operation is shift-invariant, the learned features tend to be robust without overfitting.
Actually, the convolution is an operation of feature extraction in the premise of specific weights, such as the redundancy-removed ZCA and ICA features presented in the previous chapters. Not limited to the low-level feature space, higher level features can also be obtained in this way for representing the image information semantically.