Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 50
3.6 Convolutional Neural Network (CoNN)
ОглавлениеNotations: In the following, we will use x ∈ ℝD to represent a column vector with D elements and a capital letter to denote a matrix X ∈ ℝH × W with H rows and W columns. The vector x can also be viewed as a matrix with 1 column and D rows. These concepts can be generalized to higher‐order matrices, that is, tensors. For example, x ∈ ℝH × W × D is an order‐3 (or third‐order) tensor. It contains HWD elements, each of which can be indexed by an index triplet (i, j, d), with 0 ≤ i < H, 0 ≤ j < W, and 0 ≤ d < D. Another way to view an order‐3 tensor is to treat it as containing D channels of matrices.
For example, a color image is an order‐3 tensor. An image with H rows and W columns is a tensor of size H × W × 3; if a color image is stored in the RGB format, it has three channels (for R, G and B, respectively), and each channel is an H × W matrix (second‐order tensor) that contains the R (or G, or B) values of all pixels.
It is beneficial to represent images (or other types of raw data) as a tensor. In early computer vision and pattern recognition, a color image (which is an order‐3 tensor) is often converted to the grayscale version (which is a matrix) because we know how to handle matrices much better than tensors. The color information is lost during this conversion. But color is very important in various image‐ or video‐based learning and recognition problems, and we do want to process color information in a principled way, for example, as in CoNNs.
Tensors are essential in CoNN. The input, intermediate representation, and parameters in a CoNN are all tensors. Tensors with order higher than 3 are also widely used in a CoNN. For example, we will soon see that the convolution kernels in a convolution layer of a CoNN form an order‐4 tensor.
Given a tensor, we can arrange all the numbers inside it into a long vector, following a prespecified order. For example, in MATLAB, the (:) operator converts a matrix into a column vector in the column‐first order as
(3.77)
We use the notation “vec” to represent this vectorization operator. That is, vec(A) = (1, 3, 2, 4)T in the example. In order to vectorize an order‐3 tensor, we could vectorize its first channel (which is a matrix, and we already know how to vectorize it), then the second channel, … , and so on, until all channels are vectorized. The vectorization of the order‐3 tensor is then the concatenation of the vectorizations of all the channels in this order. The vectorization of an order‐3 tensor is a recursive process that utilizes the vectorization of order‐2 tensors. This recursive process can be applied to vectorize an order‐4 (or even higher‐order) tensor in the same manner.
Vector calculus and the chain rule: The CoNN learning process depends on vector calculus and the chain rule. Suppose z is a scalar (i.e., z ∈ ℝ) and y ∈ ℝH is a vector. If z is a function of y, then the partial derivative of z with respect to y is defined as [∂z/∂y]i = ∂z/∂yi . In other words, ∂z/∂y is a vector having the same size as y, and its i‐th element is ∂z/∂yi . Also, note that ∂z/∂yT = (∂z/∂y)T.
Suppose now that x ∈ ℝW is another vector, and y is a function of x. Then, the partial derivative of y with respect to x is defined as [∂y/∂xT]ij = ∂yi/∂xj . This partial derivative is a H × W matrix whose entry at the intersection of the i‐th row and j‐th column is yi/∂xj .
It is easy to see that z is a function of x in a chain‐like argument: a function maps x to y, and another function maps y to z. The chain rule can be used to compute ∂z/∂xT as
(3.78)