Читать книгу Machine Learning for Tomographic Imaging - Professor Ge Wang - Страница 63
A CNN example: LeNet-5
ОглавлениеLet us use the famous LeNet-5 network as an example to showcase the convolution neural network. Yan LeCun et al proposed the LeNet-5 model in 1998 as shown in figure 3.16 (Lecun et al 1998). This network is the first convolutional neural network with a classic result in the field. It is deep and very successful for handwritten character recognition. It is widely used by banks in the United States to identify handwritten digits on checks.
Figure 3.16. LeNet-5 network for digit recognition. Adapted from Lecun et al (1998). Reproduced with permission. Copyright IEEE 1998.
LeNet-5 has seven layers in total, each of which contains trainable parameters. Each layer produces multiple feature maps, and each feature map extracts features through convolution. The input data are a handwritten dataset, which is divided into a training set of 60 000 images in ten classes, and a testing set of 10 000 images in the same ten classes. The network outputs probabilities corresponding to the ten classes respectively, in the final layer, allowing it to predict a digit images class using the softmax function. More specifics on LeNet-5 are as follows.
1 Input layer: The input image is uniformly normalized to be 32 × 32 in size.
2 C1 layer: The first convolution layer operates upon the input image with six convolution filters 5 × 5 in size, producing six feature maps 28 × 28 in size.
3 S2 layer: Pooling with six 2 × 2 filters for down-sampling. The pooling layer is to sum the pixel values in each 2 × 2 moving window over the C1 layer. The S2 layer produces six feature maps of 14 × 14.
4 C3 layer: The C3 layer performs convolutions after the S2 pooling layer. The filter size is 5 × 5. In total, 16 feature maps of 10 × 10 are obtained by the C3 layer. Each feature map is a different combination of the feature maps from the S2 layer.
5 S4 layer: Similar to the S2 layer, S4 is a pooling layer, with a 2 × 2 pooling window to obtain 16 feature maps of 5 × 5.
6 C5 layer: The C5 layer is another convolutional layer. The filter size is 5 × 5. In total, 120 feature maps of 1 × 1 are produced by this layer.
7 F6 layer: The F6 layer is a fully connected layer consisting of 84 nodes. It represents a stylized image of the corresponding character class in a 7 × 12 bitmap.
8 Output layer: The output layer is also a fully connected layer, with ten nodes representing digits 0 to 9, respectively. The minimum output of a node indicates the positive identification result for that node, i.e. if the value of node i is the minimum among all the values of the output neurons, the recognition result for the digit of interest would be i. In any case, only a one-digit class will be assigned to the current image.
Specifically, all the aforementioned three features of a CNN can be found in the LeNet-5 network, which are the local connectivity, shared weight, and multiple feature maps. Since a convolution neural network is close to the real biological neural system in terms of information processing workflow, a CNN analyzes the structural information of digit images well.