Читать книгу Computational Analysis and Deep Learning for Medical Care - Группа авторов - Страница 17
1.2.3 ZFNet
ОглавлениеThe architecture of ZFNet introduced by Zeiler [3] is same as that of the AlexNet, but convolutional layer uses reduced sized kernel 7 × 7 with stride 2. This reduction in the size will enable the network to obtain better hyper-parameters with less computational efficiency and helps to retain more features. The number of filters in the third, fourth and fifth convolutional layers are increased to 512, 1024, and 512. A new visualization technique, deconvolution (maps features to pixels), is used to analyze first and second layer’s feature map.
Table 1.3 AlexNet layer details.
Sl. no. | Layer | Kernel size | Stride | Activation shape | Weights | Bias | # Parameters | Activation | # Connections |
1 | Input Layer | - | - | (227,227,3) | 0 | 0 | - | relu | - |
2 | CONV1 | 11 × 11 | 4 | (55,55,96) | 34,848 | 96 | 34,944 | relu | 105,415,200 |
3 | POOL1 | 3 × 3 | 2 | (27,27,96) | 0 | 0 | 0 | relu | - |
4 | CONV2 | 5 × 5 | 1 | (27,27,256) | 614,400 | 256 | 614,656 | relu | 111,974,400 |
5 | POOL2 | 3 × 3 | 2 | (13,13,256) | 0 | 0 | 0 | relu | - |
6 | CONV3 | 3 × 3 | 1 | (13,13,384) | 884,736 | 384 | 885,120 | relu | 149,520,384 |
7 | CONV4 | 3 × 3 | 1 | (13,13,384) | 1,327,104 | 384 | 1,327,488 | relu | 112,140,288 |
8 | CONV5 | 3 × 3 | 1 | (13,13,256) | 884,736 | 256 | 884,992 | relu | 74,760,192 |
9 | POOL3 | 3 × 3 | 2 | (6,6,256) | 0 | 0 | 0 | relu | - |
10 | FC | - | - | 9,216 | 37,748,736 | 4,096 | 37,752,832 | relu | 37,748,736 |
11 | FC | - | - | 4,096 | 16,777,216 | 4,096 | 16,781,312 | relu | 16,777,216 |
12 | FC | - | - | 4,096 | 4,096,000 | 1,000 | 4,097,000 | relu | 4,096,000 |
OUTPUT | FC | - | - | 1,000 | - | - | 0 | softmax | - |
- | - | - | - | - | - | - | 62,378,344 (Total) | - | - |
Figure 1.3 Architecture of ZFNet.
ZFNet uses cross-entropy loss error function, ReLU activation function, and batch stochastic gradient descent. Training is done on 1.3 million images uses a GTX 580 GPU and it takes 12 days. The ZFNet architecture consists of five convolutional layers, followed by three max-pooling layers, and then by three fully connected layers, and a softmax layer as shown in Figure 1.3. Table 1.4 shows an input image 224 × 224 × 3 and it is processing at each layer and shows the filter size, window size, stride, and padding values across each layer. ImageNet top-5 error improved from 16.4% to 11.7%.