Читать книгу Computational Analysis and Deep Learning for Medical Care - Группа авторов - Страница 20
1.2.6 ResNet
ОглавлениеUsually, the input feature map will be fed through a series of convolutional layer, a non-linear activation function (ReLU) and a pooling layer to provide the output for the next layer. The training is done by the back-propagation algorithm. The accuracy of the network can be improved by increasing depth. Once the network gets converged, its accuracy saturates. Further, if we add more layers, then the performance gets degraded rapidly, which, in turn, results in higher training error. To solve the problem of the vanishing/exploding gradient, ResNet with a residual learning framework [6] was proposed by allowing new layers to fit a residual mapping. When a model is converged than to fit the mapping, it is easy to push the residual to zero. The principle of ResNet is residual learning and identity mapping and skip connections. The idea behind the residual learning is that it feeds the input image to the next convolutional layer and adds them together and performs non-linear activation (ReLU) and pooling.
Table 1.6 Various parameters of GoogleNet.
Layer name | Input size | Filter size | Window size | # Filters | Stride | Depth | # 1 × 1 | # 3 × 3 reduce | # 3 × 3 | # 5 × 5 reduce | # 5 × 5 | Pool proj | Padding | Output size | Params | Ops |
Convolution | 224 × 224 | 7 × 7 | - | 64 | 2 | 1 | 2 | 112 × 112 × 64 | 2.7M | 34M | ||||||
Max pool | 112 × 112 | - | 3 × 3 | - | 2 | 0 | 0 | 56 × 56 × 64 | ||||||||
Convolution | 56 × 56 | 3 × 3 | - | 192 | 1 | 2 | 64 | 192 | 1 | 56 × 56 × 192 | 112K | 360M | ||||
Max pool | 56 × 56 | - | 3 × 3 | 192 | 2 | 0 | 0 | 28 × 28 × 192 | ||||||||
Inception (3a) | 28 × 28 | - | - | - | - | 2 | 64 | 96 | 128 | 16 | 32 | 32 | - | 28 × 28 × 256 | 159K | 128M |
Inception (3b) | 28 × 28 | - | - | - | - | 2 | 128 | 128 | 192 | 32 | 96 | 64 | - | 28 × 28 × 480 | 380K | 304M |
Max pool | 28 × 28 | - | 3 × 3 | 480 | 2 | 0 | 0 | 14 × 14 × 480 | ||||||||
Inception (4a) | 14 × 14 | - | - | - | - | 2 | 192 | 96 | 208 | 16 | 48 | 64 | - | 14 × 14 × 512 | 364K | 73M |
Inception (4b) | 14 × 14 | - | - | - | - | 2 | 160 | 112 | 224 | 24 | 64 | 64 | - | 14 × 14 × 512 | 437K | 88M |
Inception (4c) | 14 × 14 | - | - | - | - | 2 | 128 | 128 | 256 | 24 | 64 | 64 | - | 14 × 14 × 512 | 463K | 100M |
Inception (4d) | 14 × 14 | - | - | - | - | 2 | 112 | 144 | 288 | 32 | 64 | 64 | - | 14 × 14 × 528 | 580K | 119M |
Inception (4e) | 14 × 14 | - | - | - | - | 2 | 256 | 160 | 320 | 32 | 128 | 128 | - | 14 × 14 × 832 | 840K | 170M |
Max pool | 14 × 14 | - | 3 × 3 | - | 2 | 0 | 0 | 7 × 7 × 832 | ||||||||
Inception (5a) | 7 × 7 | - | - | - | - | 2 | 256 | 160 | 320 | 32 | 128 | 128 | - | 7 × 7 × 832 | 1,072K | 54M |
Inception (5b) | 7 × 7 | - | - | - | - | 2 | 384 | 192 | 384 | 48 | 128 | 128 | - | 7 × 7 × 1,024 | 1,388K | 71M |
Avg pool | 7 × 7 | - | 7 × 7 | - | - | 0 | 0 | 1 × 1 × 1,024 | ||||||||
Dropout (40 %) | - | - | - | 1,024 | - | 0 | - | 1 × 1 × 1,024 | ||||||||
Linear | - | - | - | 1,000 | - | 1 | - | 1 × 1 × 1,000 | 1,000K | 1M | ||||||
Softmax | - | - | - | 1,000 | - | 0 | - | 1 × 1 × 1,000 |
The architecture is a shortcut connection of VGGNet (consists of 3 × 3 filters) that is inserted to form a residual network as shown in figure. Figure 1.7(b) shows 34-layer network converted into the residual network and has lesser training error as compared to the 18-layer residual network. As in GoogLeNet, it utilizes a series of a global average pooling layer and the classification layer. ResNets were capable of learning a network with a maximum depth of 152. Compared to the GoogLeNet and VGGNet, accuracy is better and computationally efficient than VGGNet. ResNet-152 achieves 95.51 top-5 accuracies. Figure 1.7(a) shows a residual block, Figure 1.7(b) shows the architecture of ResNet and Table 1.7 shows the parameters of ResNet.