Читать книгу Computational Analysis and Deep Learning for Medical Care - Группа авторов - Страница 16

1.2.2 AlexNet

Оглавление

Alex Krizhevsky et al. [2] presented a new architecture “AlexNet” to train the ImageNet dataset, which consists of 1.2 million high-resolution images, into 1,000 different classes. In the original implementation, layers are divided into two and to train them on separate GPUs (GTX 580 3GB GPUs) takes around 5–6 days. The network contains five convolutional layers, maximum pooling layers and it is followed by three fully connected layers, and finally a 1,000-way softmax classifier. The network uses ReLU activation function, data augmentation, dropout and smart optimizer layers, local response normalization, and overlapping pooling. The AlexNet has 60M parameters. Figure 1.2 shows the architecture of AlexNet and Table 1.3 shows the various parameters of AlexNet.

Table 1.2 Every column indicates which feature map in S2 are combined by the units in a particular feature map of C3 [1].

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 X X X X X X X X X X
1 X X X X X X X X X X
2 X X X X X X X X X X
3 X X X X X X X X X X
4 X X X X X X X X X X
5 X X X X X X X X X X

Figure 1.2 Architecture of AlexNet.

First Layer: AlexNet accepts a 227 × 227 × 3 RGB image as input which is fed to the first convolutional layer with 96 kernels (feature maps or filters) of size 11 × 11 × 3 and a stride of 4 and the dimension of the output image is changed to 96 images of size 55 × 55. The next layer is max-pooling layer or sub-sampling layer which uses a window size of 3 × 3 and a stride of two and produces an output image of size 27 × 27 × 96.

Second Layer: The second convolutional layer filters the 27 × 27 × 96 image with 256 kernels of size 5 × 5 and a stride of 1 pixel. Then, it is followed by max-pooling layer with filter size 3 × 3 and a stride of 2 and the output image is changed to 256 images of size 13 × 13.

Third, Fourth, and Fifth Layers: The third, fourth, and fifth convolutional layers uses filter size of 3 × 3 and a stride of one. The third and fourth convolutional layer has 384 feature maps, and fifth layer uses 256 filters. These layers are followed by a maximum pooling layer with filter size 3 × 3, a stride of 2 and have 256 feature maps.

Sixth Layer: The 6 × 6 × 256 image is flattened as a fully connected layer with 9,216 neurons (feature maps) of size 1 × 1.

Seventh and Eighth Layers: The seventh and eighth layers are fully connected layers with 4,096 neurons.

Output Layer: The activation used in the output layer is softmax and consists of 1,000 classes.

Computational Analysis and Deep Learning for Medical Care

Подняться наверх