Читать книгу Multi-Processor System-on-Chip 1 - Liliana Andrade - Страница 20
1.3.1.1. Neural network processing
ОглавлениеFor each layer in a neural network, the input data must be transformed into output data. An often used transformation is the convolution, which convolves, or, more precisely, correlates, the input data with a set of trained weights. This transformation is used in convolutional neural networks (CNNs), which are often applied in image or video recognition.
Figure 1.3 shows a 2D convolution, which performs a dot-product operation using the weights of a 2D weight kernel and a selected 2D region of the input data with the same width and height as the weight kernel. The dot product yields a value (M23) in the output map. In this example, no padding is applied on the borders of the input data, hence the coordinate (2, 3) for the output value. For computing the full output map, the weight kernel is “moved” over the input map and dot-product operations are performed for the selected 2D regions, producing an output value with each dot product. For example, M24 can be calculated by moving one step to the right and performing a dot product for the region with input samples A24–A26, A34–A36 and A44–A46.
Figure 1.3. 2D convolution applying a weight kernel to input data to calculate a value in the output map
Input and output maps are often three-dimensional. That is, they have a width, a height and a depth, with different planes in the depth dimension typically referred to as channels. For input maps with a depth > 1, an output value can be calculated using a dot-product operation on input data from multiple input channels. For output maps with a depth > 1, a convolution must be performed for each output channel, using different weight kernels for different output channels. Depthwise convolution is a special convolution layer type for which the number of input and output channels is the same, with each output channel being calculated from the one input channel with the same depth value as the output channel. Yet another layer type is the fully connected layer, which performs a dot-product operation for each output value using the same number of weights as the number of samples in the input map.
The key operation in the layer types described above is the dot-product operation on input samples and weights. It is therefore a requirement for a processor to implement such dot-product operations efficiently. This involves efficient computation, for example, using MAC instructions, as well as efficient access to input data, weight kernels and output data.
CNNs are feed-forward neural networks. When a layer processes an input map, it maintains no state that impacts the processing of the next input map. Recurrent neural networks (RNNs) are a different kind of neural network that maintain the state while processing sequences of inputs. As a result, RNNs also have the ability to recognize patterns across time, and are often applied in text and speech recognition applications.
There are many different types of RNN cells from which a network can be built. In its basic form, an RNN cell calculates an output as shown in equation [1.1]:
where xt is the frame t in the input sequence, ht is the output for xt, Wx and Wh are weight sets, b is a bias, and f() is an output activation function. Thus, the calculation of an output involves a dot product of one set of weights with new input data and another dot product of another set of weights with the previous output data. Therefore, also for RNNs, the dot product is a key operation that must be implemented efficiently. The long short-term memory (LSTM) cell is another well-known RNN cell. The LSTM cell has a more complicated structure than the basic RNN cell that we discussed above, but the dot product is again a dominant operation.
Activation functions are used in neural networks to transform data by performing some nonlinear mapping. Examples are rectified linear units (ReLU), sigmoid and hyperbolic tangent (TanH). The activation functions operate on a single data value and produce a single result. Hence, for an activation layer, the size of the output map is equal to the size of the input map.
Figure 1.4. Example pooling operations: max pooling and average pooling
Neural networks may also have pooling layers that transform an input map into a smaller output map by calculating single output values for (small) regions of the input data. Figure 1.4 shows two examples: max pooling and average pooling. Effectively, the pooling layers downsample the data in the width and height dimensions. The depth of the output map is the same as the depth of the input map.