Читать книгу Multi-Processor System-on-Chip 1 - Liliana Andrade - Страница 21

1.3.1.2. Implementation requirements

To obtain sufficiently accurate results when implementing machine learning inferences, appropriate data types must be used. During the training phase, data and weights are typically represented as 32-bit floating-point data. The common practice for deploying models for inference in embedded systems is to work with integer or fixed-point representations of quantized data and weights (Jacob et al. 2017). The potential impact of quantization errors can be taken into account in the training phase to avoid a notable effect on the model performance. Elements of input maps, output maps and weight kernels can typically be represented using 16-bit or smaller data types. For example, in a voice application, data samples are typically represented by 16-bit data types. In image or video applications, a 12-bit or smaller data type is often sufficient for representing the input samples. Precision requirements can differ per layer of the neural network. Special attention should be paid to the data types of the accumulators that are used to sum up the partial results when performing dot-product operations. These accumulators should be wide enough to avoid overflow of the (intermediate) results of the dot-product operations performed on the (quantized) weights and input samples.

Memory requirements for low/mid-end machine learning inference are typically modest, thanks to limited input data rates and the use of neural networks with limited complexity. Input and output maps are often of limited size, i.e. a few tens of kBs or less, and the number and size of the weight kernels are also relatively small. The use of the smallest possible data types for input maps, output maps and weight kernels helps us to reduce memory requirements.

In summary, low/mid-end machine learning inference applications require the following types of processing:

– various types of pre-processing and feature extraction, often with DSP-intensive computations;

– neural network processing, with the dot-product operation as a dominant computation and regular access patterns on multidimensional data. Additional requirements come from the use of scalar activation functions and pooling operations working on 2D data;

– decision-making, which is performed after the neural network processing, is more control-oriented.

The different types of processing may be implemented using a heterogeneous multi-processor architecture, with different types of processors to satisfy the different processing requirements. However, for low/mid-end machine learning inference, the total compute requirements are often limited and can be handled by a single processor running at a reasonable frequency, provided it has the right capabilities. As we discussed above, the use of a single processor eliminates the area and communication overhead associated with multi-processor architectures. It also simplifies software development, as a single tool chain can be used for the complete application. However, it requires that the processor performs DSP, neural network processing and control processing with excellent cycle efficiency. In the next section, we will discuss the capabilities of a programmable processor that enables such cycle efficiency in more detail.

For many IoT edge devices, low cost is a key requirement. Therefore, making IoT edge devices smarter by adding machine learning inference must be cost-effective. The main contributor to cost is silicon area, in particular, for high-volume products, so it is important that the processor implementing the machine learning inference minimizes the logic area and uses small memories. In addition, small code size is key to limiting the area of the instruction memory.

Many IoT edge devices are battery-operated and have a tight power budget. This demands a power-efficient processor, measured in uW/MHz, as well as an excellent cycle efficiency so that the processor can be run at a low frequency. Low power consumption is particularly important for IoT edge devices that perform always-on functions such as:

– smart speakers, smartphones, etc. with always-on voice command functions that are “always listening”;

– camera-based devices, performing, for example, face detection or gesture recognition that are “always watching”;

– health and fitness monitoring devices that are “always sensing”.

Such devices typically apply smart techniques to reduce power consumption. For example, an “always listening” device may sample the microphone signal and use simple voice detection techniques to check whether anyone is speaking at all. It then applies the more compute-intensive machine learning inference for recognizing voice commands only when voice activity is detected. A processor must limit power consumption in each of these different states, i.e. voice detection and voice command recognition. For this purpose, it must offer various power management features, including effective sleep modes and power-down modes.

Подняться наверх