Читать книгу Efficient Processing of Deep Neural Networks - Vivienne Sze - Страница 7

Оглавление

Contents

Preface

Acknowledgments

PART I Understanding Deep Neural Networks

1 Introduction

1.1 Background on Deep Neural Networks

1.1.1 Artificial Intelligence and Deep Neural Networks

1.1.2 Neural Networks and Deep Neural Networks

1.2 Training versus Inference

1.3 Development History

1.4 Applications of DNNs

1.5 Embedded versus Cloud

2 Overview of Deep Neural Networks

2.1 Attributes of Connections Within a Layer

2.2 Attributes of Connections Between Layers

2.3 Popular Types of Layers in DNNs

2.3.1 CONV Layer (Convolutional)

2.3.2 FC Layer (Fully Connected)

2.3.3 Nonlinearity

2.3.4 Pooling and Unpooling

2.3.5 Normalization

2.3.6 Compound Layers

2.4 Convolutional Neural Networks (CNNs)

2.4.1 Popular CNN Models

2.5 Other DNNs

2.6 DNN Development Resources

2.6.1 Frameworks

2.6.2 Models

2.6.4 Datasets for Other Tasks

2.6.5 Summary

PART II Design of Hardware for Processing DNNs

3 Key Metrics and Design Objectives

3.1 Accuracy

3.2 Throughput and Latency

3.3 Energy Efficiency and Power Consumption

3.4 Hardware Cost

3.5 Flexibility

3.6 Scalability

3.7 Interplay Between Different Metrics

4 Kernel Computation

4.1 Matrix Multiplication with Toeplitz

4.2 Tiling for Optimizing Performance

4.3 Computation Transform Optimizations

4.3.1 Gauss’ Complex Multiplication Transform

4.3.2 Strassen’s Matrix Multiplication Transform

4.3.3 Winograd Transform

4.3.4 Fast Fourier Transform

4.3.5 Selecting a Transform

4.4 Summary

5 Designing DNN Accelerators

5.1 Evaluation Metrics and Design Objectives

5.2 Key Properties of DNN to Leverage

5.3 DNN Hardware Design Considerations

5.4 Architectural Techniques for Exploiting Data Reuse

5.4.1 Temporal Reuse

5.4.2 Spatial Reuse

5.5 Techniques to Reduce Reuse Distance

5.6 Dataflows and Loop Nests

5.7 Dataflow Taxonomy

5.7.1 Weight Stationary (WS)

5.7.2 Output Stationary (OS)

5.7.3 Input Stationary (IS)

5.7.4 Row Stationary (RS)

5.7.5 Other Dataflows

5.7.6 Dataflows for Cross-Layer Processing

5.8 DNN Accelerator Buffer Management Strategies

5.8.1 Implicit versus Explicit Orchestration

5.8.2 Coupled versus Decoupled Orchestration

5.8.3 Explicit Decoupled Data Orchestration (EDDO)

5.9 Flexible NoC Design for DNN Accelerators

5.9.1 Flexible Hierarchical Mesh Network

5.10 Summary

6 Operation Mapping on Specialized Hardware

6.1 Mapping and Loop Nests

6.2 Mappers and Compilers

6.3 Mapper Organization

6.3.1 Map Spaces and Iteration Spaces

6.3.2 Mapper Search

6.3.3 Mapper Models and Configuration Generation

6.4 Analysis Framework for Energy Efficiency

6.4.1 Input Data Access Energy Cost

6.4.2 Partial Sum Accumulation Energy Cost

6.4.3 Obtaining the Reuse Parameters

6.5 Eyexam: Framework for Evaluating Performance

6.5.1 Simple 1-D Convolution Example

6.5.2 Apply Performance Analysis Framework to 1-D Example

6.6 Tools for Map Space Exploration

PART III Co-Design of DNN Hardware and Algorithms

7 Reducing Precision

7.1 Benefits of Reduce Precision

7.2 Determining the Bit Width

7.2.1 Quantization

7.2.2 Standard Components of the Bit Width

7.3 Mixed Precision: Different Precision for Different Data Types

7.4 Varying Precision: Change Precision for Different Parts of the DNN

7.5 Binary Nets

7.6 Interplay Between Precision and Other Design Choices

7.7 Summary of Design Considerations for Reducing Precision

8 Exploiting Sparsity

8.1 Sources of Sparsity

8.1.1 Activation Sparsity

8.1.2 Weight Sparsity

8.2 Compression

8.2.1 Tensor Terminology

8.2.2 Classification of Tensor Representations

8.2.3 Representation of Payloads

8.2.4 Representation Optimizations

8.2.5 Tensor Representation Notation

8.3 Sparse Dataflow

8.3.1 Exploiting Sparse Weights

8.3.2 Exploiting Sparse Activations

8.3.3 Exploiting Sparse Weights and Activations

8.3.4 Exploiting Sparsity in FC Layers

8.3.5 Summary of Sparse Dataflows

8.4 Summary

9 Designing Efficient DNN Models

9.1 Manual Network Design

9.1.1 Improving Efficiency of CONV Layers

9.1.2 Improving Efficiency of FC Layers

9.1.3 Improving Efficiency of Network Architecture After Training

9.2 Neural Architecture Search

9.2.1 Shrinking the Search Space

9.2.2 Improving the Optimization Algorithm

9.2.3 Accelerating the Performance Evaluation

9.2.4 Example of Neural Architecture Search

9.3 Knowledge Distillation

9.4 Design Considerations for Efficient DNN Models

10 Advanced Technologies

10.1 Processing Near Memory

10.1.1 Embedded High-Density Memories

10.1.2 Stacked Memory (3-D Memory)

10.2 Processing in Memory

10.2.1 Non-Volatile Memories (NVM)

10.2.2 Static Random Access Memories (SRAM)

10.2.3 Dynamic Random Access Memories (DRAM)

10.2.4 Design Challenges

10.3 Processing in Sensor

10.4 Processing in the Optical Domain

11 Conclusion

Bibliography

Authors’ Biographies

Efficient Processing of Deep Neural Networks

Подняться наверх