Читать книгу Multi-Processor System-on-Chip 1 - Liliana Andrade - Страница 34
2.3.2. Compute cluster
ОглавлениеThe compute unit of the MPPA3 processor, called the compute cluster, is structured around a local interconnect (Figure 2.9); it comprises a secure zone and a non-secure zone. The secure zone contains a security and safety management core (RM), a 256 KB dedicated memory bank and a cryptographic accelerator. The RM core of each compute cluster is also connected to the processor CSM and the internal boot ROM
through a private bus. The purpose of the secure zone is to host a trusted execution environment (TEE), and a run-time system that performs on-demand code decryption for the applications that require it (Table 2.1). The non-secure zone contains 16 application cores (PE) that share a 16-banked local memory (SMEM), a DMA engine, a debug support unit (DSU), a cryptographic accelerator and cluster-local peripheral control registers. The SMEM can be configured to split its 4 MB capacity between scratch-pad memory (SPM) and level-2 cache (L2$). The non-secure zone of the MPPA3 compute cluster supports two types of execution environments:
– A symmetric multi-processing (SMP) environment, exposed through the standard POSIX multi-threading (supporting the OpenMP run-time of C/C++ compilers) and file system APIs. In this environment targeting high-performance computing under soft real-time constraints, all the core L1 data caches are kept coherent, and the local memory is interleaved across the banks at 64-byte granularity for the SPM and at 256-byte granularity for the L2$.
– An asymmetric multi-processing (AMP) environment, seen as a collection of 16 cores where each executes under an RTOS and is associated through linker maps with one particular bank (256KB) of the local memory. In this environment targeting high-integrity computing under hard real-time constraints, L1 cache coherence is disabled, and the local memory is configured as scratch-pad memory only.
By default, the L2 caches of the compute clusters are not kept coherent, although this can be enabled by running cache controller firmware in the RM cores and maintaining a distributed directory in the cluster SPMs. The third level of the memory hierarchy is composed of the external DDR memory (2x DDR4/LPDDR4 64-bit channels), and of the SPM of other compute clusters. The DDR memory channels can be interleaved or separated in machine address space, in the latter case operating independently. The standard C11 atomic operations are available in all memory spaces.
Figure 2.9. Local interconnects of the MPPA3 processor