Читать книгу Multi-Processor System-on-Chip 2 - Liliana Andrade - Страница 28

1.5. Implementation

Оглавление

Implementation of algorithms on wide vector processors, such as the vDSP that we used, introduces a series of considerations that need to be taken into account. Furthermore, the design implementation solution space increases even further by having algorithms with multiple loops. In the solution space, we need to make choices, for example, which loop should be vectorized or which loop order is notably impactful on the kernel’s overall performance and requirements. These considerations add yet another layer of complexity and deserve a chapter of their own. Here, we will separate most important elements in an abridged manner to reach the next set of HW requirements: how does the 6G candidate waveform kernel under corner workloads map onto SotA vDSPs? How much of the vDSP core cycle budget does it require?

Is it practical to run the kernel on the vDSP, provided that the current vDSP load is sufficiently low?

A simplified block diagram of the vDSP is illustrated in Figure 1.13 (Damjancevic et al. 2019). But before we dive deeper into the implementation, let us use Table 1.1 to identify for which set of GFDM kernel parameters we need to profile the vDSP. As mentioned in section 1.4, the subcarriers (RE/(Symbol · BW · layer · TTI)) use an IDFT bin that is slightly larger, i.e. rounded up to the nearest step of 2, which, in turn, sets the value for K, as in K = bin size. On the other hand, we choose the number of symbols per slot to be M, i.e. half of the TTI. Other values for M and K are possible as long as they overlap and fit in the TTI in both dimensions. An overview is presented in Table 1.2. Note that for the high-end use cases, the kernel needs to be executed several times to meet the required data rate. Furthermore, the deadline for all cases scales down by half, since MTTI/2 [OF DM Symbols].


Figure 1.13. vDSP simplified HW block diagram

Table 1.2. Kernel parameters for corner use cases

Use Case Throughput TTI K M Kernels req. Deadline
[μs] [#] [#] [#] [μs]
low-end LTE legacy 72 1000 128 7 1 500
high-end FR2 4 ×CA, µ = 3, 400MHz 3,168 125 4,096 7 4 62.5
MIMO high-end FR2 8 ×8, 4 ×CA, µ = 3,400MHz 3,168 125 4,096 7 32 62.5

Multi-Processor System-on-Chip 2

Подняться наверх