# **Optimizing Efficiency in Extended SIMD RISC-V Based Architectures** through Minimization of Idle Computational Cores

OMasaru Nishimura, Yuxi Tan, Yoshiki Yamaguchi Graduate School of Science and Technology, University of Tsukuba

2023 RISC-V Days Tokyo @UTokyo, 6/20



# 1. Motivation & Target

- Introduce DIMD (Dual Instruction Multiple-Data), Torus/Mesh Interconnect, and HBM (High Bandwidth Memory) and Memory Hierarchy.
- SIMD processors often suffer from Idle time caused by data transfer & limited number of instructions at same time. How can we deal with them ?

## 2. Architecture Overview

# **DIMD** (Dual Instruction Multiple-Data)

Issue 2 instructions with mask code to all PE at same cycle. Then each PE determine which instruction should be executed.



#### Format of mask code

Mask code consists of **serial numbers** and **numbering scheme**.

| 18 | 11                 | 10           | 3        | 2                | 0        |
|----|--------------------|--------------|----------|------------------|----------|
| in | st-1 serial number | inst-0 seria | l number | numbering scheme | <b>;</b> |

| 000: row-based 001:column-based |   |   |   |   |   |   |   |   |   | ed | 010: center 011: diagonal |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |
|---------------------------------|---|---|---|---|---|---|---|---|---|----|---------------------------|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|                                 |   |   |   |   |   |   |   | 1 |   |    |                           |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | 7 |
|                                 |   |   |   |   |   |   |   |   |   |    |                           |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   |   | 6 |
| 2                               | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 0 | 1 | 2  | 3                         | 4 | 5 | 6 | 7 | 0 | 1 | 2 | 2 | 2 | 2 | 1 | 0 | 2 | 1 | 0 | 1 | 2 | 3 | 4 | 5 |

#### **Benefits**

Reduce idle time caused by limited number of instructions at SIMD. Effective for stencil computation, FFT, etc.

3. Architecture of the Processing Element





#### **Example of Operation**

01100000 00011111 100



#### **Torus/Mesh Interconnect**



### **HBM and Memory Hierarchy**



Adopt **RISC-V** as PE's ISA. We've fully designed processor dedicated to PE operations.









# "Butterfly" at FFT on SIMD

#### "Butterfly" at FFT on DIMD



| + - | + - |
|-----|-----|
| + - | + - |
| + - | + - |
|     |     |