

### RISC-V があらゆる場所にある: なぜそうなるのか? そして どの ような過程でそうなったのか?

SC

Florian Wohlrab Head of Sales EMEA, Russia & Japan Andes Technology 2021/Apr/22

florian@andestech.com

## What Age Are We In

#### **Agile Hardware Development**

Accelerate chip development and commercial adoption

DSA

A New Golden Age for **Computer Architecture** 

#### **Improved Security**

Prevent side channel attack for





NOTWICE & RECEIPTION John Hennessy and David Patterson Receive 2017 ACM A.M. Turing Award re

**Open ISA** Industry-standard open ISA

Source: https://www.acm.org/hennessy-patterson-turing-lecture

ANDES



#### **Domain-Specific** Architectures

Tailored to a specific problem domain and offer significant performance/efficiency gains

#### **Taking RISC-V® Mainstream**

# Why is RISC-V everywhere and How

- New CPU architectures need time to be adopted, TapeOut and reviewed
  - RISC-V around many years and proven in commercial designs (already mature)
  - Taped Out and Reviewed by many Companies
- Customers have different use cases
  - From low power to very high power
  - From single Core to Many Cores (from 1 to 1000)
- RISC-V allow for greater flexibility
  - Custom instructions allow for great flexibility and true innovations
- Eco System is needed (now ready)
  - It takes time for an ECO System to start and have all the different projects available
    - For RISC-V the Eco System is now very big and great

# Why: Examples of AndesCore<sup>™</sup> in SoC

### Renesas: ASSP MCU with configurable V5 cores

- Scalable/configurable performance
- Selectable safety features
- Customization optionsFeature-rich AndeSight IDE



IAR

### Telink: IoT and Wireless Audio with D25F embedded

- Strong integer/DSP performance
- Efficient small data processing
- Good development tools

### Picocom: 5G Open RAN small cells



### AI Accelerators for Servers with >10 NX27V Cores

RV-Vectore with 512-bit VLEN/SIMD

- Custom instructions
- LLVM compiler





## **Why: Solutions for Data Path Acceleration**

- RVV extension (Vector)
  - Scalable vector registers
  - For high data rate computations
- Andes Custom Extension<sup>™</sup> (ACE)

- RVP extension (DSP/SIMD)
  - Integer/fixed-point on already existing GPR
  - For audio/voice, small image, slow video



#### ISS:

- AndeSim near-cycle accurate simulatorImperas fast simulator
  - imperas





# **Andes Technology Corporation**

#### Who We Are





## **AndesCore™ RISC-V Processor Lineup**



# A27L2/AX27L2 Overview

- A27/AX27 + L2\$ controller
- AndeStar<sup>™</sup> V5 base for "A" cores
  - RV\*GCN + P
  - MMU support
  - Andes V5 extensions
- 5-stage single-issue cores
- Programmable PMP/PMA
- MemBoost for L1 caches
  - Skip unnecessary writes to dcache
  - Multiple outstanding data accesses
  - I/D cache prefetch





# A27L2/AX27L2: L2\$ Controller

### Features:

- Size up to 2MB with 64B lines
- 16-way, pseudo-random replacement
- 2 tag&data banks with bank interleaving
  - Programmable SRAM latencies (setup & delay)
- Prefetching based on access types (I or D)
- 128-bit AXI master/slave ports through BIU
- Optional ECC error protection

### Performance with 512KB L2 cache:

- → Comparing AX27L2 and AX27
- Memory bandwidth: 2.1x
- Memory latency: 30%
- Specint2k: 1.9x





## **45-Series: Features**

- AndeStar<sup>™</sup> V5 architecture:
  - Base: RV\*GCN + Andes V5 extensions
  - N45/NX45: base
  - D45: base + P
  - A45/AX45: base + P + MMU
  - **A45MP/AX45MP**: base + P + MMU
- 8-stage in-order dual-issue
  - Independent pairs with 1 or 2 ALU insns
  - Most dependent pairs with 2 ALU insns
  - Late ALU for 0-cycle load-use penalty
- Unaligned data accesses
- Low power dynamic branch prediction
- MemBoost memory subsystem





## **45-Series: Features**

### Virtual memory support:

- MMU and S-mode
- All page sizes and virtual memory mappings (SV32/39/48)
- Shared TLB: 32-512 entries

### Physical memory support:

• Up to 16-entry PMP and PMA

### L1 I/D Caches:

- Size up to 64KB, 64B lines, up to 4-way
- Cache lock support
- Optional Parity or ECC error protection
- I/D Local Memory (ILM/DLM)
  - 4KB up to 16MB
  - Optional ECC error protection





## A(X)45MP: Cache-Coherent Multicore



### Cache coherence scheme

- Directory-based for scalability
- MESI coherence protocol

### 45MP Coherence Manager

- Support 1~4 A45/AX45
- IO coherence for cacheless masters
- L2\$ Controller (optional)
   Similar to that of A\*27L2

### Bus Interfaces

- Memory and MMIO ports
- LM slave ports (one per core)
- Coherence slave port
- PLIC for global interrupt handling
- Debug/trace support
   Linux SMP ready



## **45-Series: Performance**

### Total compute performance (at 28nm):

| Coremark® | <b>45-series</b><br>(1.2 GHz) | <b>27-series</b><br>(1.1 GHz) | <b>Speedup</b><br>(Per-MHz) | <b>Speedup</b><br>(Total Perf.) |
|-----------|-------------------------------|-------------------------------|-----------------------------|---------------------------------|
| RV32      | 5.66                          | 3.58                          | 1.58                        | 1.72                            |
| RV64      | 5.50                          | 3.53                          | 1.56                        | 1.70                            |

- 70% higher than the 27-series
- With less than 50% increase in logic area and power
- Memory bandwidth (C copy): 45-series is 35% higher than 27-series
- Running up to 2.4 GHz at 12nm



# **NX27V: Overview**

- AndeStar V5 architecture:
  - RV64GCN+ Andes V5 extensions
  - Vector ext. (RVV) 1.0: latest spec
- An efficient 5-stage scalar unit
  - Optional branch prediction
  - FP16 instructions
- I/D caches
  - Caches: 8KB to 64KB
  - HW unaligned load/store accesses
  - Optional parity or ECC protection
  - I\$/D\$ prefetch
  - Multiple outstanding data accesses
    - Cached and uncached





#### **Taking RISC-V® Mainstream**

## 16

# **NX27V: Overview**

### RVV data formats:

- Standard: int8~int64, fp16~fp64
- Andes-extended: bfloat16 and int4
- A powerful Vector Unit (VPU):
  - RVV starts execution after retired
  - Multiple Functional Units
    - Operating in parallel and out of order
    - Chainable, and most fully pipelined
  - VLEN & SIMD width: 128, 256, 512
- Independent memory access paths:
  - RVV load/store thru dcache and system bus
  - ACE load/store thru Streaming Port





# **NX27V: ACE Streaming Port**



insn svload { operand= {out vr data, io addrCtl addr, imm2 mode,

```
};
csr_op= {vl};
streaming_port= load;
csim= . . .
```

### A usage example

- HW engine: application-specific DMA and structured computations (e.g. CNN)
- ACE instructions: control HW engine, and load/store data to/from VRF

### Advantages:

};

- HW engine is tightly-coupled
- Data accesses are more efficient (such as address auto-increment and wrap-around)



# **Scalable Acceleration Architecture**



Multi-cluster architecture

Separate control from acceleration to optimize them independently
 Programming support: OpenCL

- Popular for heterogeneous multicore architecture with host and devices
- Support RVV intrinsic programming in addition to auto-vectorization



# **AndeSentry™ Security Framework**

- An open framework for a wide spectrum of threat mitigations
  - From cyber attacks to physical attacks
  - Flexible, scalable, and trustable
  - Solutions from Andes and partners
- Scope:
  - TEE, crypto acceleration, protection against cyber attacks, countermeasures for physical attacks
  - Hardware and software





# **Andes Wrap up**



### Lineup of Andes V5 RISC-V Processors

- From small power saving MCU up to powerful Vector units
- From single Core to MultiProcessor with L1 Cache/IO coherence and L2 Cache

### Andes V5 Product Introduction

- A27L2 and AX27L2
- A45MP and AX45MP
- NX27V Vector Processor release in RVV Version 1.0
- AndeSentry<sup>™</sup> Security Framework

### Andes Technology : your Trusted Partner!

- Over 7 Billion Cores shipped
- 16 year old, Public Company



Taking RISC-V<sup>®</sup> Mainstream

#### florian@andestech.com

# Thank you





ありがとうございました

florian@andestech.com