

# AI: scale from Edge to Server with RISC-V and Linux

#### JUNE 8TH 2020, 6PM ISRAEL TIME

# **3rd RISC-V Israel Virtual Meetup**

### Florian Wohlrab RISC-V Ambassador & Sales Manager

8. June 2020

Driving Innovations<sup>™</sup>

### Agenda

- Who is Andes Technology?
- Where is RISC-V used, sample applications
- V-Series: Vector for everyone
- Closer look on our RISC-V Cores







ANDES

### At A Glance



**Taking RISC-V® Mainstream** 



ANDES

### **Active Roles in RISC-V International**



# RISC-V Market adoption and usage examples



Taking RISC-V<sup>®</sup> Mainstream

### **RISC-V** Adoption: Applications

Taking RISC-V<sup>®</sup> Mainstream

**General-Purpose** 

1 ++ | -

DSP



-confidential-

ANDES

Performance-driver

6

Big

Vector

datapath,

### **Example of Edge Computing – Vision Processing**



ANDES

### Andes RISC-V on Audio product

- D25F on LE Audio(BLE 5.2) for True Wireless Earbuds and Hearing Aids
  - Customer Tape out already!

This product will be the one of the first SoCs supporting both of LE audio and Classic Audio

Bluetooth Baseband and Audio Subsystem

LC3 codec - high quality, low power LE Audio/BLE 5.2 – Multi-Stream, Broadcast Audio



ANDES

Application CPU

Power Efficient – Excellent PPA Mature Ecosystem – Support various RTOSs RISC-V DSP Extension – outstanding performance DSP library DSP/AI Accelerator

DSP Algorithm for Noise and Echo Cancelation Machine Learning for Key Word Spotting







### **Example of Cloud Computing - Datacenter**





ANDES

# RISC-V Added Value and contributed extensions



Taking RISC-V<sup>®</sup> Mainstream

### Andes Added Value in RISC-V

### Andes extensions to RISC-V

- Baseline ISA extension to speed up memory access and branches
- CoDense to reduce code size (12% better measured by GCC )
- PowerBrake to save power by stalling pipeline
- StackSafe HW stack protection
- vPLIC vectored dispatch and preemption(reduce 57% of latency)

- Powerful features to differentiate your products
- Create competitive edge for your systems





### **RISC-V DSP Extension (Packed SIMD/DSP)**

- Andes contributed market-proven DSP(SIMD) as P-Extension
- Designed to accelerate slow video, audio/voice and low data rate DSP workloads



Increase power efficiency to your DSP applications



### **Andes Custom Extension**



•

- ACE unlocks RISC-V's Potential of DSA
  - Define ACE instructions to handle time critical codes
  - Another approach to co-processor or accelerators

- All-in-one **COPILOT** development environment
  - Automation tool and ease of use
  - Extensions are easy to re-use, can be used as a library







### AndesCore<sup>™</sup> RISC-V Families



# V-Series Cray style, scalable vector processor



Taking RISC-V<sup>®</sup> Mainstream

### Why Andes Vector Processor?

Open ISA and ecosystem creates the collaborative RISC-V community



#### Extensibility

Andes Vector supports bfloat16 and INT4 data types for AI training and inference and Andes Custom Extension

#### Scalability

Open

Scalable Vector Register to support implementations from MCU to supercomputer



#### 57x faster performance in parallel computing and realtime processing

#### Efficient

Vector processing reduces instruction issue bandwidth and starts dependent instruction sooner





#### Visualization

CPU pipeline visualization tool for performance optimization and stall bubble analysis



Taking RISC-V<sup>®</sup> Mainstream

### NX27V One Vector for All Implementations

Configurable compute data width (VLEN)

ANDES



### First RISC-V Vector Engine Shipped in Industry!



RVV v0.8 support\* Al optimized with BFloat16 & INT4 1GHz, 0.3mm<sup>2</sup> in TSMC 7nm FF+ Configurable & scalable • Vector length 128-bits to 512-bits

• Licensee configurable ALUs

Low power, simple to use

- Multi-level clock-gating
- In-order, 1R/W SRAM, cell based

> 50 VPU in < 10W Open Compute card

Core Overview Some details From 2-stage over 5-stage to 8-stage



### AndesCore<sup>™</sup> 25 Series

- \* 32-bit and 64-bit cores
- \* AndeStar V5 architecture:
  - RV-IMAC + Andes V5 Extensions
  - Optional: F/D and S-mode/MMU
- \* 5-stage pipeline, single-issue
- \* Configurable multiplier
- \* Optional branch prediction
- \* I/D caches and Local Memory
  - Optional parity or ECC protection
  - Hit-under-miss caches
  - HW unaligned load/store accesses

#### \* Bus interface

- A master port (AHB, AXI, AXIx2)
- An optional slave port (AHB)



#### 1~4 A25/AX25 CPUs: RV-IMACFD ISA + V5 extensions P-extension draft

P-extension draft
 Supporting SMP Linux

#### **Bus Interfaces**

- LM slave port
- Coherence slave port
- AXI bus master interface

   N:1 synchronous clock ratio

PLIC for interrupt handling Debug/trace support

#### **Andes Coherence Unit**

- MESI cache coherence protocol
- Duplicate L1 dcache tags
- IO coherence for <u>cacheless</u> masters
- L2 Controller
  - Size: 128KB to 2MB







ANDES

### AndesCore<sup>™</sup> 27 Series

#### A27 and AX27

- RV\*GC-N-P
- 5-stage single-issue
- Programmable PMA table
- MMU for Linux
- Leveraging the mature 25-series; same performance on Local Memory

#### MemBoost

- Higher memory throughput for Vector
- Performance over 25-series:
  - 200% higher bandwidth
  - 50% lower latency





### AndesCore<sup>™</sup> 45 Series

|                                                                                                                                                                   | <u>F1</u>                                    | <u>F2</u> | <u>ID</u>          | <u>11</u>           | <u>EX</u>        | MM    | <u>LX</u>        | <u>WB</u> |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|-----------|--------------------|---------------------|------------------|-------|------------------|-----------|--|
| 8-stage in-order dual-issue                                                                                                                                       |                                              |           |                    |                     |                  |       |                  |           |  |
| AndeStar <sup>™</sup> V5 ISA:                                                                                                                                     |                                              |           |                    |                     | ALU <sub>0</sub> |       | ALU <sub>2</sub> |           |  |
| RV*GCN (S/D FPU)                                                                                                                                                  |                                              |           |                    |                     | ALU <sub>1</sub> |       | ALU <sub>3</sub> |           |  |
| <ul> <li>RV*P-ext (DSP/SIMD)</li> </ul>                                                                                                                           | / ¢۱                                         | ILM       | DEC                | ISS                 | AG               | D\$/  | DLM              |           |  |
| MMU: for Linux Applications                                                                                                                                       | יאָי                                         |           | DLC                | 155                 |                  | יר ט  |                  |           |  |
| ALL have Andes extensions                                                                                                                                         |                                              |           |                    |                     |                  | Multi | plier            |           |  |
| Dual-issue most instruction pairs                                                                                                                                 |                                              |           |                    |                     |                  | DSP   |                  |           |  |
| <ul> <li>Except for 2 MUL/FPU/DSP/LD_ST and some<br/>special dependent ALU instruction pairs</li> </ul>                                                           |                                              |           |                    | Floating Point Unit |                  |       |                  |           |  |
| Late ALUs enable 0-cycle load-use                                                                                                                                 | (\$/LM: 2 <sup>nd</sup> cycle for alignment) |           |                    |                     |                  |       |                  |           |  |
| <ul> <li>MemBoost for memory subsystem</li> <li>Low power dynamic branch prediction</li> <li>Unaligned data accesses</li> <li>Fast or small multiplier</li> </ul> |                                              |           | 45-Series Pipeline |                     |                  |       |                  |           |  |

### **Time-to-Market**

### Get the whole set, IDE, Debug Probes, BSP's and Core IP



### **Complete Development Environment**



• AndeSight<sup>™</sup> Feature-Rich IDE

Free Evaluation on SID and ICE target



- AndeSoft<sup>™</sup> Software Stack
  - Bare metal demo projects FreeRTOS ver10 Linux

ANDES



AndeShape<sup>™</sup> Development Boards

Full-Featured ADP-XC7K Corvette-F1 Amazon FreeRTOS-qualified







qualified device

Debugging Hardware
 AICE-MINI+, AICE-MICRO





### AndesCore<sup>™</sup> Ecosystem - Tools

- IAR Embedded Workbench<sup>®</sup>
  - Support RISC-V
  - Support P-Extension (DSP/Packed-SIMD)
  - excellent optimization technology
  - static code analysis
  - Extensive Debugging via I-jet probe







| Andes N25 AE350 Orca<br>Andes N25F AE250 Orca | - ド*TimerInterrupt*のオブション<br>カテゴリ:<br>General Options<br>新台)解析<br>C/C++ Compiler<br>Assembler<br>Output Converter<br>カスタムビルド<br>ビルドアクション<br>Linker<br>Debugger<br>I-jet<br>Simulator | Library Options 2 Stack/F<br>Target Output L<br>Device<br>Andes N25 AE250 Corvette-F | ×<br>Andes A25 AE350 Orca<br>Andes D25F AE350 Orca<br>Andes N22 AE250 Corvette-F1<br>Andes N22 RV32E AE250 Corvette-F1<br>Andes N25 AE250 Orca<br>Andes N25 AE250 Corvette-F1 |
|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|                                               |                                                                                                                                                                                      |                                                                                      |                                                                                                                                                                               |





### Summary

### ► RISC-V is fast growing $\rightarrow$ The Future of SoC

- Efficient and extensible architecture for all computing devices
- From number-crunching vector processors to Linux ready Cores
- More flexibility in RISC-V
- Andes commits to serve emerging RISC-V demands
  - Most matured offerings for RISC-V processor IP's
  - Strong development tools and SW from Andes and partners
  - V5 cores already in AIoT, FPGA, MCU, Security, Storage, Wireless

## **AndesCores For Your Next SoC Projects !**







www.andestech.com

# תודה רבה לך **THANK YOU**



ANDES



florian@andestech.com

Taking RISC-V<sup>®</sup> Mainstream