

## Optimizing Data Transport Architectures in RISC-V SoCs for AI/ML Applications

Michał Siwiński EVP and CMO



February 27, 2025 RISC-V Day Tokyo 2025



### **Arteris** – Connecting Innovation

- Arteris specializes in semiconductor System IP
  - On-chip communications
  - SoC integration technologies

- From architecture to IP assembly
  - to sub-system and chip integration
  - all with physical awareness



### **Arteris** – Connecting SoCs

### Arteris technology is **pervasive in SoCs**

- 5-20 NoCs (Network-on-Chip) on chip or chiplet
- NoCs represent 10-13% of silicon
- Registers represent 3-20% of silicon

## With **90+ patents**, Arteris addresses critical SoC design challenges

- Integrating AI
- Functional safety
- Power efficiency
- Performance
- Die area optimization



### **Arteris** – Connecting Ecosystem

### Arteris technology – vendor-agnostic

- Connecting across the entire semiconductor ecosystem
- Simplifying & derisking SoC designs























































### **Arteris** – Connecting Technology

- Arteris technology proliferation
  - Over 850 SoC designs
  - Shipped in >3.7 billion SoCs globally
  - Touching every industry













### **Arteris** – Enabling Increasing Numbers of AI/ML Chiplets and SoCs



- Training
- Inference
- Generative Al
- Vehicle endpoints
- Robotics
- Datacenter
- Infrastructure



### The Opportunities and Challenges

### Interconnect is the enabler for high performance & complex SoCs



Hennessy & Patterson predicted a few years ago, 'A new golden age for computer architecture' driven by:

- Open Instruction Sets
- Domain-Specific Architecture
- Agile Chip Development

Over the past 20 years:

Peak compute increased: 60,000x

DRAM bandwidth increased: 100x
Interconnect bandwidth increased: 30x

ARTERIS 12

## Expanding AI Compute Accelerating the Rise in Chip Complexity

More logic and design constraints further complicating on-chip connectivity ("the glue")

#### Al Era of Compute: 7x Acceleration



#### **Driving Exponential Chip Complexity**



Modern SoCs are connected by billions of wires, ever-expanding, with growing complexity

- → More & bigger networks-on-chip (NoCs)
- → Requiring more complex architecture and topology

### Challenges of RISC-V SoCs – Interconnect Is the Problem to Address



## Arteris System IP and Network-on-Chip (NoC) for RISC-V AI SoCs

Networking techniques for improved on-chip communication & data flow



### Mixed Architecture AI SoCs are Commonplace

### DreamChip example of ADAS Level 2+ SoC with 2x CNN accelerator sections

- Custom Al Accelerator 6-24 TOPS
- 2x Al accelerators 10 TOPS
  - **RISC-V** CPU in NPU with 768 Processing Elements (PE)
  - Cadence® Tensilica® AI Max coupled with Vision DSP V6
- Dual-core Arm® Cortex®-A65AE processor cluster
- Arm Cortex-R52 functional safety processor
- 2x DreamChip Image Signal Processors (ISPs)
- ISO26262 safety certified up to ASIL-D (TÜV SÜD)



Multiple Arteris NoC IP Instances



NoCs: ~15% of SoC

## Why Do Designs Use AMBA CHI and ACE in the Same System?

### Adapt to RISC-V dynamic ecosystem



- RISC-V is a diverse and evolving ecosystem
- Mixed ACE/CHI can ease integration of new and legacy processors
  - Coherent Interfaces: CHI-B, CHI-E or ACE, interoperable
  - Mix the latest high-performance RISC-V clusters using CHI with older RISC-V CPUs using ACE
  - Mix of RISC-V and Arm
  - Leverage investment in ACE IP
- Proxy caches ease integration of noncoherent accelerators into the coherent domain
- Provide PCIe connectivity for storage and data center applications

## **Generative AI and LLM** Data Transport Architecture Ultra-wide Data to Support the Rapidly Growing Number of Parameters

- A tremendous amount of data has to be moved in Gen AI chips
- Maximizing Throughput to Handle Trillion+ Parameter AI
- Reduce data movement by bringing compute and memory
- Performance Scaling for Multi-TFLOP/GHz
- Support of Arm, RISC-V, and mixed architectures
- Support of Multi-Die / Chiplet architectures



### Efficient and Performant AI/ML Data Transport Architecture

### Optimal solutions combine coherent and non-coherent NoCs

- Coherent NoCs required for data shared with cached CPUs
  - Coherent systems work on 64B coherency granules (512b cache line)
- Extreme bandwidths in AI/ML devices
  - Local memories may reduce traffic to external memory
  - Separate shared and non-shared memory traffic
- Provide a fast and wide path to memory for non-shared traffic
- Combine coherent and I/O-coherent NoCs for optimal performance
  - Coherent hub close to the cached CPUs with narrower buses.
  - Wide NoC connects the rest of the SoC including AI core array
  - Mesh topology can be appropriate for Al applications

**Ncore 3** coherent interconnect provides the coherent hub

FlexGen / FlexNoC 5 connect the AI core accelerator units



AI/ML Accelerator SoC

### High Memory Bandwidth from Interleaving Channels

Up to 8 or 16 channels interleave

Read-reorder buffers

 Traffic aggregation / data width conversions

Up to 2048 bits wide connections



INIU: Initiator Network Interface Unit TNIU: Target Network Interface Unit

### **Intelligent Multicast Write**

Efficient multicast – bandwidth saving

- Broadcast station optimizes use of NoC bandwidth
  - Broadcasts performed as close as possible to the destination
  - Any number of broadcast stations in a FlexNoC
  - Writing to broadcast station will cause it to send posted writes to multiple destinations
- Used in AI for Deep Neural Network (DNN) weight and image map updates



## Al Tiling: >10x Scalable Performance with Mesh Connected Tiles

Meeting Al's massive demand for faster and more powerful computing

- NoC tiling allows Al chips to boost their processing power by more than 10x without changing the basic design
- The effort to implement the NIUs, the most logically intense elements in the NoC, is drastically reduced
- NIUs can be implemented once, then tiled using external tie-offs for IDs

 Al Workloads: Vision, ML, DL, NLP including LLMs and Generative Al



Arteris estimations based on NoC tiling customer use cases for AI/ML

### Large Compute Support with CPU Tiling and Mesh Topology

Cache-coherent Ncore IP with flexible and highly scalable support up to 512 CPUs in clusters





### NoC Tiling with Mesh Topology for NPUs, GPUs, TPUs

Flexible, highly scalable tiling supported by mesh, up to 1024 tiles, in FlexNoC IP





## Improving TAT and PPA for Complex AI Chips and Chiplets Sample FlexGen Smart NoC IP Results for Automotive AI SoC (ADAS)

### FlexNoC (Manual)



Total Wire Length
Length of Longest Wire
Number of Switches
Number of Links
No. of Clock Adapters
No. of Packet Adapters
Latency
Maximum Latency
Main NoC area

138,709 mm 904 mm 258 313 152 157 65.18 ns 1005.67 ns 3.64 mm<sup>2</sup> 10x productivity-26% wire length-28% longest wire



-5% latency-51% max latency-3% area



#### FlexGen (Automated)



| Total Wire Length      | 102,587 mm           |
|------------------------|----------------------|
| Length of Longest Wire | 650 mm               |
| Number of Switches     | 282                  |
| Number of Links        | 420                  |
| No. of Clock Adapters  | 141                  |
| No. of Packet Adapters | 210                  |
| Latency                | 62.08 ns             |
| Maximum Latency        | 491.67 ns            |
| Main NoC area          | 3.51 mm <sup>2</sup> |
|                        |                      |

TAT: Turn-Around Time PPA: Power, Performance, Area



### **Automotive Domains and Their Complexity**

### Cache coherency is required in safety-critical systems



Chiplets increasingly important in future











## Challenge of Safety-certification for Coherent Systems

Automotive ADAS/autonomous driving is a key application of AI/ML



- The complexity of coherent systems makes safety certification especially challenging
- Ncore 3 safety/resilience capabilities:
  - External ECC or parity
  - Interface ECC or parity
  - Interface duplication
  - Cache/SF ECC or parity
  - Transport link ECC or parity
  - Directory duplication
  - Fault controller/signaling



Ncore 3.4 is ISO 26262 ASIL D certified



## RISC-V Ecosystem Collaboration and Interoperability



### Tenstorrent – RISC-V AI High Performance Computing





We are happy to share that we are partnering with Arteris to use Ncore and FlexNoC IP in our next-generation product, The combination of performance and features made it a great choice for both our AI chips and our high-performance RISC-V CPUs. The Arteris team and IP solved our on-chip network problems so we can focus on building our next-generation AI and RISC-V CPU products."



Jim Keller, CEO of Tenstorrent



We continue to leverage Arteris' network-on-chip IP products in our designs as we drive the next wave of advancements in AI computing. Arteris is a proven technology partner — their FlexNoC IP provides superior performance for our next-generation AI compute.



David Bennett, CCO of Tenstorrent

## Summary: RISC-V for AI/ML Is Here, and Growing in Adoption

- The proliferation of Al/ML hardware is increasing, including with RISC-V Compute
  - Strong traction for Edge AI Inference, ADAS, Accelerators, and increasingly for training compute
  - Growing deployment of AI chiplets, per modularity, scaling of systems, and cost reductions
- Expanded bandwidth need for rising AI compute needs an optimized SoC dataflow
  - Wide buses for massive AI bandwidths
  - Mesh topology for large regular structures that align with physical layout
  - Broadcast writes bandwidth savings in deep neural networks
- Flexibility, configurability, and smart automation is key to scaling
- Mission-critical applications require certifications such as ISO 26262 up to ASIL D
- Protocol and ecosystem interoperability is key to pragmatic RISC-V use for Al,...
  - Interoperable support of mix of standards: CHI-E, CHI-B, ACE, ACE-Lite, AXI, UCIe, ...
  - Interoperability testing and silicon proof points













# Thank you

Arteris, Inc. All rights reserved worldwide. Arteris, P, the Arteris IP logo, and the other Arteris marks found at <a href="https://www.arteris.com/trademarks">https://www.arteris.com/trademarks</a> are trademarks or registered trademarks of Arteris, Inc. or its subsidiaries. All other trademarks are the property of their respective owners.