

# Agenda

Opensource RiscV CPU RiscV Al Software Hardware Opportunities



O PyTorch







# More people are designing RiscV than any architecture

- RiscV vs ARM vs Intel new designs
  20+ > 5 > 2
- You can change it
- You can do what you want
- This is how Innovation happens







# Ecosystem diversity in RISC-V





## Fundamental Technology - RISC-V

#### Open

**PARTNER** owns it: Open-Source, no-centralized control or roadblocks, no-IP limiting issues

### Flexible

**PARTNER** can change it: Add/Remove functionality or change architecture to better suit your target

#### Accelerated

**PARTNER** can adapt quickly: Modern architecture with strong industry support + OPEN and FLEXIBLE

#### **Future Proof**

Al will generate future software and hardware – Need an architecture that can change to adapt



Tenstorrent's mission is to bring high performance RiscV to general purpose computing and AI



tenstorrent Confidential

# Ascalon O-o-O Superscalar Processor

# Disruptive high-performance RISC-V processor for AI and server



#### RVA-23

- Advanced branch
   predictions
- Up to 8-wide decode
- 3 LD/ST with large load/store queues
- 6 ALU/2 BR
- 2 256-bit vector units
- 2 FPU units





# Ascalon-D4 Core Configuration



- Compact and power-efficient
- RV64ACDHFMV

|          | Features                  | Size/Width    |
|----------|---------------------------|---------------|
|          |                           |               |
| Frontend | Direction Predictor       | 45 KB         |
|          | Indirect Target Predictor | 26 KB         |
|          | I-Cache                   | 32KB (8-ways) |
|          | Decode Width              | 4             |
| Backend  | Integer/Branch pipes      | 2             |
|          | Integer pipes             | 2             |
|          | LS pipes                  | 2             |
|          | FP pipes                  | 2             |
|          | Vector pipes (256-bit)    | 1             |
|          | ROB                       | 160           |
| LSU      | LDQ Entries               | 24            |
|          | STQ Entries               | 36            |
|          | DTLB                      | 256 (4-ways)  |
|          | D-Cache                   | 64KB (4-ways) |
|          | L2 TLB                    | 1K entries    |



# Ascalon Clusters



### Cluster Architecture

- Up to 8 Cores/per Cluster
- 230GB/S CHI coherency bus
- 230GB/S AXI message passing bus
- 12MB shared cluster cache







### RiscV Methodology

- Whisper/Spike: Instruction set simulator
- GenUVM: Infrastructure to automatically generate testbench environments
- RIESCUE: Stimulus and workload development framework to create ISA and microarchitecture tests.
- Parameterization: flows in place to seamlessly re-configure/generate RTL and DV code for different views
- RISCV compliance suite
- Memory consistency model + stimulus suite
- Configurable core/cluster level testbench in-place to check architectural results, compatible with simulation and emulation





### RISC-V Software Progress

**Upstream projects adding RISC-V support:** 

2017: GCC 7.0 includes RISC-V targets
2017: Linux 4.15 includes RISC-V architecture
2018: U-boot v2019.03 adds RISC-V support
2019: LLVM 9.0 supports RISC-V targets
2020: Rust and Go add RISC-V targets
2021: KVM support in upstream repo (Type 2 hypervisor)
2021: Node v.17 supports RISC-V
2022: Chromium V8 (Javascript) support merged
2023: Tianocore EDK2 adds RISC-V support for qemu-virt
2023: Android official port started by Google
2023: Xen platform support actively developed







### General purpose CPU and AI work closely together



# RiscV AI





# Scalable Tensix Element



Grayskull: 120 Tensix cores

1 Receive

- 3 Compute
- Licensable IP elements for scalable AI •



### Tenstorrent Software - Two Distinct Approaches



## BUDA – Top Down high level program to graph

- Fully automated path from all popular ML framework to optimized implementation
- High quality results with no manual effort
- Same compiler targets one chip or many thousands of chips





### Map the Graph

- Al computations map to processors
- Optimize data flow and limit use of memory
- Local CPUs manage and participle in the graph
- Opens up new software algorythms











### Wormhole Products (2<sup>nd</sup> Gen device for AI at scale)



#### Galaxy Card

- Modular device with 1.6TB onboard ethernet
- Natively scalable to an arbitrary number of devices
- High performance at low cost



#### **Galaxy Server**

- High-density AI servers in 4U enclosures for rack systems
- Comprised of 32 devices
- Includes backplane interconnect, active cooling units and SDK
- 12 PFLOP at 6KW



tenstorrent Confidential

#### BERT-large on Galaxy fwd\_0\_temporal\_epoch\_0: matmul, grid\_loc: [0, 0], grid\_size: [3, 1], inputs: [buffer\_0\_hidden\_state\_add\_37, • PyBuda in pipeline Single BERTt: 1, mblock: [2, 8], ublock: [2, 4], buf\_size\_mb: 2, mode: suitable for LLM large encoder math\_fidelity: HiFi3, Models running on 1 attributes: {bias: true, m\_k: 4, Wormhole Chip PyBuda places 1 encoder per chip on Galaxy, other placements possible • Outputs from chip 24 BERT-large N/layer N flow directly encoders over ethernet to chip running on 24 N+1/layer N+1 Wormhole Chips 32-Chip Galaxy tenstorrent Confidential

# Next Gen Chiplet Solutions

- Organic substrate
- Range of package sizes
- 0.8 mm BGA pitch
- uCIE and BOW D2D phys make organic package work
- Building block approach
- AI, CPU, Memory and IO Chiplets in design





# Same building blocks - edge RISC-V Server





Confidential

Tenstorrent confident

### Next gen chiplet plans – RiscV CPU and RiscV AI







RISC-V IP or Chiplet





**Ecosystem Partnerships** 



Market specific solutions



### Tenstorrent Example Vertical: Automotive



Tenstorrent AI and RISC-V IP deliver the compute power that ADAS and IVI require



Chiplet approach reduces cost while accelerating design and production schedules.



Automotive companies can own their own silicon working with Tenstorrent



Power Consumption is critical: Tenstorrent technology scales from MW to mW



ent Confidential

## Tenstorrent: Open Business Model

- Tenstorrent works with partners to design, create, modify, optimize heterogenous designs
- Key technology providers for wide spectrum of products for our strategy partners
  - Al
  - CPU







IΡ





#### Edge Devices





tenstorrent Confidential

26

# Summary







Scalable Al



```
Powered by RISC-V
```

