

# GreenRio: A Modern RISC-V Microprocessor Completely Designed with An Open-source EDA Flow

Yifei Zhu, Guohua Yin, Xinze Wang, Qiaowen Yang,

Zhengxuan Luan, Yihai Zhang, Mingzi Wang, Peichen Guo,

Xinlai Wan, Shenwei Hu, Dongyu Zhang, Yucheng Wang,

Wei Chen, Lei Ren, Zhangxi Tan

**RISC-V International open-source Laboratory, Tsinghua-Berkeley Shenzhen Institute (RIOS Lab, TBSI)** 





# Outline

- Background
- GreenRio: Front-end Design
- GreenRio: Back-end Design
- Comparison between Proprietary and Open EDA Tools
- Outlook on Future Open-source RISC-V Design

# Background



#### Increasing complexity of IC design in advanced processing technologies



cost, difficulties, risks, policies, ...

### **Expectations**

1. Bridging the knowledge gap between classroom and industry

2. Promote RISC-V and OpenEDA's echo system

Threshold for Open CPU design

→ Agile development in fully open source mode

Open source EDA tools are evolving but the community lacks designs for references **RISC-V** chip's development **OpenEda's iteration** 

# Background

#### • Comparison of some typical cores hardened by Open EDA

| Design            | GreenRio                  | EH1      | biriscv  | picorv32a    | ibex         |
|-------------------|---------------------------|----------|----------|--------------|--------------|
| ISA               | RV64                      | RV32     | RV32     | RV32         | RV32         |
| pipeline stage    | 7                         | 9        | 6 or 7   | 6            | 2 or 3       |
| issue width       | dual                      | dual     | dual     | single       | single       |
| execution feature | <mark>out-of-order</mark> | in-order | in-order | in-order     | in-order     |
| gate count (K)    | <mark>53-120*</mark>      | 100      | 67       | 17           | 20           |
| Efabless tape-out | <mark>√</mark>            | ×        | ×        | $\checkmark$ | $\checkmark$ |

#### Motivations

Can the open toolchains be used to develop a modern RISC-V core?

#### Limitations

1. Gate counts, area

2. Features of modern processors

#### Taped out in the Chipignite and OpenMPW programs

| D | Details Summary Projects   | (8) Announcements (0) Manage My | Submissions | Details Summary Projects (   | 104) Announcements (1) Manage My |
|---|----------------------------|---------------------------------|-------------|------------------------------|----------------------------------|
|   | Cpu_Camp Add another proje | ict to Shuttle 🚭                |             | hehecore Add another project | to Shuttle 🔂                     |
|   | View Project 🎓             |                                 |             |                              |                                  |
|   | MPW Precheck               | Complete 🤗                      | Re-Submit   | View Project 🎓               |                                  |
|   |                            |                                 |             | MPW Precheck                 | Complete 🛇                       |
|   | Tapeout                    | Complete 🥝                      | Re-Submit 🔨 |                              |                                  |
|   | Shipping Address           | Complete 🥑                      | Edit        | Tapeout                      | Complete 🛇                       |
|   | Billing Address            | Complete 🥑                      | Edit        | Shipping Address             | Complete 🛇                       |
|   | ÷                          | 1                               |             |                              |                                  |
|   | Legal                      | Complete 🥑                      | ^           | Legal                        | Complete 🖉                       |
|   |                            |                                 |             |                              |                                  |
|   | Submission                 | In Review 🛇                     | Cancel      | Submission                   | In Review 오                      |
|   |                            | Clav1204                        |             |                              | Sky130B                          |
|   |                            | Sky130A                         |             |                              | SKYISUD                          |

| Ramo                                            | ▲ Type O | Owner                 | © Country      | Previous Participant O | MPW Precheck Ø | Tapeout O | Iterations t |
|-------------------------------------------------|----------|-----------------------|----------------|------------------------|----------------|-----------|--------------|
| 4 bit Ring Counter 2                            | Digital  | Romyo Suriyorani      | None           | 80                     | Pass           | Poss      | 1            |
| 10b ADC and Analog Support - Update 🖻           | Analog   | Christoph Weiser      | Denmark        | Tes                    | Poss           | Pass      | 1            |
| A39 (2*                                         | N/A      | Bill Flynn            | United States  | Tes                    | N/A            | N/A       | 0            |
| AI-CHIP-4-IN-112                                | Digital  | 집대원                   | None           | 843                    | Passa          | Poss      | 3            |
| alu (c'                                         | N/A      | AASHISH TIWARY        | None           | 803                    | N/A            | N/A       | 1            |
| Analog Frontend for Particle Detection Resubm 🖉 | Analog   | Simon Wald            | Austria        | <b>199</b>             | Peers          | Poss      | 2            |
| Bitcoin Mining Asic g                           | Digital  | Constantine Mantas    | United States  | <b>1985</b>            | Poss           | Pass      | 2            |
| Chaos Automaton 🖉                               | Digital  | Alex Goldstein        | None           | 80                     | Pass           | Pass      | 2            |
| crypto_oesi28 g*                                | Digital  | Uniel Jaramilio Toral | None           | 88                     | Frank          | Poss      | 4            |
| Cryptographically Secure RNO g                  | Digital  | RECEP GÜNAY           | Turkey         | 100                    | Poss           | Poss      | 1            |
| demo_mpw_project @                              | N/A      | Tanishq E             | None           | 80                     | N/A            | N/A       | 0            |
| Digital Biquad Filter - mpw7 g                  | Digital  | Tiago Silva           | Portugal       | Tes                    | Poss           | Poss      | 2            |
| Enhanced Chaotic Oscillator Design 🗹            | Digital  | Parker Hardy          | None           | Tes                    | Poss           | Poss      | 1            |
| extraction_test_structures ≥                    | N/A      | Andrew P. Lentvorski  | None           | 80                     | N/A            | N/A       | 0            |
| FABulous_eFPOA_wb @                             | Digital  | Nguyen Dao            | United Kingdom | 100                    | Poss           | Poss      | 5            |
| First Silicon (MPW-7) ⊵*                        | N/A      | Horoce                | None           | 803                    | NA             | N/A       | 0            |
| FPGA_Programming_Management_Unit 🖉              | Digital  | Allen Boston          | United States  | 80                     | Poss           | Poss      | 15           |
| Graphics Controller 🖉                           | Digital  | Vijayan Krishnan      | None           | 80                     | Poss           | Poss      | 1            |
| henecore of                                     | Digital  | višel Zhu             | None           | 80                     | Poss           | Poss      | 6            |
| Hyperspace-resubmission @                       | Digital  | Vladimir Milovanović  | Serbio         |                        | Pass           | Poss      | 4            |



# Overview of Our work

whole signoff flow





# Frontend Flow Specification, Simulation, and Verification

### Choose a Development Language: System Verilog versus Verilog

• The translation pass rate of open-source tools for some **SV-based** cores

|                  | preprocesso    | or    | open synthesis tool | open verification tool |
|------------------|----------------|-------|---------------------|------------------------|
| test cores*      | Surelog*-UHDM* | sv2v* | Yosys               | Verilator              |
| ariane           | 0/1            | 0/1   | 0/1                 | 1/1                    |
| black-parrot     | 0/7            | 0/7   | 0/7                 | 4/7                    |
| earlgrey         | 0/1            | 0/1   | 0/1                 | 0/1                    |
| fx68k            | 0/1            | 1/1   | 0/1                 | 1/1                    |
| ibex             | 1/1            | 1/1   | 0/1                 | 1/1                    |
| rsd              | 0/1            | 1/1   | 0/1                 | 1/1                    |
| scr1             | 0/1            | 1/1   | 0/1                 | 0/1                    |
| swerv            | 0/1            | 1/1   | 0/1                 | 1/1                    |
| TNoC             | 0/1            | 0/1   | 0/1                 | 0/1                    |
| picorio1.0       | 0/1            | 0/1   | 0/1                 | 0/1                    |
| total tests pass | 1/16           | 5/16  | 0/16                | 9/16                   |

- Formal verification tools like "Conformal"
  - open ecological chain of chip design.

From https://chipsalliance.github.io/sv-tests-results/

\*test cores: some open cores which are written in System Verilog \*Surelog: A frontend that can convert System Verilog to UHDM file \*UHDM (Universal Hardware Data Model) : A tool that can convert UHDM files to Verilog \*Sv2v: A tool that can convert System Verilog to Verilog

#### RIOS

# GreenRio1.0



feature list

- 1. Dynamic branch prediction
- 3. RISC-V I Extension, M mode
- 5. Dual Issue

- 2. Out-of-Order Execution
- 4. Register Renaming
- 6. Nonblocking Dcache



# SOC of GreenRio



# Design Implementation and Verification

- Syntax Checking and RTL simulation: Verilator & Lint
- Verification: Open-source benchmarks for RISCV architectures
- Co-simulation: RISCV Sail model





#### https://gitlab.com/picorio/software/talon-opensource

### RIOS

# Co-simulation with RISCV Sail Model

### What & Why Sail?

- Accurate transaction level abstraction
- Domain-Specific Language designed for expressing the ISA semantics
- RISC-V Sail: golden reference for RISC-V architecture
  - Reduces the possibility of human error
- Friendly to ISA extension
- Output clear information and registers' status
- <u>Drawback:</u> Slow C model compilation, but acceptable





# Co-simulation with RISCV Sail Model

Verify them respectively by comparing the results generated by RTL and Sail

• Frontend co-sim

• Backend co-sim





# Backend Flow Experience, Analysis and Feedback

Technology dependent



#### RIOS

# **Open Silicon Implementation Flow**

- Open-source backend EDA tools
  - OpenLane: an automated flow performing full ASIC implementation
- Open-source PDK
  - Skywater 130nm (Sky130A & Sky130B)
- Open-source external IP
  - SRAM blocks compiled by OpenRAM
- Open-source silicon production
  - Open Multi Project Wafer (OpenMPW)
  - Efabless Chipignite

### **Experience: A Journey of Discovery**

- First tape-out project
- Openlane and open pdk
- A series of trials and tribulations
- 5+ solutions tried within 10 days





# 1<sup>st</sup> Attempt: Flatten Everything

- Let OpenLane do everything!
  - Fixed area (2920x3520nm)
  - 8 SRAM blocks, GreenRio core, and SoC
  - Automatic floorplan, placement, and routing ....

# ResultPlacement step failed

Detailed placement failed

- Floorplan
  - 8 sram macros from Openram
- Needs manual participation



# 2<sup>nd</sup> Attempt: Manual Macro Placement

- Manual Floorplan
  - Specify the positions of SRAMs
  - EDA handles everything else

# Result

- Routing time is too long:
  - More than 10 hours
- SIGINT signal sent



- Routing algorithm efficiency?
  - convergence is very time-consuming
- Auto placement optimizations?
  - Difficulties may be caused by inappropriate placement



# 3<sup>rd</sup> Attempt: Learning from Riscduino

- Harden hierarchical design
  - Take riscduino as a reference
    - Single 32 bit RISC-V Core based SOC
  - Divide into 13 submodules
  - Harden submodules
  - Harden the full chip with macros

### Result

- Flow finished
  - Completed within 5 hours
- Millions DRC errors
  - Errors found by Magic & KLayout





- SRAM DRC errors
  - Can be waived (from official)
- Routing space too small?
  - Only ~3 mm<sup>2</sup> blank space
  - More than 2000 wires between macros 18



# 4<sup>th</sup> Attempt: Improvement of the 3<sup>rd</sup> One

- Adjustments based on Attempt 3
  - Reduce the number of submodules: integrate the controllers with SRAMs
  - Waive SRAMs for DRC check

#### Result

- DRC errors
  - Errors found by Magic & KLayout
- Re-floorplan and iterative testing
  - Number of DRC errors changed
  - Fewer errors (<100)

### Discussion

- routing space still limited?
  - Still many macros: 11
  - Still many macro-macro wires
    - More than 1500 wires

adjust the number of submodules, find a successful strategy



# Final Attempt: Learning from Failures

- Flatten and hierarchy
  - Flatten all modules except SRAMs
  - Integration: flattened modules and SRAMs



# Result

- Flow finished in 4.3 hours
- No DRC or LVS errors found
- Passed precheck & tape-out check

- more effective routing algorithm
- Greenrio2: backend optimization exploration



# Analysis: Feedback of OpenEda Tools

- What needs to be done manually
  - Proper Marco partition
  - Reasonable Floorplan
- Expectations and reflections on OpenLane
  - Automatic floorplan for complex designs
  - Smarter placement and routing algorithms
  - Trials and errors are annoying
- OpenLane is based on several components
  - OpenROAD, Yosys, Magic, Netgen, CVC, SPEF-Extractor, KLayout





# Comparison between Proprietary and Open EDA Tools

# Comparison: Open vs Proprietary

**Tutorial and document** 

8.

User Experience

|                                                          | OpenLane          | Proprietary EDA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Gap* <sup>1</sup>  |
|----------------------------------------------------------|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|
| synthesis run time                                       | 6m12s             | 4m4s                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 1.5X               |
| gate count after synthesis                               | 53 K              | 33 K                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 1.6X               |
| placement & routing time* <sup>2</sup>                   | 1h58m             | 43m                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 2.7X               |
| die area (mm²)                                           | <mark>2.02</mark> | <mark>1.24</mark>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | <mark>1.6X</mark>  |
| placement density                                        | 32%               | 45%                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | 1.4X               |
| dynamic power                                            | 48.6mW            | 23.1mW                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 2.1X               |
| best clock period* <sup>3</sup>                          | 80MHz             | 110MHz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 1.4X               |
| <ul> <li>Limitations of open-source EDA tools</li> </ul> |                   | Contraction of the local division of the loc | PRIS TRANS         |
| 1. System Verilog syntax support                         |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | <b>新学生</b>         |
| 2. PnR with high density                                 |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                    |
| Functional Enhancement 3. Logic equivalence check (LEC)  |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                    |
| 4. Antenna violation fixing                              |                   | A REAL PROPERTY OF A REAL PROPERTY OF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 周初日本中              |
| 5. PPA optimization                                      |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Martin and Antonio |
| 6. Multi-thread acceleration                             |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                    |
| 7. Early check mechanism                                 |                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                    |

(a) Layout generated by OpenLane

(b) Layout generated by Commercial EDA



# Comparison: Open vs Proprietary

#### • What is missing?

- Complete support for SV
  - Extensive ip usage
- Available LEC
  - LEC function is abnormal
- Efficient PnR algorithms
  - Higher density
- Parameter searching
  - Early termination and endless processing
  - Parameter searching and indications

### Accelerate Silicon Research

- Speed up with multi-threading
- PPA optimization



# Conclusion: Outlook on Future Open-source RISCV Design



Through this RISC-V developing experience

- Great usability of open-source EDA tools
   High speed simulation, RISC-V Sail in verification, Automation of OpenLane...
- Several improvements to facilitate iteration
   Test generation, Design exploration, PPA optimization, ...

# Conclusion: Outlook on Future Open-source RISCV Design

#### • The RISC-V ISA has sparked a boom in open source hardware field



# "Open Chip"

- (1) Instruction set
- (2) Micro-architecture implementation
- (3) Design flow with open tools (OpenEDA)

#### The development of RISCV core is closely related OpenEDA tools



# Conclusion: Outlook on Future Open Source RISC-V Design

A full-stack open-source design methodology for modern processors

#### Prove the feasibility of designing high performance chip in open source mode



**I** Reduce open chips' the labor and cost of IP

Accelerate the customized chip development

- □ Form open source chip ecology
- Promote the reform of Warehouse-scale computers(WSC) industry

In future...



bring a new perspective on future RISC-V architecture development

# Thank you RIOS