



# Syntacore open-source and commercial RISC-V solutions

Alexander Redkin Executive director



# Outline



- Company intro
- RISC-V compatible IP
- Customization services



# Syntacore introduction



### Semiconductor IP company, founding member of RISC-V foundation

### Develops and licenses state-of-the-art RISC-V cores

- Immediately available, silicon-proven and shipping to volume
- 5+ years of focused RISC-V development
- Core team comes from 10+ years of highly-relevant background
- SDKs, samples in silicon, full collateral

### Full service to specialize CPU IP for customer needs

- One-stop workload-specific customization for 10x improvements
  - with tools/compiler support
- IP hardening at the required library node
- SoC integration and SW migration support



# Company background

Est 2015, 70+ EEs

HQ at Cyprus (EU)

- R&D offices in St. Petersburg and Moscow (Russia)
- Representatives in APAC, EMEA, US

Japan: Syncom Co., LTD

### Team background:

- 10+ years in the corporate R&D (major semi MNC)
- Developed cores and SoC are in the mass productions

### Expertise:

- high-performance and low-power embedded cores and IP
- ASIP technologies and reconfigurable architectures
- Architectural exploration & workload characterization
- Compiler technologies











### Some current results



- State-of-the-art RISC-V CPU IP line with competitive features
  - commercially deployed in SoCs up to 5nm
- Customers in APAC, EMEA, US
  - References available

- MPWs and full-wafer production at the clients
  - ✓ SoC volumes in x100 000
  - ✓ Project example: 6o-core SoC ~700 mm² @ 7nm



### SCRx baseline cores







Area, power

# State-of-the art RISC-V CPU IP



| Features                  |                       | ٨٠,                     | RTOS/ Bare Metal | Linux/ "Full" OS              |                              |                              |                               |  |
|---------------------------|-----------------------|-------------------------|------------------|-------------------------------|------------------------------|------------------------------|-------------------------------|--|
|                           |                       | SCR1* FREE SCR3 SCR4 SC |                  | SCR5                          | SCR7                         |                              |                               |  |
| Width 32bit 64bit         |                       | •                       | •                | • •                           |                              |                              |                               |  |
|                           |                       |                         | •                | •                             | •                            | •                            |                               |  |
| ISA                       |                       |                         | RV32I E[MC]      | RV[32 <mark> 64</mark> ]IMC[A | RV[3264]IMCF[AD]             | RV[3264]IMC[AFD]             | RV64IMCAFD                    |  |
| Pipeline type             |                       |                         | In-order         | In-order                      | In-order                     | In-order                     | Superscalar                   |  |
| Pipeline, stage           | S                     |                         | 2-4              | 3-5                           | 3-5                          | 7-9                          | 10-12                         |  |
| Branch prediction         |                       |                         |                  | Static BP, RAS                | Static BP, RAS               | Static BP, BTB,<br>BHT, RAS  | Dynamic BP, BTB,<br>BHT, RAS  |  |
| Execution priority levels |                       | Machine                 | User, Machine    | User, Machine                 | User, Supervisor,<br>Machine | User, Supervisor,<br>Machine |                               |  |
| Extensibility/o           | ustomization          | 1                       | •                | •                             | •                            | •                            | •                             |  |
| Execution                 | MUL/DIV               | area-opt                | •                | 0                             | 0                            |                              |                               |  |
| units                     |                       | hi-perf                 | 0                | •                             | •                            | •                            | •                             |  |
|                           | F                     | -PU                     |                  |                               | •                            | •                            | •                             |  |
|                           |                       | /ECC parity]            | 0                | 0                             | 0                            | 0                            | 0                             |  |
| Memory                    | LI\$ [w/              | ECC parity]             |                  | 0                             | 0                            | •                            | •                             |  |
| subsystem                 | L2\$ [w/ECC]          |                         |                  |                               |                              | 0                            | 0                             |  |
| Subsystem                 | MPU                   |                         |                  | •                             | •                            | •                            | •                             |  |
|                           | MMU, virtual memory   |                         |                  |                               |                              | •                            | •                             |  |
|                           | Integrated JTAG debug |                         | •                | •                             | •                            | •                            | •                             |  |
| Debug                     | HW BP                 |                         | 1-2              | 1-8 adv ctrl                  | 1-8 adv ctrl                 | 1-8 adv ctrl                 | 1-8 adv ctrl                  |  |
|                           | Performar             | nce counters            | 0                | 0                             | 0                            | 0                            | 0                             |  |
| Interrupt                 | II                    | RQs                     | 8-32             | 8-1024                        | 8-1024                       | 8-1024                       | 8-1024                        |  |
| Controller                | Fea                   | atures                  | basic            | advanced                      | advanced                     | advanced+                    | advanced+<br>up to 8-16 cores |  |
| SMP support               |                       |                         |                  | up t                          | o 4 cores with coher         | rency up to 8-16 co          |                               |  |
|                           | A                     | <b>AHB</b>              | •                | 0                             | 0                            | 0                            |                               |  |
| I/F options               | A                     | XI4                     | 0                | •                             | •                            | •                            | •                             |  |
|                           | A                     | ACE                     |                  |                               |                              |                              | 0                             |  |

#### Baseline cores:

- Clean-slate designs in System Verilog
- Configurable and extensible
- 100% compatible with major EDA flows



### SCR1 overview



Industry-grade compact MCU core for deeply embedded applications and accelerator control

- RV32I|E[MC] ISA
- 2 to 4 stages pipeline
- M-mode only
- Optional configurable IPIC
- Optional integrated Debug Controller
- Choices of the optional MUL/DIV unit
- Open sourced under SHL (Apache 2.0 derivative) since 2017
  - Unrestricted commercial use allowed
- High quality, silicon-proven <u>free</u> MCU IP
- In the top System Verilog Github repos in the world
  - https://github.com/syntacore/scr1
- Full collateral TB & verification suite, SDK, specs, SW...
- Best-effort support provided, commercial offered





### SCR1 overview cont



| Dayfayyaayaat         | DMIPS    | -02     | 1.28 |  |
|-----------------------|----------|---------|------|--|
| Performance*, per MHz | DIVIIFS  | -best** | 1.89 |  |
| per wiriz             | Coremark | -best** | 2.95 |  |

<sup>\*</sup> Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM from TCM

#### Synthesis data:

Minimal RV32EC config: 11 kGates

Default RV32IMC config: 32 kGates

Range 10..40+ kGates

250+ MHz @ tsmc90lp {typical, 1.0V, +25C}

### What's new:

- Extensive user guide and quick start collateral
  - works out-of-the-box in all major sims
- Verilator support
- More tests/sample: RISC-V compliance, others
- Taped-out @several companies
- Regular talk at ORCONF
- Updated and maintained





<sup>\*\* -</sup>O3 -funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto

### SCR<sub>1</sub> SDK



### https://github.com/syntacore/scr1-sdk

### Repository content:

- docs SDK documentation
- fpga SCR1 SDK FPGA projects
- images precompiled binary files
- scr1 SCR1 core source files
- sw sample SW projects

#### Supported platforms:

- Digilent Arty and Nexys 4 (Xilinx)
- Terasic DE10-Lite and Arria V GX starter (Intel)















#### Software:

- Bootloader
- Zephyr OS
- Tests/sample apps
- Pre-built GCC-based toolchain (Win/Linux)

Fully open SDK designs + pre-build images

One of the easiest paths to start with **RISC-V** 





# RV64 SCR7

### Efficient mid-range application core

- RV64GC ISA
- SMP up to 8, later 16 cores
- Flexible uarch template, 10-12 stage pipeline
- Initial SCR7 configuration:
  - Decode and dispatch up to two instructions per cycle
  - Out-of-order issue of up to four micro-ops
  - Out-of-order completion, in-order retirement
- M-, S- and U-modes
- Virtual memory support, full MMU, Linux
- 16-64KB L1, up to 2MB L2 cache with ECC
- 1.5 GHz+ @28nm
- Advanced debug with JTAG i/f





| Performance*, |
|---------------|
| per MHz       |

| DMIPS    | -O2     | 3.25 |  |  |  |
|----------|---------|------|--|--|--|
| DIVIIFS  | -best** | 3.80 |  |  |  |
| Coremark | -best** | 5.12 |  |  |  |

<sup>\*</sup> Preliminary data, 2-way implementation, Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM



<sup>\*\*</sup> O3-funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto

# SCR7 SpecInt 2017 in HW







# Fully featured SW development suite



### Stable IDE in production:

- GCC 10.2
- GNU Binutils 2.31.0
- Newlib 3.0
- GNU GDB 8.0.50
- Open On-Chip Debugger 0.10.0
- Eclipse 4.9.0

Hosts: Linux, Windows

Targets: BM, Linux (beta)

#### Also available:

- LLVM 5.0
- CompCert 3.1
- 3<sup>rd</sup> party vendors

### Simulators:

- Qemu
- Spike
- 3<sup>rd</sup> party vendors



### JTAG-based debug solutions:

Supports: Segger J-link, Olimex ARM-USB-OCD family, Digilink JTAG-HS2, more vendors soon















# Wide support by 3rd party tools and SW vendors





Lauterbach Trace32



https://www.lauterbach.com/frames.html?pro/pro\_\_syntacore.html



Segger Embedded Studio

https://wiki.segger.com/Syntacore\_SCR1\_SDK\_Arty





IAR Embedded Workbench









### SCR<sub>x</sub> SDK



### Stable Eclipse/gcc based toolchain with IDE:

- GCC 10.2
- GNU Binutils 2.31.0
- Newlib 3.0
- GNU GDB 8.0.50
- Open On-Chip Debugger 0.10.0
- Eclipse 4.9.0

### HW platform based on standard FPGA dev.kits

- Multiple boards supported (Altera, Xilinx)
- Low-cost 3<sup>rd</sup> party JTAG tools
- Open design for easy start

#### SW:

- Bootloader
- OS: Zephyr/FreeRTOS/Linux
- Application samples, tests, benchmarks







preemptible and cooperative threads of differing priorities, as

well as dynamic mutexes and thread sleeping.



M COM4:115200baud - Tera Term VT





https://www.altera.com/products/boar ds\_and\_kits/dev-kits/altera/kit-arriav-starter.html



# Extensibility/customization: how it works











### Extensibility features:

- Computational capabilities
   New functions using existing HW
   New Functional Units
- Extended storage
   Mems/RF, addressable or state
   Custom AGU
- I/O ports
- Specialized system behavior
   Standard events processing
   Custom events

### Domain examples:

- Computationally intensive algorithms acceleration
- Specialized processors (including DSP)
- High-throughput applications
  - Wire Speed Processing/DPI/Realtime/Comms







Custom ISA extension for AES & other crypto kernels acceleration for SCR5

- Data
  - RV32G FPGA-based devkit, g++ 5.2.0, Linux 4.6, optimized C++ implementation
  - Rv32G + custom same + intrinsics
  - Core i7 68ooK @ 3.4GHz, g++ 5.4.o, Linux 64, optimized C++ implementation
- 60..575x speedup @ modest area increase: 11.7% core, 3.7% at the CPU cluster level



|                    |           | Encoding throughput, MB/s |          |          | Normalized per MHz, MB/s |          |         | RV32G + custom |        |        |
|--------------------|-----------|---------------------------|----------|----------|--------------------------|----------|---------|----------------|--------|--------|
| Platform           | Fmax, MHz | Crypto-1                  | Crypto-2 | AES-128  | Crypto-1                 | Crypto-2 | AES-128 | •              |        | eed-up |
| RV <sub>32</sub> G | 20        | 0.025                     | 0.129    | 0.238    | 0.00125                  | 0.00645  | 0.0119  | 575.00         | 117.74 | 60.93  |
| RV32G + custom     | 20        | 14.375                    | 15.188   | 14.502   | 0.71875                  | 0.7594   | 0.7251  |                |        |        |
| Core i7            | 3400      | 79.115                    | 235.343  | 335.212  | 0.02327                  | 0.06922  | 0.09859 | 30.89          | 10.97  | 7-35   |
| Core i7 + NI       | 3400      |                           |          | 3874.552 |                          |          | 1.13957 |                |        | 0.64   |

Disclaimer: Authors are aware AES allows for more efficient dedicated accelerators designs, used as example algorithm



# Getting access/evaluation



### SCR<sub>1</sub>

- Is fully open: <a href="https://github.com/syntacore/scr1">https://github.com/syntacore/scr1-sdk</a>
- SHL-licensed with unrestricted commercial use allowed
  - Commercial SLA-based support is available

### SCR 3 4 5 7

Full package\* access is available after simple evaluation agreement

For more info: <a href="mailto:evaluation@syntacore.com">evaluation@syntacore.com</a>

(\*) sufficient for simulation and synthesis



# IP collateral (what is included)



#### Standard core package (SCR<sub>3</sub>)

- RISC-V compatible core
  - RV[32|64]IMC[A] ISA
  - RTL (encrypted for evaluation stage), suitable for simulation and synthesis
  - Netlist for the required FPGA devices (Xilinx/Altera)
- Simulation and verification environment
  - Testbench, Integration verification environment
  - Architectural and compliance tests suites (pre- and post-si)
- Synthesis support harness
  - sample scripts, SDC/timing constraints for the required flow
- Reference instantiation examples (for AHB and AXI sockets)
- Back-end support @ required process node (PDK access to be provided)
  - Full cycle: synthesis, floor-planning, netlist verification, PaR/CTS/timing closure, DRC, FEV, DFT)
- Support for 1 tapeout up to a year is included

#### Tools (pre-built & sources)

- GCC based toolchain
  - complier, debugger, linker, functional simulator, binutils, newlib, openocd
- Eclipse-based IDE (Linux, Windows)

#### **FPGA-based SDK**

- Sample FPGA project (open design)
- pre-build FPGA and SW images

#### SW:

- First stage bootloader (SC-BL)
- ZephyrOS /FreeRTOS for the SDK board, including BSP
- Application samples for BM env (tests)

#### Documentation

- SCRx user manual (quick-start/integration guide)
- SCRx EAS (External architecture specification)
- SCRx ISM (Instruction set manual)
- SCRx SDK guide
- Integration verification environment guide
- Tools guide (IDE & CLI)



# Summary



- Syntacore offers high-quality RISC-V compatible CPU IP
  - Founding member, fully focused on RISC-V since 2015
  - Silicon-proven and shipping in full-wafer production
  - Turnkey IP customization services
    - with full tools/compiler support

- Local contact in Japan: Syncom Co., LTD
  - Mr. Katsuhiro Katayama <u>katayama@synkom.co.jp</u>





# info@syntacore.com

Thank you!