



SCRx family of the RISC-V compatible processor IP: compact MCU to octa-core SMP Linux

Alexander Redkin RISC-V Day Tokyo November 2020

## Outline



- Company intro
- RISC-V compatible IP
- Customization services







### Semiconductor IP company, founding member of RISC-V foundation

### Develops and licenses state-of-the-art RISC-V cores

- Immediately available, silicon-proven and shipping to volume
- 5+ years of *focused* RISC-V development
- Core team comes from 10+ years of highly-relevant background
- SDKs, samples in silicon, full collateral

### Full service to specialize CPU IP for customer needs

- One-stop workload-specific customization for 10x improvements
  - with tools/compiler support
- IP hardening at the required library node
- SoC integration and SW migration support



## Company background

Est 2015, 60+ EEs

HQ at Cyprus (EU)

- R&D offices in St.Petersburg and Moscow (Russia)
- Representatives in APAC, EMEA, US Japan: Syncom Co.,LTD

#### Team background:

- 10+ years in the corporate R&D (major semi MNC)
- Developed cores and SoC are in the mass productions

#### Expertise:

- high-performance and low-power embedded cores and IP
- ASIP technologies and reconfigurable architectures
- Architectural exploration & workload characterization
- Compiler technologies











### SCRx baseline cores



**RV64IMCFDA** 





## State-of-the art RISC-V CPU IP



| Features            |                     | ٨٨٥          | RTOS/ Bare Metal          |                               | Linux/ "Full" OS              |                              |                                       |
|---------------------|---------------------|--------------|---------------------------|-------------------------------|-------------------------------|------------------------------|---------------------------------------|
|                     |                     |              | SCR1* EFREE!              | SCR3                          | SCR4                          | SCR5                         | SCR7                                  |
| Width 32bit 64bit   |                     | •            | •                         | •                             | •                             |                              |                                       |
|                     |                     |              | •                         | •                             | •                             | •                            |                                       |
| ISA                 |                     |              | RV32I E[MC]               | RV[32 <mark> 64</mark> ] MC[A | RV[32 <mark>64</mark> IMCFAD] | RV[32 64]IMC[AFD]            | RV64IMCAFD                            |
| Pipeline type       |                     |              | In-order                  | In-order                      | In-order                      | In-order                     | Superscalar                           |
| Pipeline, stages    |                     |              | 2-4                       | 3-5                           | 3-5                           | 7-9                          | 10-12                                 |
| Branch prediction   | on                  |              |                           | Static BP, RAS                | Static BP, RAS                | Static BP, BTB,<br>BHT, RAS  | Dynamic BP, BTB,<br>BHT, RAS          |
| Execution prior     | ity levels          |              | Machine                   | User, Machine                 | User, Machine                 | User, Supervisor,<br>Machine | User, Supervisor,<br>Machine          |
| Extensibility/cu    | stomization         |              | •                         | •                             | •                             | •                            | •                                     |
| Execution           | MUL/DIV             | area-opt     | •                         | 0                             | 0                             |                              |                                       |
| units               |                     | hi-perf      | 0                         | •                             | •                             | •                            | •                                     |
|                     | FPU                 |              |                           |                               | •                             | •                            | •                                     |
|                     | TCM [w/ECC parity]  |              | 0                         | 0                             | 0                             | 0                            | 0                                     |
| Maman               | L1\$ [w/ECC parity] |              |                           | 0                             | 0                             | •                            | O O O O O O O O O O O O O O O O O O O |
| Memory<br>subsystem | L2\$ [w/ECC]        |              |                           |                               |                               | 0                            | 0                                     |
| Subsystem           | MPU                 |              |                           | •                             | •                             | •                            | •                                     |
|                     | MMU, virtual memory |              |                           |                               |                               | •                            | • 5                                   |
|                     | Integrated          | JTAG debug   | •                         | •                             | •                             | •                            | •                                     |
| Debug               | HV                  | √ BP         | 1-2                       | 1-8 adv ctrl                  | 1-8 adv ctrl                  | 1-8 adv ctrl                 | 1-8 adv ctrl                          |
|                     | Performar           | nce counters | 0                         | 0                             | 0                             | 0                            | 0                                     |
| Interrupt           | IF                  | RQs          | 8-32                      | 8-1024                        | 8-1024                        | 8-1024                       | 8-1024                                |
| Controller          | Fea                 | atures       | basic                     | advanced                      | advanced                      | advanced+                    | advanced+ up to 8-16 cores            |
| SMP support         |                     |              | up to 4 cores with cohere |                               | ores with coherency           |                              |                                       |
|                     | A                   | НВ           | •                         | 0                             | 0                             | 0                            | 7                                     |
| I/F options         | Α                   | XI4          | 0                         | •                             | •                             | •                            | •                                     |
|                     | A                   | ACE .        |                           |                               |                               |                              | 0                                     |

#### Baseline cores:

- Clean-slate designs in System Verilog
- Configurable and extensible
- 100% compatible with major EDA flows



### SCR1 overview



Industry-grade compact MCU core for deeply embedded applications and accelerator control

- RV32I|E[MC] ISA
- 2 to 4 stages pipeline
- M-mode only
- Optional configurable IPIC
- Optional integrated Debug Controller
- Choices of the optional MUL/DIV unit
- Open sourced under SHL (Apache 2.0 derivative) since 2017
  - Unrestricted commercial use allowed
- High quality, silicon-proven <u>free</u> MCU IP
- In the top System Verilog Github repos in the world
  - https://github.com/syntacore/scr1
- Full collateral TB & verification suite, SDK, specs, SW...
- Best-effort support provided, commercial offered





### SCR1 overview cont



| Performance*,<br>per MHz | DMIPS    | -02     | 1.28 |  |
|--------------------------|----------|---------|------|--|
|                          | DIVIIF3  | -best** | 1.89 |  |
| per wir iz               | Coremark | -best** | 2.95 |  |

<sup>\*</sup> Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM from TCM

#### Synthesis data:

Minimal RV32EC config: 11 kGates

Default RV32IMC config: 32 kGates

Range 10..40+ kGates

250+ MHz @ tsmc90lp {typical, 1.0V, +25C}

#### What's new:

- Extensive user guide and quick start collateral
  - works out-of-the-box in all major sims
- Verilator support
- More tests/sample: RISC-V compliance, others
- Taped-out @several companies
- Regular talk at ORCONF
- Updated and maintained





<sup>\*\* -</sup>O3 -funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto

### SCR<sub>1</sub> SDK



#### https://github.com/syntacore/scr1-sdk

#### Repository content:

- docs SDK documentation
- fpga SCR1 SDK FPGA projects
- images precompiled binary files
- scr1 SCR1 core source files
- sw sample SW projects

#### Supported platforms:

- Digilent Arty and Nexys 4 (Xilinx)
- Terasic DE10-Lite and Arria V GX starter (Intel)









**Zephyr**"





#### Software:

- Bootloader
- Zephyr OS
- Tests/sample apps
- Pre-built GCC-based toolchain (Win/Linux)

Fully open SDK designs + pre-build images

One of the easiest paths to start with **RISC-V** 



## SCR3: 32 or 64 bit

### High-performance multicore capable MCU-class core

- RV32I[MCA] or RV64I[MCA] ISA
- Machine and User privilege modes
- Optional MPU (Memory Protection Unit)
- Optional Tightly Coupled Memory (TCM), L1 caches ECC/parity
- 32|64bit AHB or AXI4 external interface
- Optional high-performance or area-optimized MUL/DIV unit
- Integrated IRQ controller and PLIC
- Advanced debug with JTAG i/f
- Multicore configs up to 4 SCRx cores
  - SMP and heterogeneous
  - with memory coherency



Controller

IPIC

**CSRF** 

SCR3 core top cluster

Debug Controller

SCR3 core

Integer

High-perf

MUL/DIV

|       | 416<br>TO<br>DAXI4(AHB/OCP)<br>bridge | 224 KB |             | (AHB/OCP) dge |   |
|-------|---------------------------------------|--------|-------------|---------------|---|
|       |                                       | RV     | <b>′</b> 32 | RV6           | 4 |
| IIPS  | -O2                                   | 1.8    | 86          | 1.97          |   |
| III 3 | -best**                               | 2.9    | 937         | 3.27          |   |
| mark  | -best**                               | 3.     | 30          | 3.40          |   |

MPU

DM

Core

Performance\*,

per MHz



Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM from TCM

<sup>\*\* -</sup>O3 -funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto

## SCR4: 32 or 64 bit



### High-performance multicore capable MCU core with FPU

- RV32IMCF[DA] or RV64IMCF[DA] ISA
- U- and M-mode
- Configurable advanced BP, fast MUL/DIV
- Integrated IRQ controller and PLIC
- 32|64bit bit AHB or AXI4 external interface
- Optional MPU, TCM, L1 caches w/ECC
- Advanced debug controller with JTAG
- Configurable SP or DP FPU
  - IEEE 754-2008 compliant
- Multicore configs up to 4 SCRx cores
  - SMP and heterogeneous
  - with memory coherency



|               | RV32         | RV64    |      |      |
|---------------|--------------|---------|------|------|
|               | DMIPS        | -02     | 1.86 | 1.97 |
| Performance*, | DIVIIF3      | -best** | 2.96 | 3.27 |
| per MHz       | Coremark     | -best** | 3.30 | 3.40 |
|               | DP Whetstone | -best** | 1.22 | 1.22 |

<sup>\*</sup> Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM from TCM



<sup>\*\* -</sup>O3 -funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto

## SCR5: 32 or 64 bit



### Efficient entry-level APU/embedded core

- RV32IMC[AFD] or RV64IMC[AFD] ISA
- Multicore configs up to 4 SCRx cores
  - SMP and heterogeneous
- Advanced BP (BTB/BHT/RAS)
- IRQ controller (integrated and PLIC)
- M-, S- and U-modes
- Virtual memory support, full MMU
- L1, L2 caches with coherency, atomics, ECC
- High performance double-precision FPU
- Linux and FreeBSD support
- 1GHz+@28nm
- Advanced debug with JTAG i/f



SCR5 core top SMP cluster

SCR5 core 0

|               | RV32     | RV64    |      |      |
|---------------|----------|---------|------|------|
| Performance*, |          | -O2     | 1,60 | ,    |
|               | DIVIIFS  | -best** | 2,48 | 2.62 |
| per MHz       | Coremark | -best** | 2,83 | 3.02 |

<sup>\*</sup> Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM from TCM



<sup>\*\*</sup> O3-funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto



## RV64 SCR7

### Efficient mid-range application core

- RV64GC ISA
- SMP up to 8, later 16 cores
- Flexible uarch template, 10-12 stage pipeline
- Initial SCR7 configuration:
  - Decode and dispatch up to two instructions per cycle
  - Out-of-order issue of up to four micro-ops
  - Out-of-order completion, in-order retirement
- M-, S- and U-modes
- Virtual memory support, full MMU, Linux
- 16-64KB L1, up to 2MB L2 cache with ECC
- 1.5 GHz+ @28nm
- Advanced debug with JTAG i/f



2-way SCR7 implementation

4-way SCR7 derivative



App-specific mix of Integer, FPU and LSU pipelines

| Perforr | nance*, |
|---------|---------|
| per     | MHz     |

| DMIPS    | -O2     | 3.25 |  |  |  |
|----------|---------|------|--|--|--|
| DIVIIFS  | -best** | 3.80 |  |  |  |
| Coremark | -best** | 5.12 |  |  |  |

<sup>\*</sup> Preliminary data, 2-way implementation, Dhrystone 2.1, Coremark 1.0, GCC 8.1 BM



<sup>\*\*</sup> O3-funroll-loops -fpeel-loops -fgcse-sm -fgcse-las -flto

## Fully featured SW development suite



### Stable IDE in production:

- GCC 10.2
- GNU Binutils 2.31.0
- Newlib 3.0
- GNU GDB 8.0.50
- Open On-Chip Debugger 0.10.0
- Eclipse 4.9.0

Hosts: Linux, Windows

Targets: BM, Linux (beta)

#### Also available:

- LLVM 5.0
- CompCert 3.1
- 3<sup>rd</sup> party vendors

#### Simulators:

- Qemu
- Spike
- 3<sup>rd</sup> party vendors



#### JTAG-based debug solutions:

Supports: Segger J-link, Olimex ARM-USB-OCD family, Digilink JTAG-HS2, more vendors soon















## Wide support by 3rd party tools and SW vendors







Lauterbach Trace32



https://www.lauterbach.com/frames.html?pro/pro\_\_syntacore.html



Segger Embedded Studio

https://wiki.segger.com/Syntacore\_SCR1\_SDK\_Arty





IAR Embedded Workbench

https://www.iar.com/iar-embedded-workbench/#!?architecture=RISC-V



...more in 2020



## Fully integrated FPGA-based SDKs



### Stable Eclipse/gcc based toolchain with IDE:

- GCC 10.2
- GNU Binutils 2.31.0
- Newlib 3.0
- GNU GDB 8.0.50
- Open On-Chip Debugger o.1o.o
- Eclipse 4.9.0

#### HW platform based on standard FPGA dev.kits

- Multiple boards supported (Altera, Xilinx)
- Low-cost 3<sup>rd</sup> party JTAG tools
- Open design for easy start

#### SW:

- Bootloader
- OS: Zephyr/FreeRTOS/Linux
- Application samples, tests, benchmarks









COM4:115200baud - Tera Term VT
Elle Edit Setup Control Window Help





https://www.altera.com/products/boar ds\_and\_kits/dev-kits/altera/kit-arriav-starter.html



## Extensibility/customization: how it works











### Extensibility features:

- Computational capabilities
   New functions using existing HW
   New Functional Units
- Extended storage
   Mems/RF, addressable or state
   Custom AGU
- I/O ports
- Specialized system behavior
   Standard events processing
   Custom events

### Domain examples:

- Computationally intensive algorithms acceleration
- Specialized processors (including DSP)
- High-throughput applications
  - CV/ML/AI
  - Wireless/Comms
  - Network/DPI/real-time processing







Custom ISA extension for AES & other crypto kernels acceleration for SCR5

- Data
  - RV32G FPGA-based devkit, g++ 5.2.0, Linux 4.6, optimized C++ implementation
  - Rv32G + custom same + intrinsics
  - Core i7 6800K @ 3.4GHz, g++ 5.4.0, Linux 64, optimized C++ implementation
- 60..575x speedup @ modest area increase: 11.7% core, 3.7% at the CPU cluster level



|                    |           | Encoding throughput, MB/s |          |          | Normalized per MHz, MB/s |          |         | RV32G + custom |        |        |
|--------------------|-----------|---------------------------|----------|----------|--------------------------|----------|---------|----------------|--------|--------|
| Platform           | Fmax, MHz | Crypto-1                  | Crypto-2 | AES-128  | Crypto-1                 | Crypto-2 | AES-128 | •              |        | eed-up |
| RV <sub>32</sub> G | 20        | 0.025                     | 0.129    | 0.238    | 0.00125                  | 0.00645  | 0.0119  | 575.00         | 117.74 | 60.93  |
| RV32G + custom     | 20        | 14.375                    | 15.188   | 14.502   | 0.71875                  | 0.7594   | 0.7251  |                |        |        |
| Core i7            | 3400      | 79.115                    | 235.343  | 335.212  | 0.02327                  | 0.06922  | 0.09859 | 30.89          | 10.97  | 7-35   |
| Core i7 + NI       | 3400      |                           |          | 3874.552 |                          |          | 1.13957 |                |        | 0.64   |

Disclaimer: Authors are aware AES allows for more efficient dedicated accelerators designs, used as example algorithm



## Getting access/evaluation



#### SCR<sub>1</sub>

- Is fully open: <a href="https://github.com/syntacore/scr1">https://github.com/syntacore/scr1-sdk</a>
- SHL-licensed with unrestricted commercial use allowed
  - Commercial SLA-based support is available

### SCR 3 4 5 7

Full package\* access is available after simple evaluation agreement

For more info: <a href="mailto:evaluation@syntacore.com">evaluation@syntacore.com</a>

(\*) sufficient for evaluation and tapeout



## Summary



- Syntacore offers high-quality RISC-V compatible CPU IP
  - Founding member, fully focused on RISC-V since 2015
  - Silicon-proven and shipping in full-wafer production
  - Turnkey IP customization services
    - with full tools/compiler support

- Local contact in Japan: Syncom Co.,LTD
  - Mr. Katsuhiro Katayama <u>katayama@synkom.co.jp</u>





# info@syntacore.com

Thank you!