





Technology Research Association of Secure IoT Edge application based on RISC-V Open architecture

# TEE Hardware for RISC-V Implementation

Authors: Ckristian Duran (UEC), Trong-Thuc Hoang (UEC/AIST), Akira Tsukamoto (AIST), Kuniyasu Suzaki (TRASIO/AIST), and Cong-Kha Pham (UEC)

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

### 1. Introduction (1/3)



Open-sources framework for agile development of Chisel-based System-on-Chip

### Berkeley Architecture Research has developed and open-sourced:



Make it easy for small teams to design, integrate, simulate, and tape-out a custom SoC Berkeley Architecture Research



### **Perks:**

- Most common RISC-V cores: Rocket-chip, BOOM, Arianne (and updated frequently with the mainstream of those cores)
- FPGA accelerators included (Hwacha, Gemmini, NVDLA)
- Simulation supported (RTL: Verilator, FPGA: FireSim, VLSI: Hammer)

### 1. Introduction (2/3)

Based on Chipyard, a TEE-Hardware system is developed: <a href="https://github.com/uec-hanken/tee-hardware">https://github.com/uec-hanken/tee-hardware</a>



### 1. Introduction (3/3)





### **TEE-HW** has demos on:



Altera: TR4

Altera: DE4

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

# 2. Trusted Execution Environment (1/10)

TEE in-action (using Keystone: A TEE Framework)

Remote PC connects to FPGA via Serial (*UART*) terminal or a TCP connection



TEE (*Keystone in this case*) creates the Trusted-Side based on the chain-of-trust across multiple operating layers.

It allows client to create and operate an Enclave App in the Trusted Side.

# 2. Trusted Execution Environment (2/10)

**TEE in-action** (using Keystone: A TEE Framework)

Remote PC connects to FPGA via Serial (*UART*) terminal or a TCP connection



TEE (*Keystone in this case*) creates the Trusted-Side based on the chain-of-trust across multiple operating layers.

It allows client to create and operate an Enclave App in the Trusted Side.

### 2. Trusted Execution Environment (3/10)

### **TEE in-action** (using Keystone: A TEE Framework)

Remote PC connects to FPGA via Serial (*UART*) terminal or a TCP connection



TEE (*Keystone in this case*) creates the Trusted-Side based on the chain-of-trust across multiple operating layers.

It allows client to create and operate an Enclave App in the Trusted Side.

### 2. Trusted Execution Environment (4/10)

**TEE in-action** (using Keystone: A TEE Framework)



1. Connection with the Enclave host

### 2. Trusted Execution Environment (5/10)

**TEE in-action** (using Keystone: A TEE Framework)



- 1. Connection with the Enclave host
- 2. Verify attestation report

### 2. Trusted Execution Environment (6/10)

### **TEE in-action** (using Keystone: A TEE Framework)

### An example of attestation report:



### 2. Trusted Execution Environment (7/10)

### **TEE in-action** (using Keystone: A TEE Framework)



### 2. Trusted Execution Environment (8/10)

**TEE in-action** (using Keystone: A TEE Framework)



- 1. Connection with the Enclave host 3. Exchange communication keys
- 2. Verify attestation report

### 2. Trusted Execution Environment (9/10)

### **TEE in-action** (using Keystone: A TEE Framework)

Keystone demo: (1) client sends strings, then (2) request calculation from the Enclave, finally (3) the Enclave replies with the number of words



- 1. Connection with the Enclave host
- 2. Verify attestation report

- 3. Exchange communication keys
- 4. Client's app runs on the established TEE

### 2. Trusted Execution Environment (10/10)

### **TEE Secure Boot Sequence** (with HW Accelerators)



- The  $H_S$  value is automatically transferred between acts, thus it is not exposed to the software.
- The data in W-only memory are also not exposed to the software.

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

### 3. TEE-Hardware System (1/6)



- Not fixed at dual-core, can increase/decrease the number of cores as you wanted (as long as that fits the FPGA board)
- Some hardware modules can be easily included/excluded to/from the system

### 3. TEE-Hardware System (2/6)



- Available cores in the system are **Rocket-chip** and **BOOM**
- Because BOOMv3 isn't stable yet, so both BOOMv2 and BOOMv3 are available on the GitHub with different branches.

### 3. TEE-Hardware System (3/6)



- System Bus (SBUS), Memory Bus (MBUS), and Peripheral Bus (PBUS) hierarchy.
- Several Peripheral devices for IO (GPIO, MMC, UART, PCIe, USB), memory (DDR, SPI ROM, Mask ROM), and Crypto-cores (SHA-3, ED25519, AES, PRNG)

# 3. TEE-Hardware System (4/6)

| Variable | Available option                                                            | Description                                                                                                                                                                              |
|----------|-----------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| BOARD    | - VC707<br>- DE4<br>- TR4                                                   | Select the FPGA board                                                                                                                                                                    |
| ISACONF  | - RV64GC<br>- RV64IMAC<br>- RV32GC<br>- RV32IMAC                            | Select the ISA                                                                                                                                                                           |
| MBUS     | - MBus64<br>- MBus32                                                        | Select the bit-width for the memory bus                                                                                                                                                  |
| BOOTSRC  | - BOOTROM<br>- QSPI                                                         | Select the boot source                                                                                                                                                                   |
| PCIE     | - WPCIe<br>- WoPCIe                                                         | <ul><li>Include PCIe module in the system</li><li>Remove PCIe module from the system</li></ul>                                                                                           |
| DDRCLK   | <ul><li>WSepaDDRClk</li><li>WoSepaDDRClk</li></ul>                          | <ul><li>Separate DDR-clock with System-clock</li><li>Not separate DDR-clock with System-clock</li></ul>                                                                                  |
| HYBRID   | <ul><li>Rocket</li><li>Boom</li><li>RocketBoom</li><li>BoomRocket</li></ul> | <ul> <li>Two Rocket cores</li> <li>Two Boom cores</li> <li>Rocket core 1<sup>st</sup>, Boom core 2<sup>nd</sup></li> <li>Boom core 1<sup>st</sup>, Rocket core 2<sup>nd</sup></li> </ul> |

In the Makefile system, these variables are available.

Example usage:

BOARD=VC707
ISACONF=RV64GC
MBUS=MBus64
BOOTSRC=BOOTROM
PCIE=WoPCIe
DDRCLK=WoSepaDDRClk
HYBRID=Rocket

### 3. TEE-Hardware System (4/5)

### **TEE-HW** with various core configurations

### Boom

### /proc/cpuinfo : 0 : rv64imafdc isa : sv39 mmu uarch : ucb-bar,boom0 hart : rv64imafdc isa : sv39 mmu : ucb-bar,boom0 uarch

### RV64GC

| # cat | /pro | c/cpuinfo       |
|-------|------|-----------------|
| hart  | :    | 0               |
| isa   | :    | rv64imafdc      |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |
| hart  | :    | 1               |
| isa   | :    | rv64imafdc      |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |

### Rocket

| # cat | /pro | oc/cpuinfo      |
|-------|------|-----------------|
| hart  | :    | 0               |
| isa   | :    | rv64imafdc      |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |
| hart  | :    | 1               |
| isa   | :    | rv64imafdc      |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |

| t0 |  |
|----|--|
|    |  |
| t0 |  |

### BoomRocket

| # cat | /pro | oc/cpuinfo      |
|-------|------|-----------------|
| hart  | :    | 0               |
| isa   | :    | rv64imafdc      |
| mmu   | :    | sv39            |
| uarch | :    | ucb-bar,boom0   |
|       |      |                 |
| hart  | :    | 1               |
| isa   | :    | rv64imafdc      |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |

### RocketBoom

```
cat /proc/cpuinfo
        : 0
hart
        : rv64imafdc
        : sv39
        : sifive, rocket0
uarch
hart
isa
        : rv64imafdc
        : sv39
mmu
        : ucb-bar,boom0
uarch
```

### RV64IMAC

| # cat | /pro | c/cpuinfo       |
|-------|------|-----------------|
| hart  | :    | 0               |
| isa   | :    | rv64imac        |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |
| hart  | :    | 1               |
| isa   | :    | rv64imac        |
| mmu   | :    | sv39            |
| uarch | :    | sifive, rocket0 |

### RV32GC

| # cat | /pro | c/cpuinfo       |
|-------|------|-----------------|
| hart  | :    | 0               |
| isa   | :    | rv32imafdc      |
| mmu   | :    | sv32            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |
| hart  | :    | 1               |
| isa   | :    | rv32imafdc      |
| mmu   | :    | sv32            |
| uarch | :    | sifive, rocket0 |

### **RV32IMAC**

| # cat | /pro | oc/cpuinfo      |
|-------|------|-----------------|
| hart  | :    | 0               |
| isa   | :    | rv32imac        |
| mmu   | :    | sv32            |
| uarch | :    | sifive, rocket0 |
|       |      |                 |
| hart  | :    | 1               |
| isa   | :    | rv32imac        |
| mmu   | :    | sv32            |
| uarch | :    | sifive,rocket0  |
|       |      |                 |

### 3. TEE-Hardware System (5/5)

### Summary table of FPGA logic utilization (on VC707) with various core configurations:

|          | HYBRID |        | FPGA logic utilization |        |
|----------|--------|--------|------------------------|--------|
| ISACONF  | Core0  | Core1  | (LUT) (on VC707)       |        |
|          | Boom   | Boom   | 160,873                | 52.99% |
| RV64GC   | Rocket | Rocket | 96,571                 | 31.81% |
| KV04GC   | Boom   | Rocket | 128,708                | 42.39% |
|          | Rocket | Boom   | 128,719                | 42.40% |
| RV64GC   |        | Rocket | 96,571                 | 31.81% |
| RV64IMAC | Rocket |        | 72,007                 | 23.72% |
| RV32GC   | Rocket |        | 89,356                 | 29.43% |
| RV32IMAC |        |        | 65,899                 | 21.71% |

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

### 4. Crypto-core Accelerators (1/6)



**Crypto-cores:** 

• SHA-3 512

- Ed25519 (genkey and signature)
- AES-128/256
- PRNG (Pseudo-random generator)

# 4. Crypto-core Accelerators (2/6)

# Some feature notes

- Crypto-Core can be implemented as a custom instruction (ROCC)
- AES supports on-the-fly 128 and 256 bits, and can be changed
- Ed25519 contains:
  Ed25519-Mult for pair-key generation
  Ed25519-Sign for signature verification
- PRNG uses LFSR (*Linear-Feedback Shift Register*); and is based on ARM TrustZone RNG register model

### 4. Crypto-core Accelerators (3/6)

# **Crypto-cores on Stratix-IV FPGA**

|            | CIII A 2 | AES-128/256 | Ed25519 |       |  |
|------------|----------|-------------|---------|-------|--|
|            | SHA-3    |             | Mult    | Sign  |  |
| ALUT       | 8,108    | 3,195       | 2,737   | 3,969 |  |
| Registers  | 2,790    | 2,854       | 4,778   | 4,617 |  |
| Fmax (MHz) | 100      | 100         | 100     | 100   |  |
| Memory     | 0        | 0           | 8KB     | 0     |  |
| DSP block  | 0        | 0           | 48      | 130   |  |
| Total (%)  | 1.1      | 0.6         | 3.3     | 5.9   |  |

### 4. Crypto-core Accelerators (4/6)

# Crypto-cores in ASIC (ROHM-180nm)

|                   | CIIA 2               | AES-128/256           | Ed25519                    |                            |
|-------------------|----------------------|-----------------------|----------------------------|----------------------------|
|                   | SHA-3                |                       | Mult                       | Sign                       |
| Size              | 1,150μm ×<br>1,150μm | 808.96μm ×<br>806.4μm | 1,694.72μm ×<br>1,693.44μm | 1,346.56µm ×<br>1,345.68µm |
| Gate-count (NAND) | 102,500              | 50,560                | 222,432                    | 140,442                    |
| Fmax (MHz)        | 104                  | 90                    | 106                        | 91                         |
| Power (mW)        | 42.745               | 37.566                | 53.061                     | 80.894                     |

### 4. Crypto-core Accelerators (5/6)

The result of using crypto-core hardware accelerators (applying at boot stage)
The test was done on Stratix-IV FPGA with Rocket-chip RV64GC core



Software vs. hardware of SHA-3 execution times in the TEE framework.

Hardware is faster about 2.5 decades

### 4. Crypto-core Accelerators (6/6)

The result of using crypto-core hardware accelerators (applying at boot stage)
The test was done on Stratix-IV FPGA with Rocket-chip RV64GC core



Software vs. hardware of SHA-3 operation throughput.

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

### 5. Other Hardware Modules (1/4)



**QSPI:** to use Flash outside

### 5. Other Hardware Modules (2/4)



Flash modules (cheap, bundle, and easy to plug-in with FPGA boards)

Easy to on/off the using of QSPI

- **BOOTROM** scenario:
  - ➤ Disable QSPI
  - ➤ BootROM at 0x20000000, ZSBL in BootROM
- QSPI scenario:
  - <sub>6</sub> ➤ Enable QSPI at 0x20000000, ZSBL now in Flash
  - ➤ BootROM moved back to 0x10000, in BootROM now just a simple instruction to jump directly to 0x20000000

### 5. Other Hardware Modules (3/4)



**TileLink Sync:** synchronize between different clock domains

### 5. Other Hardware Modules (4/4)



Separate the inner system clock with outer DDR clock:

- Sometime inner system cannot run at high-speed
  - → System-clock < DDR-clock
  - → Keep the DDR bandwidth still at high-speed
- Sometime (depends on board) DDR IP is fixed at lower clock rate (for example, 100MHz) than the CPU (for example, 125MHz)
  - → System-clock > DDR-clock
  - → Keep the CPU runs at higher clock rate

### **Outline**

- 1. Introduction
- 2. Trusted Execution Environment
- 3. TEE-Hardware System
- 4. Crypto-cores Accelerators
- 5. Other Hardware Modules
- 6. Chip Results & Conclusion

# 6. Conclusion (1/4)



### **Features**

- Cores: Rocket-chip (x4)
- ISA: RV64GC (crypto-cores aren't included)
- Size:  $4,512\mu m \times 7,172\mu m$
- Fmax: 92 MHz
- Power: 391.125 mW
- Process: ROHM 180nm
- Fabricate: 10/2019

Layout

Barechip

38

# 6. Conclusion (2/4)



**Features**Core: Rocket-

- chip (**x2**)
- ISA: RV64GC

Crypto-cores:

- SHA3-512, AES-128/256, Ed25519 (both Mult and Sign)
- Other: QSPI (for Flash), USB1.1

• Size:  $4,573\mu m \times 4,578\mu m$ 

• Fmax: 98 MHz

• Power: 706.635 mW

Layout

Process: ROHM 180nm

Barechip

Fabricate: 01/2020

### 6. Conclusion (3/4)

Solving the DDR problem for the chip by:

- 1. Using the DIMM RAM in the TR4
- 2. Having the PCB (with socket-chip) mounted on the TR4





### 6. Conclusion (4/4)

- We presented a system platform for Trusted Execution Environment (TEE) featuring crypto-cores accelerators.
- Completed TEE-Hardware system was developed with various configurations to fit specific needs; such as core options, hybrid options, ISA options, etc.
- The system was implemented and tested on various FPGAs (VC707, DE4, TR4) and ASIC (ROHM-180nm).
- The execution time of the TEE with hardware accelerators dropped significantly compared to software.

### Acknowledge

• The presented work is based on results obtained from a project (JPNP16007) commissioned by the New Energy and Industrial Technology Development Organization (NEDO).







Technology Research Association of Secure IoT Edge application based on RISC-V Open architecture

# THANK YOU FOR YOUR LISTENING