

## An FPGA-based Implementation of Quantum Computer Simulator Qulacs



Kaijie Wei\* Hildeharu Amano\*

ano\* Takefumi Miyoshi 🧖

Dyohoi Niwaga

Yoshiki Yamaguchi 🔺

Ryohei Niwase 🔺

\* Graduate School of Science and Technology, Keio University

🖉 WasaLabo, LLC.

▲ Graduate School of Science and Technology, University of Tsukuba





## 1. Quantum Simulator using State Vector



0

IBM quantum Composer

1

+c2

- For a qubit (quantum bit) state: a coherent superposition of the basis states ٠
  - **Linear combination** of  $|0\rangle$  and  $|1\rangle$ :  $|\psi\rangle = \alpha |0\rangle + \beta |1\rangle$ •
    - $|0\rangle = (0, 1)$   $|1\rangle = (1, 0)$   $|\alpha|^2 + |\beta|^2 = 1$
    - $|\alpha|^2$  vs.  $|\beta|^2 \rightarrow$  The probability for  $|0\rangle$  and  $|1\rangle$ ٠

$$|\psi\rangle = \alpha \begin{bmatrix} 1\\ 0 \end{bmatrix} + \beta \begin{bmatrix} 0\\ 1 \end{bmatrix}$$

- For n qubit
  - $|\psi\rangle = a_{0...00}|0...00\rangle + a_{0...01}|0...01\rangle + ... + a_{1...11}|1...11\rangle$
  - 2<sup>n</sup> states: Complex vectors representation
  - n qubits VS. Double precision Complex (128 bits)  $\rightarrow$  Memory capacity: 2<sup>n+4</sup> Bytes

$$|\psi\rangle = a_{0...00} \begin{bmatrix} 1\\0\\\vdots\\0 \end{bmatrix} + a_{0...01} \begin{bmatrix} 0\\1\\\vdots\\0 \end{bmatrix} + \dots + a_{1...11} \begin{bmatrix} 0\\0\\\vdots\\1 \end{bmatrix}$$

1

## 2. Quantum Simulator Qulacs VS. FPGA Implementation



[1] Suzuki, Y. et al. (2021) 'Qulacs: a fast and versatile quantum circuit simulator for research purpose', Quantum, 5, p. 559. doi:10.22331/q-2021-10-06-559.

## **3. Qulacs Implementation on FPGA**

#### • 2-stage Implementation

.



- FPGA Trefoil Design: FPGA with extensible memory capacities
  - Response to enormous memory resource requirements
  - Multiple SATA-ports connection (Possible for 32 SATAs)
  - Pipelined data transfer (Higher throughput with less latency)

• HLS Design for Quantum Gates

| Quantum Gates1           | Meaning                                                                   | Matrix                                                                                           |
|--------------------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| H<br>(Hadamard)          | Convert the qubit from<br>clustering state to uniform<br>superposed state | $\frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1\\ 1 & -1 \end{bmatrix}$                                |
| S                        | Rotates qubits 90 around the Z axis, counterclock                         | $\begin{bmatrix} 1 & 0 \\ 0 & e^{-\frac{\pi}{2}} \end{bmatrix}$                                  |
| CNOT<br>(Controlled Not) | Entangle & disentangle Bell<br>states                                     | $\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix}$ |
| Matrix                   | Arbitrary 2 quantum gates multiplication                                  |                                                                                                  |

**FPGA** platform LiteX [2] Design 10 Open source IPs 4 Softcores [2018] LiteDRAM LM32 PicoRV32 LiteEth ----- 7 more ----- LiteVideo DDR4 Gigabit Ether VexRiscV Mor1kx 🔿 Display Port 4GB **PS** Part ĽΤ provides SATA microSD card QSPI flash I2C devices supports Cortex A53 Quad Core+ ľТ ⇒ CAN x2 Cortex R5 Quad Core 123 Hardware Migen based LiteX ➡ USB 3.0 FHDL modules AXI BUS x 2 USB Dual UART supports provides JTAG DDR4 Build Utils SoC Boards Arty7 TinyFPGA BX 4GB - altera · cores Nexys4 KCU105 etc... PL Part server - platforms FMC x 2 - xilinx intercon Firefly cables 1143 K Logic cell KC705 Versa ECP5 - term lattice integration - targets GTH x 8 ┥ 70.6Mb Memory Pmod x 2 \_ sim - microsemi software GTY x 4 1968 DSPs Utilities 21 Boards [2018] yosys LED, DipSW, pushSW

[2] Kermarrec, Florent, et al. "LiteX: an open-source SoC builder and library based on Migen Python DSL." arXiv preprint arXiv:2005.02506 (2020).

## 4. Qulacs Optimization using HLS

#### Hadamard Gate Design





Execution Time for H Gate (28 qubits)

•



Resource utilization for H Gate (28 qubits)



4

# Welcome to Poster Session for more details

Thank you for listening