

# An FPGA-based Implementation of Quantum Computer Simulator Qulacs

Takefumi MIYOSHI<sup>2</sup>Yoshiki YAMAGUCHI<sup>3</sup> Kaijie WEI<sup>1</sup> Hideharu AMANO<sup>1</sup> **Ryohei NIWASE<sup>3</sup>** 

<sup>1</sup>Graduate School of Science and Technology, Keio University <sup>2</sup>WasaLabo, LLC. <sup>3</sup>Graduate School of Science and Technology, University of Tsukuba

#### Introduction

Quantum computing has achieved significant developments, nevertheless, most quantum hardware is only accessible to the public through the cloud environment or supercomputers. Among all challenges in the current emulator development, memory bomb comes to be the most severe problem for a practical quantum emulator. In this research, we propose to use Trefoil FPGA with an extensive



## **Qulacs Optimization using HLS**

Taking H gate as an example:

- $\square$  n-qubit quantum circuit with  $2^n$  states saved in vector state []
- Target gate: t
  - $index_0 = b_{n-1}b_{n-2}\dots 0_t b_{t-1}b_{t-2}\dots b_0$
  - $index_1 = b_{n-1}b_{n-2} \dots 1_t b_{t-1}b_{t-2} \dots b_0$
- $x_0 = state[index_0]; x_1 = state[index_0 + 1]$  $y_0 = state[index_1]; y_1 = state[index_1 + 1]$ state[index\_0] =  $\frac{1}{\sqrt{2}}(x_0 + x_1)$ ; state[index\_1] =  $\frac{1}{\sqrt{2}}(y_0 + y_1)$  $state[index_0 + 1] = \frac{1}{\sqrt{2}}(x_0 - x_1); state[index_1 + 1] = \frac{1}{\sqrt{2}}(y_0 - y_1)$ Working on all the state vectors  $b_{n-1} \dots b_0$

storage system to overcome resource limitations as shown in Figure 1. We summarize our work as follows.

- A high-speed quantum simulator, Qulacs [1], on the M-KUBOS FPGA cluster ■ HLS-based quantum gate implementations H gate, S gate, CNOT gate, and a dense matrix computation
- Performance improvement to a similar level of the Ryzen sever
- Stable performance of increasing qubits with board extension

## **Quantum Gates**

A quantum state expression:

$$\psi\rangle = a_0|0\rangle + a_1|1\rangle \quad (|0\rangle = \begin{bmatrix} 1\\0 \end{bmatrix}, |1\rangle = \begin{bmatrix} 0\\1 \end{bmatrix})$$
 (1)

n-qubit system:

Streaming processing depending on target gate (t)'s location Data stream optimization using buffering:



Figure 3: Hadamard gate optimization using buffering

## **Evaluation**

#### Execution time of naive implementation:

$$|\psi\rangle = a_{0...00}|\ldots 00\rangle + a_{0...01}|0\ldots 01\rangle + \dots + a_{1...11}|1\ldots 11\rangle$$

$$|\psi\rangle = a_{0...00} \begin{bmatrix} 1\\0\\ \vdots\\0 \end{bmatrix} + a_{0...01} \begin{bmatrix} 0\\1\\ \vdots\\0 \end{bmatrix} + \dots + a_{1...11} \begin{bmatrix} 0\\0\\ \vdots\\1 \end{bmatrix}$$
(2)

| Versatile Quantum Gate Implementation |                                                                     |                                                                                                  |
|---------------------------------------|---------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|
| Quantum Gate                          | Meaning                                                             | Matrix                                                                                           |
| H (Hadamard)                          | Convert the qubit from clustering state to uniform superposed state | $\frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}$                               |
| S                                     | Rotates qubits 90° around the Z<br>axis, counterclockwise           | $\begin{bmatrix} 1 & 0 \\ 0 & \exp{-\frac{i\pi}{2}} \end{bmatrix}$                               |
| CNOT (Controlled<br>NOT)              | Entangle & disentangle Bell states                                  | $\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 1 \end{bmatrix}$ |
| Matrix                                | Arbitrary 2 quantum gates<br>multiplication                         |                                                                                                  |

**Target Platform** 



Figure 4: Naive implementation concerning target qubit and execution time

Time evaluation of H gate after optimization





**PL Part** 1143 K Logic cell 70.6Mb Memory

**1968 DSPs** 

| tem         | Specification                                                                    |  |
|-------------|----------------------------------------------------------------------------------|--|
| Form Factor | $244 \text{mm} \times 244 \text{mm} \text{(microATX)}$                           |  |
| FPGA        | XCZU19EG-2FFVC1760                                                               |  |
| Memory      | PS: 4GB DDR4-2400                                                                |  |
|             | PL: 1x 4GB DDR4-2400 SODIMM Socke                                                |  |
| I/O         | $4 \times \text{GTY } 4\text{TX} / 4\text{RX} \text{ (max } 28. 125\text{Gbps)}$ |  |
|             | $4 \times$ GTH 8TX (max 16. 3Gbps)                                               |  |
|             | $4 \times$ GTH 8RX (max 16. 3Gbps)                                               |  |
|             | USB3.0 $\times$ 1                                                                |  |
|             | USB-UART $\times 1$                                                              |  |
|             | 1 Gb Ether(RJ45)                                                                 |  |
|             | DisplayPort 1.2                                                                  |  |
|             |                                                                                  |  |

Figure 2: M-KUBOS [2] architecture

#### Figure 5: Hadamard gate evaluation after optimization

#### References

Y. Suzuki, Y. Kawase, Y. Masumura, Y. Hiraga, M. Nakadai, J. Chen, K. M. Nakanishi, K. Mitarai, R. Imai, S. Tamiya, T. Yamamoto, T. Yan, T. Kawakubo, Y. O. Nakagawa, Y. Ibe, Y. Zhang, H. Yamashita, H. Yoshimura, A. Hayashi, and K. Fujii, "Qulacs: a fast and versatile quantum circuit simulator for research purpose," Quantum, vol. 5, p. 559, oct 2021. [Online]. Available: https://doi.org/10.22331%2Fq-2021-10-06-559

K. Iizuka, H. Takagi, A. Kamei, K. Hironaka, and H. Amano, "Power analysis of directly-connected FPGA clusters," in 2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS), 2022, pp. 1–6.