{
"cells": [
{
"cell_type": "markdown",
"source": [
"# 量子神经网络的贫瘠高原效应\n",
"\n",
"\n",
" Copyright (c) 2021 Institute for Quantum Computing, Baidu Inc. All Rights Reserved. "
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## 概览\n",
"\n",
"在经典神经网络的训练中,基于梯度的优化方法不仅仅会遇到局部最小值的问题,同时还要面对鞍点等附近梯度近似于零的几何结构。相对应的,在量子神经网络中也存在着一种**贫瘠高原效应**(barren plateaus)。这个奇特的现象首先是由 McClean et al. 在 2018 年发现 [[1]](https://arxiv.org/abs/1803.11173)。简单来说,就是当你选取的随机电路结构满足一定复杂程度时优化曲面(optimization landscape)会变得很平坦,从而导致基于梯度下降的优化方法很难找到全局最小值。对于大多数的变分量子算法(VQE等),这个现象意味着当量子比特数量越来越多时,选取随机结构的电路有可能效果并不好。这会让你设计的损失函数所对应的优化曲面变成一个巨大的高原,让从而导致量子神经网络的训练愈加困难。你随机找到的初始值很难逃离这个高原,梯度下降收敛速度会很缓慢。\n",
"\n",
"![BP-fig-barren](./figures/BP-fig-barren-cn.png)\n",
"\n",
"图片由 [Gradient Descent Viz](https://github.com/lilipads/gradient_descent_viz) 生成\n",
"\n",
"\n",
"基于梯度变化对这类变分量子算法训练的影响,我们在量桨(Paddle Quantum)平台提供了梯度分析工具模块,辅助用户对 QML 模型进行诊断,便于贫瘠高原等问题的研究。\n",
"\n",
"本教程主要讨论如何在量桨(Paddle Quantum)平台上展示贫瘠高原现象,以及如何使用梯度分析工具对用户自定义量子神经网络中的参数梯度进行分析。其中并不涉及任何算法创新,但能提升读者对于量子神经网络训练中梯度问题的认识。\n",
"\n",
"首先我们先引入必要的 library 和 package:\n"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 55,
"source": [
"# 需要用的包\n",
"import time\n",
"import numpy as np\n",
"import random\n",
"import paddle\n",
"from paddle import matmul\n",
"from paddle_quantum.circuit import UAnsatz\n",
"from paddle_quantum.utils import dagger\n",
"from paddle_quantum.state import density_op\n",
"# 画图工具\n",
"from matplotlib import pyplot as plt \n",
"# 忽略 waring 输出\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
],
"outputs": [],
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-02T12:20:39.463025Z",
"start_time": "2021-03-02T12:20:36.336398Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"## 随机的网络结构\n",
"\n",
"这里我们按照原作者 McClean (2018) [[1]](https://arxiv.org/abs/1803.11173) 论文中提及的类似方法搭建如下随机电路:\n",
"\n",
"![BP-fig-Barren_circuit](./figures/BP-fig-Barren_circuit.png)\n",
"\n",
"首先作用在所有量子比特上绕布洛赫球的 Y-轴旋转 $\\pi/4$。\n",
"\n",
"其余的结构加起来构成一个模块(Block), 每个模块共分为两层:\n",
"\n",
"- 第一层搭建随机的旋转门, 其中 $R_{\\ell,n} \\in \\{R_x, R_y, R_z\\}$。下标 $\\ell$ 表示处于第 $\\ell$ 个重复的模块, 上图中 $\\ell =1$。第二个下标 $n$ 表示作用在第几个量子比特上。\n",
"- 第二层由 CZ 门组成,作用在每两个相邻的量子比特上。\n",
"\n",
"在量桨中, 我们可以这么搭建。"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 56,
"source": [
"def rand_circuit(theta, target, num_qubits):\n",
" # 我们需要将 Numpy array 转换成 Paddle 中的 Tensor\n",
" const = paddle.to_tensor(np.array([np.pi/4]))\n",
" \n",
" # 初始化量子电路\n",
" cir = UAnsatz(num_qubits)\n",
" \n",
" # ============== 第一层 ==============\n",
" # 固定角度的 Ry 旋转门\n",
" for i in range(num_qubits):\n",
" cir.ry(const, i)\n",
"\n",
" # ============== 第二层 ==============\n",
" # target是一个随机的数组,用来帮助我们抽取随机的单比特门 \n",
" for i in range(num_qubits):\n",
" if target[i] == 0:\n",
" cir.rz(theta[i], i)\n",
" elif target[i] == 1:\n",
" cir.ry(theta[i], i)\n",
" else:\n",
" cir.rx(theta[i], i)\n",
" \n",
" # ============== 第三层 ==============\n",
" # 搭建两两相邻的 CZ 门\n",
" for i in range(num_qubits - 1):\n",
" cir.cz([i, i + 1])\n",
" \n",
" return cir"
],
"outputs": [],
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-02T12:20:39.972053Z",
"start_time": "2021-03-02T12:20:39.962259Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"## 损失函数与优化曲面 \n",
"\n",
"当我们确定了电路的结构之后,我们还需要定义一个损失函数(loss function)来确定优化曲面。按照原作者 McClean (2018) [[1]](https://arxiv.org/abs/1803.11173) 论文中提及的,我们采用 VQE算法中常用的损失函数:\n",
"\n",
"$$\n",
"\\mathcal{L}(\\boldsymbol{\\theta})= \\langle0| U^{\\dagger}(\\boldsymbol{\\theta})H U(\\boldsymbol{\\theta}) |0\\rangle,\n",
"\\tag{1}\n",
"$$\n",
"\n",
"其中的酉矩阵 $U(\\boldsymbol{\\theta})$ 就是我们上一部分搭建的带有随机结构的量子神经网络。对于其中的哈密顿量 $H$ 我们不妨先取最简单的形式 $H = |00\\cdots 0\\rangle\\langle00\\cdots 0|$。设定好这些后,我们就可以从最简单的两个量子比特的情形开始采样了 -- 生成300组随机网络结构和不同的随机初始参数 $\\{\\theta_{\\ell,n}^{(i)}\\}_{i=1}^{300}$。每次计算梯度都是按照 VQE 的解析梯度公式计算关于 **第一个参数 $\\theta_{1,1}$** 的偏导数。然后统计得到的这300个梯度的平均值和方差。其中解析梯度的公式为:\n",
"\n",
"$$\n",
"\\partial \\theta_{j} \n",
"\\equiv \\frac{\\partial \\mathcal{L}}{\\partial \\theta_j}\n",
"= \\frac{1}{2} \\big[\\mathcal{L}(\\theta_j + \\frac{\\pi}{2}) - \\mathcal{L}(\\theta_j - \\frac{\\pi}{2})\\big].\n",
"\\tag{2}\n",
"$$\n",
"\n",
"具体推导请参见:[arXiv:1803.00745](https://arxiv.org/abs/1803.00745)"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 26,
"source": [
"# 超参数设置\n",
"np.random.seed(42) # 固定 Numpy 的随机种子\n",
"N = 2 # 设置量子比特数量 \n",
"samples = 300 # 设定采样随机网络结构的数量\n",
"THETA_SIZE = N # 设置参数 theta 的大小\n",
"ITR = 1 # 设置迭代次数\n",
"LR = 0.2 # 设定学习速率\n",
"SEED = 1 # 固定优化器中随机初始化的种子\n",
"\n",
"# 初始化梯度数值的寄存器\n",
"grad_info = []\n",
"\n",
"paddle.seed(SEED)\n",
"\n",
"class manual_gradient(paddle.nn.Layer):\n",
" \n",
" # 初始化一个可学习参数列表,并用 [0, 2*pi] 的均匀分布来填充初始值\n",
" def __init__(self, shape, param_attr=paddle.nn.initializer.Uniform(low=0.0, high=2 * np.pi),dtype='float64'):\n",
" super(manual_gradient, self).__init__()\n",
" \n",
" # 将 Numpy array 转换成 Paddle 中的 Tensor\n",
" self.H = paddle.to_tensor(density_op(N))\n",
" \n",
" # 定义损失函数和前向传播机制 \n",
" def forward(self):\n",
" \n",
" # 初始化三个 theta 参数列表\n",
" theta_np = np.random.uniform(low=0., high= 2 * np.pi, size=(THETA_SIZE))\n",
" theta_plus_np = np.copy(theta_np) \n",
" theta_minus_np = np.copy(theta_np) \n",
" \n",
" # 修改用以计算解析梯度\n",
" theta_plus_np[0] += np.pi/2\n",
" theta_minus_np[0] -= np.pi/2\n",
" \n",
" # 将 Numpy array 转换成 Paddle 中的 Tensor\n",
" theta = paddle.to_tensor(theta_np)\n",
" theta_plus = paddle.to_tensor(theta_plus_np)\n",
" theta_minus = paddle.to_tensor(theta_minus_np)\n",
" \n",
" # 生成随机目标,在 rand_circuit 中随机选取电路门\n",
" target = np.random.choice(3, N) \n",
" \n",
" U = rand_circuit(theta, target, N).U\n",
" U_dagger = dagger(U) \n",
" U_plus = rand_circuit(theta_plus, target, N).U\n",
" U_plus_dagger = dagger(U_plus) \n",
" U_minus = rand_circuit(theta_minus, target, N).U\n",
" U_minus_dagger = dagger(U_minus) \n",
"\n",
" # 计算解析梯度\n",
" grad = (paddle.real(matmul(matmul(U_plus_dagger, self.H), U_plus))[0][0] \n",
" - paddle.real(matmul(matmul(U_minus_dagger, self.H), U_minus))[0][0])/2 \n",
" \n",
" return grad\n",
"\n",
"# 定义主程序段\n",
"def main():\n",
" \n",
" # 设置QNN的维度\n",
" sampling = manual_gradient(shape=[THETA_SIZE])\n",
"\n",
" # 采样获得梯度信息\n",
" grad = sampling() \n",
" \n",
" return grad.numpy()\n",
"\n",
"\n",
"# 记录运行时间\n",
"time_start = time.time()\n",
"\n",
"# 开始采样\n",
"for i in range(samples):\n",
" if __name__ == '__main__':\n",
" grad = main()\n",
" grad_info.append(grad)\n",
"\n",
"time_span = time.time() - time_start \n",
"print('主程序段总共运行了', time_span, '秒')\n",
"print(\"采样\", samples, \"个随机网络关于第一个参数梯度的均值是:\", np.mean(grad_info))\n",
"print(\"采样\", samples, \"个随机网络关于第一个参数梯度的方差是:\", np.var(grad_info))"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"主程序段总共运行了 5.709956169128418 秒\n",
"采样 300 个随机网络关于第一个参数梯度的均值是: 0.005925709445960606\n",
"采样 300 个随机网络关于第一个参数梯度的方差是: 0.028249053148446363\n"
]
}
],
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-02T12:20:52.236108Z",
"start_time": "2021-03-02T12:20:40.850822Z"
}
}
},
{
"cell_type": "markdown",
"source": [
"## 优化曲面的可视化\n",
"\n",
"接下来我们试着利用 Matplotlib来可视化这个优化曲面(optimization landscape)。在**两个量子比特**的情况下,我们只有两个参数 $\\theta_1$ 和 $\\theta_2$, 并且第二层的随机电路电路总共有9种情况。这个很容易画出来:\n",
"\n",
"![BP-fig-landscape2](./figures/BP-fig-landscape2.png)\n",
"\n",
"可以看到的是最后一张图中 $R_z$-$R_z$ 结构所展示出的平原结构是我们非常不想见到的。在这种情况下,我们不能收敛到理论最小值。如果你想自己试着画一些优化曲面,不妨参见以下代码:"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"# 引入必要的 package\n",
"from matplotlib import cm\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
"\n",
"time_start = time.time()\n",
"N = 2\n",
"\n",
"# 设置图像比例 竖:横 = 0.3 \n",
"fig = plt.figure(figsize=plt.figaspect(0.3))\n",
"\n",
"# 生成 x, y 轴上的点\n",
"X = np.linspace(0, 2 * np.pi, 80)\n",
"Y = np.linspace(0, 2 * np.pi, 80)\n",
"\n",
"# 生成 2D 网格 (mesh)\n",
"xx, yy = np.meshgrid(X, Y)\n",
"\n",
"\n",
"# 定义必要的逻辑门\n",
"def rx(theta):\n",
" mat = np.array([[np.cos(theta/2), -1j * np.sin(theta/2)],\n",
" [-1j * np.sin(theta/2), np.cos(theta/2)]])\n",
" return mat\n",
"\n",
"def ry(theta):\n",
" mat = np.array([[np.cos(theta/2), -1 * np.sin(theta/2)],\n",
" [np.sin(theta/2), np.cos(theta/2)]])\n",
" return mat\n",
"\n",
"def rz(theta):\n",
" mat = np.array([[np.exp(-1j * theta/2), 0],\n",
" [0, np.exp(1j * theta/2)]])\n",
" return mat\n",
"\n",
"def CZ():\n",
" mat = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, -1]])\n",
" return mat\n",
"\n",
"# ============= 第一张图 =============\n",
"# 我们可视化第二层是 kron(Ry, Ry) 的情况\n",
"ax = fig.add_subplot(1, 2, 1, projection='3d')\n",
"\n",
"# 向前传播计算损失函数:\n",
"def cost_yy(para):\n",
" L1 = np.kron(ry(np.pi/4), ry(np.pi/4))\n",
" L2 = np.kron(ry(para[0]), ry(para[1]))\n",
" U = np.matmul(np.matmul(L1, L2), CZ())\n",
" H = np.zeros((2 ** N, 2 ** N))\n",
" H[0, 0] = 1\n",
" val = (U.conj().T @ H@ U).real[0][0]\n",
" return val\n",
"\n",
"# 画出图像\n",
"Z = np.array([[cost_yy([x, y]) for x in X] for y in Y]).reshape(len(Y), len(X))\n",
"surf = ax.plot_surface(xx, yy, Z, cmap='plasma')\n",
"ax.set_xlabel(r\"$\\theta_1$\")\n",
"ax.set_ylabel(r\"$\\theta_2$\")\n",
"ax.set_title(\"Optimization Landscape for Ry-Ry Layer\")\n",
"\n",
"# ============= 第二张图 =============\n",
"# 我们可视化第二层是 kron(Rx, Rz) 的情况\n",
"ax = fig.add_subplot(1, 2, 2, projection='3d')\n",
"\n",
"\n",
"def cost_xz(para):\n",
" L1 = np.kron(ry(np.pi/4), ry(np.pi/4))\n",
" L2 = np.kron(rx(para[0]), rz(para[1]))\n",
" U = np.matmul(np.matmul(L1, L2), CZ())\n",
" H = np.zeros((2 ** N, 2 ** N))\n",
" H[0, 0] = 1\n",
" val = (U.conj().T @ H @ U).real[0][0]\n",
" return val\n",
"\n",
"Z = np.array([[cost_xz([x, y]) for x in X] for y in Y]).reshape(len(Y), len(X))\n",
"surf = ax.plot_surface(xx, yy, Z, cmap='viridis')\n",
"ax.set_xlabel(r\"$\\theta_1$\")\n",
"ax.set_ylabel(r\"$\\theta_2$\")\n",
"ax.set_title(\"Optimization Landscape for Rx-Rz Layer\")\n",
"\n",
"\n",
"plt.show()\n",
"\n",
"time_span = time.time() - time_start \n",
"print('主程序段总共运行了', time_span, '秒')"
],
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
"