{ "cells": [ { "cell_type": "markdown", "id": "chronic-tunisia", "metadata": {}, "source": [ "# 变分影子量子学习\n", "\n", " Copyright (c) 2021 Institute for Quantum Computing, Baidu Inc. All Rights Reserved. " ] }, { "cell_type": "markdown", "id": "metric-peace", "metadata": {}, "source": [ "## 概览\n", "\n", "本教程我们将讨论变分影子量子学习(variational shadow quantum learning, VSQL)[1] 的原理,以及如何使用 VSQL 来完成一个**二分类**任务。VSQL 是一个在监督学习框架下的量子–经典混合算法。它使用了参数化量子电路(parameterized quantum circuit, PQC)和经典影子(classical shadow),和通常使用的变分量子算法(variational quantum alogorithm, VQA)不同的是,VSQL 只从子空间获取局部特征,而不是从整个希尔伯特空间获取特征。\n", "\n", "### 背景\n", "\n", "我们考虑一个 $k$ 分类问题。输入是一个由 $N$ 个带标签的数据点组成的集合 $D=\\left\\{(\\mathbf{x}^i, \\mathbf{y}^i)\\right\\}_{i=1}^{N}$,这里的 $\\mathbf{x}^i\\in\\mathbb{R}^{m}$ 表示数据点,$\\mathbf{y}^i$ 是这个数据点的标签。**学习过程的目的在于训练一个能尽可能准确预测所有数据点标签的模型 $\\mathcal{F}$。** 我们需要注意,$\\mathbf{y}^i$ 是一个长度为 $k$ 的独热向量(one-hot vector),将标签值表示为独热向量是机器学习领域中的常规做法。我们举一个例子,当 $k=3$ 时,$\\mathbf{y}^{a}=[1, 0, 0]^\\text{T}$、$\\mathbf{y}^{b}=[0, 1, 0]^\\text{T}$ 和 $\\mathbf{y}^{c}=[0, 0, 1]^\\text{T}$ 分别表示第 $a$ 个、第 $b$ 个和第 $c$ 个数据点属于第一类、第二类和第三类。 \n", "在 VSQL 中,$\\mathcal{F}$ 由两部分组成:参数化局部量子电路和经典全连接神经网络(fully-connected neural network, FCNN)。首先,我们需要预处理经典信息,将经典信息编码到量子态。然后我们以卷积的方式将 $U(\\mathbf{\\theta})$ 连续作用在局部量子比特上,这里的 $\\mathbf{\\theta}$ 是参数化局部电路的参数。在这些局部量子比特上进行测量就能得到我们想要的期望值。除此之外,我们还需要一个经典 FCNN 来做数据的后处理。 \n", "我们可以把 VSQL 得到的 $\\mathcal{F}$ 的输出写为 $\\tilde{\\mathbf{y}}^i = \\mathcal{F}(\\mathbf{x}^i)$。这里的 $\\tilde{\\mathbf{y}}^i$ 是一个概率分布,$\\tilde{y}^i_j$ 表示第 $i$ 个数据点属于第 $j$ 类的概率。为了尽可能地预测到准确的标签,我们将预测标签 $\\tilde{\\mathbf{y}}^i$ 和实际标签 $\\mathbf{y}^i$ 之间的累计差距作为损失函数 $\\mathcal{L}$ 进行优化: \n", "\n", "$$\n", "\\mathcal{L}(\\mathbf{\\theta}, \\mathbf{W}, \\mathbf{b}) = -\\frac{1}{N}\\sum\\limits_{i=1}^{N}\\sum\\limits_{j=1}^{k}y^i_j\\log{\\tilde{y}^i_j}, \\tag{1}\n", "$$\n", "\n", "其中 $\\mathbf{W}$ 和 $\\mathbf{b}$ 是经典 FCNN 的权重(weight)和偏置(bias)。注意这个损失函数是由交叉熵(cross-entropy)[2] 推导得到的。\n", "\n", "### 方案流程 \n", "\n", "![pipeline](./figures/vsql-fig-pipeline-cn.png \"图1:VSQL 的流程图\")\n", "
图1:VSQL 的流程图
\n", "\n", "这里我们给出实现 VSQL 的流程。\n", "\n", "1. 将经典数据 $\\mathbf{x}^i$ 编码到量子态 $\\left|\\mathbf{x}^i\\right>$。\n", "2. 准备一个参数化局部量子电路 $U(\\mathbf{\\theta})$ 并且初始化它的参数 $\\mathbf{\\theta}$。\n", "3. 在前几个量子比特上作用 $U(\\mathbf{\\theta})$,然后通过测量局部可观测量(比如说泡利 $X\\otimes X\\cdots \\otimes X$ 算符)来获取一个局部影子特征。\n", "4. 每次将 $U(\\mathbf{\\theta})$ 向下移动一个量子比特,重复步骤3直到 $U(\\mathbf{\\theta})$ 作用到最后一个量子比特上。\n", "5. 将步骤3–4中得到的所有局部影子特征传入经典 FCNN 并通过激活函数得到预测的标签 $\\tilde{\\mathbf{y}}^i$。对于多分类问题来说,我们使用归一化指数函数 (softmax) 作为激活函数。\n", "6. 重复步骤3–5直到数据集内所有的数据点都经过了处理。然后计算损失函数 $\\mathcal{L}(\\mathbf{\\theta}, \\mathbf{W}, \\mathbf{b})$。\n", "7. 通过梯度下降等优化方法调整参数 $\\mathbf{\\theta}$、$\\mathbf{W}$ 和 $\\mathbf{b}$ 的值,从而最小化损失函数。这样我们就得到了优化后的模型 $\\mathcal{F}$。\n", "\n", "由于 VSQL 只获取局部影子特征,所以它可以比较容易地在有拓扑连接限制的量子设备上实现。除此之外,因为我们用同一个 $U(\\mathbf{\\theta})$ 来获取整个电路上的局部影子特征,所以需要训练的参数数量相对于通常使用的变分量子分类器来说大大减少。" ] }, { "cell_type": "markdown", "id": "attempted-ticket", "metadata": {}, "source": [ "## Paddle Quantum 实现\n", "\n", "接下来,我们将用 VSQL 来完成手写数字图像二分类的问题。我们使用的数据来源于常用于基准测试的公开数据集 MNIST [3],其中包含了标签为从 '0' 到 '9' 的十个类别的数据。为了便于展示,我们这里只考虑二分类问题,使用标签为 '0' 和 '1' 的数据。 \n", "\n", "首先,导入所需要的语言包:" ] }, { "cell_type": "code", "execution_count": 1, "id": "senior-blues", "metadata": {}, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "import paddle\n", "from paddle_quantum.circuit import UAnsatz\n", "from paddle.vision.datasets import MNIST\n", "import paddle.nn.functional as F\n", "\n", "paddle.set_default_dtype(\"float64\")" ] }, { "cell_type": "markdown", "id": "korean-beverage", "metadata": {}, "source": [ "### 数据的预处理\n", "\n", "每个手写数字图像都是由 $28\\times 28$ 个取值在 $[0, 255]$ 之间的灰度像素点组成。我们需要先把这个 $28\\times 28$ 的二维矩阵转化为一个长度为784的一维向量 $\\mathbf{x}^i$,然后再使用振幅编码将每个 $\\mathbf{x}^i$ 编码到有10个量子比特的量子态 $\\left|\\mathbf{x}^i\\right>$。为了进行振幅编码,我们需要归一化每个向量,再在它的尾部补零使每个向量的长度与有10个量子比特的量子态一致。" ] }, { "cell_type": "code", "execution_count": 2, "id": "surgical-breast", "metadata": {}, "outputs": [], "source": [ "# 归一化处理向量并在尾部补零\n", "def norm_img(images):\n", " new_images = [np.pad(np.array(i).flatten(), (0, 240), constant_values=(0, 0)) for i in images]\n", " new_images = [paddle.to_tensor(i / np.linalg.norm(i), dtype='complex128') for i in new_images]\n", " \n", " return new_images" ] }, { "cell_type": "code", "execution_count": 3, "id": "synthetic-holly", "metadata": {}, "outputs": [], "source": [ "def data_loading(n_train=1000, n_test=100):\n", " # 我们使用 PaddlePaddle 提供的 MNIST\n", " train_dataset = MNIST(mode='train')\n", " test_dataset = MNIST(mode='test')\n", " # 选出标签为0和1的数据点\n", " train_dataset = np.array([i for i in train_dataset if i[1][0] == 0 or i[1][0] == 1], dtype=object)\n", " test_dataset = np.array([i for i in test_dataset if i[1][0] == 0 or i[1][0] == 1], dtype=object)\n", " np.random.shuffle(train_dataset)\n", " np.random.shuffle(test_dataset)\n", " # 将手写数字图像和标签分开\n", " train_images = train_dataset[:, 0][:n_train]\n", " train_labels = train_dataset[:, 1][:n_train].astype('int64')\n", " test_images = test_dataset[:, 0][:n_test]\n", " test_labels = test_dataset[:, 1][:n_test].astype('int64')\n", " # 归一化向量并补零\n", " x_train = norm_img(train_images)\n", " x_test = norm_img(test_images)\n", " # 将标签处理为独热向量\n", " train_targets = np.array(train_labels).reshape(-1)\n", " y_train = paddle.to_tensor(np.eye(2)[train_targets])\n", " test_targets = np.array(test_labels).reshape(-1)\n", " y_test = paddle.to_tensor(np.eye(2)[test_targets])\n", " \n", " return x_train, y_train, x_test, y_test" ] }, { "cell_type": "markdown", "id": "dominant-hartford", "metadata": {}, "source": [ "### 搭建局部影子电路\n", "\n", "接下来我们要搭建电路。在讲电路的细节之前,我们需要说明几个参数:\n", "- $n$:编码后量子态的量子比特数目。\n", "- $n_{qsc}$:量子影子电路的宽度。我们每次只在连续 $n_{qsc}$ 个量子比特上作用 $U(\\mathbf{\\theta})$。\n", "- $D$:电路的深度,表示 $U(\\mathbf{\\theta})$ 门中某一层电路重复的次数。\n", "\n", "这里我们给出 $n=4$、$n_{qsc}=2$ 时的一个例子:\n", "\n", "我们首先在前两个量子比特上作用 $U(\\mathbf{\\theta})$,并且获取第一个影子特征 $O_1$。\n", "\n", "![qubit0](./figures/vsql-fig-qubit0-cn.png \"图2:获取第一个影子特征\")\n", "
图2: 获取第一个影子特征
\n", "\n", "然后我们准备一样的输入态 $\\left|\\mathbf{x}^i\\right>$,在中间两个量子比特上作用 $U(\\mathbf{\\theta})$,得到第二个影子特征 $O_2$。\n", "\n", "![qubit1](./figures/vsql-fig-qubit1-cn.png \"图3:获取第二个影子特征\")\n", "
图3: 获取第二个影子特征
\n", "\n", "最后,我们再准备一个一样的输入态,在最后两个量子比特上作用 $U(\\mathbf{\\theta})$,得到影子特征 $O_3$。这样我们就处理完了这个数据点!\n", "\n", "![qubit2](./figures/vsql-fig-qubit2-cn.png \"图4:获取第三个影子特征\")\n", "
图4: 获取第三个影子特征
\n", "\n", "通常来说,处理一个数据点需要重复以上步骤 $n - n_{qsc} + 1$ 次。有一点需要指出的是,在上面这个例子中我们只使用了一个影子电路,在获取这三个影子特征时我们使用同样的参数 $\\mathbf{\\theta}$。你可以选择增加影子电路的数量来解决更复杂的问题,这里需要注意的是不同影子电路中的参数 $\\mathbf{\\theta}$ 不同。 \n", " \n", "在后面的 MNIST 二分类任务中,我们将使用2–局部影子电路,即 $n_{qsc}=2$。图5展示了这个影子电路的结构。\n", "\n", "![2-local](./figures/vsql-fig-2-local.png \"图5:$n_{qsc}=2$ 时 $U(\\mathbf{\\theta})$ 的结构\")\n", "
图5:$n_{qsc}=2$ 时 $U(\\mathbf{\\theta})$ 的结构
\n", "\n", "为了增强量子电路的表达能力,我们将重复 $D$ 次虚线框中的结构。$U(\\mathbf{\\theta})$ 的设计并不是唯一的,这里展示的仅仅是一个例子,读者不妨尝试设计自己的电路结构。" ] }, { "cell_type": "code", "execution_count": 4, "id": "orange-american", "metadata": {}, "outputs": [], "source": [ "# 搭建 U(theta)\n", "def U_theta(theta, n, n_start, n_qsc=2, depth=1):\n", " # 初始化电路\n", " cir = UAnsatz(n)\n", " # 先搭建广义的旋转层\n", " for i in range(n_qsc):\n", " cir.rx(theta[0][0][i], n_start + i)\n", " cir.ry(theta[0][1][i], n_start + i)\n", " cir.rx(theta[0][2][i], n_start + i)\n", " # 搭建纠缠层和 Ry 旋转层,重复 D 次\n", " for repeat in range(1, depth + 1):\n", " for i in range(n_qsc - 1):\n", " cir.cnot([n_start + i, n_start + i + 1])\n", " cir.cnot([n_start + n_qsc - 1, n_start])\n", " for i in range(n_qsc):\n", " cir.ry(theta[repeat][1][i], n_start + i)\n", "\n", " return cir" ] }, { "cell_type": "markdown", "id": "scientific-agreement", "metadata": {}, "source": [ "当 $n_{qsc}$ 比较大的时候,我们可以通过对上述的2–局部影子电路进行扩展来搭建 $n_{qsc}$–局部影子电路。我们可以打印一个深度为2的4–局部影子电路来看看这种扩展是怎么进行的:" ] }, { "cell_type": "code", "execution_count": 5, "id": "shaped-location", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--Rx(1.591)----Ry(1.112)----Rx(1.673)----*--------------X----Ry(2.861)----*--------------X----Ry(3.252)--\n", " | | | | \n", "--Rx(1.569)----Ry(0.516)----Rx(3.077)----X----*---------|----Ry(4.816)----X----*---------|----Ry(3.264)--\n", " | | | | \n", "--Rx(0.560)----Ry(6.122)----Rx(1.436)---------X----*----|----Ry(6.038)---------X----*----|----Ry(2.531)--\n", " | | | | \n", "--Rx(1.425)----Ry(1.378)----Rx(5.048)--------------X----*----Ry(5.986)--------------X----*----Ry(1.309)--\n", " \n", "---------------------------------------------------------------------------------------------------------\n", " \n", "---------------------------------------------------------------------------------------------------------\n", " \n" ] } ], "source": [ "N = 6\n", "NSTART = 0\n", "NQSC = 4\n", "D = 2\n", "theta = paddle.uniform([D + 1, 3, NQSC], min=0.0, max=2 * np.pi)\n", "cir = U_theta(theta, N, NSTART, n_qsc=NQSC, depth=D)\n", "print(cir)" ] }, { "cell_type": "markdown", "id": "continued-february", "metadata": {}, "source": [ "### 影子特征\n", "\n", "在前面的教程中,我们多次提到影子特征,那么究竟什么是影子特征呢?我们可以认为它是一个从希尔伯特空间到经典空间的投影。有非常多这样的投影都可以作为影子特征。这里我们将在 $n_{qsc}$ 个量子比特上用泡利 $X\\otimes X\\otimes \\cdots \\otimes X$ 算符进行测量得到的期望值作为影子特征。在我们前面举的 $n=4$ 的例子中,在前两个量子比特上测量得到的 $O_1 = \\left$ 就是我们用影子电路获取的第一个影子特征。" ] }, { "cell_type": "code", "execution_count": 6, "id": "advanced-clinic", "metadata": {}, "outputs": [], "source": [ "# 构造用来获取影子特征的可观测量\n", "def observable(n_start, n_qsc=2):\n", " pauli_str = ','.join('x' + str(i) for i in range(n_start, n_start + n_qsc))\n", " \n", " return [[1.0, pauli_str]]" ] }, { "cell_type": "markdown", "id": "handled-halloween", "metadata": {}, "source": [ "### 用 FCNN 进行经典后处理\n", "\n", "我们将获得的所有影子特征传入一个经典的 FCNN,并使用归一化指数函数作为激活函数。经过归一化指数函数后得到的输出是一个概率分布,输出中的第 $i$ 个元素代表这个数据点属于第 $i$ 类的概率,我们认为预测的类别就是输出中概率最高的类别。为了提高预测的准确率,我们将预测标签和实际标签之间的累计差距作为损失函数进行优化:\n", "\n", "$$\n", "\\mathcal{L}(\\mathbf{\\theta}, \\mathbf{W}, \\mathbf{b}) = -\\frac{1}{N}\\sum\\limits_{i=1}^{N}\\sum\\limits_{j=1}^{k}y^i_j\\log{\\tilde{y}^i_j}.\\tag{2}\n", "$$" ] }, { "cell_type": "code", "execution_count": 7, "id": "certified-details", "metadata": {}, "outputs": [], "source": [ "class Net(paddle.nn.Layer):\n", " def __init__(self,\n", " n, # 全局量子比特的数量: n \n", " n_qsc, # 影子电路的宽度\n", " depth=1, # 电路深度\n", " seed=3, # 随机数种子\n", " ):\n", " super(Net, self).__init__()\n", "\n", " self.n = n\n", " self.n_qsc = n_qsc\n", " self.depth = depth\n", " # 初始化参数列表 theta,并用 [0, 2*pi] 的均匀分布来填充初始值\n", " self.theta = self.create_parameter(shape=[depth + 1, 3, n_qsc],\n", " default_initializer=paddle.nn.initializer.Uniform(low=0.0, high=2 * np.pi))\n", " # FCNN, 采用高斯分布来初始化权重和偏置\n", " self.fc = paddle.nn.Linear(n - n_qsc + 1, 2,\n", " weight_attr=paddle.ParamAttr(initializer=paddle.nn.initializer.Normal()),\n", " bias_attr=paddle.ParamAttr(initializer=paddle.nn.initializer.Normal()))\n", " \n", " # 定义前向传播机制、计算损失函数 和交叉验证正确率\n", " def forward(self, batch_in, label):\n", " # 量子部分\n", " dim = len(batch_in)\n", " features = []\n", " for state in batch_in:\n", " f_i = []\n", " for st in range(self.n - self.n_qsc + 1):\n", " ob = observable(st, n_qsc=self.n_qsc)\n", " cir = U_theta(self.theta, self.n, st, n_qsc=self.n_qsc)\n", " cir.run_state_vector(state)\n", " # 计算期望值\n", " f_ij = cir.expecval(ob)\n", " f_i.append(f_ij)\n", " f_i = paddle.concat(f_i)\n", " features.append(f_i)\n", " features = paddle.stack(features)\n", " # 经典部分\n", " outputs = self.fc(features)\n", " outputs = F.log_softmax(outputs)\n", " # 计算损失函数和准确率\n", " loss = -paddle.mean(paddle.sum(outputs * label)) \n", " is_correct = 0\n", " for i in range(dim):\n", " if paddle.argmax(label[i], axis=-1) == paddle.argmax(outputs[i], axis=-1):\n", " is_correct = is_correct + 1\n", " acc = is_correct / dim\n", " \n", " return loss, acc" ] }, { "cell_type": "code", "execution_count": 8, "id": "classified-saying", "metadata": {}, "outputs": [], "source": [ "def ShadowClassifier(N=4, n_qsc=2, D=1, EPOCH=4, LR=0.1, BATCH=1, seed=1, N_train=1000, N_test=100):\n", " # 加载数据\n", " x_train, y_train, x_test, y_test = data_loading(n_train=N_train, n_test=N_test)\n", " # 初始化神经网络\n", " net = Net(N, n_qsc, depth=D, seed=seed)\n", " # 一般来说,我们利用 Adam 优化器来获得相对好的收敛\n", " # 当然你可以改成 SGD 或者是 RMSprop\n", " opt = paddle.optimizer.Adam(learning_rate=LR, parameters=net.parameters())\n", "\n", " # 优化循环\n", " for ep in range(EPOCH):\n", " for itr in range(N_train // BATCH):\n", " # 前向传播计算损失函数和准确率\n", " loss, batch_acc = net(x_train[itr * BATCH:(itr + 1) * BATCH],\n", " y_train[itr * BATCH:(itr + 1) * BATCH])\n", " # 反向传播极小化损失函数\n", " loss.backward()\n", " opt.minimize(loss)\n", " opt.clear_grad()\n", " # 在测试集上测试\n", " if itr % 10 == 0:\n", " # 计算测试集上的正确率 test_acc\n", " loss_useless, test_acc = net(x_test[0:N_test],\n", " y_test[0:N_test])\n", " # 打印测试结果\n", " print(\"epoch:%3d\" % ep, \" iter:%3d\" % itr,\n", " \" loss: %.4f\" % loss.numpy(),\n", " \" batch acc: %.4f\" % batch_acc,\n", " \" test acc: %.4f\" % test_acc)" ] }, { "cell_type": "markdown", "id": "passing-diagram", "metadata": {}, "source": [ "接下来我们来看看实际的训练效果,整个训练过程大概需要八分钟:" ] }, { "cell_type": "code", "execution_count": 9, "id": "chicken-trash", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch: 0 iter: 0 loss: 13.1132 batch acc: 0.6000 test acc: 0.5900\n", "epoch: 0 iter: 10 loss: 11.3892 batch acc: 0.9000 test acc: 0.9600\n", "epoch: 0 iter: 20 loss: 8.4291 batch acc: 0.9500 test acc: 0.9700\n", "epoch: 1 iter: 0 loss: 8.3585 batch acc: 0.9000 test acc: 0.9700\n", "epoch: 1 iter: 10 loss: 6.9244 batch acc: 0.9500 test acc: 0.9800\n", "epoch: 1 iter: 20 loss: 6.0455 batch acc: 0.9500 test acc: 0.9800\n", "epoch: 2 iter: 0 loss: 5.4859 batch acc: 0.9500 test acc: 0.9800\n", "epoch: 2 iter: 10 loss: 5.1867 batch acc: 0.9500 test acc: 0.9800\n", "epoch: 2 iter: 20 loss: 5.3668 batch acc: 0.9500 test acc: 0.9700\n", "time used: 551.4094610214233\n" ] } ], "source": [ "time_st = time.time()\n", "ShadowClassifier(\n", " N=10, # 全局量子比特的数量: n \n", " n_qsc=2, # 影子电路的宽度\n", " D=1, # 采用的电路深度\n", " EPOCH=3, # 训练 epoch 轮数\n", " LR=0.02, # 设置学习速率\n", " BATCH=20, # 训练时 batch 的大小\n", " seed=1024, # 随机数种子\n", " N_train=500, # 规定训练集大小\n", " N_test=100 # 规定测试集大小\n", ")\n", "print(\"time used:\", time.time() - time_st)" ] }, { "cell_type": "markdown", "id": "second-placement", "metadata": {}, "source": [ "## 总结\n", "\n", "VSQL 是一个基于经典影子的量子–经典混合算法,它以卷积的方式获取局部影子特征。VSQL 结合了参数化量子电路 $U(\\mathbf{\\theta})$ 和经典 FCNN,在二分类任务中表现出色。在[量子分类器](./QClassifier_CN.ipynb)的教程里,我们介绍了一种常用的使用参数化量子电路的分类器。这种分类器在所有的量子比特上作用参数化电路,所以优化器必须搜索整个希尔伯特空间来找到最优的 $\\mathbf{\\theta}$。和这种方法不同的是,VSQL 每次只在少量选中的量子比特上作用参数化局部量子电路 $U(\\mathbf{\\theta})$。对于一个 $k$ 分类问题来说,VSQL 需要训练的是参数化局部量子电路中的参数和经典 FCNN 中的参数,加在一起总共是 $n_{qsc}D + [(n-n_{qsc}+1)+1]k$ 个参数。我们可以看到,和常用的变分量子分类器需要训练的参数个数 $nD$ 比起来,VSQL 参数化局部量子电路部分需要训练的参数数量 $n_{qsc}D$ 大大减少,也就是说,VSQL 需要的量子资源大大减少。" ] }, { "cell_type": "markdown", "id": "nonprofit-motel", "metadata": {}, "source": [ "---\n", "\n", "## 参考文献\n", "\n", "[1] Li, Guangxi, Zhixin Song, and Xin Wang. \"VSQL: Variational Shadow Quantum Learning for Classification.\" [Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 9. 2021.](https://ojs.aaai.org/index.php/AAAI/article/view/17016)\n", "\n", "[2] Goodfellow, Ian, et al. Deep learning. Vol. 1. No. 2. Cambridge: MIT press, 2016.\n", "\n", "[3] LeCun, Yann. \"The MNIST database of handwritten digits.\" http://yann.lecun.com/exdb/mnist/ (1998)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 5 }