diff --git a/graph_convolutional_network/README.md b/graph_convolutional_network/README.md index 64ebb9e4a1292d9b6b727518ab7b71879ed94a74..7aa30925a4b23d93d8296b84fb5c550a2d11bc46 100644 --- a/graph_convolutional_network/README.md +++ b/graph_convolutional_network/README.md @@ -1,6 +1,10 @@ -# Graph_Convolutional_Network +# Graph Convolutional Network -## Graph_Convolutional_Network原理 +## 实验介绍 + +图卷积网络(Graph Convolutional Network,GCN)是近年来逐渐流行的一种神经网络结构。不同于只能用于网格结构(grid-based)数据的传统网络模型LSTM和CNN,图卷积网络能够处理具有广义拓扑图结构的数据,并深入发掘其特征和规律。 + +本实验主要介绍在下载的Cora和Citeseer数据集上使用MindSpore进行图卷积网络的训练。 ### 图卷积神经网络由来 @@ -15,18 +19,18 @@ > 1. 在图像为代表的欧式空间中,结点的邻居数量都是固定的。比如说绿色结点的邻居始终是8个。在图这种非欧空间中,结点有多少邻居并不固定。目前绿色结点的邻居结点有2个,但其他结点也会有5个邻居的情况。 > 2. 欧式空间中的卷积操作实际上是用固定大小可学习的卷积核来抽取像素的特征。对于非欧式空间因为邻居结点不固定,所以传统的卷积核不能直接用于抽取图上结点的特征。 -![GCN](images/gcn1.png) +![Graph data](images/gcn1.png) -为了解决传统卷积不能直接应用在邻居结点数量不固定的非欧式空间数据上的问题。 +为了解决传统卷积不能直接应用在邻居结点数量不固定的非欧式空间数据上的问题,目前有两种主流的方法: 1. 提出一种方式把非欧空间的图转换成欧式空间。 -2. 找出一种可处理变长邻居结点的卷积核在图上抽取特征。(图卷积属于这一种,后边又将其分为基于空间域和基于频域) +2. 找出一种可处理变长邻居结点的卷积核在图上抽取特征。图卷积属于这一种,又分为基于空间域和基于频域的方法。 ### 图卷积神经网络概述 -GCN的本质目的就是用来提取拓扑图的空间特征。 图卷积神经网络主要有两类,一类是基于空间域(spatial domain)或顶点域(vertex domain)的,另一类则是基于频域或谱域(spectral domain)的。GCN属于频域图卷积神经网络。 +GCN的本质目的就是用来提取拓扑图的空间特征。图卷积神经网络主要有两类,一类是基于空间域(spatial domain)或顶点域(vertex domain)的,另一类则是基于频域或谱域(spectral domain)的。GCN属于频域图卷积神经网络。 -基于空域卷积的方法直接将卷积操作定义在每个结点的连接关系上,它跟传统的卷积神经网络中的卷积更相似一些。在这个类别中比较有代表性的方法有 Message Passing Neural Networks(MPNN), GraphSage, Diffusion Convolution Neural Networks(DCNN), PATCHY-SAN等。 +空间域方法直接将卷积操作定义在每个结点的连接关系上,它跟传统的卷积神经网络中的卷积更相似一些。在这个类别中比较有代表性的方法有Message Passing Neural Networks(MPNN), GraphSage, Diffusion Convolution Neural Networks(DCNN), PATCHY-SAN等。 频域方法希望借助图谱的理论来实现拓扑图上的卷积操作。从整个研究的时间进程来看:首先研究GSP(graph signal processing)的学者定义了graph上的傅里叶变化(Fourier Transformation),进而定义了graph上的卷积,最后与深度学习结合提出了Graph Convolutional Network(GCN)。 @@ -34,12 +38,13 @@ GCN的本质目的就是用来提取拓扑图的空间特征。 图卷积神经 过程: -1. 定义graph上的Fourier Transformation傅里叶变换 -2. 定义graph上的convolution卷积 +1. 定义graph上的Fourier Transformation傅里叶变换; +2. 定义graph上的convolution卷积。 #### 图上的傅里叶变换 传统的傅里叶变换: + $$ F(w) = \int f(t)e^{-jwt}dt $$ @@ -48,11 +53,13 @@ $$ 注:特征向量定义:特征方程$ AV= \lambda V $中V为特征向量,$\lambda $为特征值 -由于图节点为离散的,我们将上面傅里叶变换离散化,使用有限项分量来近似F(w) ,得到图上的傅里叶变换: +由于图节点为离散的,我们将上面傅里叶变换离散化,使用有限项分量来近似F(w),得到图上的傅里叶变换: -​ $ F(\lambda _l) = \hat {f(\lambda _l)} = \sum_{i=1}^{N}{f(i)}u_l(i)i $ +$$ +F(\lambda _l) = \hat {f(\lambda _l)} = \sum_ {i=1}^{N} f(i) u_l(i) +$$ -其中$u_l(i)$为基函数、特征向量,$\lambda _l$为$u_l(i)$对应的特征值,$ f(i)$为特i征值$\lambda _l$下的$f$ 。即:特征值$\lambda _l$下的$f$ 的傅里叶变换是该分量与$\lambda _l$对应的特征向量$u_l$进行内积运算。这里用拉普拉斯向量$u_l$替换傅里叶变换的基$ e^{-jwt} $,后边我们将定义拉普拉斯矩阵。 +其中$u_l(i)$为基函数、特征向量,$\lambda _l$为$u_l(i)$对应的特征值,$f(i)$为特征值$\lambda _l$下的$f$ 。即:特征值$\lambda _l$下的$f$ 的傅里叶变换是该分量与$\lambda _l$对应的特征向量$u_l$进行内积运算。这里用拉普拉斯正交向量$u_l$替换傅里叶变换的基$ e^{-jwt} $,后边我们将定义拉普拉斯矩阵。 利用矩阵乘法将Graph上的傅里叶变换推广到矩阵形式: @@ -63,11 +70,12 @@ $$ \left[ \begin{matrix} {f(1)} \\ {f(2)} \\ ... \\ {f(N)} \end{matrix} \right] $$ -即 f 在Graph上傅里叶变换的矩阵形式为:$\hat f = U^Tf$ +即: - f 在Graph上傅里叶逆变换的矩阵形式为:$f = U\hat f$ +- f 在Graph上傅里叶变换的矩阵形式为:$\hat f = U^Tf$ +- f 在Graph上傅里叶逆变换的矩阵形式为:$f = U\hat f$ -其中$U$为f的特征矩阵,拉普拉斯矩阵的正交基(后边我们会提到拉普拉斯矩阵)。它是一个对称正交矩阵,故$U^T=U^{-1}$ +其中$U$为f的特征矩阵,拉普拉斯矩阵的正交基(后边我们会提到拉普拉斯矩阵)。它是一个对称正交矩阵,故$U^T=U^{-1}$。 #### 图卷积 @@ -119,15 +127,9 @@ $$ U取图的度矩阵(如下图中的degree matrix),H取图的邻接矩阵(如下图中的adjacency matrix) -![GCN](images/gcn2.png) +![png](images/gcn2.png) -[Graph_Convolutional_Network原理引用于论文]: https://arxiv.org/pdf/1609.02907.pdf - -## 实验介绍 - -图卷积网络(Graph Convolutional Network,GCN)是近年来逐渐流行的一种神经网络结构。不同于只能用于网格结构(grid-based)数据的传统网络模型 LSTM 和 CNN,图卷积网络能够处理具有广义拓扑图结构的数据,并深入发掘其特征和规律。 - -本实验主要介绍在下载的Cora和Citeseer数据集上使用MindSpore进行图卷积网络的训练。 +[1] Graph Convolutional Network原理引用自论文:https://arxiv.org/pdf/1609.02907.pdf ## 实验目的 @@ -149,43 +151,47 @@ U取图的度矩阵(如下图中的degree matrix),H取图的邻接矩阵( ### 数据集准备 -从[github](https://github.com/kimiyoung/planetoid)下载/kimiyoung/planetoid提供的数据集Cora或Citeseer。这是Planetoid的一种实现,以下论文提出了一种基于图的半监督学习方法:[图嵌入技术在半监督学习中的应用](https://arxiv.org/abs/1603.08861) +Cora和CiteSeer是图神经网络常用的数据集,数据集官网[LINQS Datasets](https://linqs.soe.ucsc.edu/data)。 + +Cora数据集包含2708个科学出版物,分为七个类别。 引用网络由5429个链接组成。 数据集中的每个出版物都用一个0/1值的词向量描述,0/1指示词向量中是否出现字典中相应的词。 该词典包含1433个独特的单词。 数据集中的README文件提供了更多详细信息。 + +CiteSeer数据集包含3312种科学出版物,分为六类。 引用网络由4732个链接组成。 数据集中的每个出版物都用一个0/1值的词向量描述,0/1指示词向量中是否出现字典中相应的词。 该词典包含3703个独特的单词。 数据集中的README文件提供了更多详细信息。 + +本实验使用Github上[kimiyoung/planetoid](https://github.com/kimiyoung/planetoid/tree/master/data)预处理和划分好的数据集。 将数据集放置到所需的路径下,该文件夹应包含以下文件: ``` -└─data - ├─ind.cora.allx - ├─ind.cora.ally - ├─... - ├─ind.cora.test.index - ├─trans.citeseer.tx - ├─trans.citeseer.ty - ├─... - └─trans.pubmed.y +data +├── ind.cora.allx +├── ind.cora.ally +├── ... +├── ind.cora.test.index +├── trans.citeseer.tx +├── trans.citeseer.ty +├── ... +└── trans.pubmed.y ``` -其中模型的输入包含: +inductive模型的输入包含: -- `x`,标记训练实例的特征向量, -- `y`,即已标记的训练实例的热门标签, -- `allx`,标记的和未标记的训练实例(的超集`x`)的特征向量, -- `graph`,`dict`格式为`{index: [index_of_neighbor_nodes]}.` +- `x`,已标记的训练实例的特征向量, +- `y`,已标记的训练实例的one-hot标签, +- `allx`,标记的和未标记的训练实例(`x`的超集)的特征向量, +- `graph`,一个`dict`,格式为`{index: [index_of_neighbor_nodes]}.` -令n为标记和未标记训练实例的数量。这n个实例的索引应从0到n-1 `graph`,其顺序与中的顺序相同`allx`。 +令n为标记和未标记训练实例的数量。在`graph`中这n个实例的索引应从0到n-1,其顺序与`allx`中的顺序相同。 除了`x`,`y`,`allx`,和`graph`如上所述,预处理的数据集还包括: - `tx`,测试实例的特征向量, -- `ty`,测试实例的热门标签, -- `test.index`,`graph`对于归纳设置,中的测试实例的索引, -- `ally`,是中实例的标签`allx`。 - -`graph`转换设置中测试实例的索引是从`#x`到`#x + #tx - 1`,与中的顺序相同`tx`。 +- `ty`,测试实例的one-hot标签, +- `test.index`,`graph`中测试实例的索引, +- `ally`,是`allx`中实例的标签。 ### 脚本准备 -从[MindSpore model_zoo](https://gitee.com/mindspore/mindspore/tree/r0.5/model_zoo/gcn)中下载GCN代码;[课程gitee仓库](https://gitee.com/mindspore/course)上下载本实验相关脚本。 +从[MindSpore model_zoo](https://gitee.com/mindspore/mindspore/tree/r0.5/model_zoo/gcn)中下载GCN代码;从[课程gitee仓库](https://gitee.com/mindspore/course)中下载本实验相关脚本。 ### 上传文件 @@ -195,16 +201,15 @@ U取图的度矩阵(如下图中的degree matrix),H取图的邻接矩阵( experiment ├── data ├── graph_to_mindrecord -│ ├── citeseer +│ ├── citeseer │ ├── cora │ ├── graph_map_schema.py -│ ├── writer.py +│ └── writer.py │── src -│ ├── config.py -│ ├── dataset.py -│ ├── gcn.py -│ ├── metrics.py -│── README.md +│ ├── config.py +│ ├── dataset.py +│ ├── gcn.py +│ └── metrics.py └── main.py ``` @@ -214,52 +219,26 @@ experiment #### 数据处理 -将Cora或Citeseer生成mindrecord格式的数据集。(可在cfg中设置`DATASET_NAME`为cora或者citeseer来转换不同的数据集) +`graph_to_mindrecord/writer.py`用于将Cora或Citeseer数据集转换为mindrecord格式,便于提高数据集读取和处理的性能。(可在`main.py`的cfg中设置`DATASET_NAME`为cora或者citeseer来切换不同的数据集) ```python -def run(cfg): - args = read_args() - #建立输出文件夹 - cur_path = os.getcwd() - M_PATH = os.path.join(cur_path, cfg.MINDRECORD_PATH) - if os.path.exists(M_PATH): - shutil.rmtree(M_PATH) # 删除文件夹 - os.mkdir(M_PATH) - cfg.SRC_PATH = os.path.join(cur_path, cfg.SRC_PATH) - #参数 - args.mindrecord_script= cfg.DATASET_NAME - args.mindrecord_file=os.path.join(cfg.MINDRECORD_PATH,cfg.DATASET_NAME) - args.mindrecord_partitions=cfg.mindrecord_partitions - args.mindrecord_header_size_by_bit=cfg.mindrecord_header_size_by_bit - args.mindrecord_page_size_by_bit=cfg.mindrecord_header_size_by_bit - args.graph_api_args=cfg.SRC_PATH - - start_time = time.time() - # pass mr_api arguments - os.environ['graph_api_args'] = args.graph_api_args - - try: - mr_api = import_module('graph_to_mindrecord.'+args.mindrecord_script + '.mr_api') - except ModuleNotFoundError: - raise RuntimeError("Unknown module path: {}".format(args.mindrecord_script + '.mr_api')) - - # init graph schema - graph_map_schema = GraphMapSchema() - - num_features, feature_data_types, feature_shapes = mr_api.node_profile - graph_map_schema.set_node_feature_profile(num_features, feature_data_types, feature_shapes) - - num_features, feature_data_types, feature_shapes = mr_api.edge_profile - graph_map_schema.set_edge_feature_profile(num_features, feature_data_types, feature_shapes) - - graph_schema = graph_map_schema.get_schema() +# init writer +writer = init_writer(graph_schema) + +# write nodes data +mindrecord_dict_data = mr_api.yield_nodes +run_parallel_workers() + +# write edges data +mindrecord_dict_data = mr_api.yield_edges +run_parallel_workers() ``` -### 参数配置 +#### 参数配置 -训练参数可以在config.py中设置。 +训练参数可以在`src/config.py`中设置。 -```shell +```python "learning_rate": 0.01, # Learning rate "epochs": 200, # Epoch sizes for training "hidden1": 16, # Hidden size for the first graph convolution layer @@ -268,12 +247,98 @@ def run(cfg): "early_stopping": 10, # Tolerance for early stopping ``` -### 运行训练 +#### 模型定义 -```shell -def train(args_opt): - """Train model.""" - np.random.seed(args_opt.seed) +图卷积网络及其依赖的图卷积算子定义在`gcn/src/gcn.py`中。 + +图卷积算子基于MindSpore `nn.Dense()`和`P.MatMul()`算子实现。 + +```python +class GraphConvolution(nn.Cell): + """ + GCN graph convolution layer. + + Args: + feature_in_dim (int): The input feature dimension. + feature_out_dim (int): The output feature dimension. + dropout_ratio (float): Dropout ratio for the dropout layer. Default: None. + activation (str): Activation function applied to the output of the layer, eg. 'relu'. Default: None. + + Inputs: + - **adj** (Tensor) - Tensor of shape :math:`(N, N)`. + - **input_feature** (Tensor) - Tensor of shape :math:`(N, C)`. + + Outputs: + Tensor, output tensor. + """ + + def __init__(self, + feature_in_dim, + feature_out_dim, + dropout_ratio=None, + activation=None): + super(GraphConvolution, self).__init__() + self.in_dim = feature_in_dim + self.out_dim = feature_out_dim + self.weight_init = glorot([self.out_dim, self.in_dim]) + self.fc = nn.Dense(self.in_dim, + self.out_dim, + weight_init=self.weight_init, + has_bias=False) + self.dropout_ratio = dropout_ratio + if self.dropout_ratio is not None: + self.dropout = nn.Dropout(keep_prob=1-self.dropout_ratio) + self.dropout_flag = self.dropout_ratio is not None + self.activation = get_activation(activation) + self.activation_flag = self.activation is not None + self.matmul = P.MatMul() + + def construct(self, adj, input_feature): + dropout = input_feature + if self.dropout_flag: + dropout = self.dropout(dropout) + + fc = self.fc(dropout) + output_feature = self.matmul(adj, fc) + + if self.activation_flag: + output_feature = self.activation(output_feature) + return output_feature +``` + +图卷积网络使用了两层图卷积层,即只采集二阶邻居。 + +```python +class GCN(nn.Cell): + """ + GCN architecture. + + Args: + config (ConfigGCN): Configuration for GCN. + adj (numpy.ndarray): Numbers of block in different layers. + feature (numpy.ndarray): Input channel in each layer. + output_dim (int): The number of output channels, equal to classes num. + """ + + def __init__(self, config, adj, feature, output_dim): + super(GCN, self).__init__() + self.adj = Tensor(adj) + self.feature = Tensor(feature) + input_dim = feature.shape[1] + self.layer0 = GraphConvolution(input_dim, config.hidden1, activation="relu", dropout_ratio=config.dropout) + self.layer1 = GraphConvolution(config.hidden1, output_dim, dropout_ratio=None) + + def construct(self): + output0 = self.layer0(self.adj, self.feature) + output1 = self.layer1(self.adj, output0) + return output1 +``` + +#### 模型训练 + +训练和验证的主要逻辑在`main.py`中。包括数据集、网络、训练函数和验证函数的初始化,以及训练逻辑的控制。 + +```python config = ConfigGCN() adj, feature, label = get_adj_features_labels(args_opt.data_dir) @@ -305,10 +370,9 @@ def train(args_opt): eval_accuracy = eval_result[1].asnumpy() loss_list.append(eval_loss) - if epoch%10==0: - print("Epoch:", '%04d' % (epoch), "train_loss=", "{:.5f}".format(train_loss), - "train_acc=", "{:.5f}".format(train_accuracy), "val_loss=", "{:.5f}".format(eval_loss), - "val_acc=", "{:.5f}".format(eval_accuracy), "time=", "{:.5f}".format(time.time() - t)) + print("Epoch:", '%04d' % (epoch + 1), "train_loss=", "{:.5f}".format(train_loss), + "train_acc=", "{:.5f}".format(train_accuracy), "val_loss=", "{:.5f}".format(eval_loss), + "val_acc=", "{:.5f}".format(eval_accuracy), "time=", "{:.5f}".format(time.time() - t)) if epoch > config.early_stopping and loss_list[-1] > np.mean(loss_list[-(config.early_stopping+1):-1]): print("Early stopping...") @@ -321,55 +385,9 @@ def train(args_opt): test_accuracy = test_result[1].asnumpy() print("Test set results:", "loss=", "{:.5f}".format(test_loss), "accuracy=", "{:.5f}".format(test_accuracy), "time=", "{:.5f}".format(time.time() - t_test)) - -if __name__ == '__main__': - #------------------------定义变量------------------------------ - parser = argparse.ArgumentParser(description='GCN') - parser.add_argument('--data_url', type=str, default='./data', help='Dataset directory') - args_opt = parser.parse_args() - - dataname = 'cora' - datadir_save = './data_mr' - datadir = os.path.join(datadir_save, dataname) - cfg = edict({ - 'SRC_PATH': './data', - 'MINDRECORD_PATH': datadir_save, - 'DATASET_NAME': dataname, # citeseer,cora - 'mindrecord_partitions':1, - 'mindrecord_header_size_by_bit' : 18, - 'mindrecord_page_size_by_bit' : 20, - - 'data_dir': datadir, - 'seed' : 123, - 'train_nodes_num':140, - 'eval_nodes_num':500, - 'test_nodes_num':1000 - }) - - #转换数据格式 - print("============== Graph To Mindrecord ==============") - run(cfg) - #训练 - print("============== Starting Training ==============") - train(cfg) -``` - -MindSpore暂时没有提供直接访问OBS数据的接口,需要通过MoXing提供的API与OBS交互。将OBS中存储的数据拷贝至执行容器: - -```python -import moxing as mox -mox.file.copy_parallel(args.data_url, dst_url='./data') -``` - -将训练模型Checkpoint从执行容器拷贝至OBS: - -```python -mox.file.copy_parallel(src_url='data_mr', dst_url=cfg.MINDRECORD_PATH) ``` -### 实验结果 - -训练结果将存储在脚本路径中,该路径的文件夹名称以“ train”开头。可以在日志中找到类似以下结果。 +训练结果将存储在脚本路径中,该路径的文件夹名称以“train”开头。可以在日志中找到类似以下结果。 ```shell Epoch: 0000 train_loss= 1.95401 train_acc= 0.12143 val_loss= 1.94917 val_acc= 0.31400 time= 36.95478 @@ -406,17 +424,24 @@ Test set results: loss= 1.01702 accuracy= 0.81400 time= 6.51215 ```python import argparse parser = argparse.ArgumentParser() -parser.add_argument('--data_url', required=False, default=None, help='Location of data.') -parser.add_argument('--train_url', required=False, default=None, help='Location of training outputs.') -args = parser.parse_args() -dataset = args.dataset +parser.add_argument('--data_url', required=True, default=None, help='Location of data.') +parser.add_argument('--train_url', required=True, default=None, help='Location of training outputs.') +args, unknown = parser.parse_known_args() ``` -MindSpore暂时没有提供直接访问OBS数据的接口,需要通过MoXing提供的API与OBS交互。将OBS中存储的数据拷贝至执行容器(见`start.py`): +MindSpore暂时没有提供直接访问OBS数据的接口,需要通过MoXing提供的API与OBS交互。将OBS中存储的数据拷贝至执行容器: + +```python +import moxing +moxing.file.copy_parallel(src_url=args.data_url, dst_url='./data') +``` + +如需将训练输出(如模型Checkpoint)从执行容器拷贝至OBS,请参考: ```python -import moxing as mox -mox.file.copy_parallel(src_url=args.data_url, dst_url='data/') +import moxing +# dst_url形如's3://OBS/PATH',将ckpt目录拷贝至OBS后,可在OBS的`args.train_url`目录下看到ckpt目录 +moxing.file.copy_parallel(src_url='ckpt', dst_url=os.path.join(args.train_url, 'ckpt')) ``` ### 创建训练作业 @@ -427,7 +452,7 @@ mox.file.copy_parallel(src_url=args.data_url, dst_url='data/') - 算法来源:常用框架->Ascend-Powered-Engine->MindSpore - 代码目录:选择上述新建的OBS桶中的experiment目录 -- 启动文件:选择上述新建的OBS桶中的experiment目录下的`start.py` +- 启动文件:选择上述新建的OBS桶中的experiment目录下的`main.py` - 数据来源:数据存储位置->选择上述新建的OBS桶中的experiment目录下的data目录 - 训练输出位置:选择上述新建的OBS桶中的experiment目录并在其中创建output目录 - 作业日志路径:同训练输出位置 @@ -441,3 +466,6 @@ mox.file.copy_parallel(src_url=args.data_url, dst_url='data/') 3. 点击运行中的训练作业,在展开的窗口中可以查看作业配置信息,以及训练过程中的日志,日志会不断刷新,等训练作业完成后也可以下载日志到本地进行查看; 4. 参考上述代码梳理,在日志中找到对应的打印信息,检查实验是否成功。 +## 实验总结 + +本实验介绍在Cora和Citeseer数据集上使用MindSpore进行GCN实验,GCN能很好的处理和学习图结构数据。 diff --git a/nlp_lstm/README.md b/lstm/README.md similarity index 76% rename from nlp_lstm/README.md rename to lstm/README.md index 912e996e561d0742f1439e9df25fef478b7cb8b8..0369160b5e8c50e49eb62b57daef83bb4296870e 100644 --- a/nlp_lstm/README.md +++ b/lstm/README.md @@ -14,17 +14,13 @@ RNN是一个包含大量重复神经网络模块的链式形式,在标准RNN ![LSTM1](./images/LSTM1.png) -​ **标准RNN中只包含单个tanh层的重复模块** - LSTM也有与之相似的链式结构,但不同的是它的重复模块结构不同,是4个以特殊方式进行交互的神经网络。 ![LSTM2](./images/LSTM2.png) -​ **LSTM示意图** - 这里我们先来看看图中的这些符号: -![LSTM3](./IMAGES/LSTM3.png) +![LSTM3](./images/LSTM3.png) 在示意图中,从某个节点的输出到其他节点的输入,每条线都传递一个完整的向量。粉色圆圈表示pointwise操作,如节点求和,而黄色框则表示用于学习的神经网络层。合并的两条线表示连接,分开的两条线表示信息被复制成两个副本,并将传递到不同的位置。 @@ -66,19 +62,10 @@ sigmoid层输出0到1之间的数字,点乘操作决定多少信息可以传 ### 数据集介绍 -IMDB是一个与国内豆瓣比较类似的与电影相关的网站,而本次实验用到的数据集是这个网站中的一些用户评论。IMDB数据集共包含50000项影评文字,训练数据和测试数据各25000项,每一项影评文字都被标记为正面评价或负面评价,所ff以本实验可以看做一个二分类问题。 - -l 从华为云对象存储服务(OBS)获取 - -华为云开通了相应的数据存储服务OBS可直接通过链接进行数据集下载。 - -[数据集链接]: https://obs-deeplearning.obs.cn-north-1.myhuaweicloud.com/obs-80d2/aclImdb_v1.tar.gz - -l 从斯坦福大学网站获取 - -[数据集链接]: http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz - +IMDB是一个与国内豆瓣比较类似的与电影相关的网站,而本次实验用到的数据集是这个网站中的一些用户评论。IMDB数据集共包含50000项影评文字,训练数据和测试数据各25000项,每一项影评文字都被标记为正面评价或负面评价,所以本实验可以看做一个二分类问题。IMDB数据集官网:[Large Movie Review Dataset](http://ai.stanford.edu/~amaas/data/sentiment/)。 +- 方式一,从斯坦福大学官网下载[aclImdb_v1.tar.gz](http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz)并解压。 +- 方式二,从华为云OBS中下载[aclImdb_v1.tar.gz](https://obs-deeplearning.obs.cn-north-1.myhuaweicloud.com/obs-80d2/aclImdb_v1.tar.gz)并解压。 ## 实验目的 @@ -101,7 +88,7 @@ l 从斯坦福大学网站获取 ### 数据集准备 -采用[IMDB影评数据集](http://ai.stanford.edu/~amaas/data/sentiment/)作为实验数据。同时,我们要下载[GloVe](http://nlp.stanford.edu/data/glove.6B.zip)文件,并在文件glove.6B.300d.txt开头处添加新的一行400000 300,意思是总共读取400000个单词,每个单词用300纬度的词向量表示。 +采用[IMDB影评数据集](http://ai.stanford.edu/~amaas/data/sentiment/)作为实验数据。同时,我们要下载[GloVe](http://nlp.stanford.edu/data/glove.6B.zip)文件,并在文件glove.6B.200d.txt开头处添加新的一行400000 200,意思是总共读取400000个单词,每个单词用200维度的词向量表示。 ### 确定评价标准 @@ -155,29 +142,23 @@ experiment 导入MindSpore模块和辅助模块: ```python -import os -import shutil -import math import argparse -import json +import os from itertools import chain import numpy as np -from config import lstm_cfg as cfg +import gensim + +from easydict import EasyDict as edict -import mindspore.nn as nn -import mindspore.context as context +from mindspore import Model import mindspore.dataset as ds +from mindspore.nn import Accuracy +from mindspore import Tensor, nn, context +from mindspore.train.callback import Callback from mindspore.ops import operations as P -from mindspore import Tensor -from mindspore.common.initializer import initializer -from mindspore.common.parameter import Parameter from mindspore.mindrecord import FileWriter -from mindspore.train import Model -from mindspore.nn.metrics import Accuracy from mindspore.train.serialization import load_checkpoint, load_param_into_net -from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor -# Install gensim with 'pip install gensim' -import gensim +from mindspore.train.callback import CheckpointConfig, ModelCheckpoint, TimeMonitor, LossMonitor ``` ### 预处理数据集 @@ -282,7 +263,7 @@ class ImdbParser(): encoded_features.append(encoded_sentence) self.__features[seg] = encoded_features - def __padding_features(self, seg, maxlen=500, pad=0): + def __padding_features(self, seg, maxlen=200, pad=0): """ pad all features to the same length """ padded_features = [] for feature in self.__features[seg]: @@ -374,7 +355,7 @@ def convert_to_mindrecord(embed_size, aclimdb_path, preprocess_path, glove_path) _convert_to_mindrecord(preprocess_path, test_features, test_labels, training=False) ``` -定义创建数据集函数`lstm_create_dataset`,创建训练集`ds_train`。 +定义创建数据集函数`lstm_create_dataset`,创建训练集`ds_train`和验证集`ds_eval`。 ```python def lstm_create_dataset(data_home, batch_size, repeat_num=1, training=True): @@ -392,6 +373,9 @@ def lstm_create_dataset(data_home, batch_size, repeat_num=1, training=True): data_set = data_set.repeat(count=repeat_num) return data_set + +ds_train = lstm_create_dataset(args.preprocess_path, cfg.batch_size) +ds_eval = lstm_create_dataset(args.preprocess_path, cfg.batch_size, training=False) ``` ### 定义网络 @@ -399,6 +383,7 @@ def lstm_create_dataset(data_home, batch_size, repeat_num=1, training=True): 定义`lstm_default_state`函数来初始化网络参数及网络状态。 ```python +# Initialize short-term memory (h) and long-term memory (c) to 0 def lstm_default_state(batch_size, hidden_size, num_layers, bidirectional): """init default input.""" num_directions = 1 @@ -431,6 +416,7 @@ def lstm_default_state(batch_size, hidden_size, num_layers, bidirectional): ```python class SentimentNet(nn.Cell): """Sentiment network structure.""" + def __init__(self, vocab_size, embed_size, @@ -441,6 +427,7 @@ class SentimentNet(nn.Cell): weight, batch_size): super(SentimentNet, self).__init__() + # Mapp words to vectors self.embedding = nn.Embedding(vocab_size, embed_size, embedding_table=weight) @@ -463,16 +450,38 @@ class SentimentNet(nn.Cell): self.decoder = nn.Dense(num_hiddens * 2, num_classes) def construct(self, inputs): - # (64,500,300) + # input:(64,500,300) embeddings = self.embedding(inputs) embeddings = self.trans(embeddings, self.perm) output, _ = self.encoder(embeddings, (self.h, self.c)) # states[i] size(64,200) -> encoding.size(64,400) - encoding = self.concat((output[0], output[1])) + encoding = self.concat((output[0], output[199])) outputs = self.decoder(encoding) return outputs ``` +### 定义回调函数 + +定义回调函数EvalCallBack,采用一边训练的同时,在相隔固定epoch的位置对模型进行精度验证,等训练完毕后,通过查看对应模型精度的变化就能迅速地挑选出相对最优的模型,实现同步进行训练和验证。 + +```python +class EvalCallBack(Callback): + def __init__(self, model, eval_dataset, eval_per_epoch, epoch_per_eval): + self.model = model + self.eval_dataset = eval_dataset + self.eval_per_epoch = eval_per_epoch + self.epoch_per_eval = epoch_per_eval + + def epoch_end(self, run_context): + cb_param = run_context.original_args() + cur_epoch = cb_param.cur_epoch_num + if cur_epoch % self.eval_per_epoch == 0: + acc = self.model.eval(self.eval_dataset, dataset_sink_mode=False) + self.epoch_per_eval["epoch"].append(cur_epoch) + self.epoch_per_eval["acc"].append(acc["acc"]) + print(acc) +``` + ### 配置运行信息 使用`parser`模块,传入运行必要的信息,如数据集存放路径,GloVe存放路径,这样的好处是,对于经常变化的配置,可以在运行代码时输入,使用更加灵活。 @@ -486,45 +495,39 @@ class SentimentNet(nn.Cell): - device_target:指定GPU或CPU环境。 ```python -if __name__ == '__main__': - parser = argparse.ArgumentParser(description='MindSpore LSTM Example') - parser.add_argument('--preprocess', type=str, default='true', choices=['true', 'false'], - help='whether to preprocess data.') - parser.add_argument('--aclimdb_path', type=str, default="./aclImdb", - help='path where the dataset is stored.') - parser.add_argument('--glove_path', type=str, default="./glove", - help='path where the GloVe is stored.') - parser.add_argument('--preprocess_path', type=str, default="./preprocess", - help='path where the pre-process data is stored.') - parser.add_argument('--ckpt_path', type=str, default="./", - help='the path to save the checkpoint file.') - parser.add_argument('--pre_trained', type=str, default=None, - help='the pretrained checkpoint file path.') - parser.add_argument('--device_target', type=str, default="GPU", choices=['GPU', 'CPU'], - help='the target device to run, support "GPU", "CPU". Default: "GPU".') - args = parser.parse_args(['--device_target', 'CPU', '--preprocess', 'true']) - - context.set_context( - mode=context.GRAPH_MODE, - save_graphs=False, - device_target=args.device_target) - - if args.preprocess == "true": - print("============== Starting Data Pre-processing ==============") - convert_to_mindrecord(cfg.embed_size, args.aclimdb_path, args.preprocess_path, args.glove_path) - print("======================= Successful =======================") - - ds_train = lstm_create_dataset(args.preprocess_path, cfg.batch_size) - #实例化SentimentNet,创建网络。 - embedding_table = np.loadtxt(os.path.join(args.preprocess_path, "weight.txt")).astype(np.float32) - network = SentimentNet(vocab_size=embedding_table.shape[0], - embed_size=cfg.embed_size, - num_hiddens=cfg.num_hiddens, - num_layers=cfg.num_layers, - bidirectional=cfg.bidirectional, - num_classes=cfg.num_classes, - weight=Tensor(embedding_table), - batch_size=cfg.batch_size) +parser = argparse.ArgumentParser(description='MindSpore LSTM Example') +parser.add_argument('--preprocess', type=str, default='false', choices=['true', 'false'], help='whether to preprocess data.') +parser.add_argument('--aclimdb_path', type=str, default="./aclImdb", + help='path where the dataset is stored.') +parser.add_argument('--glove_path', type=str, default="./glove", + help='path where the GloVe is stored.') +parser.add_argument('--preprocess_path', type=str, default="./preprocess", + help='path where the pre-process data is stored.') +parser.add_argument('--ckpt_path', type=str, default="./", + help='the path to save the checkpoint file.') +parser.add_argument('--pre_trained', type=str, default=None, + help='the pretrained checkpoint file path.') +parser.add_argument('--device_target', type=str, default="GPU", choices=['GPU', 'CPU'], + help='the target device to run, support "GPU", "CPU". Default: "GPU".') +args = parser.parse_args(['--device_target', 'CPU', '--preprocess', 'true']) + +context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target=args.device_target) + +if args.preprocess == "true": + print("============== Starting Data Pre-processing ==============") + convert_to_mindrecord(cfg.embed_size, args.aclimdb_path, args.preprocess_path, args.glove_path) + print("======================= Successful =======================") + +#实例化SentimentNet,创建网络。 +embedding_table = np.loadtxt(os.path.join(args.preprocess_path, "weight.txt")).astype(np.float32) +network = SentimentNet(vocab_size=embedding_table.shape[0], + embed_size=cfg.embed_size, + num_hiddens=cfg.num_hiddens, + num_layers=cfg.num_layers, + bidirectional=cfg.bidirectional, + num_classes=cfg.num_classes, + weight=Tensor(embedding_table), + batch_size=cfg.batch_size) ``` 通过`create_dict_iterator`方法创建字典迭代器,读取已创建的数据集`ds_train`中的数据。 @@ -542,27 +545,32 @@ print(f"The feature of the first item in the first batch is below vector:\n{firs ### 定义优化器及损失函数 ```python - loss = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True) - opt = nn.Momentum(network.trainable_params(), cfg.learning_rate, cfg.momentum) - loss_cb = LossMonitor() +loss = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True) +opt = nn.Momentum(network.trainable_params(), cfg.learning_rate, cfg.momentum) +loss_cb = LossMonitor() ``` -### 训练并保存模型 +### 同步训练并验证模型 加载训练数据集(`ds_train`)并配置好`CheckPoint`生成信息,然后使用`model.train`接口,进行模型训练,此步骤在GPU上训练用时约7分钟。CPU上需更久;根据输出可以看到loss值随着训练逐步降低,最后达到0.262左右。 ```python - model = Model(network, loss, opt, {'acc': Accuracy()}) - print("============== Starting Training ==============") - config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps, - keep_checkpoint_max=cfg.keep_checkpoint_max) - ckpoint_cb = ModelCheckpoint(prefix="lstm", directory=args.ckpt_path, config=config_ck) - time_cb = TimeMonitor(data_size=ds_train.get_dataset_size()) - if args.device_target == "CPU": - model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb], dataset_sink_mode=False) - else: - model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb]) - print("============== Training Success ==============") +model = Model(network, loss, opt, {'acc': Accuracy()}) +print("============== Starting Training ==============") +config_ck = CheckpointConfig(save_checkpoint_steps=ds_train.get_dataset_size(), + keep_checkpoint_max=cfg.keep_checkpoint_max) +ckpoint_cb = ModelCheckpoint(prefix="lstm", directory=args.ckpt_path, + config=config_ck) +time_cb = TimeMonitor(data_size=ds_train.get_dataset_size()) +if args.device_target == "CPU": + epoch_per_eval = {"epoch": [], "acc": []} + eval_cb = EvalCallBack(model, ds_eval, 1, epoch_per_eval) + model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb, eval_cb], dataset_sink_mode=False) +else: + epoch_per_eval = {"epoch": [], "acc": []} + eval_cb = EvalCallBack(model, ds_eval, 1, epoch_per_eval) + model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb, eval_cb]) +print("============== Training Success ==============") ``` ``` @@ -577,49 +585,25 @@ epoch: 1 step: 7, loss is 0.6856 epoch: 1 step: 8, loss is 0.6819 epoch: 1 step: 9, loss is 0.7372 epoch: 1 step: 10, loss is 0.6948 - ... -epoch: 10 step: 380, loss is 0.3090 -epoch: 10 step: 381, loss is 0.2692 -epoch: 10 step: 382, loss is 0.3088 -epoch: 10 step: 383, loss is 0.2008 -epoch: 10 step: 384, loss is 0.1450 -epoch: 10 step: 385, loss is 0.2522 -epoch: 10 step: 386, loss is 0.2532 -epoch: 10 step: 387, loss is 0.3558 -epoch: 10 step: 388, loss is 0.2641 -epoch: 10 step: 389, loss is 0.2334 -epoch: 10 step: 390, loss is 0.1966 -Epoch time: 43320.815, per step time: 111.079, avg loss: 0.262 +epoch: 10 step 774, loss is 0.3010297119617462 +epoch: 10 step 775, loss is 0.4418136477470398 +epoch: 10 step 776, loss is 0.29638347029685974 +epoch: 10 step 777, loss is 0.38901057839393616 +epoch: 10 step 778, loss is 0.3772362470626831 +epoch: 10 step 779, loss is 0.4098552167415619 +epoch: 10 step 780, loss is 0.41440871357917786 +epoch: 10 step 781, loss is 0.2255304455757141 +Epoch time: 63056.078, per step time: 80.738 +Epoch time: 63056.078, per step time: 80.738, avg loss: 0.354 ************************************************************ +{'acc': 0.8312996158770807} ============== Training Success ============== ``` -### 模型验证 - -创建并加载验证数据集(`ds_eval`),加载由**训练**保存的CheckPoint文件,进行验证,查看模型质量,此步骤用时约30秒。 - -```python -args.ckpt_path = f'./lstm-{cfg.num_epochs}_390.ckpt' - print("============== Starting Testing ==============") - ds_eval = lstm_create_dataset(args.preprocess_path, cfg.batch_size, training=False) - param_dict = load_checkpoint(args.ckpt_path) - load_param_into_net(network, param_dict) - if args.device_target == "CPU": - acc = model.eval(ds_eval, dataset_sink_mode=False) - else: - acc = model.eval(ds_eval) - print("============== {} ==============".format(acc)) -``` - -``` -============== Starting Testing ============== -============== {'acc': 0.8495592948717948} ============== -``` - ### 训练结果评价 -根据以上一段代码的输出可以看到,在经历了10轮epoch之后,使用验证的数据集,对文本的情感分析正确率在85%左右,达到一个基本满意的结果。 +根据以上一段代码的输出可以看到,在经历了10轮epoch之后,使用验证的数据集,对文本的情感分析正确率在83%左右,达到一个基本满意的结果。 ## 实验总结 diff --git a/nlp_lstm/config.py b/lstm/config.py similarity index 59% rename from nlp_lstm/config.py rename to lstm/config.py index abe819ec3a12d163e84b08180bbd87a1aa92999f..5ccf1641506ef89ab8007498e6a6d4c0c33d71a4 100644 --- a/nlp_lstm/config.py +++ b/lstm/config.py @@ -8,12 +8,12 @@ lstm_cfg = edict({ 'num_classes': 2, 'learning_rate': 0.1, 'momentum': 0.9, - 'num_epochs': 1, - 'batch_size': 64, - 'embed_size': 300, + 'num_epochs': 10, + 'batch_size': 32, + 'embed_size': 200, 'num_hiddens': 100, - 'num_layers': 2, - 'bidirectional': True, - 'save_checkpoint_steps': 390, + 'num_layers': 1, + 'bidirectional': False, + 'save_checkpoint_steps': 390*5, 'keep_checkpoint_max': 10 }) diff --git a/nlp_lstm/images/LSTM1.png b/lstm/images/LSTM1.png similarity index 100% rename from nlp_lstm/images/LSTM1.png rename to lstm/images/LSTM1.png diff --git a/nlp_lstm/images/LSTM2.png b/lstm/images/LSTM2.png similarity index 100% rename from nlp_lstm/images/LSTM2.png rename to lstm/images/LSTM2.png diff --git a/nlp_lstm/images/LSTM3.png b/lstm/images/LSTM3.png similarity index 100% rename from nlp_lstm/images/LSTM3.png rename to lstm/images/LSTM3.png diff --git a/nlp_lstm/images/LSTM4.png b/lstm/images/LSTM4.png similarity index 100% rename from nlp_lstm/images/LSTM4.png rename to lstm/images/LSTM4.png diff --git a/nlp_lstm/images/LSTM5.png b/lstm/images/LSTM5.png similarity index 100% rename from nlp_lstm/images/LSTM5.png rename to lstm/images/LSTM5.png diff --git a/nlp_lstm/images/LSTM6.png b/lstm/images/LSTM6.png similarity index 100% rename from nlp_lstm/images/LSTM6.png rename to lstm/images/LSTM6.png diff --git a/nlp_lstm/images/LSTM7.png b/lstm/images/LSTM7.png similarity index 100% rename from nlp_lstm/images/LSTM7.png rename to lstm/images/LSTM7.png diff --git a/nlp_lstm/images/LSTM8.png b/lstm/images/LSTM8.png similarity index 100% rename from nlp_lstm/images/LSTM8.png rename to lstm/images/LSTM8.png diff --git a/nlp_lstm/images/LSTM9.png b/lstm/images/LSTM9.png similarity index 100% rename from nlp_lstm/images/LSTM9.png rename to lstm/images/LSTM9.png diff --git a/nlp_lstm/main.py b/lstm/main.py similarity index 90% rename from nlp_lstm/main.py rename to lstm/main.py index 8552273efd88106b784ca16521d6906f222e8c45..a9aa8fb6fecf3738a1d83b31a7b84b2fa6c023ec 100644 --- a/nlp_lstm/main.py +++ b/lstm/main.py @@ -1,8 +1,5 @@ import os -import shutil -import math import argparse -import json from itertools import chain import numpy as np from config import lstm_cfg as cfg @@ -12,10 +9,9 @@ import mindspore.context as context import mindspore.dataset as ds from mindspore.ops import operations as P from mindspore import Tensor -from mindspore.common.initializer import initializer -from mindspore.common.parameter import Parameter from mindspore.mindrecord import FileWriter from mindspore.train import Model +from mindspore.train.callback import Callback from mindspore.nn.metrics import Accuracy from mindspore.train.serialization import load_checkpoint, load_param_into_net from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor @@ -119,7 +115,7 @@ class ImdbParser(): encoded_features.append(encoded_sentence) self.__features[seg] = encoded_features - def __padding_features(self, seg, maxlen=500, pad=0): + def __padding_features(self, seg, maxlen=200, pad=0): """ pad all features to the same length """ padded_features = [] for feature in self.__features[seg]: @@ -287,11 +283,27 @@ class SentimentNet(nn.Cell): embeddings = self.trans(embeddings, self.perm) output, _ = self.encoder(embeddings, (self.h, self.c)) # states[i] size(64,200) -> encoding.size(64,400) - encoding = self.concat((output[0], output[499])) + encoding = self.concat((output[0], output[199])) outputs = self.decoder(encoding) return outputs +class EvalCallBack(Callback): + def __init__(self, model, eval_dataset, eval_per_epoch, epoch_per_eval): + self.model = model + self.eval_dataset = eval_dataset + self.eval_per_epoch = eval_per_epoch + self.epoch_per_eval = epoch_per_eval + + def epoch_end(self, run_context): + cb_param = run_context.original_args() + cur_epoch = cb_param.cur_epoch_num + if cur_epoch % self.eval_per_epoch == 0: + acc = self.model.eval(self.eval_dataset, dataset_sink_mode=False) + self.epoch_per_eval["epoch"].append(cur_epoch) + self.epoch_per_eval["acc"].append(acc["acc"]) + print(acc) + if __name__ == '__main__': parser = argparse.ArgumentParser(description='MindSpore LSTM Example') parser.add_argument('--preprocess', type=str, default='true', choices=['true', 'false'], @@ -310,10 +322,7 @@ if __name__ == '__main__': help='the target device to run, support "GPU", "CPU". Default: "GPU".') args = parser.parse_args(['--device_target', 'CPU', '--preprocess', 'true']) - context.set_context( - mode=context.GRAPH_MODE, - save_graphs=False, - device_target=args.device_target) + context.set_context(mode=context.GRAPH_MODE, save_graphs=False, device_target=args.device_target) if args.preprocess == "true": print("============== Starting Data Pre-processing ==============") @@ -321,6 +330,7 @@ if __name__ == '__main__': print("======================= Successful =======================") ds_train = lstm_create_dataset(args.preprocess_path, cfg.batch_size) + ds_eval = lstm_create_dataset(args.preprocess_path, cfg.batch_size, training=False) iterator = ds_train.create_dict_iterator().get_next() first_batch_label = iterator["label"] @@ -344,23 +354,16 @@ if __name__ == '__main__': model = Model(network, loss, opt, {'acc': Accuracy()}) print("============== Starting Training ==============") - config_ck = CheckpointConfig(save_checkpoint_steps=cfg.save_checkpoint_steps, + config_ck = CheckpointConfig(save_checkpoint_steps=ds_train.get_dataset_size(), keep_checkpoint_max=cfg.keep_checkpoint_max) ckpoint_cb = ModelCheckpoint(prefix="lstm", directory=args.ckpt_path, config=config_ck) time_cb = TimeMonitor(data_size=ds_train.get_dataset_size()) if args.device_target == "CPU": - model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb], dataset_sink_mode=False) - else: - model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb]) - print("============== Training Success ==============") - - args.ckpt_path = f'./lstm-{cfg.num_epochs}_390.ckpt' - print("============== Starting Testing ==============") - ds_eval = lstm_create_dataset(args.preprocess_path, cfg.batch_size, training=False) - param_dict = load_checkpoint(args.ckpt_path) - load_param_into_net(network, param_dict) - if args.device_target == "CPU": - acc = model.eval(ds_eval, dataset_sink_mode=False) + epoch_per_eval = {"epoch": [], "acc": []} + eval_cb = EvalCallBack(model, ds_eval, 1, epoch_per_eval) + model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb, eval_cb], dataset_sink_mode=False) else: - acc = model.eval(ds_eval) - print("============== {} ==============".format(acc)) \ No newline at end of file + epoch_per_eval = {"epoch": [], "acc": []} + eval_cb = EvalCallBack(model, ds_eval, 1, epoch_per_eval) + model.train(cfg.num_epochs, ds_train, callbacks=[time_cb, ckpoint_cb, loss_cb, eval_cb]) + print("============== Training Success ==============") \ No newline at end of file