!5922 vgg16 support imagenet dataset on Ascend

Merge pull request !5922 from caojian05/ms_r0.5_vgg16_support_imagenet_on_ascend

!5922 vgg16 support imagenet dataset on Ascend
Merge pull request !5922 from caojian05/ms_r0.5_vgg16_support_imagenet_on_ascend
b95334c7 · mindspore-ci-bot · Gitee · 87ae8d70 · a9ea12dc · b95334c7
17 changed file
--- a/model_zoo/vgg16/README.md
+++ b/model_zoo/vgg16/README.md
-# VGG16 Example
+# Contents

-## Description
+- [VGG Description](#vgg-description)
+- [Model Architecture](#model-architecture)
+- [Dataset](#dataset)
+- [Features](#features)
+    - [Mixed Precision](#mixed-precision)
+- [Environment Requirements](#environment-requirements)
+- [Quick Start](#quick-start)
+- [Script Description](#script-description)
+    - [Script and Sample Code](#script-and-sample-code)
+    - [Script Parameters](#script-parameters)
+    - [Parameter configuration](#parameter-configuration)
+    - [Training Process](#training-process)
+        - [Training](#training)
+    - [Evaluation Process](#evaluation-process)
+        - [Evaluation](#evaluation)
+- [Model Description](#model-description)
+    - [Performance](#performance)
+        - [Training Performance](#training-performance)
+        - [Evaluation Performance](#evaluation-performance)
+- [Description of Random Situation](#description-of-random-situation)
+- [ModelZoo Homepage](#modelzoo-homepage)

-This example is for VGG16 model training and evaluation.

-## Requirements
+# [VGG Description](#contents)

- Install [MindSpore](https://www.mindspore.cn/install/en).
+VGG, a very deep convolutional networks for large-scale image recognition, was proposed in 2014 and won the 1th place in object localization and 2th place in image classification task in ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

- Download the CIFAR-10 binary version dataset.
+[Paper](): Simonyan K, zisserman A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. arXiv preprint arXiv:1409.1556, 2014.

-> Unzip the CIFAR-10 dataset to any path you want and the folder structure should be as follows:
-> ```
-> .
-> ├── cifar-10-batches-bin  # train dataset
-> └── cifar-10-verify-bin   # infer dataset
-> ```
+# [Model Architecture](#contents)
+VGG 16 network is mainly consisted by several basic modules (including convolution and pooling layer) and three continuous Dense layer.
+here basic modules mainly include basic operation like:  **3×3 conv** and **2×2 max pooling**.

-## Running the Example

-### Training
+# [Dataset](#contents)
+
+#### Dataset used: [CIFAR-10](<http://www.cs.toronto.edu/~kriz/cifar.html>)
+
+- CIFAR-10 Dataset size：175M，60,000 32*32 colorful images in 10 classes
+    - Train：146M，50,000 images
+    - Test：29.3M，10,000 images
+ - Data format: binary files
+    - Note: Data will be processed in src/dataset.py
+
+#### Dataset used: [ImageNet2012](http://www.image-net.org/) 
+- Dataset size: ~146G, 1.28 million colorful images in 1000 classes
+	- Train: 140G, 1,281,167 images
+	- Test: 6.4G, 50, 000 images
+ - Data format: RGB images
+    - Note: Data will be processed in src/dataset.py
+
+#### Dataset organize way
+
+  CIFAR-10
+
+  > Unzip the CIFAR-10 dataset to any path you want and the folder structure should be as follows:
+  > ```
+  > .
+  > ├── cifar-10-batches-bin  # train dataset
+  > └── cifar-10-verify-bin   # infer dataset
+  > ```
+
+  ImageNet2012
+
+  > Unzip the ImageNet2012 dataset to any path you want and the folder should include train and eval dataset as follows:
+  >
+  > ```
+  > .
+  > └─dataset
+  >   ├─ilsvrc                # train dataset
+  >   └─validation_preprocess # evaluate dataset
+  > ```
+
+
+# [Features](#contents)
+
+## Mixed Precision
+
+The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. 
+
+For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.

+
+# [Environment Requirements](#contents)
+
+- Hardware（Ascend/GPU）
+  - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
+- Framework
+  - [MindSpore](https://www.mindspore.cn/install/en)
+- For more information, please check the resources below：
+  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
+  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
+
+
+# [Quick Start](#contents)
+
+After installing MindSpore via the official website, you can start training and evaluation as follows:
+
+- Running on Ascend
+```python
+# run training example
+python train.py  --data_path=[DATA_PATH] --device_id=[DEVICE_ID] > output.train.log 2>&1 &
+
+# run distributed training example
+sh run_distribute_train.sh [RANL_TABLE_JSON] [DATA_PATH]
+
+# run evaluation example
+python eval.py --data_path=[DATA_PATH]  --pre_trained=[PRE_TRAINED] > output.eval.log 2>&1 &
 ```
-python train.py --data_path=your_data_path --device_id=6 > out.train.log 2>&1 & 
+For distributed training, a hccl configuration file with JSON format needs to be created in advance.
+Please follow the instructions in the link below:
+https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools
+
+- Running on GPU
 ```
-The python command above will run in the background, you can view the results through the file `out.train.log`.
+# run training example
+python train.py --device_target="GPU" --device_id=[DEVICE_ID] --dataset=[DATASET_TYPE] --data_path=[DATA_PATH] > output.train.log 2>&1 &

-After training, you'll get some checkpoint files under the script folder by default.
+# run distributed training example
+sh run_distribute_train_gpu.sh [DATA_PATH]

-You will get the loss value as following:
+# run evaluation example
+python eval.py --device_target="GPU" --device_id=[DEVICE_ID] --dataset=[DATASET_TYPE] --data_path=[DATA_PATH]  --pre_trained=[PRE_TRAINED] > output.eval.log 2>&1 &
 ```
-# grep "loss is " out.train.log
-epoch: 1 step: 781, loss is 2.093086
-epcoh: 2 step: 781, loss is 1.827582
-...
+
+# [Script Description](#contents)
+
+## [Script and Sample Code](#contents)
+
+
+```
+├── model_zoo
+    ├── README.md                                 // descriptions about all the models
+    ├── vgg16       
+        ├── README.md                             // descriptions about googlenet
+        ├── scripts 
+        │   ├── run_distribute_train.sh           // shell script for distributed training on Ascend
+        │   ├── run_distribute_train_gpu.sh       // shell script for distributed training on GPU
+        ├── src
+        │   ├── utils
+        │   │   ├── logging.py                    // logging format setting
+        │   │   ├── sampler.py                    // create sampler for dataset
+        │   │   ├── util.py                       // util function
+        │   │   ├── var_init.py                   // network parameter init method
+        │   ├── config.py                         // parameter configuration
+        │   ├── crossentropy.py                   // loss caculation
+        │   ├── dataset.py                        // creating dataset
+        │   ├── linear_warmup.py                  // linear leanring rate
+        │   ├── warmup_cosine_annealing_lr.py     // consine anealing learning rate
+        │   ├── warmup_step_lr.py                 // step or multi step learning rate
+        │   ├──vgg.py                             // vgg architecture
+        ├── train.py                              // training script
+        ├── eval.py                               // evaluation script
+```
+
+## [Script Parameters](#contents)
+
+### Training
+```
+usage: train.py [--device_target TARGET][--data_path DATA_PATH]
+                [--dataset  DATASET_TYPE][--is_distributed VALUE]
+                [--device_id DEVICE_ID][--pre_trained PRE_TRAINED]
+                [--ckpt_path CHECKPOINT_PATH][--ckpt_interval INTERVAL_STEP]
+
+parameters/options:
+  --device_target       the training backend type, Ascend or GPU, default is Ascend.
+  --dataset             the dataset type, cifar10 or imagenet2012.
+  --is_distributed      the  way of traing, whether do distribute traing, value can be 0 or 1.
+  --data_path           the storage path of dataset
+  --device_id           the device which used to train model.
+  --pre_trained         the pretrained checkpoint file path.
+  --ckpt_path           the path to save checkpoint.
+  --ckpt_interval       the epoch interval for saving checkpoint.
+
 ```

 ### Evaluation

 ```
-python eval.py --data_path=your_data_path --device_id=6 --checkpoint_path=./train_vgg_cifar10-70-781.ckpt > out.eval.log 2>&1 & 
+usage: eval.py [--device_target TARGET][--data_path DATA_PATH]
+               [--dataset  DATASET_TYPE][--pre_trained PRE_TRAINED]
+               [--device_id DEVICE_ID]
+
+parameters/options:
+  --device_target       the evaluation backend type, Ascend or GPU, default is Ascend.
+  --dataset             the dataset type, cifar10 or imagenet2012.
+  --data_path           the storage path of dataset.
+  --device_id           the device which used to evaluate model.
+  --pre_trained         the checkpoint file path used to evaluate model.
 ```
-The above python command will run in the background, you can view the results through the file `out.eval.log`.

-You will get the accuracy as following:
+## [Parameter configuration](#contents)
+
+Parameters for both training and evaluation can be set in config.py.
+
+- config for vgg16, CIFAR-10 dataset
+
 ```
-# grep "result: " out.eval.log
-result: {'acc': 0.92}
+"num_classes": 10,                   # dataset class num
+"lr": 0.01,                          # learning rate
+"lr_init": 0.01,                     # initial learning rate
+"lr_max": 0.1,                       # max learning rate
+"lr_epochs": '30,60,90,120',         # lr changing based epochs
+"lr_scheduler": "step",              # learning rate mode
+"warmup_epochs": 5,                  # number of warmup epoch
+"batch_size": 64,                    # batch size of input tensor
+"max_epoch": 70,                     # only valid for taining, which is always 1 for inference
+"momentum": 0.9,                     # momentum
+"weight_decay": 5e-4,                # weight decay
+"loss_scale": 1.0,                   # loss scale
+"label_smooth": 0,                   # label smooth
+"label_smooth_factor": 0,            # label smooth factor
+"buffer_size": 10,                   # shuffle buffer size
+"image_size": '224,224',             # image size
+"pad_mode": 'same',                  # pad mode for conv2d
+"padding": 0,                        # padding value for conv2d
+"has_bias": False,                   # whether has bias in conv2d
+"batch_norm": True,                  # wether has batch_norm in conv2d
+"keep_checkpoint_max": 10,           # only keep the last keep_checkpoint_max checkpoint
+"initialize_mode": "XavierUniform",  # conv2d init mode
+"has_dropout": True                  # wether using Dropout layer
+```
+
+- config for vgg16, ImageNet2012 dataset
+
+```
+"num_classes": 1000,                 # dataset class num
+"lr": 0.01,                          # learning rate
+"lr_init": 0.01,                     # initial learning rate
+"lr_max": 0.1,                       # max learning rate
+"lr_epochs": '30,60,90,120',         # lr changing based epochs
+"lr_scheduler": "cosine_annealing",  # learning rate mode
+"warmup_epochs": 0,                  # number of warmup epoch
+"batch_size": 32,                    # batch size of input tensor
+"max_epoch": 150,                    # only valid for taining, which is always 1 for inference
+"momentum": 0.9,                     # momentum
+"weight_decay": 1e-4,                # weight decay
+"loss_scale": 1024,                  # loss scale
+"label_smooth": 1,                   # label smooth
+"label_smooth_factor": 0.1,          # label smooth factor
+"buffer_size": 10,                   # shuffle buffer size
+"image_size": '224,224',             # image size
+"pad_mode": 'pad',                   # pad mode for conv2d
+"padding": 1,                        # padding value for conv2d
+"has_bias": True,                    # whether has bias in conv2d
+"batch_norm": False,                 # wether has batch_norm in conv2d
+"keep_checkpoint_max": 10,           # only keep the last keep_checkpoint_max checkpoint
+"initialize_mode": "KaimingNormal",  # conv2d init mode
+"has_dropout": True                  # wether using Dropout layer
+```
+
+## [Training Process](#contents)
+
+### Training
+
+#### Run vgg16 on Ascend
+
+- Training using single device(1p), using CIFAR-10 dataset in default
+```
+python train.py --data_path=your_data_path --device_id=6 > out.train.log 2>&1 & 
 ```
+The python command above will run in the background, you can view the results through the file `out.train.log`.

-### Distribute Training
+After training, you'll get some checkpoint files in specified ckpt_path, default in ./output directory.
+
+You will get the loss value as following:
+```
+# grep "loss is " output.train.log
+epoch: 1 step: 781, loss is 2.093086
+epcoh: 2 step: 781, loss is 1.827582
+...
+```
+
+- Distributed Training
 ```
 sh run_distribute_train.sh rank_table.json your_data_path
 ```
@@ -68,40 +292,83 @@ train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579
 ```
 > About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html).

-## Usage:

-### Training
-```
-usage: train.py [--device_target TARGET][--data_path DATA_PATH]
-                [--device_id DEVICE_ID][--pre_trained PRE_TRAINED]
+#### Run vgg16 on GPU

-parameters/options:
-  --device_target       the training backend type, default is Ascend.
-  --data_path           the storage path of dataset
-  --device_id           the device which used to train model.
-  --pre_trained         the pretrained checkpoint file path.
+- Training using single device(1p)
+```
+python train.py  --device_target="GPU" --dataset="imagenet2012" --is_distributed=0 --data_path=$DATA_PATH  > output.train.log 2>&1 &
+```

+- Distributed Training
 ```
+# distributed training(8p)
+bash scripts/run_distribute_train_gpu.sh /path/ImageNet2012/train"
+```
+
+## [Evaluation Process](#contents)

 ### Evaluation

+- Do eval as follows, need to specify dataset type as "cifar10" or "imagenet2012"
 ```
-usage: eval.py [--device_target TARGET][--data_path DATA_PATH]
-                [--device_id DEVICE_ID][--checkpoint_path CKPT_PATH]
+# when using cifar10 dataset
+python eval.py --data_path=your_data_path --dataset="cifar10" --device_target="Ascend" --pre_trained=./*-70-781.ckpt > output.eval.log 2>&1 &

-parameters/options:
-  --device_target       the evaluation backend type, default is Ascend.
-  --data_path           the storage path of datasetd 
-  --device_id           the device which used to evaluate model.
-  --checkpoint_path     the checkpoint file path used to evaluate model.
+# when using imagenet2012 dataset
+python eval.py --data_path=your_data_path --dataset="imagenet2012" --device_target="GPU" --pre_trained=./*-150-5004.ckpt > output.eval.log 2>&1 &
 ```
-
-### Distribute Training
-
+- The above python command will run in the background, you can view the results through the file `output.eval.log`. You will get the accuracy as following:
 ```
-Usage: sh script/run_distribute_train.sh [MINDSPORE_HCCL_CONFIG_PATH] [DATA_PATH]
+# when using cifar10 dataset
+# grep "result: " output.eval.log
+result: {'acc': 0.92}

-parameters/options:
-  MINDSPORE_HCCL_CONFIG_PATH   HCCL configuration file path.
-  DATA_PATH                    the storage path of dataset.
+# when using the imagenet2012 dataset
+after allreduce eval: top1_correct=36636, tot=50000, acc=73.27%
+after allreduce eval: top5_correct=45582, tot=50000, acc=91.16%
 ```
+
+
+# [Model Description](#contents)
+## [Performance](#contents)
+
+### Training Performance 
+
+| Parameters                 | VGG16(Ascend)                                  | VGG16(GPU)                                      |
+| -------------------------- | ---------------------------------------------- |------------------------------------|
+| Model Version              | VGG16                                          | VGG16                                           |
+| Resource                   | Ascend 910 ；CPU 2.60GHz，56cores；Memory，314G  |NV SMX2 V100-32G                                 |
+| uploaded Date              | 08/20/2020                                      |08/20/2020                                       |
+| MindSpore Version          | 0.5.0-alpha                                     |0.5.0-alpha                                             |
+| Dataset                    | CIFAR-10                                        |ImageNet2012                                     |
+| Training Parameters        | epoch=70, steps=781, batch_size = 64, lr=0.1   |epoch=150, steps=40036, batch_size = 32, lr=0.1  |
+| Optimizer                  | Momentum                                        |Momentum                                         |
+| Loss Function              | SoftmaxCrossEntropy                             |SoftmaxCrossEntropy                              |
+| outputs                    | probability                                     |probability                                                 |
+| Loss                       | 0.01                                          |1.5~2.0                                          |
+| Speed                      | 1pc: 79 ms/step;  8pcs: 104 ms/step              |1pc: 81 ms/step; 8pcs 94.4ms/step                |
+| Total time                 | 1pc: 72 mins;  8pcs: 11.8 mins              |8pcs: 19.7 hours                                 |
+| Checkpoint for Fine tuning | 1.1G(.ckpt file)                             |1.1G(.ckpt file)                                 |
+| Scripts                    |[vgg16](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/vgg16) |                   |
+
+
+### Evaluation Performance
+
+| Parameters          | VGG16(Ascend)               | VGG16(GPU)
+| ------------------- | --------------------------- |---------------------
+| Model Version       | VGG16                       |    VGG16                       |
+| Resource            | Ascend 910                  |   GPU                          |
+| Uploaded Date       | 08/20/2020                |  08/20/2020                    |
+| MindSpore Version   | 0.5.0-alpha                 |0.5.0-alpha                     |
+| Dataset             | CIFAR-10, 10,000 images     |ImageNet2012, 5000 images       |
+| batch_size          |   64                        |    32                          |
+| outputs             | probability                 |    probability                            |
+| Accuracy            | 1pc: 93.4%               |1pc: 73.0%;                     |
+
+# [Description of Random Situation](#contents)
+
+In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
+
+# [ModelZoo Homepage](#contents)  
+ Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).  
--- a/model_zoo/vgg16/eval.py
+++ b/model_zoo/vgg16/eval.py
@@ -12,42 +12,201 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-"""
-##############test vgg16 example on cifar10#################
-python eval.py --data_path=$DATA_HOME --device_id=$DEVICE_ID
-"""
+"""Eval"""
+import os
+import time
 import argparse
-
+import datetime
+import glob
+import numpy as np
 import mindspore.nn as nn
-from mindspore import context
+
+from mindspore import Tensor, context
 from mindspore.nn.optim.momentum import Momentum
 from mindspore.train.model import Model
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
-from src.config import cifar_cfg as cfg
-from src.dataset import vgg_create_dataset
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore.common import dtype as mstype
+
+from src.utils.logging import get_logger
 from src.vgg import vgg16
+from src.dataset import vgg_create_dataset
+from src.dataset import classification_dataset
+
+
+class ParameterReduce(nn.Cell):
+    """ParameterReduce"""
+    def __init__(self):
+        super(ParameterReduce, self).__init__()
+        self.cast = P.Cast()
+        self.reduce = P.AllReduce()
+
+    def construct(self, x):
+        one = self.cast(F.scalar_to_array(1.0), mstype.float32)
+        out = x * one
+        ret = self.reduce(out)
+        return ret
+

-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='Cifar10 classification')
+def parse_args(cloud_args=None):
+    """parse_args"""
+    parser = argparse.ArgumentParser('mindspore classification test')
    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU'],
                        help='device where the code will be implemented. (Default: Ascend)')
-    parser.add_argument('--data_path', type=str, default='./cifar', help='path where the dataset is saved')
-    parser.add_argument('--checkpoint_path', type=str, default=None, help='checkpoint file path.')
-    parser.add_argument('--device_id', type=int, default=None, help='device id of GPU or Ascend. (Default: None)')
+    # dataset related
+    parser.add_argument('--dataset', type=str, choices=["cifar10", "imagenet2012"], default="cifar10")
+    parser.add_argument('--data_path', type=str, default='', help='eval data dir')
+    parser.add_argument('--per_batch_size', default=32, type=int, help='batch size for per npu')
+    # network related
+    parser.add_argument('--graph_ckpt', type=int, default=1, help='graph ckpt or feed ckpt')
+    parser.add_argument('--pre_trained', default='', type=str, help='fully path of pretrained model to load. '
+                        'If it is a direction, it will test all ckpt')
+
+    # logging related
+    parser.add_argument('--log_path', type=str, default='outputs/', help='path to save log')
+    parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
+    parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
+
    args_opt = parser.parse_args()
+    args_opt = merge_args(args_opt, cloud_args)
+
+    if args_opt.dataset == "cifar10":
+        from src.config import cifar_cfg as cfg
+    else:
+        from src.config import imagenet_cfg as cfg
+
+    args_opt.image_size = cfg.image_size
+    args_opt.num_classes = cfg.num_classes
+    args_opt.per_batch_size = cfg.batch_size
+    args_opt.momentum = cfg.momentum
+    args_opt.weight_decay = cfg.weight_decay
+    args_opt.buffer_size = cfg.buffer_size
+    args_opt.pad_mode = cfg.pad_mode
+    args_opt.padding = cfg.padding
+    args_opt.has_bias = cfg.has_bias
+    args_opt.batch_norm = cfg.batch_norm
+    args_opt.initialize_mode = cfg.initialize_mode
+    args_opt.has_dropout = cfg.has_dropout
+
+    args_opt.image_size = list(map(int, args_opt.image_size.split(',')))
+
+    return args_opt
+
+
+def get_top5_acc(top5_arg, gt_class):
+    sub_count = 0
+    for top5, gt in zip(top5_arg, gt_class):
+        if gt in top5:
+            sub_count += 1
+    return sub_count
+
+
+def merge_args(args, cloud_args):
+    """merge_args"""
+    args_dict = vars(args)
+    if isinstance(cloud_args, dict):
+        for key in cloud_args.keys():
+            val = cloud_args[key]
+            if key in args_dict and val:
+                arg_type = type(args_dict[key])
+                if arg_type is not type(None):
+                    val = arg_type(val)
+                args_dict[key] = val
+    return args
+
+
+def test(cloud_args=None):
+    """test"""
+    args = parse_args(cloud_args)
+    context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True,
+                        device_target=args.device_target, save_graphs=False)
+    if os.getenv('DEVICE_ID', "not_set").isdigit():
+        context.set_context(device_id=int(os.getenv('DEVICE_ID')))
+
+    args.outputs_dir = os.path.join(args.log_path,
+                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
+
+    args.logger = get_logger(args.outputs_dir, args.rank)
+    args.logger.save_args(args)
+
+    if args.dataset == "cifar10":
+        net = vgg16(num_classes=args.num_classes, args=args)
+        opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, args.momentum,
+                       weight_decay=args.weight_decay)
+        loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
+        model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
+
+        param_dict = load_checkpoint(args.pre_trained)
+        load_param_into_net(net, param_dict)
+        net.set_train(False)
+        dataset = vgg_create_dataset(args.data_path, args.image_size, args.per_batch_size, training=False)
+        res = model.eval(dataset)
+        print("result: ", res)
+    else:
+        # network
+        args.logger.important_info('start create network')
+        if os.path.isdir(args.pre_trained):
+            models = list(glob.glob(os.path.join(args.pre_trained, '*.ckpt')))
+            print(models)
+            if args.graph_ckpt:
+                f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('-')[-1].split('_')[0])
+            else:
+                f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('_')[-1])
+            args.models = sorted(models, key=f)
+        else:
+            args.models = [args.pre_trained,]
+
+        for model in args.models:
+            dataset = classification_dataset(args.data_path, args.image_size, args.per_batch_size, mode='eval')
+            eval_dataloader = dataset.create_tuple_iterator()
+            network = vgg16(args.num_classes, args, phase="test")
+
+            # pre_trained
+            load_param_into_net(network, load_checkpoint(model))
+            network.add_flags_recursive(fp16=True)
+
+            img_tot = 0
+            top1_correct = 0
+            top5_correct = 0
+
+            network.set_train(False)
+            t_end = time.time()
+            it = 0
+            for data, gt_classes in eval_dataloader:
+                output = network(Tensor(data, mstype.float32))
+                output = output.asnumpy()
+
+                top1_output = np.argmax(output, (-1))
+                top5_output = np.argsort(output)[:, -5:]
+
+                t1_correct = np.equal(top1_output, gt_classes).sum()
+                top1_correct += t1_correct
+                top5_correct += get_top5_acc(top5_output, gt_classes)
+                img_tot += args.per_batch_size
+
+                if args.rank == 0 and it == 0:
+                    t_end = time.time()
+                    it = 1
+            if args.rank == 0:
+                time_used = time.time() - t_end
+                fps = (img_tot - args.per_batch_size) * args.group_size / time_used
+                args.logger.info('Inference Performance: {:.2f} img/sec'.format(fps))
+            results = [[top1_correct], [top5_correct], [img_tot]]
+            args.logger.info('before results={}'.format(results))
+            results = np.array(results)
+
+            args.logger.info('after results={}'.format(results))
+            top1_correct = results[0, 0]
+            top5_correct = results[1, 0]
+            img_tot = results[2, 0]
+            acc1 = 100.0 * top1_correct / img_tot
+            acc5 = 100.0 * top5_correct / img_tot
+            args.logger.info('after allreduce eval: top1_correct={}, tot={},'
+                             'acc={:.2f}%(TOP1)'.format(top1_correct, img_tot, acc1))
+            args.logger.info('after allreduce eval: top5_correct={}, tot={},'
+                             'acc={:.2f}%(TOP5)'.format(top5_correct, img_tot, acc5))
+

-    context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target)
-    context.set_context(device_id=args_opt.device_id)
-
-    net = vgg16(num_classes=cfg.num_classes)
-    opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01, cfg.momentum,
-                   weight_decay=cfg.weight_decay)
-    loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
-    model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
-
-    param_dict = load_checkpoint(args_opt.checkpoint_path)
-    load_param_into_net(net, param_dict)
-    net.set_train(False)
-    dataset = vgg_create_dataset(args_opt.data_path, 1, False)
-    res = model.eval(dataset)
-    print("result: ", res)
+if __name__ == "__main__":
+    test()
--- a/model_zoo/vgg16/scripts/run_distribute_train.sh
+++ b/model_zoo/vgg16/scripts/run_distribute_train.sh
@@ -14,15 +14,15 @@
 # limitations under the License.
 # ============================================================================

-if [ $# != 2 ]
+if [ $# != 2 ] && [ $# != 3 ]
 then
-    echo "Usage: sh run_distribute_train.sh [MINDSPORE_HCCL_CONFIG_PATH] [DATA_PATH]"
+    echo "Usage: sh run_distribute_train.sh [RANK_TABLE_FILE] [DATA_PATH] [cifar10|imagenet2012]"
 exit 1
 fi

 if [ ! -f $1 ]
 then
-    echo "error: MINDSPORE_HCCL_CONFIG_PATH=$1 is not a file"
+    echo "error: RANK_TABLE_FILE=$1 is not a file"
 exit 1
 fi

@@ -32,9 +32,22 @@ then
 exit 1
 fi

+
+dataset_type='cifar10'
+if [ $# == 3 ]
+then
+    if [ $3 != "cifar10" ] && [ $3 != "imagenet2012" ]
+    then
+        echo "error: the selected dataset is neither cifar10 nor imagenet2012"
+    exit 1
+    fi
+    dataset_type=$3
+fi
+
+
 export DEVICE_NUM=8
 export RANK_SIZE=8
-export MINDSPORE_HCCL_CONFIG_PATH=$1
+export RANK_TABLE_FILE=$1

 for((i=0;i<RANK_SIZE;i++))
 do
@@ -45,8 +58,8 @@ do
    cp *.py ./train_parallel$i
    cp -r src ./train_parallel$i
    cd ./train_parallel$i || exit
-    echo "start training for rank $RANK_ID, device $DEVICE_ID"
+    echo "start training for rank $RANK_ID, device $DEVICE_ID, $dataset_type"
    env > env.log
-    python train.py --data_path=$2 --device_id=$i &> log &
+    python train.py --data_path=$2 --device_target="Ascend" --device_id=$i --is_distributed=1 --dataset=$dataset_type &> log &
    cd ..
-done
\ No newline at end of file
+done
--- a/model_zoo/vgg16/scripts/run_distribute_train_gpu.sh
+++ b/model_zoo/vgg16/scripts/run_distribute_train_gpu.sh
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_distribute_train_gpu.sh DATA_PATH"
+echo "for example: bash run_distribute_train_gpu.sh /path/ImageNet2012/train"
+echo "=============================================================================================================="
+
+DATA_PATH=$1
+
+mpirun -n 8 python train.py  \
+    --device_target="GPU" \
+    --dataset="imagenet2012" \
+    --is_distributed=1 \
+    --data_path=$DATA_PATH  > output.train.log 2>&1 &
--- a/model_zoo/vgg16/scripts/run_eval.sh
+++ b/model_zoo/vgg16/scripts/run_eval.sh
+#!/bin/bash
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+echo "=============================================================================================================="
+echo "Please run the script as: "
+echo "bash run_eval.sh DATA_PATH DATASET_TYPE DEVICE_TYPE CHECKPOINT_PATH"
+echo "for example: bash run_eval.sh /path/ImageNet2012/train cifar10 Ascend /path/a.ckpt "
+echo "=============================================================================================================="
+
+DATA_PATH=&1
+DATASET_TYPE=$2
+DEVICE_TYPE=$3
+CHECKPOINT_PATH=$4
+
+python eval.py \
+    --data_path=$DATA_PATH \
+    --dataset=$DATASET_TYPE \
+    --device_target=$DEVICE_TYPE \
+    --pre_trained=$CHECKPOINT_PATH > output.eval.log 2>&1 &
\ No newline at end of file
--- a/model_zoo/vgg16/src/config.py
+++ b/model_zoo/vgg16/src/config.py
@@ -13,21 +13,60 @@
 # limitations under the License.
 # ============================================================================
 """
-network config setting, will be used in main.py
+network config setting, will be used in train.py and eval.py
 """
 from easydict import EasyDict as edict

+# config for vgg16, cifar10
 cifar_cfg = edict({
-    'num_classes': 10,
-    'lr_init': 0.01,
-    'lr_max': 0.1,
-    'warmup_epochs': 5,
-    'batch_size': 64,
-    'epoch_size': 70,
-    'momentum': 0.9,
-    'weight_decay': 5e-4,
-    'buffer_size': 10,
-    'image_height': 224,
-    'image_width': 224,
-    'keep_checkpoint_max': 10
+    "num_classes": 10,
+    "lr": 0.01,
+    "lr_init": 0.01,
+    "lr_max": 0.1,
+    "lr_epochs": '30,60,90,120',
+    "lr_scheduler": "step",
+    "warmup_epochs": 5,
+    "batch_size": 64,
+    "max_epoch": 70,
+    "momentum": 0.9,
+    "weight_decay": 5e-4,
+    "loss_scale": 1.0,
+    "label_smooth": 0,
+    "label_smooth_factor": 0,
+    "buffer_size": 10,
+    "image_size": '224,224',
+    "pad_mode": 'same',
+    "padding": 0,
+    "has_bias": False,
+    "batch_norm": True,
+    "keep_checkpoint_max": 10,
+    "initialize_mode": "XavierUniform",
+    "has_dropout": False
+})
+
+# config for vgg16, imagenet2012
+imagenet_cfg = edict({
+    "num_classes": 1000,
+    "lr": 0.01,
+    "lr_init": 0.01,
+    "lr_max": 0.1,
+    "lr_epochs": '30,60,90,120',
+    "lr_scheduler": 'cosine_annealing',
+    "warmup_epochs": 0,
+    "batch_size": 32,
+    "max_epoch": 150,
+    "momentum": 0.9,
+    "weight_decay": 1e-4,
+    "loss_scale": 1024,
+    "label_smooth": 1,
+    "label_smooth_factor": 0.1,
+    "buffer_size": 10,
+    "image_size": '224,224',
+    "pad_mode": 'pad',
+    "padding": 1,
+    "has_bias": False,
+    "batch_norm": False,
+    "keep_checkpoint_max": 10,
+    "initialize_mode": "XavierUnifor",
+    "has_dropout": True
 })
--- a/model_zoo/vgg16/src/crossentropy.py
+++ b/model_zoo/vgg16/src/crossentropy.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""define loss function for network"""
+from mindspore.nn.loss.loss import _Loss
+from mindspore.ops import operations as P
+from mindspore.ops import functional as F
+from mindspore import Tensor
+from mindspore.common import dtype as mstype
+import mindspore.nn as nn
+
+
+class CrossEntropy(_Loss):
+    """the redefined loss function with SoftmaxCrossEntropyWithLogits"""
+
+    def __init__(self, smooth_factor=0., num_classes=1001):
+        super(CrossEntropy, self).__init__()
+        self.onehot = P.OneHot()
+        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
+        self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32)
+        self.ce = nn.SoftmaxCrossEntropyWithLogits()
+        self.mean = P.ReduceMean(False)
+
+    def construct(self, logit, label):
+        one_hot_label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value)
+        loss = self.ce(logit, one_hot_label)
+        loss = self.mean(loss, 0)
+        return loss
--- a/model_zoo/vgg16/src/dataset.py
+++ b/model_zoo/vgg16/src/dataset.py
@@ -13,37 +13,35 @@
 # limitations under the License.
 # ============================================================================
 """
-Data operations, will be used in train.py and eval.py
+dataset processing.
 """
 import os
-
-import mindspore.common.dtype as mstype
-import mindspore.dataset as ds
+from mindspore.common import dtype as mstype
+import mindspore.dataset as de
 import mindspore.dataset.transforms.c_transforms as C
 import mindspore.dataset.transforms.vision.c_transforms as vision
-from .config import cifar_cfg as cfg
+from PIL import Image, ImageFile
+from src.utils.sampler import DistributedSampler
+
+ImageFile.LOAD_TRUNCATED_IMAGES = True


-def vgg_create_dataset(data_home, repeat_num=1, training=True):
+def vgg_create_dataset(data_home, image_size, batch_size, rank_id=0, rank_size=1, repeat_num=1, training=True):
    """Data operations."""
-    ds.config.set_seed(1)
+    de.config.set_seed(1)
    data_dir = os.path.join(data_home, "cifar-10-batches-bin")
    if not training:
        data_dir = os.path.join(data_home, "cifar-10-verify-bin")

-    rank_size = int(os.environ.get("RANK_SIZE")) if os.environ.get("RANK_SIZE") else None
-    rank_id = int(os.environ.get("RANK_ID")) if os.environ.get("RANK_ID") else None
-    data_set = ds.Cifar10Dataset(data_dir, num_shards=rank_size, shard_id=rank_id)
+    data_set = de.Cifar10Dataset(data_dir, num_shards=rank_size, shard_id=rank_id)

-    resize_height = cfg.image_height
-    resize_width = cfg.image_width
    rescale = 1.0 / 255.0
    shift = 0.0

    # define map operations
    random_crop_op = vision.RandomCrop((32, 32), (4, 4, 4, 4))  # padding_mode default CONSTANT
    random_horizontal_op = vision.RandomHorizontalFlip()
-    resize_op = vision.Resize((resize_height, resize_width))  # interpolation default BILINEAR
+    resize_op = vision.Resize(image_size)  # interpolation default BILINEAR
    rescale_op = vision.Rescale(rescale, shift)
    normalize_op = vision.Normalize((0.4465, 0.4822, 0.4914), (0.2010, 0.1994, 0.2023))
    changeswap_op = vision.HWC2CHW()
@@ -66,6 +64,134 @@ def vgg_create_dataset(data_home, repeat_num=1, training=True):
    data_set = data_set.shuffle(buffer_size=10)

    # apply batch operations
-    data_set = data_set.batch(batch_size=cfg.batch_size, drop_remainder=True)
+    data_set = data_set.batch(batch_size=batch_size, drop_remainder=True)

    return data_set
+
+
+def classification_dataset(data_dir, image_size, per_batch_size, rank=0, group_size=1,
+                           mode='train',
+                           input_mode='folder',
+                           root='',
+                           num_parallel_workers=None,
+                           shuffle=None,
+                           sampler=None,
+                           repeat_num=1,
+                           class_indexing=None,
+                           drop_remainder=True,
+                           transform=None,
+                           target_transform=None):
+    """
+    A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt".
+    If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images
+    are written into a textfile.
+
+    Args:
+        data_dir (str): Path to the root directory that contains the dataset for "input_mode="folder"".
+            Or path of the textfile that contains every image's path of the dataset.
+        image_size (str): Size of the input images.
+        per_batch_size (int): the batch size of evey step during training.
+        rank (int): The shard ID within num_shards (default=None).
+        group_size (int): Number of shards that the dataset should be divided
+            into (default=None).
+        mode (str): "train" or others. Default: " train".
+        input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder".
+        root (str): the images path for "input_mode="txt"". Default: " ".
+        num_parallel_workers (int): Number of workers to read the data. Default: None.
+        shuffle (bool): Whether or not to perform shuffle on the dataset
+            (default=None, performs shuffle).
+        sampler (Sampler): Object used to choose samples from the dataset. Default: None.
+        repeat_num (int): the num of repeat dataset.
+        class_indexing (dict): A str-to-int mapping from folder name to index
+            (default=None, the folder names will be sorted
+            alphabetically and each class will be given a
+            unique index starting from 0).
+
+    Examples:
+        >>> from mindvision.common.datasets.classification import classification_dataset
+        >>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images
+        >>> dataset_dir = "/path/to/imagefolder_directory"
+        >>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
+        >>>                               per_batch_size=64, rank=0, group_size=4)
+        >>> # Path of the textfile that contains every image's path of the dataset.
+        >>> dataset_dir = "/path/to/dataset/images/train.txt"
+        >>> images_dir = "/path/to/dataset/images"
+        >>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
+        >>>                               per_batch_size=64, rank=0, group_size=4,
+        >>>                               input_mode="txt", root=images_dir)
+    """
+
+    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
+    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]
+
+    if transform is None:
+        if mode == 'train':
+            transform_img = [
+                vision.RandomCropDecodeResize(image_size, scale=(0.08, 1.0)),
+                vision.RandomHorizontalFlip(prob=0.5),
+                vision.Normalize(mean=mean, std=std),
+                vision.HWC2CHW()
+            ]
+        else:
+            transform_img = [
+                vision.Decode(),
+                vision.Resize((256, 256)),
+                vision.CenterCrop(image_size),
+                vision.Normalize(mean=mean, std=std),
+                vision.HWC2CHW()
+            ]
+    else:
+        transform_img = transform
+
+    if target_transform is None:
+        transform_label = [C.TypeCast(mstype.int32)]
+    else:
+        transform_label = target_transform
+
+    if input_mode == 'folder':
+        de_dataset = de.ImageFolderDatasetV2(data_dir, num_parallel_workers=num_parallel_workers,
+                                             shuffle=shuffle, sampler=sampler, class_indexing=class_indexing,
+                                             num_shards=group_size, shard_id=rank)
+    else:
+        dataset = TxtDataset(root, data_dir)
+        sampler = DistributedSampler(dataset, rank, group_size, shuffle=shuffle)
+        de_dataset = de.GeneratorDataset(dataset, ["image", "label"], sampler=sampler)
+        de_dataset.set_dataset_size(len(sampler))
+
+    de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img)
+    de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label)
+
+    columns_to_project = ["image", "label"]
+    de_dataset = de_dataset.project(columns=columns_to_project)
+
+    de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder)
+    de_dataset = de_dataset.repeat(repeat_num)
+
+    return de_dataset
+
+
+class TxtDataset:
+    """
+    create txt dataset.
+
+    Args:
+    Returns:
+        de_dataset.
+    """
+    def __init__(self, root, txt_name):
+        super(TxtDataset, self).__init__()
+        self.imgs = []
+        self.labels = []
+        fin = open(txt_name, "r")
+        for line in fin:
+            img_name, label = line.strip().split(' ')
+            self.imgs.append(os.path.join(root, img_name))
+            self.labels.append(int(label))
+        fin.close()
+
+    def __getitem__(self, index):
+        img = Image.open(self.imgs[index]).convert('RGB')
+        return img, self.labels[index]
+
+    def __len__(self):
+        return len(self.imgs)
--- a/model_zoo/vgg16/src/linear_warmup.py
+++ b/model_zoo/vgg16/src/linear_warmup.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+linear warm up learning rate.
+"""
+
+
+def linear_warmup_lr(current_step, warmup_steps, base_lr, init_lr):
+    lr_inc = (float(base_lr) - float(init_lr)) / float(warmup_steps)
+    lr = float(init_lr) + lr_inc * current_step
+    return lr
--- a/model_zoo/vgg16/src/utils/logging.py
+++ b/model_zoo/vgg16/src/utils/logging.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+get logger.
+"""
+import logging
+import os
+import sys
+from datetime import datetime
+
+class LOGGER(logging.Logger):
+    """
+    set up logging file.
+
+    Args:
+        logger_name (string): logger name.
+        log_dir (string): path of logger.
+
+    Returns:
+        string, logger path
+    """
+    def __init__(self, logger_name, rank=0):
+        super(LOGGER, self).__init__(logger_name)
+        if rank % 8 == 0:
+            console = logging.StreamHandler(sys.stdout)
+            console.setLevel(logging.INFO)
+            formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
+            console.setFormatter(formatter)
+            self.addHandler(console)
+
+    def setup_logging_file(self, log_dir, rank=0):
+        """set up log file"""
+        self.rank = rank
+        if not os.path.exists(log_dir):
+            os.makedirs(log_dir, exist_ok=True)
+        log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank)
+        self.log_fn = os.path.join(log_dir, log_name)
+        fh = logging.FileHandler(self.log_fn)
+        fh.setLevel(logging.INFO)
+        formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
+        fh.setFormatter(formatter)
+        self.addHandler(fh)
+
+    def info(self, msg, *args, **kwargs):
+        if self.isEnabledFor(logging.INFO):
+            self._log(logging.INFO, msg, args, **kwargs)
+
+    def save_args(self, args):
+        self.info('Args:')
+        args_dict = vars(args)
+        for key in args_dict.keys():
+            self.info('--> %s: %s', key, args_dict[key])
+        self.info('')
+
+    def important_info(self, msg, *args, **kwargs):
+        if self.isEnabledFor(logging.INFO) and self.rank == 0:
+            line_width = 2
+            important_msg = '\n'
+            important_msg += ('*'*70 + '\n')*line_width
+            important_msg += ('*'*line_width + '\n')*2
+            important_msg += '*'*line_width + ' '*8 + msg + '\n'
+            important_msg += ('*'*line_width + '\n')*2
+            important_msg += ('*'*70 + '\n')*line_width
+            self.info(important_msg, *args, **kwargs)
+
+
+def get_logger(path, rank):
+    logger = LOGGER("mindversion", rank)
+    logger.setup_logging_file(path, rank)
+    return logger
--- a/model_zoo/vgg16/src/utils/sampler.py
+++ b/model_zoo/vgg16/src/utils/sampler.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+choose samples from the dataset
+"""
+import math
+import numpy as np
+
+class DistributedSampler():
+    """
+    sampling the dataset.
+
+    Args:
+    Returns:
+        num_samples, number of samples.
+    """
+    def __init__(self, dataset, rank, group_size, shuffle=True, seed=0):
+        self.dataset = dataset
+        self.rank = rank
+        self.group_size = group_size
+        self.dataset_length = len(self.dataset)
+        self.num_samples = int(math.ceil(self.dataset_length * 1.0 / self.group_size))
+        self.total_size = self.num_samples * self.group_size
+        self.shuffle = shuffle
+        self.seed = seed
+
+    def __iter__(self):
+        if self.shuffle:
+            self.seed = (self.seed + 1) & 0xffffffff
+            np.random.seed(self.seed)
+            indices = np.random.permutation(self.dataset_length).tolist()
+        else:
+            indices = list(range(len(self.dataset_length)))
+
+        indices += indices[:(self.total_size - len(indices))]
+        indices = indices[self.rank::self.group_size]
+        return iter(indices)
+
+    def __len__(self):
+        return self.num_samples
+ 
\ No newline at end of file
--- a/model_zoo/vgg16/src/utils/util.py
+++ b/model_zoo/vgg16/src/utils/util.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""Util class or function."""
+
+
+def get_param_groups(network):
+    """Param groups for optimizer."""
+    decay_params = []
+    no_decay_params = []
+    for x in network.trainable_params():
+        parameter_name = x.name
+        if parameter_name.endswith('.bias'):
+            # all bias not using weight decay
+            no_decay_params.append(x)
+        elif parameter_name.endswith('.gamma'):
+            # bn weight bias not using weight decay, be carefully for now x not include BN
+            no_decay_params.append(x)
+        elif parameter_name.endswith('.beta'):
+            # bn weight bias not using weight decay, be carefully for now x not include BN
+            no_decay_params.append(x)
+        else:
+            decay_params.append(x)
+
+    return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}]
--- a/model_zoo/vgg16/src/utils/var_init.py
+++ b/model_zoo/vgg16/src/utils/var_init.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+Initialize.
+"""
+import math
+from functools import reduce
+import numpy as np
+import mindspore.nn as nn
+from mindspore.common import initializer as init
+
+def _calculate_gain(nonlinearity, param=None):
+    r"""
+    Return the recommended gain value for the given nonlinearity function.
+
+    The values are as follows:
+    ================= ====================================================
+    nonlinearity      gain
+    ================= ====================================================
+    Linear / Identity :math:`1`
+    Conv{1,2,3}D      :math:`1`
+    Sigmoid           :math:`1`
+    Tanh              :math:`\frac{5}{3}`
+    ReLU              :math:`\sqrt{2}`
+    Leaky Relu        :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`
+    ================= ====================================================
+
+    Args:
+        nonlinearity: the non-linear function
+        param: optional parameter for the non-linear function
+
+    Examples:
+        >>> gain = calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2
+    """
+    linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
+    if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
+        return 1
+    if nonlinearity == 'tanh':
+        return 5.0 / 3
+    if nonlinearity == 'relu':
+        return math.sqrt(2.0)
+    if nonlinearity == 'leaky_relu':
+        if param is None:
+            negative_slope = 0.01
+        elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
+            negative_slope = param
+        else:
+            raise ValueError("negative_slope {} not a valid number".format(param))
+        return math.sqrt(2.0 / (1 + negative_slope ** 2))
+
+    raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))
+
+def _assignment(arr, num):
+    """Assign the value of `num` to `arr`."""
+    if arr.shape == ():
+        arr = arr.reshape((1))
+        arr[:] = num
+        arr = arr.reshape(())
+    else:
+        if isinstance(num, np.ndarray):
+            arr[:] = num[:]
+        else:
+            arr[:] = num
+    return arr
+
+def _calculate_in_and_out(arr):
+    """
+    Calculate n_in and n_out.
+
+    Args:
+        arr (Array): Input array.
+
+    Returns:
+        Tuple, a tuple with two elements, the first element is `n_in` and the second element is `n_out`.
+    """
+    dim = len(arr.shape)
+    if dim < 2:
+        raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")
+
+    n_in = arr.shape[1]
+    n_out = arr.shape[0]
+
+    if dim > 2:
+        counter = reduce(lambda x, y: x * y, arr.shape[2:])
+        n_in *= counter
+        n_out *= counter
+    return n_in, n_out
+
+def _select_fan(array, mode):
+    mode = mode.lower()
+    valid_modes = ['fan_in', 'fan_out']
+    if mode not in valid_modes:
+        raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))
+
+    fan_in, fan_out = _calculate_in_and_out(array)
+    return fan_in if mode == 'fan_in' else fan_out
+
+class KaimingInit(init.Initializer):
+    r"""
+    Base Class. Initialize the array with He kaiming algorithm.
+
+    Args:
+        a: the negative slope of the rectifier used after this layer (only
+            used with ``'leaky_relu'``)
+        mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
+            preserves the magnitude of the variance of the weights in the
+            forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
+            backwards pass.
+        nonlinearity: the non-linear function, recommended to use only with
+            ``'relu'`` or ``'leaky_relu'`` (default).
+    """
+    def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):
+        super(KaimingInit, self).__init__()
+        self.mode = mode
+        self.gain = _calculate_gain(nonlinearity, a)
+    def _initialize(self, arr):
+        pass
+
+
+class KaimingUniform(KaimingInit):
+    r"""
+    Initialize the array with He kaiming uniform algorithm. The resulting tensor will
+    have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where
+
+    .. math::
+        \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}
+
+    Input:
+        arr (Array): The array to be assigned.
+
+    Returns:
+        Array, assigned array.
+
+    Examples:
+        >>> w = np.empty(3, 5)
+        >>> KaimingUniform(w, mode='fan_in', nonlinearity='relu')
+    """
+
+    def _initialize(self, arr):
+        fan = _select_fan(arr, self.mode)
+        bound = math.sqrt(3.0) * self.gain / math.sqrt(fan)
+        np.random.seed(0)
+        data = np.random.uniform(-bound, bound, arr.shape)
+
+        _assignment(arr, data)
+
+
+class KaimingNormal(KaimingInit):
+    r"""
+    Initialize the array with He kaiming normal algorithm. The resulting tensor will
+    have values sampled from :math:`\mathcal{N}(0, \text{std}^2)` where
+
+    .. math::
+        \text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}
+
+    Input:
+        arr (Array): The array to be assigned.
+
+    Returns:
+        Array, assigned array.
+
+    Examples:
+        >>> w = np.empty(3, 5)
+        >>> KaimingNormal(w, mode='fan_out', nonlinearity='relu')
+    """
+
+    def _initialize(self, arr):
+        fan = _select_fan(arr, self.mode)
+        std = self.gain / math.sqrt(fan)
+        np.random.seed(0)
+        data = np.random.normal(0, std, arr.shape)
+
+        _assignment(arr, data)
+
+
+def default_recurisive_init(custom_cell):
+    """default_recurisive_init"""
+    for _, cell in custom_cell.cells_and_names():
+        if isinstance(cell, nn.Conv2d):
+            cell.weight.default_input = init.initializer(KaimingUniform(a=math.sqrt(5)),
+                                                         cell.weight.shape,
+                                                         cell.weight.dtype)
+            if cell.bias is not None:
+                fan_in, _ = _calculate_in_and_out(cell.weight)
+                bound = 1 / math.sqrt(fan_in)
+                np.random.seed(0)
+                cell.bias.default_input = init.initializer(init.Uniform(bound),
+                                                           cell.bias.shape,
+                                                           cell.bias.dtype)
+        elif isinstance(cell, nn.Dense):
+            cell.weight.default_input = init.initializer(KaimingUniform(a=math.sqrt(5)),
+                                                         cell.weight.shape,
+                                                         cell.weight.dtype)
+            if cell.bias is not None:
+                fan_in, _ = _calculate_in_and_out(cell.weight)
+                bound = 1 / math.sqrt(fan_in)
+                np.random.seed(0)
+                cell.bias.default_input = init.initializer(init.Uniform(bound),
+                                                           cell.bias.shape,
+                                                           cell.bias.dtype)
+        elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d)):
+            pass
--- a/model_zoo/vgg16/src/vgg.py
+++ b/model_zoo/vgg16/src/vgg.py
@@ -12,12 +12,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
-"""VGG."""
+"""
+Image classifiation.
+"""
+import math
 import mindspore.nn as nn
-from mindspore.common.initializer import initializer
 import mindspore.common.dtype as mstype
+from mindspore.common import initializer as init
+from mindspore.common.initializer import initializer
+from .utils.var_init import default_recurisive_init, KaimingNormal
+

-def _make_layer(base, batch_norm):
+def _make_layer(base, args, batch_norm):
    """Make stage network of VGG."""
    layers = []
    in_channels = 3
@@ -25,13 +31,17 @@ def _make_layer(base, batch_norm):
        if v == 'M':
            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
        else:
-            weight_shape = (v, in_channels, 3, 3)
-            weight = initializer('XavierUniform', shape=weight_shape, dtype=mstype.float32).to_tensor()
+            weight = 'ones'
+            if args.initialize_mode == "XavierUniform":
+                weight_shape = (v, in_channels, 3, 3)
+                weight = initializer('XavierUniform', shape=weight_shape, dtype=mstype.float32).to_tensor()
+
            conv2d = nn.Conv2d(in_channels=in_channels,
                               out_channels=v,
                               kernel_size=3,
-                               padding=0,
-                               pad_mode='same',
+                               padding=args.padding,
+                               pad_mode=args.pad_mode,
+                               has_bias=args.has_bias,
                               weight_init=weight)
            if batch_norm:
                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU()]
@@ -59,17 +69,25 @@ class Vgg(nn.Cell):
        >>>     num_classes=1000, batch_norm=False, batch_size=1)
    """

-    def __init__(self, base, num_classes=1000, batch_norm=False, batch_size=1):
+    def __init__(self, base, num_classes=1000, batch_norm=False, batch_size=1, args=None, phase="train"):
        super(Vgg, self).__init__()
        _ = batch_size
-        self.layers = _make_layer(base, batch_norm=batch_norm)
+        self.layers = _make_layer(base, args, batch_norm=batch_norm)
        self.flatten = nn.Flatten()
+        dropout_ratio = 0.5
+        if not args.has_dropout or phase == "test":
+            dropout_ratio = 1.0
        self.classifier = nn.SequentialCell([
            nn.Dense(512 * 7 * 7, 4096),
            nn.ReLU(),
+            nn.Dropout(dropout_ratio),
            nn.Dense(4096, 4096),
            nn.ReLU(),
+            nn.Dropout(dropout_ratio),
            nn.Dense(4096, num_classes)])
+        if args.initialize_mode == "KaimingNormal":
+            default_recurisive_init(self)
+            self.custom_init_weight()

    def construct(self, x):
        x = self.layers(x)
@@ -77,6 +95,25 @@ class Vgg(nn.Cell):
        x = self.classifier(x)
        return x

+    def custom_init_weight(self):
+        """
+        Init the weight of Conv2d and Dense in the net.
+        """
+        for _, cell in self.cells_and_names():
+            if isinstance(cell, nn.Conv2d):
+                cell.weight.default_input = init.initializer(
+                    KaimingNormal(a=math.sqrt(5), mode='fan_out', nonlinearity='relu'),
+                    cell.weight.shape, cell.weight.dtype)
+                if cell.bias is not None:
+                    cell.bias.default_input = init.initializer(
+                        'zeros', cell.bias.shape, cell.bias.dtype)
+            elif isinstance(cell, nn.Dense):
+                cell.weight.default_input = init.initializer(
+                    init.Normal(0.01), cell.weight.shape, cell.weight.dtype)
+                if cell.bias is not None:
+                    cell.bias.default_input = init.initializer(
+                        'zeros', cell.bias.shape, cell.bias.dtype)
+

 cfg = {
    '11': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],
@@ -86,19 +123,24 @@ cfg = {
 }


-def vgg16(num_classes=1000):
+def vgg16(num_classes=1000, args=None, phase="train"):
    """
    Get Vgg16 neural network with batch normalization.

    Args:
        num_classes (int): Class numbers. Default: 1000.
+        args(namespace): param for net init.
+        phase(str): train or test mode.

    Returns:
        Cell, cell instance of Vgg16 neural network with batch normalization.

    Examples:
-        >>> vgg16(num_classes=1000)
+        >>> vgg16(num_classes=1000, args=args)
    """

-    net = Vgg(cfg['16'], num_classes=num_classes, batch_norm=True)
+    if args is None:
+        from .config import cifar_cfg
+        args = cifar_cfg
+    net = Vgg(cfg['16'], num_classes=num_classes, args=args, batch_norm=args.batch_norm, phase=phase)
    return net
--- a/model_zoo/vgg16/src/warmup_cosine_annealing_lr.py
+++ b/model_zoo/vgg16/src/warmup_cosine_annealing_lr.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+warm up cosine annealing learning rate.
+"""
+import math
+import numpy as np
+
+from .linear_warmup import linear_warmup_lr
+
+
+def warmup_cosine_annealing_lr(lr, steps_per_epoch, warmup_epochs, max_epoch, T_max, eta_min=0):
+    """warm up cosine annealing learning rate."""
+    base_lr = lr
+    warmup_init_lr = 0
+    total_steps = int(max_epoch * steps_per_epoch)
+    warmup_steps = int(warmup_epochs * steps_per_epoch)
+
+    lr_each_step = []
+    for i in range(total_steps):
+        last_epoch = i // steps_per_epoch
+        if i < warmup_steps:
+            lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
+        else:
+            lr = eta_min + (base_lr - eta_min) * (1. + math.cos(math.pi*last_epoch / T_max)) / 2
+        lr_each_step.append(lr)
+
+    return np.array(lr_each_step).astype(np.float32)
--- a/model_zoo/vgg16/src/warmup_step_lr.py
+++ b/model_zoo/vgg16/src/warmup_step_lr.py
+# Copyright 2020 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+"""
+warm up step learning rate.
+"""
+from collections import Counter
+import numpy as np
+
+from .linear_warmup import linear_warmup_lr
+
+
+def lr_steps(global_step, lr_init, lr_max, warmup_epochs, total_epochs, steps_per_epoch):
+    """Set learning rate."""
+    lr_each_step = []
+    total_steps = steps_per_epoch * total_epochs
+    warmup_steps = steps_per_epoch * warmup_epochs
+    if warmup_steps != 0:
+        inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
+    else:
+        inc_each_step = 0
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr_value = float(lr_init) + inc_each_step * float(i)
+        else:
+            base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
+            lr_value = float(lr_max) * base * base
+            if lr_value < 0.0:
+                lr_value = 0.0
+        lr_each_step.append(lr_value)
+
+    current_step = global_step
+    lr_each_step = np.array(lr_each_step).astype(np.float32)
+    learning_rate = lr_each_step[current_step:]
+
+    return learning_rate
+
+
+def warmup_step_lr(lr, lr_epochs, steps_per_epoch, warmup_epochs, max_epoch, gamma=0.1):
+    """warmup_step_lr"""
+    base_lr = lr
+    warmup_init_lr = 0
+    total_steps = int(max_epoch * steps_per_epoch)
+    warmup_steps = int(warmup_epochs * steps_per_epoch)
+    milestones = lr_epochs
+    milestones_steps = []
+    for milestone in milestones:
+        milestones_step = milestone * steps_per_epoch
+        milestones_steps.append(milestones_step)
+
+    lr_each_step = []
+    lr = base_lr
+    milestones_steps_counter = Counter(milestones_steps)
+    for i in range(total_steps):
+        if i < warmup_steps:
+            lr = linear_warmup_lr(i + 1, warmup_steps, base_lr, warmup_init_lr)
+        else:
+            lr = lr * gamma**milestones_steps_counter[i]
+        lr_each_step.append(lr)
+
+    return np.array(lr_each_step).astype(np.float32)
+
+
+def multi_step_lr(lr, milestones, steps_per_epoch, max_epoch, gamma=0.1):
+    return warmup_step_lr(lr, milestones, steps_per_epoch, 0, max_epoch, gamma=gamma)
+
+
+def step_lr(lr, epoch_size, steps_per_epoch, max_epoch, gamma=0.1):
+    lr_epochs = []
+    for i in range(1, max_epoch):
+        if i % epoch_size == 0:
+            lr_epochs.append(i)
+    return multi_step_lr(lr, lr_epochs, steps_per_epoch, max_epoch, gamma=gamma)
--- a/model_zoo/vgg16/train.py
+++ b/model_zoo/vgg16/train.py
@@ -17,6 +17,7 @@
 python train.py --data_path=$DATA_HOME --device_id=$DEVICE_ID
 """
 import argparse
+import datetime
 import os
 import random

@@ -25,83 +26,213 @@ import numpy as np
 import mindspore.nn as nn
 from mindspore import Tensor
 from mindspore import context
-from mindspore.communication.management import init
+from mindspore import ParallelMode
+from mindspore.communication.management import init, get_rank, get_group_size
 from mindspore.nn.optim.momentum import Momentum
 from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor
-from mindspore.train.model import Model, ParallelMode
+from mindspore.train.model import Model
 from mindspore.train.serialization import load_param_into_net, load_checkpoint
-from src.config import cifar_cfg as cfg
+from mindspore.train.loss_scale_manager import FixedLossScaleManager
 from src.dataset import vgg_create_dataset
+from src.dataset import classification_dataset
+
+from src.crossentropy import CrossEntropy
+from src.warmup_step_lr import warmup_step_lr
+from src.warmup_cosine_annealing_lr import warmup_cosine_annealing_lr
+from src.warmup_step_lr import lr_steps
+from src.utils.logging import get_logger
+from src.utils.util import get_param_groups
 from src.vgg import vgg16

+
 random.seed(1)
 np.random.seed(1)


-def lr_steps(global_step, lr_init, lr_max, warmup_epochs, total_epochs, steps_per_epoch):
-    """Set learning rate."""
-    lr_each_step = []
-    total_steps = steps_per_epoch * total_epochs
-    warmup_steps = steps_per_epoch * warmup_epochs
-    if warmup_steps != 0:
-        inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps)
-    else:
-        inc_each_step = 0
-    for i in range(total_steps):
-        if i < warmup_steps:
-            lr_value = float(lr_init) + inc_each_step * float(i)
-        else:
-            base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps)))
-            lr_value = float(lr_max) * base * base
-            if lr_value < 0.0:
-                lr_value = 0.0
-        lr_each_step.append(lr_value)
+def parse_args(cloud_args=None):
+    """parameters"""
+    parser = argparse.ArgumentParser('mindspore classification training')
+    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU'],
+                        help='device where the code will be implemented. (Default: Ascend)')
+    parser.add_argument('--device_id', type=int, default=1, help='device id of GPU or Ascend. (Default: None)')

-    current_step = global_step
-    lr_each_step = np.array(lr_each_step).astype(np.float32)
-    learning_rate = lr_each_step[current_step:]
+    # dataset related
+    parser.add_argument('--dataset', type=str, choices=["cifar10", "imagenet2012"], default="cifar10")
+    parser.add_argument('--data_path', type=str, default='', help='train data dir')

-    return learning_rate
+    # network related
+    parser.add_argument('--pre_trained', default='', type=str, help='model_path, local pretrained model to load')
+    parser.add_argument('--lr_gamma', type=float, default=0.1,
+                        help='decrease lr by a factor of exponential lr_scheduler')
+    parser.add_argument('--eta_min', type=float, default=0., help='eta_min in cosine_annealing scheduler')
+    parser.add_argument('--T_max', type=int, default=150, help='T-max in cosine_annealing scheduler')

+    # logging and checkpoint related
+    parser.add_argument('--log_interval', type=int, default=100, help='logging interval')
+    parser.add_argument('--ckpt_path', type=str, default='outputs/', help='checkpoint save location')
+    parser.add_argument('--ckpt_interval', type=int, default=5, help='ckpt_interval')
+    parser.add_argument('--is_save_on_master', type=int, default=1, help='save ckpt on master or all rank')

-if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='Cifar10 classification')
-    parser.add_argument('--device_target', type=str, default='Ascend', choices=['Ascend', 'GPU'],
-                        help='device where the code will be implemented. (Default: Ascend)')
-    parser.add_argument('--data_path', type=str, default='./cifar', help='path where the dataset is saved')
-    parser.add_argument('--device_id', type=int, default=None, help='device id of GPU or Ascend. (Default: None)')
-    parser.add_argument('--pre_trained', type=str, default=None, help='the pretrained checkpoint file path.')
+    # distributed related
+    parser.add_argument('--is_distributed', type=int, default=0, help='if multi device')
+    parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
+    parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')
    args_opt = parser.parse_args()
+    args_opt = merge_args(args_opt, cloud_args)

-    context.set_context(mode=context.GRAPH_MODE, device_target=args_opt.device_target)
-    context.set_context(device_id=args_opt.device_id)
+    if args_opt.dataset == "cifar10":
+        from src.config import cifar_cfg as cfg
+    else:
+        from src.config import imagenet_cfg as cfg
+
+    args_opt.label_smooth = cfg.label_smooth
+    args_opt.label_smooth_factor = cfg.label_smooth_factor
+    args_opt.lr_scheduler = cfg.lr_scheduler
+    args_opt.loss_scale = cfg.loss_scale
+    args_opt.max_epoch = cfg.max_epoch
+    args_opt.warmup_epochs = cfg.warmup_epochs
+    args_opt.lr = cfg.lr
+    args_opt.lr_init = cfg.lr_init
+    args_opt.lr_max = cfg.lr_max
+    args_opt.momentum = cfg.momentum
+    args_opt.weight_decay = cfg.weight_decay
+    args_opt.per_batch_size = cfg.batch_size
+    args_opt.num_classes = cfg.num_classes
+    args_opt.buffer_size = cfg.buffer_size
+    args_opt.ckpt_save_max = cfg.keep_checkpoint_max
+    args_opt.pad_mode = cfg.pad_mode
+    args_opt.padding = cfg.padding
+    args_opt.has_bias = cfg.has_bias
+    args_opt.batch_norm = cfg.batch_norm
+    args_opt.initialize_mode = cfg.initialize_mode
+    args_opt.has_dropout = cfg.has_dropout
+
+    args_opt.lr_epochs = list(map(int, cfg.lr_epochs.split(',')))
+    args_opt.image_size = list(map(int, cfg.image_size.split(',')))
+
+    return args_opt
+
+
+def merge_args(args_opt, cloud_args):
+    """dictionary"""
+    args_dict = vars(args_opt)
+    if isinstance(cloud_args, dict):
+        for key_arg in cloud_args.keys():
+            val = cloud_args[key_arg]
+            if key_arg in args_dict and val:
+                arg_type = type(args_dict[key_arg])
+                if arg_type is not None:
+                    val = arg_type(val)
+                args_dict[key_arg] = val
+    return args_opt
+
+
+if __name__ == '__main__':
+    args = parse_args()

    device_num = int(os.environ.get("DEVICE_NUM", 1))
-    if device_num > 1:
+    if args.is_distributed:
+        if args.device_target == "Ascend":
+            init()
+            context.set_context(device_id=args.device_id)
+        elif args.device_target == "GPU":
+            init("nccl")
+
+        args.rank = get_rank()
+        args.group_size = get_group_size()
+        device_num = args.group_size
        context.reset_auto_parallel_context()
        context.set_auto_parallel_context(device_num=device_num, parallel_mode=ParallelMode.DATA_PARALLEL,
-                                          mirror_mean=True)
-        init()
+                                          parameter_broadcast=True, mirror_mean=True)
+    else:
+        context.set_context(device_id=args.device_id)
+    context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target)
+
+    # select for master rank save ckpt or all rank save, compatible for model parallel
+    args.rank_save_ckpt_flag = 0
+    if args.is_save_on_master:
+        if args.rank == 0:
+            args.rank_save_ckpt_flag = 1
+    else:
+        args.rank_save_ckpt_flag = 1
+
+    # logger
+    args.outputs_dir = os.path.join(args.ckpt_path,
+                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
+    args.logger = get_logger(args.outputs_dir, args.rank)
+
+    if args.dataset == "cifar10":
+        dataset = vgg_create_dataset(args.data_path, args.image_size, args.per_batch_size, args.rank, args.group_size,
+                                     repeat_num=args.max_epoch)
+    else:
+        dataset = classification_dataset(args.data_path, args.image_size, args.per_batch_size,
+                                         args.rank, args.group_size, repeat_num=args.max_epoch)

-    dataset = vgg_create_dataset(args_opt.data_path, cfg.epoch_size)
    batch_num = dataset.get_dataset_size()
+    args.steps_per_epoch = dataset.get_dataset_size()
+    args.logger.save_args(args)
+
+    # network
+    args.logger.important_info('start create network')
+
+    # get network and init
+    network = vgg16(args.num_classes, args)

-    net = vgg16(num_classes=cfg.num_classes)
    # pre_trained
-    if args_opt.pre_trained:
-        load_param_into_net(net, load_checkpoint(args_opt.pre_trained))
-
-    lr = lr_steps(0, lr_init=cfg.lr_init, lr_max=cfg.lr_max, warmup_epochs=cfg.warmup_epochs,
-                  total_epochs=cfg.epoch_size, steps_per_epoch=batch_num)
-    opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), Tensor(lr), cfg.momentum,
-                   weight_decay=cfg.weight_decay)
-    loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
-    model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
-                  amp_level="O2", keep_batchnorm_fp32=False, loss_scale_manager=None)
-
-    config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5, keep_checkpoint_max=cfg.keep_checkpoint_max)
+    if args.pre_trained:
+        load_param_into_net(network, load_checkpoint(args.pre_trained))
+
+    # lr scheduler
+    if args.lr_scheduler == 'exponential':
+        lr = warmup_step_lr(args.lr,
+                            args.lr_epochs,
+                            args.steps_per_epoch,
+                            args.warmup_epochs,
+                            args.max_epoch,
+                            gamma=args.lr_gamma,
+                            )
+    elif args.lr_scheduler == 'cosine_annealing':
+        lr = warmup_cosine_annealing_lr(args.lr,
+                                        args.steps_per_epoch,
+                                        args.warmup_epochs,
+                                        args.max_epoch,
+                                        args.T_max,
+                                        args.eta_min)
+    elif args.lr_scheduler == 'step':
+        lr = lr_steps(0, lr_init=args.lr_init, lr_max=args.lr_max, warmup_epochs=args.warmup_epochs,
+                      total_epochs=args.max_epoch, steps_per_epoch=batch_num)
+    else:
+        raise NotImplementedError(args.lr_scheduler)
+
+    # optimizer
+    opt = Momentum(params=get_param_groups(network),
+                   learning_rate=Tensor(lr),
+                   momentum=args.momentum,
+                   weight_decay=args.weight_decay,
+                   loss_scale=args.loss_scale)
+
+    if args.dataset == "cifar10":
+        loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean', is_grad=False)
+        model = Model(network, loss_fn=loss, optimizer=opt, metrics={'acc'},
+                      amp_level="O2", keep_batchnorm_fp32=False, loss_scale_manager=None)
+    else:
+        if not args.label_smooth:
+            args.label_smooth_factor = 0.0
+        loss = CrossEntropy(smooth_factor=args.label_smooth_factor, num_classes=args.num_classes)
+
+        loss_scale_manager = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)
+        model = Model(network, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale_manager, amp_level="O2")
+
+    # define callbacks
    time_cb = TimeMonitor(data_size=batch_num)
-    ckpoint_cb = ModelCheckpoint(prefix="train_vgg_cifar10", directory="./", config=config_ck)
-    loss_cb = LossMonitor()
-    model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb])
-    print("train success")
+    loss_cb = LossMonitor(per_print_times=batch_num)
+    callbacks = [time_cb, loss_cb]
+    if args.rank_save_ckpt_flag:
+        ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval * args.steps_per_epoch,
+                                       keep_checkpoint_max=args.ckpt_save_max)
+        ckpt_cb = ModelCheckpoint(config=ckpt_config,
+                                  directory=args.outputs_dir,
+                                  prefix='{}'.format(args.rank))
+        callbacks.append(ckpt_cb)
+
+    model.train(args.max_epoch, dataset, callbacks=callbacks)