getting_started_en.md 13.8 KB
Newer Older
1 2
# Getting Started
---
T
Tingquan Gao 已提交
3
Please refer to [Installation](install_en.md) to setup environment at first, and prepare flower102 dataset by following the instruction mentioned in the [Quick Start](quick_start_en.md).
4

T
Tingquan Gao 已提交
5
## 1. Training and Evaluation on CPU or Single GPU
6

T
Tingquan Gao 已提交
7 8
If training and evaluation are performed on CPU or single GPU, it is recommended to use the `tools/train.py` and `tools/eval.py`.
For training and evaluation in multi-GPU environment on Linux, please refer to [2. Training and evaluation on Linux+GPU](#2-training-and-evaluation-on-linuxgpu).
9

T
Tingquan Gao 已提交
10
<a name="1.1"></a>
11 12 13 14 15
## 1.1 Model training

After preparing the configuration file, The training process can be started in the following way.

```
T
Tingquan Gao 已提交
16 17 18 19
python tools/train.py \
    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="" \
    -o use_gpu=False
20 21
```

T
Tingquan Gao 已提交
22 23
Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
24 25


T
Tingquan Gao 已提交
26
Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_en.md).
27 28

* The output log examples are as follows:
G
gaotingquan 已提交
29
    * If mixup or cutmix is used in training, top-1 and top-k (default by 5) will not be printed in the log:
30 31

    ```
G
gaotingquan 已提交
32 33 34 35 36
    ...
    epoch:0  , train step:20   , loss: 4.53660, lr: 0.003750, batch_cost: 1.23101 s, reader_cost: 0.74311 s, ips: 25.99489 images/sec, eta: 0:12:43
    ...
    END epoch:1   valid top1: 0.01569, top5: 0.06863, loss: 4.61747,  batch_cost: 0.26155 s, reader_cost: 0.16952 s, batch_cost_sum: 10.72348 s, ips: 76.46772 images/sec.
    ...
37 38
    ```

G
gaotingquan 已提交
39
    * If mixup or cutmix is not used during training, in addition to the above information, top-1 and top-k (The default is 5) will also be printed in the log:
40 41

    ```
G
gaotingquan 已提交
42 43 44 45 46
    ...
    epoch:0  , train step:30  , top1: 0.06250, top5: 0.09375, loss: 4.62766, lr: 0.003728, batch_cost: 0.64089 s, reader_cost: 0.18857 s, ips: 49.93080 images/sec, eta: 0:06:18
    ...
    END epoch:0   train top1: 0.01310, top5: 0.04738, loss: 4.65124,  batch_cost: 0.64089 s, reader_cost: 0.18857 s, batch_cost_sum: 13.45863 s, ips: 49.93080 images/sec.
    ...
47 48
    ```

49
During training, you can view loss changes in real time through `VisualDL`,  see [VisualDL](../extension/VisualDL_en.md) for details.
50 51 52

### 1.2 Model finetuning

T
Tingquan Gao 已提交
53
After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below.
54 55

```
T
Tingquan Gao 已提交
56 57 58 59
python tools/train.py \
    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
    -o use_gpu=True
60 61
```

T
Tingquan Gao 已提交
62 63 64
Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).
65 66 67

### 1.3 Resume Training

T
Tingquan Gao 已提交
68
If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
69 70

```
T
Tingquan Gao 已提交
71 72
python tools/train.py \
    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
T
Tingquan Gao 已提交
73
    -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
T
Tingquan Gao 已提交
74 75
    -o last_epoch=5 \
    -o use_gpu=True
76 77
```

T
Tingquan Gao 已提交
78 79 80 81 82
The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.

**Note**:
* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.

T
Tingquan Gao 已提交
83
* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
T
Tingquan Gao 已提交
84 85 86 87 88 89 90 91 92 93 94 95 96 97

    ```shell
    output/
    └── MobileNetV3_large_x1_0
        ├── 0
        │   ├── ppcls.pdopt
        │   └── ppcls.pdparams
        ├── 1
        │   ├── ppcls.pdopt
        │   └── ppcls.pdparams
        .
        .
        .
    ```
98 99 100 101


### 1.4 Model evaluation

T
Tingquan Gao 已提交
102
The model evaluation process can be started as follows.
103 104

```bash
T
Tingquan Gao 已提交
105
python tools/eval.py \
T
Tingquan Gao 已提交
106 107 108
    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
    -o load_static_weights=False
109 110
```

T
Tingquan Gao 已提交
111
The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
112

T
Tingquan Gao 已提交
113 114 115 116
Some of the configurable evaluation parameters are described as follows:
* `ARCHITECTURE.name`: Model name
* `pretrained_model`: The path of the model file to be evaluated
* `load_static_weights`: Whether the model to be evaluated is a static graph model
117

T
Tingquan Gao 已提交
118 119

**Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).
120

T
Tingquan Gao 已提交
121
<a name="2"></a>
122 123
### 2. Training and evaluation on Linux+GPU

T
Tingquan Gao 已提交
124
If you want to run PaddleClas on Linux with GPU, it is highly recommended to use `paddle.distributed.launch` to start the model training script(`tools/train.py`) and evaluation script(`tools/eval.py`), which can start on multi-GPU environment more conveniently.
125 126 127

### 2.1 Model training

T
Tingquan Gao 已提交
128
After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
129 130

```bash
T
Tingquan Gao 已提交
131 132
export CUDA_VISIBLE_DEVICES=0,1,2,3

133 134 135
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
136
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
137 138 139 140 141 142 143 144
```

The configuration can be updated by adding the `-o` parameter.

```bash
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
145 146 147
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
        -o pretrained_model="" \
        -o use_gpu=True
148 149
```

T
Tingquan Gao 已提交
150
The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
151 152 153

### 2.2 Model finetuning

T
Tingquan Gao 已提交
154
After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below.
155 156

```
T
Tingquan Gao 已提交
157 158
export CUDA_VISIBLE_DEVICES=0,1,2,3

159 160 161
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
162 163
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
        -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
164 165 166 167
```

Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

T
Tingquan Gao 已提交
168
There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.
169

T
Tingquan Gao 已提交
170
<a name="model_resume"></a>
171 172
### 2.3 Resume Training

T
Tingquan Gao 已提交
173
If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
174 175

```
T
Tingquan Gao 已提交
176 177
export CUDA_VISIBLE_DEVICES=0,1,2,3

178 179 180
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
181
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
T
Tingquan Gao 已提交
182
        -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
T
Tingquan Gao 已提交
183 184
        -o last_epoch=5 \
        -o use_gpu=True
185 186
```

T
Tingquan Gao 已提交
187
The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
188 189 190

### 2.4 Model evaluation

T
Tingquan Gao 已提交
191
The model evaluation process can be started as follows.
192 193

```bash
T
Tingquan Gao 已提交
194
python tools/eval.py \
T
Tingquan Gao 已提交
195 196 197
    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
    -o load_static_weights=False
198 199
```

T
Tingquan Gao 已提交
200
About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
201

T
Tingquan Gao 已提交
202 203 204
<a name="model_infer"></a>
## 3. Use the pre-trained model to predict
After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:
205

T
Tingquan Gao 已提交
206 207
```python
python tools/infer/infer.py \
T
Tingquan Gao 已提交
208
    -i image path \
209
    --model MobileNetV3_large_x1_0 \
T
Tingquan Gao 已提交
210
    --pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
T
Tingquan Gao 已提交
211 212 213
    --use_gpu True \
    --load_static_weights False
```
214

T
Tingquan Gao 已提交
215 216
Among them:
+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
217
+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
T
Tingquan Gao 已提交
218
+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
T
Tingquan Gao 已提交
219 220
+ `use_gpu`: Whether to use the GPU, default by `True`;
+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
T
Tingquan Gao 已提交
221 222
+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
T
Tingquan Gao 已提交
223 224 225
+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.

226
**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
T
Tingquan Gao 已提交
227

T
Tingquan Gao 已提交
228 229 230 231 232 233
About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).

<a name="model_inference"></a>
## 4. Use the inference model to predict

PaddlePaddle supports inference using prediction engines, which will be introduced next.
234 235 236 237 238

Firstly, you should export inference model using `tools/export_model.py`.

```bash
python tools/export_model.py \
T
Tingquan Gao 已提交
239 240
    --model MobileNetV3_large_x1_0 \
    --pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
241 242
    --output_path ./inference \
    --class_dim 1000
T
Tingquan Gao 已提交
243
```
T
Tingquan Gao 已提交
244

245 246
Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model, class_dim means number of class for the model, default as 1000.

247
**Note**:
L
littletomatodonkey 已提交
248
1. If `--output_path=./inference`, then three files will be generated in the folder `inference`, they are `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info`.
249
2. You can specify the `shape` of the model input image by setting the parameter `--img_size`, the default is `224`, which means the shape of input image is `224*224`. If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, you need to set `--img_size=384`.
250

L
littletomatodonkey 已提交
251
The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:
252 253 254

```bash
python tools/infer/predict.py \
T
Tingquan Gao 已提交
255
    --image_file image path \
L
littletomatodonkey 已提交
256 257
    --model_file "./inference/inference.pdmodel" \
    --params_file "./inference/inference.pdiparams" \
T
Tingquan Gao 已提交
258 259
    --use_gpu=True \
    --use_tensorrt=False
260
```
T
Tingquan Gao 已提交
261
Among them:
T
Tingquan Gao 已提交
262
+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
L
littletomatodonkey 已提交
263 264
+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/inference.pdmodel`;
+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/inference.pdiparams`;
T
Tingquan Gao 已提交
265
+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
T
Tingquan Gao 已提交
266 267
+ `use_gpu`: Whether to use the GPU, default by `True`
+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
T
Tingquan Gao 已提交
268 269
+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
270 271
+ `enable_calc_topk`: Whether to calculate top-k accuracy of the predction, default by `False`. Top-k accuracy will be printed out when set as `True`.
+ `gt_label_path`: Image name and label file, used when `enable_calc_topk` is `True` to get image list and labels.
T
Tingquan Gao 已提交
272

273
**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
T
Tingquan Gao 已提交
274 275

If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.