getting_started_en.md 12.3 KB
Newer Older
1 2
# Getting Started
---
T
Tingquan Gao 已提交
3
Please refer to [Installation](install_en.md) to setup environment at first, and prepare flower102 dataset by following the instruction mentioned in the [Quick Start](quick_start_en.md).
4

T
Tingquan Gao 已提交
5
## 1. Training and Evaluation on CPU or Single GPU
6

T
Tingquan Gao 已提交
7 8
If training and evaluation are performed on CPU or single GPU, it is recommended to use the `tools/train.py` and `tools/eval.py`.
For training and evaluation in multi-GPU environment on Linux, please refer to [2. Training and evaluation on Linux+GPU](#2-training-and-evaluation-on-linuxgpu).
9

T
Tingquan Gao 已提交
10
<a name="1.1"></a>
11 12 13 14 15
## 1.1 Model training

After preparing the configuration file, The training process can be started in the following way.

```
T
Tingquan Gao 已提交
16 17 18 19
python tools/train.py \
    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="" \
    -o use_gpu=False
20 21
```

T
Tingquan Gao 已提交
22 23
Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
24 25


T
Tingquan Gao 已提交
26
Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_en.md).
27 28 29 30 31 32 33 34 35 36 37 38 39 40

* The output log examples are as follows:
    * If mixup or cutmix is used in training, only loss, lr (learning rate) and training time of the minibatch will be printed in the log.

    ```
    train step:890  loss:  6.8473 lr: 0.100000 elapse: 0.157s
    ```

    * If mixup or cutmix is not used during training, in addition to loss, lr (learning rate) and the training time of the minibatch, top-1 and top-k( The default is 5) will also be printed in the log.

    ```
    epoch:0    train    step:13    loss:7.9561    top1:0.0156    top5:0.1094    lr:0.100000    elapse:0.193s
    ```

T
Tingquan Gao 已提交
41
During training, you can view loss changes in real time through `VisualDL`,  see [VisualDL](https://github.com/PaddlePaddle/VisualDL) for details.
42 43 44

### 1.2 Model finetuning

T
Tingquan Gao 已提交
45
After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below.
46 47

```
T
Tingquan Gao 已提交
48 49 50 51
python tools/train.py \
    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
    -o use_gpu=True
52 53
```

T
Tingquan Gao 已提交
54 55 56
Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).
57 58 59

### 1.3 Resume Training

T
Tingquan Gao 已提交
60
If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
61 62

```
T
Tingquan Gao 已提交
63 64
python tools/train.py \
    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
T
Tingquan Gao 已提交
65
    -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
T
Tingquan Gao 已提交
66 67
    -o last_epoch=5 \
    -o use_gpu=True
68 69
```

T
Tingquan Gao 已提交
70 71 72 73 74
The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.

**Note**:
* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.

T
Tingquan Gao 已提交
75
* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
T
Tingquan Gao 已提交
76 77 78 79 80 81 82 83 84 85 86 87 88 89

    ```shell
    output/
    └── MobileNetV3_large_x1_0
        ├── 0
        │   ├── ppcls.pdopt
        │   └── ppcls.pdparams
        ├── 1
        │   ├── ppcls.pdopt
        │   └── ppcls.pdparams
        .
        .
        .
    ```
90 91 92 93


### 1.4 Model evaluation

T
Tingquan Gao 已提交
94
The model evaluation process can be started as follows.
95 96

```bash
T
Tingquan Gao 已提交
97
python tools/eval.py \
T
Tingquan Gao 已提交
98 99 100
    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
    -o load_static_weights=False
101 102
```

T
Tingquan Gao 已提交
103
The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
104

T
Tingquan Gao 已提交
105 106 107 108
Some of the configurable evaluation parameters are described as follows:
* `ARCHITECTURE.name`: Model name
* `pretrained_model`: The path of the model file to be evaluated
* `load_static_weights`: Whether the model to be evaluated is a static graph model
109

T
Tingquan Gao 已提交
110 111

**Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).
112

T
Tingquan Gao 已提交
113
<a name="2"></a>
114 115
### 2. Training and evaluation on Linux+GPU

T
Tingquan Gao 已提交
116
If you want to run PaddleClas on Linux with GPU, it is highly recommended to use `paddle.distributed.launch` to start the model training script(`tools/train.py`) and evaluation script(`tools/eval.py`), which can start on multi-GPU environment more conveniently.
117 118 119

### 2.1 Model training

T
Tingquan Gao 已提交
120
After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
121 122

```bash
T
Tingquan Gao 已提交
123 124
export CUDA_VISIBLE_DEVICES=0,1,2,3

125 126 127
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
128
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
129 130 131 132 133 134 135 136
```

The configuration can be updated by adding the `-o` parameter.

```bash
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
137 138 139
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
        -o pretrained_model="" \
        -o use_gpu=True
140 141
```

T
Tingquan Gao 已提交
142
The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
143 144 145

### 2.2 Model finetuning

T
Tingquan Gao 已提交
146
After configuring the configuration file, you can finetune it by loading the pretrained weights, The command is as shown below.
147 148

```
T
Tingquan Gao 已提交
149 150
export CUDA_VISIBLE_DEVICES=0,1,2,3

151 152 153
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
154 155
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
        -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
156 157 158 159
```

Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

T
Tingquan Gao 已提交
160
There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.
161

T
Tingquan Gao 已提交
162
<a name="model_resume"></a>
163 164
### 2.3 Resume Training

T
Tingquan Gao 已提交
165
If the training process is terminated for some reasons, you can also load the checkpoints to continue training.
166 167

```
T
Tingquan Gao 已提交
168 169
export CUDA_VISIBLE_DEVICES=0,1,2,3

170 171 172
python -m paddle.distributed.launch \
    --selected_gpus="0,1,2,3" \
    tools/train.py \
T
Tingquan Gao 已提交
173
        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
T
Tingquan Gao 已提交
174
        -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
T
Tingquan Gao 已提交
175 176
        -o last_epoch=5 \
        -o use_gpu=True
177 178
```

T
Tingquan Gao 已提交
179
The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
180 181 182

### 2.4 Model evaluation

T
Tingquan Gao 已提交
183
The model evaluation process can be started as follows.
184 185

```bash
T
Tingquan Gao 已提交
186
python tools/eval.py \
T
Tingquan Gao 已提交
187 188 189
    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
    -o load_static_weights=False
190 191
```

T
Tingquan Gao 已提交
192
About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
193

T
Tingquan Gao 已提交
194 195 196
<a name="model_infer"></a>
## 3. Use the pre-trained model to predict
After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:
197

T
Tingquan Gao 已提交
198 199
```python
python tools/infer/infer.py \
T
Tingquan Gao 已提交
200
    -i image path \
201
    --model MobileNetV3_large_x1_0 \
T
Tingquan Gao 已提交
202
    --pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
T
Tingquan Gao 已提交
203 204 205
    --use_gpu True \
    --load_static_weights False
```
206

T
Tingquan Gao 已提交
207 208
Among them:
+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
209
+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
T
Tingquan Gao 已提交
210
+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
T
Tingquan Gao 已提交
211 212 213 214 215 216 217 218 219 220 221
+ `use_gpu`: Whether to use the GPU, default by `True`;
+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.

About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).

<a name="model_inference"></a>
## 4. Use the inference model to predict

PaddlePaddle supports inference using prediction engines, which will be introduced next.
222 223 224 225 226

Firstly, you should export inference model using `tools/export_model.py`.

```bash
python tools/export_model.py \
T
Tingquan Gao 已提交
227 228 229
    --model MobileNetV3_large_x1_0 \
    --pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
    --output_path ./inference/cls_infer
T
Tingquan Gao 已提交
230
```
T
Tingquan Gao 已提交
231

232
Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model.
T
Tingquan Gao 已提交
233

234 235 236
**Note**:
1. File prefix must be assigned in `--output_path`. If `--output_path=./inference/cls_infer`, then three files will be generated in the folder `inference`, they are `cls_infer.pdiparams`, `cls_infer.pdmodel` and `cls_infer.pdiparams.info`.
2. In the file `export_model.py:line53`, the `shape` parameter is the shape of the model input image, the default is `224*224`. Please modify it according to the actual situation, as shown below:
237

T
Tingquan Gao 已提交
238 239 240 241 242 243
```python
50 # Please modify the 'shape' according to actual needs
51 @to_static(input_spec=[
52     paddle.static.InputSpec(
53         shape=[None, 3, 224, 224], dtype='float32')
54 ])
244 245
```

246
The above command will generate the model structure file (`cls_infer.pdmodel`) and the model weight file (`cls_infer.pdiparams`), and then the inference engine can be used for inference:
247 248 249

```bash
python tools/infer/predict.py \
T
Tingquan Gao 已提交
250
    --image_file image path \
251 252
    --model_file "./inference/cls_infer.pdmodel" \
    --params_file "./inference/cls_infer.pdiparams" \
T
Tingquan Gao 已提交
253 254
    --use_gpu=True \
    --use_tensorrt=False
255
```
T
Tingquan Gao 已提交
256
Among them:
T
Tingquan Gao 已提交
257
+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
258 259
+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/cls_infer.pdmodel`;
+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/cls_infer.pdiparams`;
T
Tingquan Gao 已提交
260
+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
T
Tingquan Gao 已提交
261 262 263
+ `use_gpu`: Whether to use the GPU, default by `True`
+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.

T
Tingquan Gao 已提交
264 265

If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.