angle_class_en.md 5.9 KB
Newer Older
T
tink2123 已提交
1
# TEXT ANGLE CLASSIFICATION
W
WenmuZhou 已提交
2

X
xiaoting 已提交
3 4 5 6 7
- [Method Introduction](#method-introduction)
- [Data Preparation](#data-preparation)
- [Training](#training)
- [Evaluation](#evaluation)
- [Prediction](#prediction)
T
tink2123 已提交
8 9

<a name="method-introduction"></a>
X
xiaoting 已提交
10
## Method Introduction
W
WenmuZhou 已提交
11 12 13
The angle classification is used in the scene where the image is not 0 degrees. In this scene, it is necessary to perform a correction operation on the text line detected in the picture. In the PaddleOCR system,
The text line image obtained after text detection is sent to the recognition model after affine transformation. At this time, only a 0 and 180 degree angle classification of the text is required, so the built-in PaddleOCR text angle classifier **only supports 0 and 180 degree classification**. If you want to support more angles, you can modify the algorithm yourself to support.

W
WenmuZhou 已提交
14 15 16
Example of 0 and 180 degree data samples:

![](../imgs_results/angle_class_example.jpg)
T
tink2123 已提交
17 18

<a name="data-preparation"></a>
X
xiaoting 已提交
19
## Data Preparation
W
WenmuZhou 已提交
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Please organize the dataset as follows:

The default storage path for training data is `PaddleOCR/train_data/cls`, if you already have a dataset on your disk, just create a soft link to the dataset directory:

```
ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/cls/dataset
```

please refer to the following to organize your data.

- Training set

First put the training images in the same folder (train_images), and use a txt file (cls_gt_train.txt) to store the image path and label.

* Note: by default, the image path and image label are split with `\t`, if you use other methods to split, it will cause training error

0 and 180 indicate that the angle of the image is 0 degrees and 180 degrees, respectively.

```
" Image file name           Image annotation "

Z
zhoujun 已提交
42 43
train/word_001.jpg   0
train/word_002.jpg   180
W
WenmuZhou 已提交
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
```

The final training set should have the following file structure:

```
|-train_data
    |-cls
        |- cls_gt_train.txt
        |- train
            |- word_001.png
            |- word_002.jpg
            |- word_003.jpg
            | ...
```

- Test set

Similar to the training set, the test set also needs to be provided a folder
containing all images (test) and a cls_gt_test.txt. The structure of the test set is as follows:

```
|-train_data
    |-cls
        |- cls_gt_test.txt
        |- test
            |- word_001.jpg
            |- word_002.jpg
            |- word_003.jpg
            | ...
```
T
tink2123 已提交
74
<a name="training"></a>
X
xiaoting 已提交
75
## Training
Z
zhoujun 已提交
76
Write the prepared txt file and image folder path into the configuration file under the `Train/Eval.dataset.label_file_list` and `Train/Eval.dataset.data_dir` fields, the absolute path of the image consists of the `Train/Eval.dataset.data_dir` field and the image name recorded in the txt file.
W
WenmuZhou 已提交
77 78 79 80 81 82 83 84

PaddleOCR provides training scripts, evaluation scripts, and prediction scripts.

Start training:

```
# Set PYTHONPATH path
export PYTHONPATH=$PYTHONPATH:.
85
# GPU training Support single card and multi-card training, specify the card number through --gpus.
W
WenmuZhou 已提交
86
# Start training, the following command has been written into the train.sh file, just modify the configuration file path in the file
W
WenmuZhou 已提交
87
python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/cls/cls_mv3.yml
W
WenmuZhou 已提交
88 89 90 91
```

- Data Augmentation

W
WenmuZhou 已提交
92
PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, Please uncomment the `RecAug` and `RandAugment` fields under `Train.dataset.transforms` in the configuration file.
W
WenmuZhou 已提交
93 94 95 96

The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, random crop, perspective, color reverse, RandAugment.

Except for RandAugment, each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to:
W
WenmuZhou 已提交
97
[rec_img_aug.py](../../ppocr/data/imaug/rec_img_aug.py)
W
WenmuZhou 已提交
98
[randaugment.py](../../ppocr/data/imaug/randaugment.py)
W
WenmuZhou 已提交
99 100 101 102


- Training

W
WenmuZhou 已提交
103 104 105 106 107 108 109 110 111 112 113
PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/cls/cls_mv3.yml` to set the evaluation frequency. By default, it is evaluated every 1000 iter. The following content will be saved during training:
```bash
├── best_accuracy.pdopt # Optimizer parameters for the best model
├── best_accuracy.pdparams # Parameters of the best model
├── best_accuracy.states # Metric info and epochs of the best model
├── config.yml # Configuration file for this experiment
├── latest.pdopt # Optimizer parameters for the latest model
├── latest.pdparams # Parameters of the latest model
├── latest.states # Metric info and epochs of the latest model
└── train.log # Training log
```
W
WenmuZhou 已提交
114 115 116 117 118

If the evaluation set is large, the test will be time-consuming. It is recommended to reduce the number of evaluations, or evaluate after training.

**Note that the configuration file for prediction/evaluation must be consistent with the training.**

T
tink2123 已提交
119
<a name="evaluation"></a>
X
xiaoting 已提交
120
## Evaluation
W
WenmuZhou 已提交
121

W
WenmuZhou 已提交
122
The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/cls/cls_mv3.yml` file.
W
WenmuZhou 已提交
123 124 125 126 127 128

```
export CUDA_VISIBLE_DEVICES=0
# GPU evaluation, Global.checkpoints is the weight to be tested
python3 tools/eval.py -c configs/cls/cls_mv3.yml -o Global.checkpoints={path/to/weights}/best_accuracy
```
T
tink2123 已提交
129
<a name="prediction"></a>
X
xiaoting 已提交
130
## Prediction
W
WenmuZhou 已提交
131 132 133 134 135

* Training engine prediction

Using the model trained by paddleocr, you can quickly get prediction through the following script.

W
WenmuZhou 已提交
136
Use `Global.infer_img` to specify the path of the predicted picture or folder, and use `Global.checkpoints` to specify the weight:
W
WenmuZhou 已提交
137 138 139

```
# Predict English results
W
WenmuZhou 已提交
140
python3 tools/infer_cls.py -c configs/cls/cls_mv3.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.load_static_weights=false Global.infer_img=doc/imgs_words_en/word_10.png
W
WenmuZhou 已提交
141 142 143 144
```

Input image:

W
WenmuZhou 已提交
145
![](../imgs_words_en/word_10.png)
W
WenmuZhou 已提交
146 147 148 149

Get the prediction result of the input image:

```
W
WenmuZhou 已提交
150 151
infer_img: doc/imgs_words_en/word_10.png
     result: ('0', 0.9999995)
W
WenmuZhou 已提交
152
```