update rec doc

248669a8 · WenmuZhou · 8e697e34 · 248669a8 · 248669a8
隐藏空白更改
内联并排

Showing with 131 addition and 116 deletion

doc/doc_ch/recognition.md doc/doc_ch/recognition.md +74 -66

doc/doc_en/recognition_en.md doc/doc_en/recognition_en.md +57 -50

未找到文件。
--- a/doc/doc_ch/recognition.md
+++ b/doc/doc_ch/recognition.md
 ## 文字识别


- [一、数据准备](#数据准备)
-    - [数据下载](#数据下载)
-    - [自定义数据集](#自定义数据集)  
-    - [字典](#字典)  
-    - [支持空格](#支持空格)
+- [1 数据准备](#数据准备)
+    - [1.1 自定义数据集](#自定义数据集)
+    - [1.2 数据下载](#数据下载)
+    - [1.3 字典](#字典)  
+    - [1.4 支持空格](#支持空格)

- [二、启动训练](#启动训练)
-    - [1. 数据增强](#数据增强)
-    - [2. 训练](#训练)
-    - [3. 小语种](#小语种)
+- [2 启动训练](#启动训练)
+    - [2.1 数据增强](#数据增强)
+    - [2.2 训练](#训练)
+    - [2.3 小语种](#小语种)

- [三、评估](#评估)
+- [3 评估](#评估)

- [四、预测](#预测)
-    - [1. 训练引擎预测](#训练引擎预测)
+- [4 预测](#预测)
+    - [4.1 训练引擎预测](#训练引擎预测)


 <a name="数据准备"></a>
-### 数据准备
+### 1. 数据准备


-PaddleOCR 支持两种数据格式: `lmdb` 用于训练公开数据，调试算法; `通用数据` 训练自己的数据:
-
-请按如下步骤设置数据集：
+PaddleOCR 支持两种数据格式:
+ - `lmdb` 用于训练以lmdb格式存储的数据集;
+ - `通用数据` 用于训练以文本文件存储的数据集:

 训练数据的默认存储路径是 `PaddleOCR/train_data`,如果您的磁盘上已有数据集，只需创建软链接至数据集目录：

 ```
+# linux and mac os
 ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
+# windows
+mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
 ```

-<a name="数据下载"></a>
-* 数据下载
-
-若您本地没有数据集，可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据，用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)，下载 benchmark 所需的lmdb格式数据集。
+<a name="准备数据集"></a>
+#### 1.1 自定义数据集
+下面以通用数据集为例， 介绍如何准备数据集：

-<a name="自定义数据集"></a>
-* 使用自己数据集
+* 训练集

-若您希望使用自己的数据进行训练，请参考下文组织您的数据。
+建议将训练图片放入同一个文件夹，并用一个txt文件（rec_gt_train.txt）记录图片路径和标签，txt文件里的内容如下:

- 训练集
+**注意：** txt文件中默认请将图片路径和图片标签用 \t 分割，如用其他方式分割将造成训练报错。

-首先请将训练图片放入同一个文件夹（train_images），并用一个txt文件（rec_gt_train.txt）记录图片路径和标签。
+```
+" 图像文件名                 图像标注信息 "

-**注意：** 默认请将图片路径和图片标签用 \t 分割，如用其他方式分割将造成训练报错
+train_data/train/word_001.jpg   简单可依赖
+train_data/train/word_002.jpg   用科技让复杂的世界更简单
+...
+```

+最终训练集应有如下文件结构：
+```
+|-train_data
+    |- rec_gt_train.txt
+    |- train
+        |- word_001.png
+        |- word_002.jpg
+        |- word_003.jpg
+        | ...
 ```
-" 图像文件名                 图像标注信息 "

-train_data/train_0001.jpg   简单可依赖
-train_data/train_0002.jpg   用科技让复杂的世界更简单
+- 测试集
+
+同训练集类似，测试集也需要提供一个包含所有图片的文件夹（test）和一个rec_gt_test.txt，测试集的结构如下所示：
+
+```
+|-train_data
+    |- rec_gt_test.txt
+    |- test
+        |- word_001.jpg
+        |- word_002.jpg
+        |- word_003.jpg
+        | ...
 ```
-PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件，通过以下方式下载：
+
+<a name="数据下载"></a>
+
+1.2 数据下载
+
+若您本地没有数据集，可以在官网下载 [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads) 数据，用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，下载 benchmark 所需的lmdb格式数据集。
+
+如果你使用的是icdar2015的公开数据集，PaddleOCR 提供了一份用于训练 icdar2015 数据集的标签文件，通过以下方式下载：

 ```
 # 训练集标签
@@ -70,34 +100,8 @@ PaddleOCR 也提供了数据格式转换脚本，可以将官网 label 转换支
 python gen_label.py --mode="rec" --input_path="{path/of/origin/label}" --output_label="rec_gt_label.txt"
 ```

-最终训练集应有如下文件结构：
-```
-|-train_data
-    |-ic15_data
-        |- rec_gt_train.txt
-        |- train
-            |- word_001.png
-            |- word_002.jpg
-            |- word_003.jpg
-            | ...
-```
-
- 测试集
-
-同训练集类似，测试集也需要提供一个包含所有图片的文件夹（test）和一个rec_gt_test.txt，测试集的结构如下所示：
-
-```
-|-train_data
-    |-ic15_data
-        |- rec_gt_test.txt
-        |- test
-            |- word_001.jpg
-            |- word_002.jpg
-            |- word_003.jpg
-            | ...
-```
 <a name="字典"></a>
- 字典
+1.3 字典

 最后需要提供一个字典（{word_dict_name}.txt），使模型在训练时，可以将所有出现的字符映射为字典的索引。

@@ -114,6 +118,10 @@ n

 word_dict.txt 每行有一个单字，将字符与数字索引映射在一起，“and” 将被映射成 [2 5 1]

+* 内置字典
+
+PaddleOCR内置了一部分字典，可以按需使用。
+
 `ppocr/utils/ppocr_keys_v1.txt` 是一个包含6623个字符的中文字典

 `ppocr/utils/ic15_dict.txt` 是一个包含36个字符的英文字典
@@ -129,7 +137,7 @@ word_dict.txt 每行有一个单字，将字符与数字索引映射在一起，
 `ppocr/utils/dict/en_dict.txt` 是一个包含63个字符的英文字典


-您可以按需使用。
+

 目前的多语言模型仍处在demo阶段，会持续优化模型并补充语种，**非常欢迎您为我们提供其他语言的字典和字体**，
 如您愿意可将字典文件提交至 [dict](../../ppocr/utils/dict) 将语料文件提交至[corpus](../../ppocr/utils/corpus)，我们会在Repo中感谢您。
@@ -140,13 +148,13 @@ word_dict.txt 每行有一个单字，将字符与数字索引映射在一起，
 并将 `character_type` 设置为 `ch`。

 <a name="支持空格"></a>
- 添加空格类别
+1.4 添加空格类别

 如果希望支持识别"空格"类别, 请将yml文件中的 `use_space_char` 字段设置为 `True`。


 <a name="启动训练"></a>
-### 启动训练
+### 2. 启动训练

 PaddleOCR提供了训练脚本、评估脚本和预测脚本，本节将以 CRNN 识别模型为例：

@@ -171,7 +179,7 @@ tar -xf rec_mv3_none_bilstm_ctc_v2.0_train.tar && rm -rf rec_mv3_none_bilstm_ctc
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_icdar15_train.yml
 ```
 <a name="数据增强"></a>
- 数据增强
+#### 2.1 数据增强

 PaddleOCR提供了多种数据增强方式，如果您希望在训练时加入扰动，请在配置文件中设置 `distort: true`。

@@ -182,7 +190,7 @@ PaddleOCR提供了多种数据增强方式，如果您希望在训练时加入
 *由于OpenCV的兼容性问题，扰动操作暂时只支持Linux*

 <a name="训练"></a>
- 训练
+#### 2.2 训练

 PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_train.yml` 中修改 `eval_batch_step` 设置评估频率，默认每500个iter评估一次。评估过程中默认将最佳acc模型，保存为 `output/rec_CRNN/best_accuracy` 。

@@ -268,7 +276,7 @@ Eval:
 **注意，预测/评估时的配置文件请务必与训练一致。**

 <a name="小语种"></a>
- 小语种
+#### 2.3 小语种

 PaddleOCR目前已支持26种（除中文外）语种识别，`configs/rec/multi_languages` 路径下提供了一个多语言的配置文件模版: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。

@@ -411,7 +419,7 @@ Eval:
    ...
 ```
 <a name="评估"></a>
-### 评估
+### 3 评估

 评估数据集可以通过 `configs/rec/rec_icdar15_train.yml`  修改Eval中的 `label_file_path` 设置。

@@ -421,10 +429,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
 ```

 <a name="预测"></a>
-### 预测
+### 4 预测

 <a name="训练引擎预测"></a>
-* 训练引擎的预测
+#### 4.1 训练引擎的预测

 使用 PaddleOCR 训练好的模型，可以通过以下脚本进行快速预测。


--- a/doc/doc_en/recognition_en.md
+++ b/doc/doc_en/recognition_en.md
 ## TEXT RECOGNITION

- [DATA PREPARATION](#DATA_PREPARATION)
-    - [Dataset Download](#Dataset_download)
-    - [Costom Dataset](#Costom_Dataset)  
-    - [Dictionary](#Dictionary)  
-    - [Add Space Category](#Add_space_category)
+- [1 DATA PREPARATION](#DATA_PREPARATION)
+    - [1.1 Costom Dataset](#Costom_Dataset)
+    - [1.2 Dataset Download](#Dataset_download)
+    - [1.3 Dictionary](#Dictionary)  
+    - [1.4 Add Space Category](#Add_space_category)

- [TRAINING](#TRAINING)
-    - [Data Augmentation](#Data_Augmentation)
-    - [Training](#Training)
-    - [Multi-language](#Multi_language)
+- [2 TRAINING](#TRAINING)
+    - [2.1 Data Augmentation](#Data_Augmentation)
+    - [2.2 Training](#Training)
+    - [2.3 Multi-language](#Multi_language)

- [EVALUATION](#EVALUATION)
+- [3 EVALUATION](#EVALUATION)

- [PREDICTION](#PREDICTION)
-    - [Training engine prediction](#Training_engine_prediction)
+- [4 PREDICTION](#PREDICTION)
+    - [4.1 Training engine prediction](#Training_engine_prediction)

 <a name="DATA_PREPARATION"></a>
 ### DATA PREPARATION


-PaddleOCR supports two data formats: `LMDB` is used to train public data and evaluation algorithms; `general data` is used to train your own data:
+PaddleOCR supports two data formats:
+- `LMDB` is used to train data sets stored in lmdb format;
+- `general data` is used to train data sets stored in text files:

 Please organize the dataset as follows:

 The default storage path for training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:

 ```
+# linux and mac os
 ln -sf <path/to/dataset> <path/to/paddle_ocr>/train_data/dataset
+# windows
+mklink /d <path/to/paddle_ocr>/train_data/dataset <path/to/dataset>
 ```

-<a name="Dataset_download"></a>
-* Dataset download
-
-If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here)，download the lmdb format dataset required for benchmark
-
-If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
-
 <a name="Costom_Dataset"></a>
-* Use your own dataset:
+#### 1.1 Costom dataset

 If you want to use your own data for training, please refer to the following to organize your data.

 - Training set

-First put the training images in the same folder (train_images), and use a txt file (rec_gt_train.txt) to store the image path and label.
+It is recommended to put the training images in the same folder, and use a txt file (rec_gt_train.txt) to store the image path and label. The contents of the txt file are as follows:

 * Note: by default, the image path and image label are split with \t, if you use other methods to split, it will cause training error

 ```
 " Image file name           Image annotation "

-train_data/train_0001.jpg   简单可依赖
-train_data/train_0002.jpg   用科技让复杂的世界更简单
-```
-PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
-
-```
-# Training set label
-wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
-# Test Set Label
-wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
+train_data/train/word_001.jpg   简单可依赖
+train_data/train/word_002.jpg   用科技让复杂的世界更简单
+...
 ```

 The final training set should have the following file structure:

 ```
 |-train_data
-    |-ic15_data
-        |- rec_gt_train.txt
-        |- train
-            |- word_001.png
-            |- word_002.jpg
-            |- word_003.jpg
-            | ...
+    |- rec_gt_train.txt
+    |- train
+        |- word_001.png
+        |- word_002.jpg
+        |- word_003.jpg
+        | ...
 ```

 - Test set
@@ -90,8 +80,25 @@ Similar to the training set, the test set also needs to be provided a folder con
            |- word_003.jpg
            | ...
 ```
+
+<a name="Dataset_download"></a>
+#### 1.2 Dataset download
+
+If you do not have a dataset locally, you can download it on the official website [icdar2015](http://rrc.cvc.uab.es/?ch=4&com=downloads). Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，download the lmdb format dataset required for benchmark
+
+If you want to reproduce the paper indicators of SRN, you need to download offline [augmented data](https://pan.baidu.com/s/1-HSZ-ZVdqBF2HaBZ5pRAKA), extraction code: y3ry. The augmented data is obtained by rotation and perturbation of mjsynth and synthtext. Please unzip the data to {your_path}/PaddleOCR/train_data/data_lmdb_Release/training/path.
+
+PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
+
+```
+# Training set label
+wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
+# Test Set Label
+wget -P ./train_data/ic15_data  https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt
+```
+
 <a name="Dictionary"></a>
- Dictionary
+#### 1.3 Dictionary

 Finally, a dictionary ({word_dict_name}.txt) needs to be provided so that when the model is trained, all the characters that appear can be mapped to the dictionary index.

@@ -108,6 +115,8 @@ n

 In `word_dict.txt`, there is a single word in each line, which maps characters and numeric indexes together, e.g "and" will be mapped to [2 5 1]

+PaddleOCR has built-in dictionaries, which can be used on demand.
+
 `ppocr/utils/ppocr_keys_v1.txt` is a Chinese dictionary with 6623 characters.

 `ppocr/utils/ic15_dict.txt` is an English dictionary with 63 characters
@@ -123,8 +132,6 @@ In `word_dict.txt`, there is a single word in each line, which maps characters a
 `ppocr/utils/dict/en_dict.txt` is a English dictionary with 63 characters


-You can use it on demand.
-
 The current multi-language model is still in the demo stage and will continue to optimize the model and add languages. **You are very welcome to provide us with dictionaries and fonts in other languages**,
 If you like, you can submit the dictionary file to [dict](../../ppocr/utils/dict) or corpus file to [corpus](../../ppocr/utils/corpus) and we will thank you in the Repo.

@@ -136,14 +143,14 @@ To customize the dict file, please modify the `character_dict_path` field in `co
 If you need to customize dic file, please add character_dict_path field in configs/rec/rec_icdar15_train.yml to point to your dictionary path. And set character_type to ch.

 <a name="Add_space_category"></a>
- Add space category
+#### 1.4 Add space category

 If you want to support the recognition of the `space` category, please set the `use_space_char` field in the yml file to `True`.

 **Note: use_space_char only takes effect when character_type=ch**

 <a name="TRAINING"></a>
-### TRAINING
+### 2 TRAINING

 PaddleOCR provides training scripts, evaluation scripts, and prediction scripts. In this section, the CRNN recognition model will be used as an example:

@@ -166,7 +173,7 @@ Start training:
 python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_icdar15_train.yml
 ```
 <a name="Data_Augmentation"></a>
- Data Augmentation
+#### 2.1 Data Augmentation

 PaddleOCR provides a variety of data augmentation methods. If you want to add disturbance during training, please set `distort: true` in the configuration file.

@@ -175,7 +182,7 @@ The default perturbation methods are: cvtColor, blur, jitter, Gasuss noise, rand
 Each disturbance method is selected with a 50% probability during the training process. For specific code implementation, please refer to: [img_tools.py](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/ppocr/data/rec/img_tools.py)

 <a name="Training"></a>
- Training
+#### 2.2 Training

 PaddleOCR supports alternating training and evaluation. You can modify `eval_batch_step` in `configs/rec/rec_icdar15_train.yml` to set the evaluation frequency. By default, it is evaluated every 500 iter and the best acc model is saved under `output/rec_CRNN/best_accuracy` during the evaluation process.

@@ -264,7 +271,7 @@ Eval:
 **Note that the configuration file for prediction/evaluation must be consistent with the training.**

 <a name="Multi_language"></a>
- Multi-language
+#### 2.3 Multi-language

 PaddleOCR currently supports 26 (except Chinese) language recognition. A multi-language configuration file template is
 provided under the path `configs/rec/multi_languages`: [rec_multi_language_lite_train.yml](../../configs/rec/multi_language/rec_multi_language_lite_train.yml)。
@@ -416,7 +423,7 @@ Eval:
 ```

 <a name="EVALUATION"></a>
-### EVALUATION
+### 3 EVALUATION

 The evaluation dataset can be set by modifying the `Eval.dataset.label_file_list` field in the `configs/rec/rec_icdar15_train.yml` file.

@@ -426,10 +433,10 @@ python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec
 ```

 <a name="PREDICTION"></a>
-### PREDICTION
+### 4 PREDICTION

 <a name="Training_engine_prediction"></a>
-* Training engine prediction
+#### 4.1 Training engine prediction

 Using the model trained by paddleocr, you can quickly get prediction through the following script.