README.md

    News

    Introduction

    This is a PyToch implementation of "Real-time Scene Text Detection with Differentiable Binarization". This paper presents a real-time arbitrary-shape scene text detector, achieving the state-of-the-art performance on standard benchmarks.

    Part of the code is inherited from MegReader.

    ToDo List

    • Release code
    • Document for Installation
    • Trained models
    • Document for testing and training
    • Evaluation
    • Demo script
    • re-organize and clean the parameters

    Installation

    Requirements:

    • Python3
    • PyTorch >= 1.2
    • GCC >= 4.9 (This is important for PyTorch)
    • CUDA >= 9.0 (10.1 is recommended)
      # first, make sure that your conda is setup properly with the right environment
      # for that, check that `which conda`, `which pip` and `which python` points to the
      # right path. From a clean conda env, this is what you need to do
    
      conda create --name DB -y
      conda activate DB
    
      # this installs the right pip and dependencies for the fresh python
      conda install ipython pip
    
      # python dependencies
      pip install -r requirement.txt
    
      # install PyTorch with cuda-10.1
      conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
    
      # clone repo
      git clone https://github.com/MhLiao/DB.git
      cd DB/
    
      # build deformable convolution opertor
      # make sure your cuda path of $CUDA_HOME is the same version as your cuda in PyTorch
      # make sure GCC >= 4.9
      # you need to delete the build directory before you re-build it.
      echo $CUDA_HOME
      cd assets/ops/dcn/
      python setup.py build_ext --inplace
    

    Models

    Download Trained models Baidu Drive (download code: p6u3), Google Drive.

      pre-trained-model-synthtext   -- used to finetune models, not for evaluation
      td500_resnet18
      td500_resnet50
      totaltext_resnet18
      totaltext_resnet50

    Datasets

    The root of the dataset directory can be DB/datasets/.

    Download the converted ground-truth and data list Baidu Drive (download code: mz0a), Google Drive. The images of each dataset can be obtained from their official website.

    Testing

    Prepar dataset

    An example of the path of test images:

      datasets/total_text/train_images
      datasets/total_text/train_gts
      datasets/total_text/train_list.txt
      datasets/total_text/test_images
      datasets/total_text/test_gts
      datasets/total_text/test_list.txt

    The data root directory and the data list file can be defined in base_totaltext.yaml

    Config file

    The YAML files with the name of base*.yaml should not be used as the training or testing config file directly.

    Demo

    Run the model inference with a single image. Here is an example:

    CUDA_VISIBLE_DEVICES=0 python demo.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --image_path datasets/total_text/test_images/img10.jpg --resume path-to-model-directory/totaltext_resnet18 --polygon --box_thresh 0.7 --visualize

    The results can be find in demo_results.

    Evaluate the performance

    Note that we do not provide all the protocols for all benchmarks for simplification. The embedded evaluation protocol in the code is modified from the protocol of ICDAR 2015 dataset while support arbitrary-shape polygons. It almost produces the same results as the pascal evaluation protocol in Total-Text dataset.

    The img651.jpg in the test set of Total-Text contains exif info for a 90° rotation thus the gt does not match the image. You should read and re-write this image to get normal results. The converted image is also provided in the dataset links.

    The following command can re-implement the results in the paper:

    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --resume path-to-model-directory/totaltext_resnet18 --polygon --box_thresh 0.7
    
    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/totaltext_resnet50_deform_thre.yaml --resume path-to-model-directory/totaltext_resnet50 --polygon --box_thresh 0.6
    
    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/td500_resnet18_deform_thre.yaml --resume path-to-model-directory/td500_resnet18 --box_thresh 0.5
    
    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/td500_resnet50_deform_thre.yaml --resume path-to-model-directory/td500_resnet50 --box_thresh 0.5
    
    # short side 736, which can be changed in base_ic15.yaml
    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/ic15_resnet18_deform_thre.yaml --resume path-to-model-directory/ic15_resnet18 --box_thresh 0.55
    
    # short side 736, which can be changed in base_ic15.yaml
    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/ic15_resnet50_deform_thre.yaml --resume path-to-model-directory/ic15_resnet50 --box_thresh 0.6
    
    # short side 1152, which can be changed in base_ic15.yaml
    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/ic15_resnet50_deform_thre.yaml --resume path-to-model-directory/ic15_resnet50 --box_thresh 0.6

    The results should be as follows:

    Model precision recall F-measure precision (paper) recall (paper) F-measure (paper)
    totaltext-resnet18 88.9 77.6 82.9 88.3 77.9 82.8
    totaltext-resnet50 88.0 81.5 84.6 87.1 82.5 84.7
    td500-resnet18 86.5 79.4 82.8 90.4 76.3 82.8
    td500-resnet50 91.1 80.8 85.6 91.5 79.2 84.9
    ic15-resnet18 (736) 87.7 77.5 82.3 86.8 78.4 82.3
    ic15-resnet50 (736) 91.3 80.3 85.4 88.2 82.7 85.4
    ic15-resnet50 (1152) 90.7 84.0 87.2 91.8 83.2 87.3

    box_thresh can be used to balance the precision and recall, which may be different for different datasets to get a good F-measure. polygon is only used for arbitrary-shape text dataset. The size of the input images are defined in validate_data->processes->AugmentDetectionData in base_*.yaml.

    Evaluate the speed

    Set adaptive to False in the yaml file to speedup the inference without decreasing the performance. The speed is evaluated by performing a testing image for 50 times to exclude extra IO time.

    CUDA_VISIBLE_DEVICES=0 python eval.py experiments/seg_detector/totaltext_resnet18_deform_thre.yaml --resume path-to-model-directory/totaltext_resnet18 --polygon --box_thresh 0.7 --speed

    Note that the speed is related to both to the GPU and the CPU since the model runs with the GPU and the post-processing algorithm runs with the CPU.

    Training

    Check the paths of data_dir and data_list in the base_*.yaml file. For better performance, you can first per-train the model with SynthText and then fine-tune it with the specific real-world dataset.

    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py path-to-yaml-file --num_gpus 4

    You can also try distributed training (Note that the distributed mode is not fully tested. I am not sure whether it can achieves the same performance as non-distributed training.)

    CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train.py path-to-yaml-file --num_gpus 4

    Improvements

    Note that the current implementation is written by pure Python code except for the deformable convolution operator. Thus, the code can be further optimized by some optimization skills, such as TensorRT for the model forward and efficient C++ code for the post-processing function.

    Another option to increase speed is to run the model forward and the post-processing algorithm in parallel through a producer-consumer strategy.

    Contributions or pull requests are welcome.

    Third-party implementations

    Citing the related works

    Please cite the related works in your publications if it helps your research:

     @inproceedings{liao2020real,
      author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
      title={Real-time Scene Text Detection with Differentiable Binarization},
      booktitle={Proc. AAAI},
      year={2020}
    }

    项目简介

    🚀 Github 镜像仓库 🚀

    源项目地址

    https://github.com/MhLiao/DB

    发行版本

    当前项目没有发行版本

    贡献者 6

    M Minghui Liao @Minghui Liao
    N Ning Lu @Ning Lu
    M Minghui Liao @Minghui Liao
    M microkitty @microkitty
    Y Yg Zhou @Yg Zhou
    X xuannianz @xuannianz

    开发语言

    • Python 76.7 %
    • Cuda 13.8 %
    • C++ 7.9 %
    • HTML 1.6 %