Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
4 years ago | |
|---|---|---|
| .. | ||
| scripts | 4 years ago | |
| src | 4 years ago | |
| README.md | 4 years ago | |
| eval.py | 4 years ago | |
| train.py | 4 years ago | |
CTPN is a text detection model based on object detection method. It improves Faster R-CNN and combines with bidirectional LSTM, so ctpn is very effective for horizontal text detection. Another highlight of ctpn is to transform the text detection task into a series of small-scale text box detection.This idea was proposed in the paper "Detecting Text in Natural Image with Connectionist Text Proposal Network".
Paper Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao, "Detecting Text in Natural Image with Connectionist Text Proposal Network", ArXiv, vol. abs/1609.03605, 2016.
The overall network architecture contains a VGG16 as backbone, and use bidirection lstm to extract context feature of the small-scale text box, then it used the RPN(RegionProposal Network) to predict the boundding box and probability.
Here we used 6 datasets for training, and 1 datasets for Evaluation.
.
└─ctpn
├── README.md # network readme
├── eval.py # eval net
├── scripts
│ ├── eval_res.sh # calculate precision and recall
│ ├── run_distribute_train_ascend.sh # launch distributed training with ascend platform(8p)
│ ├── run_eval_ascend.sh # launch evaluating with ascend platform
│ └── run_standalone_train_ascend.sh # launch standalone training with ascend platform(1p)
├── src
│ ├── CTPN
│ │ ├── BoundingBoxDecode.py # bounding box decode
│ │ ├── BoundingBoxEncode.py # bounding box encode
│ │ ├── __init__.py # package init file
│ │ ├── anchor_generator.py # anchor generator
│ │ ├── bbox_assign_sample.py # proposal layer
│ │ ├── proposal_generator.py # proposla generator
│ │ ├── rpn.py # region-proposal network
│ │ └── vgg16.py # backbone
│ ├── config.py # training configuration
│ ├── convert_icdar2015.py # convert icdar2015 dataset label
│ ├── convert_svt.py # convert svt label
│ ├── create_dataset.py # create mindrecord dataset
│ ├── ctpn.py # ctpn network definition
│ ├── dataset.py # data proprocessing
│ ├── lr_schedule.py # learning rate scheduler
│ ├── network_define.py # network definition
│ └── text_connector
│ ├── __init__.py # package init file
│ ├── connect_text_lines.py # connect text lines
│ ├── detector.py # detect box
│ ├── get_successions.py # get succession proposal
│ └── utils.py # some functions which is commonly used
└── train.py # train net
To create dataset, download the dataset first and deal with it.We provided src/convert_svt.py and src/convert_icdar2015.py to deal with svt and icdar2015 dataset label.For svt dataset, you can deal with it as below:
python convert_svt.py --dataset_path=/path/img --xml_file=/path/train.xml --location_dir=/path/location
For ICDAR2015 dataset, you can deal with it
python convert_icdar2015.py --src_label_path=/path/train_label --target_label_path=/path/label
Then modify the src/config.py and add the dataset path.For each path, add IMAGE_PATH and LABEL_PATH into a list in config.An example is show as blow:
# create dataset
"coco_root": "/path/coco",
"coco_train_data_type": "train2017",
"cocotext_json": "/path/cocotext.v2.json",
"icdar11_train_path": ["/path/image/", "/path/label"],
"icdar13_train_path": ["/path/image/", "/path/label"],
"icdar15_train_path": ["/path/image/", "/path/label"],
"icdar13_test_path": ["/path/image/", "/path/label"],
"flick_train_path": ["/path/image/", "/path/label"],
"svt_train_path": ["/path/image/", "/path/label"],
"pretrain_dataset_path": "",
"finetune_dataset_path": "",
"test_dataset_path": "",
Then you can create dataset with src/create_dataset.py with the command as below:
python src/create_dataset.py
# distribute training example(8p)
sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [TASK_TYPE] [PRETRAINED_PATH]
# standalone training
sh run_standalone_train_ascend.sh [TASK_TYPE] [PRETRAINED_PATH]
# evaluation:
sh run_eval_ascend.sh [IMAGE_PATH] [DATASET_PATH] [CHECKPOINT_PATH]
The pretrained_path should be a checkpoint of vgg16 trained on Imagenet2012. The name of weight in dict should be totally the same, also the batch_norm should be enabled in the trainig of vgg16, otherwise fails in further steps.COCO_TEXT_PARSER_PATH coco_text.py can refer to Link.To get the vgg16 backbone, you can use the network structure defined in src/CTPN/vgg16.py.To train the backbone, copy the src/CTPN/vgg16.py under modelzoo/official/cv/vgg16/src/, and modify the vgg16/train.py to suit the new construction.You can fix it as below:
...
from src.vgg16 import VGG16
...
network = VGG16()
...
Then you can train it with ImageNet2012.
Notes:
RANK_TABLE_FILE can refer to Link , and the device_ip can be got as Link. For large models like InceptionV4, it's better to export an external environment variableexport HCCL_CONNECT_TIMEOUT=600to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.This is processor cores binding operation regarding the
device_numand total processor numbers. If you are not expect to do it, remove the operationstasksetinscripts/run_distribute_train.shTASK_TYPE contains Pretraining and Finetune. For Pretraining, we use ICDAR2013, ICDAR2015, SVT, SCUT-FORU, CocoText v2. For Finetune, we use ICDAR2011,
ICDAR2013, SCUT-FORU to improve precision and recall, and when doing Finetune, we use the checkpoint training in Pretrain as our PRETRAINED_PATH.
COCO_TEXT_PARSER_PATH coco_text.py can refer to Link.
# training example
shell:
Ascend:
# distribute training example(8p)
sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [TASK_TYPE] [PRETRAINED_PATH]
# standalone training
sh run_standalone_train_ascend.sh [TASK_TYPE] [PRETRAINED_PATH]
Training result will be stored in the example path. Checkpoints will be stored at ckpt_path by default, and training log will be redirected to ./log, also the loss will be redirected to ./loss_0.log like followings.
377 epoch: 1 step: 229 ,rpn_loss: 0.00355, rpn_cls_loss: 0.00047, rpn_reg_loss: 0.00103,
399 epoch: 2 step: 229 ,rpn_loss: 0.00327,rpn_cls_loss: 0.00047, rpn_reg_loss: 0.00093,
424 epoch: 3 step: 229 ,rpn_loss: 0.00910, rpn_cls_loss: 0.00385, rpn_reg_loss: 0.00175,
You can start training using python or shell scripts. The usage of shell scripts as follows:
sh run_eval_ascend.sh [IMAGE_PATH] [DATASET_PATH] [CHECKPOINT_PATH]
After eval, you can get serval archive file named submit_ctpn-xx_xxxx.zip, which contains the name of your checkpoint file.To evalulate it, you can use the scripts provided by the ICDAR2013 network, you can download the Deteval scripts from the link
After download the scripts, unzip it and put it under ctpn/scripts and use eval_res.sh to get the result.You will get files as below:
gt.zip
readme.txt
rrc_evalulation_funcs_1_1.py
script.py
Then you can run the scripts/eval_res.sh to calculate the evalulation result.
bash eval_res.sh
Evaluation result will be stored in the example path, you can find result like the followings in log.
{"precision": 0.90791, "recall": 0.86118, "hmean": 0.88393}
| Parameters | Ascend |
|---|---|
| Model Version | CTPN |
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
| uploaded Date | 02/06/2021 |
| MindSpore Version | 1.1.1 |
| Dataset | 16930 images |
| Batch_size | 2 |
| Training Parameters | src/config.py |
| Optimizer | Momentum |
| Loss Function | SoftmaxCrossEntropyWithLogits for classification, SmoothL2Loss for bbox regression |
| Loss | ~0.04 |
| Total time (8p) | 6h |
| Scripts | ctpn script |
| Parameters | Ascend |
|---|---|
| Model Version | CTPN |
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
| Uploaded Date | 02/06/2020 |
| MindSpore Version | 1.1.1 |
| Dataset | 229 images |
| Batch_size | 1 |
| Accuracy | precision=0.9079, recall=0.8611 F-measure:0.8839 |
| Total time | 1 min |
| Model for inference | 135M (.ckpt file) |
| Ascend | train performance |
|---|---|
| 1p | 10 img/s |
| Ascend | train performance |
|---|---|
| 8p | 84 img/s |
We set seed to 1 in train.py.
Please check the official homepage.
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other