You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 15 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340
  1. ![logo](https://www.mindspore.cn/static/img/logo_black.6a5c850d.png)
  2. # CTPN for Ascend
  3. <!-- TOC -->
  4. - [CTPN Description](#CTPN-description)
  5. - [Model Architecture](#model-architecture)
  6. - [Dataset](#dataset)
  7. - [Features](#features)
  8. - [Mixed Precision](#mixed-precision)
  9. - [Environment Requirements](#environment-requirements)
  10. - [Script Description](#script-description)
  11. - [Script and Sample Code](#script-and-sample-code)
  12. - [Training Process](#training-process)
  13. - [Evaluation Process](#evaluation-process)
  14. - [Evaluation](#evaluation)
  15. - [Model Description](#model-description)
  16. - [Performance](#performance)
  17. - [Training Performance](#evaluation-performance)
  18. - [Inference Performance](#evaluation-performance)
  19. - [Description of Random Situation](#description-of-random-situation)
  20. - [ModelZoo Homepage](#modelzoo-homepage)
  21. # [CTPN Description](#contents)
  22. CTPN is a text detection model based on object detection method. It improves Faster R-CNN and combines with bidirectional LSTM, so ctpn is very effective for horizontal text detection. Another highlight of ctpn is to transform the text detection task into a series of small-scale text box detection.This idea was proposed in the paper "Detecting Text in Natural Image with Connectionist Text Proposal Network".
  23. [Paper](https://arxiv.org/pdf/1609.03605.pdf) Zhi Tian, Weilin Huang, Tong He, Pan He, Yu Qiao, "Detecting Text in Natural Image with Connectionist Text Proposal Network", ArXiv, vol. abs/1609.03605, 2016.
  24. # [Model architecture](#contents)
  25. The overall network architecture contains a VGG16 as backbone, and use bidirection lstm to extract context feature of the small-scale text box, then it used the RPN(RegionProposal Network) to predict the boundding box and probability.
  26. [Link](https://arxiv.org/pdf/1605.07314v1.pdf)
  27. # [Dataset](#contents)
  28. Here we used 6 datasets for training, and 1 datasets for Evaluation.
  29. - Dataset1: ICDAR 2013: Focused Scene Text
  30. - Train: 142MB, 229 images
  31. - Test: 110MB, 233 images
  32. - Dataset2: ICDAR 2011: Born-Digital Images
  33. - Train: 27.7MB, 410 images
  34. - Dataset3: ICDAR 2015:
  35. - Train:89MB, 1000 images
  36. - Dataset4: SCUT-FORU: Flickr OCR Universal Database
  37. - Train: 388MB, 1715 images
  38. - Dataset5: CocoText v2(Subset of MSCOCO2017):
  39. - Train: 13GB, 63686 images
  40. - Dataset6: SVT(The Street View Dataset)
  41. - Train: 115MB, 349 images
  42. # [Features](#contents)
  43. # [Environment Requirements](#contents)
  44. - Hardware(Ascend)
  45. - Prepare hardware environment with Ascend processor.
  46. - Framework
  47. - [MindSpore](https://www.mindspore.cn/install/en)
  48. - For more information, please check the resources below:
  49. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  50. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  51. # [Script description](#contents)
  52. ## [Script and sample code](#contents)
  53. ```shell
  54. .
  55. └─ctpn
  56. ├── README.md # network readme
  57. ├──ascend310_infer #application for 310 inference
  58. ├── eval.py # eval net
  59. ├── scripts
  60. │   ├── eval_res.sh # calculate precision and recall
  61. │   ├── run_distribute_train_ascend.sh # launch distributed training with ascend platform(8p)
  62. │   ├── run_eval_ascend.sh # launch evaluating with ascend platform
  63. │ ├──run_infer_310.sh # shell script for 310 inference
  64. │   └── run_standalone_train_ascend.sh # launch standalone training with ascend platform(1p)
  65. ├── src
  66. │   ├── CTPN
  67. │   │   ├── BoundingBoxDecode.py # bounding box decode
  68. │   │   ├── BoundingBoxEncode.py # bounding box encode
  69. │   │   ├── __init__.py # package init file
  70. │   │   ├── anchor_generator.py # anchor generator
  71. │   │   ├── bbox_assign_sample.py # proposal layer
  72. │   │   ├── proposal_generator.py # proposla generator
  73. │   │   ├── rpn.py # region-proposal network
  74. │   │   └── vgg16.py # backbone
  75. │   ├── config.py # training configuration
  76. │   ├── convert_icdar2015.py # convert icdar2015 dataset label
  77. │   ├── convert_svt.py # convert svt label
  78. │   ├── create_dataset.py # create mindrecord dataset
  79. │   ├── ctpn.py # ctpn network definition
  80. │   ├── dataset.py # data proprocessing
  81. │   ├── lr_schedule.py # learning rate scheduler
  82. │   ├── network_define.py # network definition
  83. │   └── text_connector
  84. │   ├── __init__.py # package init file
  85. │   ├── connect_text_lines.py # connect text lines
  86. │   ├── detector.py # detect box
  87. │   ├── get_successions.py # get succession proposal
  88. │   └── utils.py # some functions which is commonly used
  89. ├──postprogress.py # post process for 310 inference
  90. ├──export.py # script to export AIR,MINDIR model
  91. └── train.py # train net
  92. ```
  93. ## [Training process](#contents)
  94. ### Dataset
  95. To create dataset, download the dataset first and deal with it.We provided src/convert_svt.py and src/convert_icdar2015.py to deal with svt and icdar2015 dataset label.For svt dataset, you can deal with it as below:
  96. ```shell
  97. python convert_svt.py --dataset_path=/path/img --xml_file=/path/train.xml --location_dir=/path/location
  98. ```
  99. For ICDAR2015 dataset, you can deal with it
  100. ```shell
  101. python convert_icdar2015.py --src_label_path=/path/train_label --target_label_path=/path/label
  102. ```
  103. Then modify the src/config.py and add the dataset path.For each path, add IMAGE_PATH and LABEL_PATH into a list in config.An example is show as blow:
  104. ```python
  105. # create dataset
  106. "coco_root": "/path/coco",
  107. "coco_train_data_type": "train2017",
  108. "cocotext_json": "/path/cocotext.v2.json",
  109. "icdar11_train_path": ["/path/image/", "/path/label"],
  110. "icdar13_train_path": ["/path/image/", "/path/label"],
  111. "icdar15_train_path": ["/path/image/", "/path/label"],
  112. "icdar13_test_path": ["/path/image/", "/path/label"],
  113. "flick_train_path": ["/path/image/", "/path/label"],
  114. "svt_train_path": ["/path/image/", "/path/label"],
  115. "pretrain_dataset_path": "",
  116. "finetune_dataset_path": "",
  117. "test_dataset_path": "",
  118. ```
  119. Then you can create dataset with src/create_dataset.py with the command as below:
  120. ```shell
  121. python src/create_dataset.py
  122. ```
  123. ### Usage
  124. - Ascend:
  125. ```bash
  126. # distribute training example(8p)
  127. sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [TASK_TYPE] [PRETRAINED_PATH]
  128. # standalone training
  129. sh run_standalone_train_ascend.sh [TASK_TYPE] [PRETRAINED_PATH]
  130. # evaluation:
  131. sh run_eval_ascend.sh [IMAGE_PATH] [DATASET_PATH] [CHECKPOINT_PATH]
  132. ```
  133. The `pretrained_path` should be a checkpoint of vgg16 trained on Imagenet2012. The name of weight in dict should be totally the same, also the batch_norm should be enabled in the trainig of vgg16, otherwise fails in further steps.COCO_TEXT_PARSER_PATH coco_text.py can refer to [Link](https://github.com/andreasveit/coco-text).To get the vgg16 backbone, you can use the network structure defined in src/CTPN/vgg16.py.To train the backbone, copy the src/CTPN/vgg16.py under modelzoo/official/cv/vgg16/src/, and modify the vgg16/train.py to suit the new construction.You can fix it as below:
  134. ```python
  135. ...
  136. from src.vgg16 import VGG16
  137. ...
  138. network = VGG16()
  139. ...
  140. ```
  141. Then you can train it with ImageNet2012.
  142. > Notes:
  143. > RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/distributed_training_ascend.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.
  144. >
  145. > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh`
  146. >
  147. > TASK_TYPE contains Pretraining and Finetune. For Pretraining, we use ICDAR2013, ICDAR2015, SVT, SCUT-FORU, CocoText v2. For Finetune, we use ICDAR2011,
  148. ICDAR2013, SCUT-FORU to improve precision and recall, and when doing Finetune, we use the checkpoint training in Pretrain as our PRETRAINED_PATH.
  149. > COCO_TEXT_PARSER_PATH coco_text.py can refer to [Link](https://github.com/andreasveit/coco-text).
  150. >
  151. ### Launch
  152. ```bash
  153. # training example
  154. shell:
  155. Ascend:
  156. # distribute training example(8p)
  157. sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [TASK_TYPE] [PRETRAINED_PATH]
  158. # standalone training
  159. sh run_standalone_train_ascend.sh [TASK_TYPE] [PRETRAINED_PATH]
  160. ```
  161. ### Result
  162. Training result will be stored in the example path. Checkpoints will be stored at `ckpt_path` by default, and training log will be redirected to `./log`, also the loss will be redirected to `./loss_0.log` like followings.
  163. ```python
  164. 377 epoch: 1 step: 229 ,rpn_loss: 0.00355, rpn_cls_loss: 0.00047, rpn_reg_loss: 0.00103,
  165. 399 epoch: 2 step: 229 ,rpn_loss: 0.00327,rpn_cls_loss: 0.00047, rpn_reg_loss: 0.00093,
  166. 424 epoch: 3 step: 229 ,rpn_loss: 0.00910, rpn_cls_loss: 0.00385, rpn_reg_loss: 0.00175,
  167. ```
  168. ## [Eval process](#contents)
  169. ### Usage
  170. You can start training using python or shell scripts. The usage of shell scripts as follows:
  171. - Ascend:
  172. ```bash
  173. sh run_eval_ascend.sh [IMAGE_PATH] [DATASET_PATH] [CHECKPOINT_PATH]
  174. ```
  175. After eval, you can get serval archive file named submit_ctpn-xx_xxxx.zip, which contains the name of your checkpoint file.To evalulate it, you can use the scripts provided by the ICDAR2013 network, you can download the Deteval scripts from the [link](https://rrc.cvc.uab.es/?com=downloads&action=download&ch=2&f=aHR0cHM6Ly9ycmMuY3ZjLnVhYi5lcy9zdGFuZGFsb25lcy9zY3JpcHRfdGVzdF9jaDJfdDFfZTItMTU3Nzk4MzA2Ny56aXA=)
  176. After download the scripts, unzip it and put it under ctpn/scripts and use eval_res.sh to get the result.You will get files as below:
  177. ```text
  178. gt.zip
  179. readme.txt
  180. rrc_evalulation_funcs_1_1.py
  181. script.py
  182. ```
  183. Then you can run the scripts/eval_res.sh to calculate the evalulation result.
  184. ```base
  185. bash eval_res.sh
  186. ```
  187. ### Result
  188. Evaluation result will be stored in the example path, you can find result like the followings in `log`.
  189. ```text
  190. {"precision": 0.90791, "recall": 0.86118, "hmean": 0.88393}
  191. ```
  192. ## Model Export
  193. ```shell
  194. python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT]
  195. ```
  196. `EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
  197. ## [Inference process](#contents)
  198. ### Usage
  199. Before performing inference, the air file must bu exported by export script on the Ascend910 environment.
  200. ```shell
  201. # Ascend310 inference
  202. bash run_infer_310.sh [MODEL_PATH] [DATA_PATH] [ANN_FILE_PATH] [DEVICE_ID]]
  203. ```
  204. After inference, you can get a archive file named submit.zip.To evalulate it, you can use the scripts provided by the ICDAR2013 network, you can download the Deteval scripts from the [link](https://rrc.cvc.uab.es/?com=downloads&action=download&ch=2&f=aHR0cHM6Ly9ycmMuY3ZjLnVhYi5lcy9zdGFuZGFsb25lcy9zY3JpcHRfdGVzdF9jaDJfdDFfZTItMTU3Nzk4MzA2Ny56aXA=)
  205. After download the scripts, unzip it and put it under ctpn/scripts and use eval_res.sh to get the result.You will get files as below:
  206. ```text
  207. gt.zip
  208. readme.txt
  209. rrc_evalulation_funcs_1_1.py
  210. script.py
  211. ```
  212. Then you can run the scripts/eval_res.sh to calculate the evalulation result.
  213. ```base
  214. bash eval_res.sh
  215. ```
  216. ### Result
  217. Evaluation result will be stored in the example path, you can find result like the followings in `log`.
  218. ```text
  219. {"precision": 0.88913, "recall": 0.86082, "hmean": 0.87475}
  220. ```
  221. # [Model description](#contents)
  222. ## [Performance](#contents)
  223. ### Training Performance
  224. | Parameters | Ascend |
  225. | -------------------------- | ------------------------------------------------------------ |
  226. | Model Version | CTPN |
  227. | Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
  228. | uploaded Date | 02/06/2021 |
  229. | MindSpore Version | 1.1.1 |
  230. | Dataset | 16930 images |
  231. | Batch_size | 2 |
  232. | Training Parameters | src/config.py |
  233. | Optimizer | Momentum |
  234. | Loss Function | SoftmaxCrossEntropyWithLogits for classification, SmoothL2Loss for bbox regression|
  235. | Loss | ~0.04 |
  236. | Total time (8p) | 6h |
  237. | Scripts | [ctpn script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/ctpn) |
  238. #### Inference Performance
  239. | Parameters | Ascend |
  240. | ------------------- | --------------------------- |
  241. | Model Version | CTPN |
  242. | Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G |
  243. | Uploaded Date | 02/06/2020 |
  244. | MindSpore Version | 1.1.1 |
  245. | Dataset | 229 images |
  246. | Batch_size | 1 |
  247. | Accuracy | precision=0.9079, recall=0.8611 F-measure:0.8839 |
  248. | Total time | 1 min |
  249. | Model for inference | 135M (.ckpt file) |
  250. #### Training performance results
  251. | **Ascend** | train performance |
  252. | :--------: | :---------------: |
  253. | 1p | 10 img/s |
  254. | **Ascend** | train performance |
  255. | :--------: | :---------------: |
  256. | 8p | 84 img/s |
  257. # [Description of Random Situation](#contents)
  258. We set seed to 1 in train.py.
  259. # [ModelZoo Homepage](#contents)
  260. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).