You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 12 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236
  1. # Contents
  2. - [PSENet Description](#PSENet-description)
  3. - [Dataset](#dataset)
  4. - [Features](#features)
  5. - [Mixed Precision](#mixed-precision)
  6. - [Environment Requirements](#environment-requirements)
  7. - [Quick Start](#quick-start)
  8. - [Script Description](#script-description)
  9. - [Script and Sample Code](#script-and-sample-code)
  10. - [Script Parameters](#script-parameters)
  11. - [Training Process](#training-process)
  12. - [Training](#training)
  13. - [Distributed Training](#distributed-training)
  14. - [Evaluation Process](#evaluation-process)
  15. - [Evaluation](#evaluation)
  16. - [Model Description](#model-description)
  17. - [Performance](#performance)
  18. - [Evaluation Performance](#evaluation-performance)
  19. - [Inference Performance](#evaluation-performance)
  20. - [How to use](#how-to-use)
  21. - [Inference](#inference)
  22. - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model)
  23. - [Transfer Learning](#transfer-learning)
  24. # [PSENet Description](#contents)
  25. With the development of convolutional neural network, scene text detection technology has been developed rapidly. However, there are still two problems in this algorithm, which hinders its application in industry. On the one hand, most of the existing algorithms require quadrilateral bounding boxes to accurately locate arbitrary shape text. On the other hand, two adjacent instances of text can cause error detection overwriting both instances. Traditionally, a segmentation-based approach can solve the first problem, but usually not the second. To solve these two problems, a new PSENet (PSENet) is proposed, which can accurately detect arbitrary shape text instances. More specifically, PSENet generates different scale kernels for each text instance and gradually expands the minimum scale kernel to a text instance with full shape. Because of the large geometric margins between the minimum scale kernels, our method can effectively segment closed text instances, making it easier to detect arbitrary shape text instances. The effectiveness of PSENet has been verified by numerous experiments on CTW1500, full text, ICDAR 2015, and ICDAR 2017 MLT.
  26. [Paper](https://openaccess.thecvf.com/content_CVPR_2019/html/Wang_Shape_Robust_Text_Detection_With_Progressive_Scale_Expansion_Network_CVPR_2019_paper.html): Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, Shuai Shao; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9336-9345
  27. # PSENet Example
  28. ## Description
  29. Progressive Scale Expansion Network (PSENet) is a text detector which is able to well detect the arbitrary-shape text in natural scene.
  30. # [Dataset](#contents)
  31. Dataset used: [ICDAR2015](https://rrc.cvc.uab.es/?ch=4&com=tasks#TextLocalization)
  32. A training set of 1000 images containing about 4500 readable words
  33. A testing set containing about 2000 readable words
  34. # [Environment Requirements](#contents)
  35. - Hardware(Ascend)
  36. - Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
  37. - Framework
  38. - [MindSpore](http://www.mindspore.cn/install/en)
  39. - For more information, please check the resources below:
  40. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  41. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  42. - install Mindspore
  43. - install [pyblind11](https://github.com/pybind/pybind11)
  44. - install [Opencv3.4](https://docs.opencv.org/3.4.9/d7/d9f/tutorial_linux_install.html)
  45. # [Quick Start](#contents)
  46. After installing MindSpore via the official website, you can start training and evaluation as follows:
  47. ```python
  48. # run distributed training example
  49. sh scripts/run_distribute_train.sh pretrained_model.ckpt
  50. #setup opencv library
  51. download pyblind11, opencv3.4,setup opencv3.4
  52. #make so file
  53. run src/ETSNET/pse/Makefile; make libadaptor.so
  54. #run test.py
  55. python test.py --ckpt=pretrained_model.ckpt
  56. #download eval method from [here](https://rrc.cvc.uab.es/?ch=4&com=tasks#TextLocalization).
  57. #click "My Methods" button,then download Evaluation Scripts
  58. download script.py
  59. # run evaluation example
  60. sh scripts/run_eval_ascend.sh
  61. ```
  62. # [Script Description](#contents)
  63. ## [Script and Sample Code](#contents)
  64. ```
  65. └── PSENet
  66. ├── README.md // descriptions about PSENet
  67. ├── scripts
  68. ├── run_distribute_train.sh // shell script for distributed
  69. └── eval_ic15.sh // shell script for evaluation
  70. ├── src
  71. ├── __init__.py
  72. ├── generate_hccn_file.py // creating rank.json
  73. ├── ETSNET
  74. ├── __init__.py
  75. ├── base.py // convolution and BN operator
  76. ├── dice_loss.py // calculate PSENet loss value
  77. ├── etsnet.py // Subnet in PSENet
  78. ├── fpn.py // Subnet in PSENet
  79. ├── resnet50.py // Subnet in PSENet
  80. ├── pse // Subnet in PSENet
  81. ├── __init__.py
  82. ├── adaptor.cpp
  83. ├── adaptor.h
  84. ├── Makefile
  85. ├── config.py // parameter configuration
  86. ├── dataset.py // creating dataset
  87. └── network_define.py // PSENet architecture
  88. ├── test.py // test script
  89. └── train.py // training script
  90. ```
  91. ## [Script Parameters](#contents)
  92. ```python
  93. Major parameters in train.py and config.py are:
  94. --pre_trained: Whether training from scratch or training based on the
  95. pre-trained model.Optional values are True, False.
  96. --device_id: Device ID used to train or evaluate the dataset. Ignore it
  97. when you use train.sh for distributed training.
  98. --device_num: devices used when you use train.sh for distributed training.
  99. ```
  100. ## [Training Process](#contents)
  101. ### Distributed Training
  102. ```
  103. sh scripts/run_distribute_train.sh pretrained_model.ckpt
  104. ```
  105. The above shell script will run distribute training in the background. You can view the results through the file
  106. `device[X]/log`. The loss value will be achieved as follows:
  107. ```
  108. # grep "epoch: " device_*/loss.log
  109. device_0/log:epoch: 1, step: 20, loss is 0.80383
  110. device_0/log:epcoh: 2, step: 40, loss is 0.77951
  111. ...
  112. device_1/log:epoch: 1, step: 20, loss is 0.78026
  113. device_1/log:epcoh: 2, step: 40, loss is 0.76629
  114. ```
  115. ## [Evaluation Process](#contents)
  116. ### Eval Script for ICDAR2015
  117. #### Usage
  118. + step 1: download eval method from [here](https://rrc.cvc.uab.es/?ch=4&com=tasks#TextLocalization).
  119. + step 2: click "My Methods" button,then download Evaluation Scripts.
  120. + step 3: it is recommended to symlink the eval method root to $MINDSPORE/model_zoo/psenet/eval_ic15/. if your folder structure is different,you may need to change the corresponding paths in eval script files.
  121. ```
  122. sh ./script/run_eval_ascend.sh.sh
  123. ```
  124. #### Result
  125. Calculated!{"precision": 0.814796668299853, "recall": 0.8006740491092923, "hmean": 0.8076736279747451, "AP": 0}
  126. # [Model Description](#contents)
  127. ## [Performance](#contents)
  128. ### Evaluation Performance
  129. | Parameters | PSENet |
  130. | -------------------------- | ----------------------------------------------------------- |
  131. | Model Version | Inception V1 |
  132. | Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G |
  133. | uploaded Date | 09/15/2020 (month/day/year) |
  134. | MindSpore Version | 1.0-alpha |
  135. | Dataset | ICDAR2015 |
  136. | Training Parameters | start_lr=0.1; lr_scale=0.1 |
  137. | Optimizer | SGD |
  138. | Loss Function | LossCallBack |
  139. | outputs | probability |
  140. | Loss | 0.35 |
  141. | Speed | 1pc: 444 ms/step; 4pcs: 446 ms/step |
  142. | Total time | 1pc: 75.48 h; 4pcs: 18.87 h |
  143. | Parameters (M) | 27.36 |
  144. | Checkpoint for Fine tuning | 109.44M (.ckpt file) |
  145. | Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/psenet |
  146. ### Inference Performance
  147. | Parameters | PSENet |
  148. | ------------------- | --------------------------- |
  149. | Model Version | Inception V1 |
  150. | Resource | Ascend 910 |
  151. | Uploaded Date | 09/15/2020 (month/day/year) |
  152. | MindSpore Version | 1.0-alpha |
  153. | Dataset | ICDAR2015 |
  154. | outputs | probability |
  155. | Accuracy | 1pc: 81%; 8pcs: 81% |
  156. ## [How to use](#contents)
  157. ### Inference
  158. If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). Following the steps below, this is a simple example:
  159. ```
  160. # Load unseen dataset for inference
  161. dataset = dataset.create_dataset(cfg.data_path, 1, False)
  162. # Define model
  163. config.INFERENCE = False
  164. net = ETSNet(config)
  165. net = net.set_train()
  166. param_dict = load_checkpoint(args.pre_trained)
  167. load_param_into_net(net, param_dict)
  168. print('Load Pretrained parameters done!')
  169. criterion = DiceLoss(batch_size=config.TRAIN_BATCH_SIZE)
  170. lrs = lr_generator(start_lr=1e-3, lr_scale=0.1, total_iters=config.TRAIN_TOTAL_ITER)
  171. opt = nn.SGD(params=net.trainable_params(), learning_rate=lrs, momentum=0.99, weight_decay=5e-4)
  172. # warp model
  173. net = WithLossCell(net, criterion)
  174. net = TrainOneStepCell(net, opt)
  175. time_cb = TimeMonitor(data_size=step_size)
  176. loss_cb = LossCallBack(per_print_times=20)
  177. # set and apply parameters of check point
  178. ckpoint_cf = CheckpointConfig(save_checkpoint_steps=1875, keep_checkpoint_max=2)
  179. ckpoint_cb = ModelCheckpoint(prefix="ETSNet", config=ckpoint_cf, directory=config.TRAIN_MODEL_SAVE_PATH)
  180. model = Model(net)
  181. model.train(config.TRAIN_REPEAT_NUM, ds, dataset_sink_mode=False, callbacks=[time_cb, loss_cb, ckpoint_cb])
  182. # Load pre-trained model
  183. param_dict = load_checkpoint(cfg.checkpoint_path)
  184. load_param_into_net(net, param_dict)
  185. net.set_train(False)
  186. # Make predictions on the unseen dataset
  187. acc = model.eval(dataset)
  188. print("accuracy: ", acc)
  189. ```