You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 16 kB

5 years ago
5 years ago
4 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313
  1. # Contents
  2. - [Contents](#contents)
  3. - [CRNN Description](#crnn-description)
  4. - [Model Architecture](#model-architecture)
  5. - [Dataset](#dataset)
  6. - [Dataset Prepare](#dataset-prepare)
  7. - [Environment Requirements](#environment-requirements)
  8. - [Quick Start](#quick-start)
  9. - [Script Description](#script-description)
  10. - [Script and Sample Code](#script-and-sample-code)
  11. - [Script Parameters](#script-parameters)
  12. - [Training Script Parameters](#training-script-parameters)
  13. - [Parameters Configuration](#parameters-configuration)
  14. - [Dataset Preparation](#dataset-preparation)
  15. - [Training Process](#training-process)
  16. - [Training](#training)
  17. - [Distributed Training](#distributed-training)
  18. - [Evaluation Process](#evaluation-process)
  19. - [Evaluation](#evaluation)
  20. - [Inference Process](#inference-process)
  21. - [Export MindIR](#export-mindir)
  22. - [Infer on Ascend310](#infer-on-ascend310)
  23. - [result](#result)
  24. - [Model Description](#model-description)
  25. - [Performance](#performance)
  26. - [Training Performance](#training-performance)
  27. - [Evaluation Performance](#evaluation-performance)
  28. - [Description of Random Situation](#description-of-random-situation)
  29. - [ModelZoo Homepage](#modelzoo-homepage)
  30. ## [CRNN Description](#contents)
  31. CRNN was a neural network for image based sequence recognition and its Application to scene text recognition.In this paper, we investigate the problem of scene text recognition, which is among the most important and challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for real-world application scenarios.
  32. [Paper](https://arxiv.org/abs/1507.05717): Baoguang Shi, Xiang Bai, Cong Yao, "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition", ArXiv, vol. abs/1507.05717, 2015.
  33. ## [Model Architecture](#content)
  34. CRNN use a vgg16 structure for feature extraction, the appending with two-layer bidirectional LSTM, finally use CTC to calculate loss. See src/crnn.py for details.
  35. ## [Dataset](#content)
  36. Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
  37. We use five datasets mentioned in the paper.For training, we use the synthetic dataset([MJSynth](https://www.robots.ox.ac.uk/~vgg/data/text/) and [SynthText](https://github.com/ankush-me/SynthText)) released by Jaderberg etal as the training data, which contains 8 millions training images and their corresponding ground truth words.For evaluation, we use four popular benchmarks for scene text recognition, nalely ICDAR 2003([IC03](http://www.iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions)),ICDAR2013([IC13](https://rrc.cvc.uab.es/?ch=2&com=downloads)),IIIT 5k-word([IIIT5k](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset)),and Street View Text([SVT](http://vision.ucsd.edu/~kai/grocr/)).
  38. ### [Dataset Prepare](#content)
  39. For dataset `IC03`, `IIIT5k` and `SVT`, the original dataset from the official website can not be used directly in CRNN.
  40. - `IC03`, the text need to be cropped from the original image according to the words.xml.
  41. - `IIIT5k`, the annotation need to be extracted from the matlib data file.
  42. - `SVT`, the text need to be cropped from the original image according to the `train.xml` or `test.xml`.
  43. We provide `convert_ic03.py`, `convert_iiit5k.py`, `convert_svt.py` as exmples for the aboving preprocessing which you can refer to.
  44. ## [Environment Requirements](#contents)
  45. - Hardware(Ascend)
  46. - Prepare hardware environment with Ascend processor.
  47. - Framework
  48. - [MindSpore](https://gitee.com/mindspore/mindspore)
  49. - For more information, please check the resources below:
  50. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  51. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  52. ## [Quick Start](#contents)
  53. - After the dataset is prepared, you may start running the training or the evaluation scripts as follows:
  54. - Running on Ascend
  55. ```shell
  56. # distribute training example in Ascend
  57. $ bash run_distribute_train.sh [DATASET_NAME] [RANK_TABLE_FILE] [DATASET_PATH]
  58. # evaluation example in Ascend
  59. $ bash run_eval.sh [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] [PLATFORM]
  60. # standalone training example in Ascend
  61. $ bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM]
  62. # offline inference on Ascend310
  63. $ bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [ANN_FILE_PATH] [DATASET] [DEVICE_ID]
  64. ```
  65. DATASET_NAME is one of `ic03`, `ic13`, `svt`, `iiit5k`, `synth`.
  66. For distributed training, a hccl configuration file with JSON format needs to be created in advance.
  67. Please follow the instructions in the link below:
  68. [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
  69. - Run on docker
  70. Build docker images(Change version to the one you actually used)
  71. ```shell
  72. # build docker
  73. docker build -t ssd:20.1.0 . --build-arg FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0
  74. ```
  75. Create a container layer over the created image and start it
  76. ```shell
  77. # start docker
  78. bash scripts/docker_start.sh ssd:20.1.0 [DATA_DIR] [MODEL_DIR]
  79. ```
  80. Then you can run everything just like on Ascend.
  81. ## [Script Description](#contents)
  82. ### [Script and Sample Code](#contents)
  83. ```shell
  84. crnn
  85. ├── README.md # Descriptions about CRNN
  86. ├── convert_ic03.py # Convert the original IC03 daatset
  87. ├── convert_iiit5k.py # Convert the original IIIT5K dataset
  88. ├── convert_svt.py # Convert the original SVT dataset
  89. ├── requirements.txt # Requirements for this dataset
  90. ├── scripts
  91. │   ├── run_distribute_train.sh # Launch distributed training in Ascend(8 pcs)
  92. │   ├── run_eval.sh # Launch evaluation
  93. │   └── run_standalone_train.sh # Launch standalone training(1 pcs)
  94. ├── src
  95. │   ├── config.py # Parameter configuration
  96. │   ├── crnn.py # crnn network definition
  97. │   ├── crnn_for_train.py # crnn network with grad, loss and gradient clip
  98. │   ├── dataset.py # Data preprocessing for training and evaluation
  99. │   ├── eval_callback.py
  100. │   ├── ic03_dataset.py # Data preprocessing for IC03
  101. │   ├── ic13_dataset.py # Data preprocessing for IC13
  102. │   ├── iiit5k_dataset.py # Data preprocessing for IIIT5K
  103. │   ├── loss.py # Ctcloss definition
  104. │   ├── metric.py # accuracy metric for crnn network
  105. │   └── svt_dataset.py # Data preprocessing for SVT
  106. └── train.py # Training script
  107. ├── eval.py # Evaluation Script
  108. ```
  109. ### [Script Parameters](#contents)
  110. #### Training Script Parameters
  111. ```shell
  112. # distributed training in Ascend
  113. Usage: bash run_distribute_train.sh [DATASET_NAME] [RANK_TABLE_FILE] [DATASET_PATH]
  114. # standalone training
  115. Usage: bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM]
  116. ```
  117. #### Parameters Configuration
  118. Parameters for both training and evaluation can be set in config.py.
  119. ```shell
  120. max_text_length": 23, # max number of digits in each
  121. "image_width": 100, # width of text images
  122. "image_height": 32, # height of text images
  123. "batch_size": 64, # batch size of input tensor
  124. "epoch_size": 10, # only valid for taining, which is always 1
  125. "hidden_size": 256, # hidden size in LSTM layers
  126. "learning_rate": 0.02, # initial learning rate
  127. "momentum": 0.95, # momentum of SGD optimizer
  128. "nesterov": True, # enable nesterov in SGD optimizer
  129. "save_checkpoint": True, # whether save checkpoint or not
  130. "save_checkpoint_steps": 1000, # the step interval between two checkpoints.
  131. "keep_checkpoint_max": 30, # only keep the last keep_checkpoint_max
  132. "save_checkpoint_path": "./", # path to save checkpoint
  133. "class_num": 37, # dataset class num
  134. "input_size": 512, # input size for LSTM layer
  135. "num_step": 24, # num step for LSTM layer
  136. "use_dropout": True, # whether use dropout
  137. "blank": 36 # add blank for classification
  138. ```
  139. ### [Dataset Preparation](#contents)
  140. - You may refer to "Generate dataset" in [Quick Start](#quick-start) to automatically generate a dataset, or you may choose to generate a text image dataset by yourself.
  141. ## [Training Process](#contents)
  142. - Set options in `config.py`, including learning rate and other network hyperparameters. Click [MindSpore dataset preparation tutorial](https://www.mindspore.cn/tutorial/training/zh-CN/master/use/data_preparation.html) for more information about dataset.
  143. ### [Training](#contents)
  144. - Run `run_standalone_train.sh` for non-distributed training of CRNN model, only support Ascend now.
  145. ``` bash
  146. bash run_standalone_train.sh [DATASET_NAME] [DATASET_PATH] [PLATFORM](optional)
  147. ```
  148. #### [Distributed Training](#contents)
  149. - Run `run_distribute_train.sh` for distributed training of CRNN model on Ascend.
  150. ``` bash
  151. bash run_distribute_train.sh [DATASET_NAME] [RANK_TABLE_FILE] [DATASET_PATH]
  152. ```
  153. Check the `train_parallel0/log.txt` and you will get outputs as following:
  154. ```shell
  155. epoch: 10 step: 14110, loss is 0.0029097411
  156. Epoch time: 2743.688s, per step time: 0.097s
  157. ```
  158. ## [Evaluation Process](#contents)
  159. ### [Evaluation](#contents)
  160. - Run `run_eval.sh` for evaluation.
  161. ``` bash
  162. bash run_eval.sh [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] [PLATFORM](optional)
  163. ```
  164. Check the `eval/log.txt` and you will get outputs as following:
  165. ```shell
  166. result: {'CRNNAccuracy': (0.806)}
  167. ```
  168. ### Evaluation while training
  169. You can add `run_eval` to start shell and set it True.You need also add `eval_dataset` to select which dataset to eval, and add eval_dataset_path to start shell if you want evaluation while training. And you can set argument option: `save_best_ckpt`, `eval_start_epoch`, `eval_interval` when `run_eval` is True.
  170. ## [Inference Process](#contents)
  171. ### [Export MindIR](#contents)
  172. ```shell
  173. python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
  174. ```
  175. The ckpt_file parameter is required,
  176. `EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
  177. ### Infer on Ascend310
  178. Before performing inference, the mindir file must bu exported by export script on the 910 environment. We only provide an example of inference using MINDIR model.
  179. Current batch_Size can only be set to 1. The inference result will be just the network outputs, which will be save in binary file. The accuracy is calculated by `src/metric.`.
  180. ```shell
  181. # Ascend310 inference
  182. bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [ANN_FILE_PATH] [DATASET] [DEVICE_ID]
  183. ```
  184. `MINDIR_PATH` is the MINDIR model exported by export.py
  185. `DATA_PATH` is the path of dataset. If the data has to be converted, passing the path to the converted data.
  186. `ANN_FILE_PATH` is the path of annotation file. For converted data, the annotation file is exported by convert scripts.
  187. `DATASET` is the name of dataset, which should be in ["synth", "svt", "iiit5k", "ic03", "ic13"]
  188. `DEVICE_ID` is optional, default value is 0.
  189. ### result
  190. Inference result is saved in current path, you can find result like this in acc.log file.
  191. ```shell
  192. correct num: 2042 , total num: 3000
  193. result CRNNAccuracy is: 0.806666666666
  194. ```
  195. ## [Model Description](#contents)
  196. ### [Performance](#contents)
  197. #### [Training Performance](#contents)
  198. | Parameters | Ascend 910 |
  199. | -------------------------- | --------------------------------------------------|
  200. | Model Version | v1.0 |
  201. | Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
  202. | uploaded Date | 12/15/2020 (month/day/year) |
  203. | MindSpore Version | 1.0.1 |
  204. | Dataset | Synth |
  205. | Training Parameters | epoch=10, steps per epoch=14110, batch_size = 64 |
  206. | Optimizer | SGD |
  207. | Loss Function | CTCLoss |
  208. | outputs | probability |
  209. | Loss | 0.0029097411 |
  210. | Speed | 118ms/step(8pcs) |
  211. | Total time | 557 mins |
  212. | Parameters (M) | 83M (.ckpt file) |
  213. | Checkpoint for Fine tuning | 20.3M (.ckpt file) |
  214. | Scripts | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/crnn) | [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/crnn) |
  215. #### [Evaluation Performance](#contents)
  216. | Parameters | SVT | IIIT5K |
  217. | ------------------- | --------------------------- | --------------------------- |
  218. | Model Version | V1.0 | V1.0 |
  219. | Resource | Ascend 910; OS Euler2.8 | Ascend 910 |
  220. | Uploaded Date | 12/15/2020 (month/day/year) | 12/15/2020 (month/day/year) |
  221. | MindSpore Version | 1.0.1 | 1.0.1 |
  222. | Dataset | SVT | IIIT5K |
  223. | batch_size | 1 | 1 |
  224. | outputs | ACC | ACC |
  225. | Accuracy | 80.8% | 79.7% |
  226. | Model for inference | 83M (.ckpt file) | 83M (.ckpt file) |
  227. ## [Description of Random Situation](#contents)
  228. In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py for weight initialization.
  229. ## [ModelZoo Homepage](#contents)
  230. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)