You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 16 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354
  1. # Contents
  2. - [CNNCTC Description](#CNNCTC-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Features](#features)
  6. - [Mixed Precision](#mixed-precision)
  7. - [Environment Requirements](#environment-requirements)
  8. - [Quick Start](#quick-start)
  9. - [Script Description](#script-description)
  10. - [Script and Sample Code](#script-and-sample-code)
  11. - [Script Parameters](#script-parameters)
  12. - [Training Process](#training-process)
  13. - [Training](#training)
  14. - [Distributed Training](#distributed-training)
  15. - [Evaluation Process](#evaluation-process)
  16. - [Evaluation](#evaluation)
  17. - [Model Description](#model-description)
  18. - [Performance](#performance)
  19. - [Evaluation Performance](#evaluation-performance)
  20. - [Inference Performance](#evaluation-performance)
  21. - [How to use](#how-to-use)
  22. - [Inference](#inference)
  23. - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model)
  24. - [Transfer Learning](#transfer-learning)
  25. - [Description of Random Situation](#description-of-random-situation)
  26. - [ModelZoo Homepage](#modelzoo-homepage)
  27. # [CNNCTC Description](#contents)
  28. This paper proposes three major contributions to addresses scene text recognition (STR).
  29. First, we examine the inconsistencies of training and evaluation datasets, and the performance gap results from inconsistencies.
  30. Second, we introduce a unified four-stage STR framework that most existing STR models fit into.
  31. Using this framework allows for the extensive evaluation of previously proposed STR modules and the discovery of previously
  32. unexplored module combinations. Third, we analyze the module-wise contributions to performance in terms of accuracy, speed,
  33. and memory demand, under one consistent set of training and evaluation datasets. Such analyses clean up the hindrance on the current
  34. comparisons to understand the performance gain of the existing modules.
  35. [Paper](https://arxiv.org/abs/1904.01906): J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis,” ArXiv, vol. abs/1904.01906, 2019.
  36. # [Model Architecture](#contents)
  37. This is an example of training CNN+CTC model for text recognition on MJSynth and SynthText dataset with MindSpore.
  38. # [Dataset](#contents)
  39. The [MJSynth](https://www.robots.ox.ac.uk/~vgg/data/text/) and [SynthText](https://github.com/ankush-me/SynthText) dataset are used for model training. The [The IIIT 5K-word dataset](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) dataset is used for evaluation.
  40. - step 1:
  41. All the datasets have been preprocessed and stored in .lmdb format and can be downloaded [**HERE**](https://drive.google.com/drive/folders/192UfE9agQUMNq6AgU3_E05_FcPZK4hyt).
  42. - step 2:
  43. Uncompress the downloaded file, rename the MJSynth dataset as MJ, the SynthText dataset as ST and the IIIT dataset as IIIT.
  44. - step 3:
  45. Move above mentioned three datasets into `cnnctc_data` folder, and the structure should be as below:
  46. ```
  47. |--- CNNCTC/
  48. |--- cnnctc_data/
  49. |--- ST/
  50. data.mdb
  51. lock.mdb
  52. |--- MJ/
  53. data.mdb
  54. lock.mdb
  55. |--- IIIT/
  56. data.mdb
  57. lock.mdb
  58. ......
  59. ```
  60. - step 4:
  61. Preprocess the dataset by running:
  62. ```
  63. python src/preprocess_dataset.py
  64. ```
  65. This takes around 75 minutes.
  66. # [Features](#contents)
  67. ## Mixed Precision
  68. The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
  69. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
  70. # [Environment Requirements](#contents)
  71. - Hardware(Ascend)
  72. - Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
  73. - Framework
  74. - [MindSpore](https://www.mindspore.cn/install/en)
  75. - For more information, please check the resources below:
  76. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html)
  77. - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)
  78. # [Quick Start](#contents)
  79. - Install dependencies:
  80. ```
  81. pip install lmdb
  82. pip install Pillow
  83. pip install tqdm
  84. pip install six
  85. ```
  86. - Standalone Training:
  87. ```
  88. bash scripts/run_standalone_train_ascend.sh $PRETRAINED_CKPT
  89. ```
  90. - Distributed Training:
  91. ```
  92. bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT
  93. ```
  94. - Evaluation:
  95. ```
  96. bash scripts/run_eval_ascend.sh $TRAINED_CKPT
  97. ```
  98. # [Script Description](#contents)
  99. ## [Script and Sample Code](#contents)
  100. The entire code structure is as following:
  101. ```
  102. |--- CNNCTC/
  103. |---README.md // descriptions about cnnctc
  104. |---train.py // train scripts
  105. |---eval.py // eval scripts
  106. |---scripts
  107. |---run_standalone_train_ascend.sh // shell script for standalone on ascend
  108. |---run_distribute_train_ascend.sh // shell script for distributed on ascend
  109. |---run_eval_ascend.sh // shell script for eval on ascend
  110. |---src
  111. |---__init__.py // init file
  112. |---cnn_ctc.py // cnn_ctc network
  113. |---config.py // total config
  114. |---callback.py // loss callback file
  115. |---dataset.py // process dataset
  116. |---util.py // routine operation
  117. |---generate_hccn_file.py // generate distribute json file
  118. |---preprocess_dataset.py // preprocess dataset
  119. ```
  120. ## [Script Parameters](#contents)
  121. Parameters for both training and evaluation can be set in `config.py`.
  122. Arguments:
  123. * `--CHARACTER`: Character labels.
  124. * `--NUM_CLASS`: The number of classes including all character labels and the <blank> label for CTCLoss.
  125. * `--HIDDEN_SIZE`: Model hidden size.
  126. * `--FINAL_FEATURE_WIDTH`: The number of features.
  127. * `--IMG_H`: The height of input image.
  128. * `--IMG_W`: The width of input image.
  129. * `--TRAIN_DATASET_PATH`: The path to training dataset.
  130. * `--TRAIN_DATASET_INDEX_PATH`: The path to training dataset index file which determines the order .
  131. * `--TRAIN_BATCH_SIZE`: Training batch size. The batch size and index file must ensure input data is in fixed shape.
  132. * `--TRAIN_DATASET_SIZE`: Training dataset size.
  133. * `--TEST_DATASET_PATH`: The path to test dataset.
  134. * `--TEST_BATCH_SIZE`: Test batch size.
  135. * `--TEST_DATASET_SIZE`:Test dataset size.
  136. * `--TRAIN_EPOCHS`:Total training epochs.
  137. * `--CKPT_PATH`:The path to model checkpoint file, can be used to resume training and evaluation.
  138. * `--SAVE_PATH`:The path to save model checkpoint file.
  139. * `--LR`:Learning rate for standalone training.
  140. * `--LR_PARA`:Learning rate for distributed training.
  141. * `--MOMENTUM`:Momentum.
  142. * `--LOSS_SCALE`:Loss scale to prevent gradient underflow.
  143. * `--SAVE_CKPT_PER_N_STEP`:Save model checkpoint file per N steps.
  144. * `--KEEP_CKPT_MAX_NUM`:The maximum number of saved model checkpoint file.
  145. ## [Training Process](#contents)
  146. ### Training
  147. - Standalone Training:
  148. ```
  149. bash scripts/run_standalone_train_ascend.sh $PRETRAINED_CKPT
  150. ```
  151. Results and checkpoints are written to `./train` folder. Log can be found in `./train/log` and loss values are recorded in `./train/loss.log`.
  152. `$PRETRAINED_CKPT` is the path to model checkpoint and it is **optional**. If none is given the model will be trained from scratch.
  153. - Distributed Training:
  154. ```
  155. bash scripts/run_distribute_train_ascend.sh $RANK_TABLE_FILE $PRETRAINED_CKPT
  156. ```
  157. Results and checkpoints are written to `./train_parallel_{i}` folder for device `i` respectively.
  158. Log can be found in `./train_parallel_{i}/log_{i}.log` and loss values are recorded in `./train_parallel_{i}/loss.log`.
  159. `$RANK_TABLE_FILE` is needed when you are running a distribute task on ascend.
  160. `$PATH_TO_CHECKPOINT` is the path to model checkpoint and it is **optional**. If none is given the model will be trained from scratch.
  161. ### Training Result
  162. Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in loss.log.
  163. ```
  164. # distribute training result(8p)
  165. epoch: 1 step: 1 , loss is 76.25, average time per step is 0.335177839748392712
  166. epoch: 1 step: 2 , loss is 73.46875, average time per step is 0.36798572540283203
  167. epoch: 1 step: 3 , loss is 69.46875, average time per step is 0.3429678678512573
  168. epoch: 1 step: 4 , loss is 64.3125, average time per step is 0.33512671788533527
  169. epoch: 1 step: 5 , loss is 58.375, average time per step is 0.33149147033691406
  170. epoch: 1 step: 6 , loss is 52.7265625, average time per step is 0.3292975425720215
  171. ...
  172. epoch: 1 step: 8689 , loss is 9.706798802612482, average time per step is 0.3184656601312549
  173. epoch: 1 step: 8690 , loss is 9.70612545289855, average time per step is 0.3184725407765116
  174. epoch: 1 step: 8691 , loss is 9.70695776049204, average time per step is 0.31847309686135555
  175. epoch: 1 step: 8692 , loss is 9.707279624277456, average time per step is 0.31847339290613375
  176. epoch: 1 step: 8693 , loss is 9.70763437950938, average time per step is 0.3184720295013031
  177. epoch: 1 step: 8694 , loss is 9.707695425072046, average time per step is 0.31847410284595573
  178. epoch: 1 step: 8695 , loss is 9.708408273381295, average time per step is 0.31847338271072345
  179. epoch: 1 step: 8696 , loss is 9.708703753591953, average time per step is 0.3184726025560777
  180. epoch: 1 step: 8697 , loss is 9.709536406025824, average time per step is 0.31847212061114694
  181. epoch: 1 step: 8698 , loss is 9.708542263610315, average time per step is 0.3184715309307257
  182. ```
  183. ## [Evaluation Process](#contents)
  184. ### Evaluation
  185. - Evaluation:
  186. ```
  187. bash scripts/run_eval_ascend.sh $TRAINED_CKPT
  188. ```
  189. The model will be evaluated on the IIIT dataset, sample results and overall accuracy will be printed.
  190. # [Model Description](#contents)
  191. ## [Performance](#contents)
  192. ### Training Performance
  193. | Parameters | FasterRcnn |
  194. | -------------------------- | ----------------------------------------------------------- |
  195. | Model Version | V1 |
  196. | Resource | Ascend 910 ;CPU 2.60GHz,192cores;Memory,755G |
  197. | uploaded Date | 09/28/2020 (month/day/year) |
  198. | MindSpore Version | 1.0.0 |
  199. | Dataset | MJSynth,SynthText |
  200. | Training Parameters | epoch=3, batch_size=192 |
  201. | Optimizer | RMSProp |
  202. | Loss Function | CTCLoss |
  203. | Speed | 1pc: 300 ms/step; 8pcs: 310 ms/step |
  204. | Total time | 1pc: 18 hours; 8pcs: 2.3 hours |
  205. | Parameters (M) | 177 |
  206. | Scripts | https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/cnnctc |
  207. ### Evaluation Performance
  208. | Parameters | FasterRcnn |
  209. | ------------------- | --------------------------- |
  210. | Model Version | V1 |
  211. | Resource | Ascend 910 |
  212. | Uploaded Date | 09/28/2020 (month/day/year) |
  213. | MindSpore Version | 1.0.0 |
  214. | Dataset | IIIT5K |
  215. | batch_size | 192 |
  216. | outputs | Accuracy |
  217. | Accuracy | 85% |
  218. | Model for inference | 675M (.ckpt file) |
  219. ## [How to use](#contents)
  220. ### Inference
  221. If you need to use the trained model to perform inference on multiple hardware platforms, such as GPU, Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/network_migration.html). Following the steps below, this is a simple example:
  222. - Running on Ascend
  223. ```
  224. # Set context
  225. context.set_context(mode=context.GRAPH_HOME, device_target=cfg.device_target)
  226. context.set_context(device_id=cfg.device_id)
  227. # Load unseen dataset for inference
  228. dataset = dataset.create_dataset(cfg.data_path, 1, False)
  229. # Define model
  230. net = CNNCTC(cfg.NUM_CLASS, cfg.HIDDEN_SIZE, cfg.FINAL_FEATURE_WIDTH)
  231. opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()), 0.01,
  232. cfg.momentum, weight_decay=cfg.weight_decay)
  233. loss = P.CTCLoss(preprocess_collapse_repeated=False,
  234. ctc_merge_repeated=True,
  235. ignore_longer_outputs_than_inputs=False)
  236. model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
  237. # Load pre-trained model
  238. param_dict = load_checkpoint(cfg.checkpoint_path)
  239. load_param_into_net(net, param_dict)
  240. net.set_train(False)
  241. # Make predictions on the unseen dataset
  242. acc = model.eval(dataset)
  243. print("accuracy: ", acc)
  244. ```
  245. ### Continue Training on the Pretrained Model
  246. - running on Ascend
  247. ```
  248. # Load dataset
  249. dataset = create_dataset(cfg.data_path, 1)
  250. batch_num = dataset.get_dataset_size()
  251. # Define model
  252. net = CNNCTC(cfg.NUM_CLASS, cfg.HIDDEN_SIZE, cfg.FINAL_FEATURE_WIDTH)
  253. # Continue training if set pre_trained to be True
  254. if cfg.pre_trained:
  255. param_dict = load_checkpoint(cfg.checkpoint_path)
  256. load_param_into_net(net, param_dict)
  257. lr = lr_steps(0, lr_max=cfg.lr_init, total_epochs=cfg.epoch_size,
  258. steps_per_epoch=batch_num)
  259. opt = Momentum(filter(lambda x: x.requires_grad, net.get_parameters()),
  260. Tensor(lr), cfg.momentum, weight_decay=cfg.weight_decay)
  261. loss = P.CTCLoss(preprocess_collapse_repeated=False,
  262. ctc_merge_repeated=True,
  263. ignore_longer_outputs_than_inputs=False)
  264. model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'},
  265. amp_level="O2", keep_batchnorm_fp32=False, loss_scale_manager=None)
  266. # Set callbacks
  267. config_ck = CheckpointConfig(save_checkpoint_steps=batch_num * 5,
  268. keep_checkpoint_max=cfg.keep_checkpoint_max)
  269. time_cb = TimeMonitor(data_size=batch_num)
  270. ckpoint_cb = ModelCheckpoint(prefix="train_googlenet_cifar10", directory="./",
  271. config=config_ck)
  272. loss_cb = LossMonitor()
  273. # Start training
  274. model.train(cfg.epoch_size, dataset, callbacks=[time_cb, ckpoint_cb, loss_cb])
  275. print("train success")
  276. ```
  277. # [ModelZoo Homepage](#contents)
  278. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).