You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 12 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272
  1. # Contents
  2. - [DenseNet121 Description](#densenet121-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Features](#features)
  6. - [Mixed Precision](#mixed-precision)
  7. - [Environment Requirements](#environment-requirements)
  8. - [Quick Start](#quick-start)
  9. - [Script Description](#script-description)
  10. - [Script and Sample Code](#script-and-sample-code)
  11. - [Script Parameters](#script-parameters)
  12. - [Training Process](#training-process)
  13. - [Training](#training)
  14. - [Distributed Training](#distributed-training)
  15. - [Evaluation Process](#evaluation-process)
  16. - [Evaluation](#evaluation)
  17. - [Model Description](#model-description)
  18. - [Performance](#performance)
  19. - [Training accuracy results](#training-accuracy-results)
  20. - [Training performance results](#training-performance-results)
  21. - [Description of Random Situation](#description-of-random-situation)
  22. - [ModelZoo Homepage](#modelzoo-homepage)
  23. # [DenseNet121 Description](#contents)
  24. DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet121 is a implementation on [MindSpore](https://www.mindspore.cn/).
  25. The repository also contains scripts to launch training and inference routines.
  26. # [Model Architecture](#contents)
  27. DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.
  28. # [Dataset](#contents)
  29. Dataset used: ImageNet
  30. The default configuration of the Dataset are as follows:
  31. - Training Dataset preprocess:
  32. - Input size of images is 224\*224
  33. - Range (min, max) of respective size of the original size to be cropped is (0.08, 1.0)
  34. - Range (min, max) of aspect ratio to be cropped is (0.75, 1.333)
  35. - Probability of the image being flipped set to 0.5
  36. - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4)
  37. - Normalize the input image with respect to mean and standard deviation
  38. - Test Dataset preprocess:
  39. - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center)
  40. - Normalize the input image with respect to mean and standard deviation
  41. # [Features](#contents)
  42. ## Mixed Precision
  43. The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
  44. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
  45. # [Environment Requirements](#contents)
  46. - Hardware(Ascend)
  47. - Prepare hardware environment with Ascend AI processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
  48. - Framework
  49. - [MindSpore](https://www.mindspore.cn/install/en)
  50. - For more information, please check the resources below:
  51. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  52. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  53. # [Quick Start](#contents)
  54. After installing MindSpore via the official website, you can start training and evaluation as follows:
  55. ```python
  56. # run training example
  57. python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
  58. # run distributed training example
  59. sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
  60. # run evaluation example
  61. python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 &
  62. OR
  63. sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
  64. ```
  65. For distributed training, a hccl configuration file with JSON format needs to be created in advance.
  66. Please follow the instructions in the link below:
  67. https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
  68. # [Script Description](#contents)
  69. ## [Script and Sample Code](#contents)
  70. ```
  71. ├── model_zoo
  72. ├── README.md // descriptions about all the models
  73. ├── densenet121
  74. ├── README.md // descriptions about densenet121
  75. ├── scripts
  76. │ ├── run_distribute_train.sh // shell script for distributed on Ascend
  77. │ ├── run_distribute_eval.sh // shell script for evaluation on Ascend
  78. ├── src
  79. │ ├── datasets // dataset processing function
  80. │ ├── losses
  81. │ ├──crossentropy.py // densenet loss function
  82. │ ├── lr_scheduler
  83. │ ├──lr_scheduler.py // densenet learning rate schedule function
  84. │ ├── network
  85. │ ├──densenet.py // densenet architecture
  86. │ ├──optimizers // densenet optimize function
  87. │ ├──utils
  88. │ ├──logging.py // logging function
  89. │ ├──var_init.py // densenet variable init function
  90. │ ├── config.py // network config
  91. ├── train.py // training script
  92. ├── eval.py // evaluation script
  93. ```
  94. ## [Script Parameters](#contents)
  95. You can modify the training behaviour through the various flags in the `train.py` script. Flags in the `train.py` script are as follows:
  96. ```
  97. --data_dir train data dir
  98. --num_classes num of classes in dataset(default:1000)
  99. --image_size image size of the dataset
  100. --per_batch_size mini-batch size (default: 256) per gpu
  101. --pretrained path of pretrained model
  102. --lr_scheduler type of LR schedule: exponential, cosine_annealing
  103. --lr initial learning rate
  104. --lr_epochs epoch milestone of lr changing
  105. --lr_gamma decrease lr by a factor of exponential lr_scheduler
  106. --eta_min eta_min in cosine_annealing scheduler
  107. --T_max T_max in cosine_annealing scheduler
  108. --max_epoch max epoch num to train the model
  109. --warmup_epochs warmup epoch(when batchsize is large)
  110. --weight_decay weight decay (default: 1e-4)
  111. --momentum momentum(default: 0.9)
  112. --label_smooth whether to use label smooth in CE
  113. --label_smooth_factor smooth strength of original one-hot
  114. --log_interval logging interval(dafault:100)
  115. --ckpt_path path to save checkpoint
  116. --ckpt_interval the interval to save checkpoint
  117. --is_save_on_master save checkpoint on master or all rank
  118. --is_distributed if multi device(default: 1)
  119. --rank local rank of distributed(default: 0)
  120. --group_size world size of distributed(default: 1)
  121. ```
  122. ## [Training Process](#contents)
  123. ### Training
  124. - running on Ascend
  125. ```
  126. python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
  127. ```
  128. The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
  129. ```
  130. 2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
  131. 2020-08-22 16:58:56,619:INFO:local passed
  132. 2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
  133. 2020-08-22 17:02:19,921:INFO:local passed
  134. 2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
  135. 2020-08-22 17:05:43,113:INFO:local passed
  136. ...
  137. ```
  138. ### Distributed Training
  139. - running on Ascend
  140. ```
  141. sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
  142. ```
  143. The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
  144. ```
  145. 2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
  146. 2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
  147. 2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
  148. 2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
  149. 2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
  150. 2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
  151. ...
  152. ...
  153. ```
  154. ## [Evaluation Process](#contents)
  155. ### Evaluation
  156. - evaluation on Ascend
  157. running the command below for evaluation.
  158. ```
  159. python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 &
  160. OR
  161. sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
  162. ```
  163. The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows:
  164. ```
  165. 2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
  166. 2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
  167. ```
  168. # [Model Description](#contents)
  169. ## [Performance](#contents)
  170. ### Training accuracy results
  171. | Parameters | Densenet |
  172. | ------------------- | --------------------------- |
  173. | Model Version | Inception V1 |
  174. | Resource | Ascend 910 |
  175. | Uploaded Date | 09/15/2020 (month/day/year) |
  176. | MindSpore Version | 1.0.0 |
  177. | Dataset | ImageNet |
  178. | epochs | 120 |
  179. | outputs | probability |
  180. | train performance | Top1:75.13%; Top5:92.57% |
  181. ### Training performance results
  182. | Parameters | Densenet |
  183. | ------------------- | --------------------------- |
  184. | Model Version | Inception V1 |
  185. | Resource | Ascend 910 |
  186. | Uploaded Date | 09/15/2020 (month/day/year) |
  187. | MindSpore Version | 1.0.0 |
  188. | Dataset | ImageNet |
  189. | batch_size | 32 |
  190. | outputs | probability |
  191. | speed | 1pc:760 img/s;8pc:6000 img/s|
  192. # [Description of Random Situation](#contents)
  193. In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
  194. # [ModelZoo Homepage](#contents)
  195. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).