You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 13 kB

4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306
  1. # Contents
  2. - [Description](#description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Features](#features)
  6. - [Mixed Precision](#mixed-precision)
  7. - [Environment Requirements](#environment-requirements)
  8. - [Quick Start](#quick-start)
  9. - [Dataset Preparation](#dataset-preparation)
  10. - [Running](#running)
  11. - [Script Description](#script-description)
  12. - [Script and Sample Code](#script-and-sample-code)
  13. - [Script Parameters](#script-parameters)
  14. - [Training Process](#training-process)
  15. - [Training](#training)
  16. - [Running on Ascend](#running-on-ascend)
  17. - [Distributed Training](#distributed-training)
  18. - [Running on Ascend](#running-on-ascend-1)
  19. - [Evaluation Process](#evaluation-process)
  20. - [Running on Ascend](#running-on-ascend-2)
  21. - [Model Description](#model-description)
  22. - [Performance](#performance)
  23. - [Accuracy](#accuracy)
  24. - [DPN92 (Training)](#dpn92-training)
  25. - [Efficiency](#efficiency)
  26. - [DPN92](#dpn92)
  27. - [Description of Random Situation](#description-of-random-situation)
  28. - [ModelZoo Homepage](#modelzoo-homepage)
  29. # [Description](#contents)
  30. Dual Path Network (DPN) is a convolution-based neural network for the task of image classification. It combines the advantage of both ResNeXt and DenseNet to get higher accuracy. More detail about this model can be found in:
  31. Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng. "Dual Path Networks" (NIPS17).
  32. This repository contains a Mindspore implementation of DPNs based upon cypw's original MXNet implementation (<https://github.com/cypw/DPNs>). The training and validating scripts are also included, and the validation results with cypw’s pretrained weights are shown in the Results section.
  33. # [Model Architecture](#contents)
  34. The overall network architecture of DPN is show below:
  35. [Link](https://arxiv.org/pdf/1707.01629.pdf)
  36. # [Dataset](#contents)
  37. All the models in this repository are trained and validated on ImageNet-1K. The models can achieve the [results](#model-description) with the configurations of the dataset preprocessing as follow:
  38. - For the training dataset:
  39. - Range (min, max) of the respective size of the original size to be cropped is (0.08, 1.0)
  40. - Range (min, max) of aspect ratio to be cropped is (0.75, 1.333)
  41. - The size of input images is reshaped to (width = 224, height = 224)
  42. - Probability of random horizontal flip is 50%
  43. - In normalization, the mean is (255\*0.485, 255\*0.456, 255\*0.406) and the standard deviation is (255\*0.229, 255\*0.224, 255\*0.225)
  44. - For the evaluation dataset:
  45. - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center)
  46. - In normalization, the mean is (255\*0.485, 255\*0.456, 255\*0.406) and the standard deviation is (255\*0.229, 255\*0.224, 255\*0.225)
  47. # [Features](#contents)
  48. ## [Mixed Precision](#contents)
  49. The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
  50. # [Environment Requirements](#contents)
  51. To run the python scripts in the repository, you need to prepare the environment as follow:
  52. - Hardware
  53. - Prepare hardware environment with Ascend or GPU processor.
  54. - Python and dependencies
  55. - Python3.7
  56. - Mindspore 1.1.0
  57. - For more information, please check the resources below:
  58. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  59. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  60. # [Quick Start](#contents)
  61. ## [Dataset Preparation](#contents)
  62. The DPN models use ImageNet-1K dataset to train and validate in this repository. Download the dataset from [ImageNet.org](http://image-net.org/download). You can place them anywhere and tell the scripts where they are when running.
  63. ## [Running](#contents)
  64. To train the DPNs, run the shell script `scripts/train_standalone.sh` with the format below:
  65. ```shell
  66. sh scripts/train_standalone.sh [device_id] [dataset_dir] [ckpt_path_to_save] [eval_each_epoch] [pretrained_ckpt(optional)]
  67. ```
  68. To validate the DPNs, run the shell script `scripts/eval.sh` with the format below:
  69. ```shell
  70. sh scripts/eval.sh [device_id] [dataset_dir] [pretrained_ckpt]
  71. ```
  72. # [Script Description](#contents)
  73. ## [Script and Sample Code](#contents)
  74. The structure of the files in this repository is shown below.
  75. ```text
  76. └─ mindspore-dpns
  77. ├─ scripts
  78. │ ├─ eval.sh // launch ascend standalone evaluation
  79. │ ├─ train_distributed.sh // launch ascend distributed training
  80. │ └─ train_standalone.sh // launch ascend standalone training
  81. ├─ src
  82. │ ├─ config.py // network and running config
  83. │ ├─ crossentropy.py // loss function
  84. │ ├─ dpn.py // dpns implementation
  85. │ ├─ imagenet_dataset.py // dataset processor and provider
  86. │ └─ lr_scheduler.py // dpn learning rate scheduler
  87. ├─ eval.py // evaluation script
  88. ├─ train.py // training script
  89. ├─ export.py // export model
  90. └─ README.md // descriptions about this repository
  91. ```
  92. ## [Script Parameters](#contents)
  93. Parameters for both training and evaluation can be set in `src/config.py`
  94. - Configurations for DPN92 with ImageNet-1K dataset
  95. ```python
  96. # model config
  97. config.image_size = (224,224) # inpute image size
  98. config.num_classes = 1000 # dataset class number
  99. config.backbone = 'dpn92' # backbone network
  100. config.is_save_on_master = True
  101. # parallel config
  102. config.num_parallel_workers = 4 # number of workers to read the data
  103. config.rank = 0 # local rank of distributed
  104. config.group_size = 1 # group size of distributed
  105. # training config
  106. config.batch_size = 32 # batch_size
  107. config.global_step = 0 # start step of learning rate
  108. config.epoch_size = 180 # epoch_size
  109. config.loss_scale_num = 1024 # loss scale
  110. # optimizer config
  111. config.momentum = 0.9 # momentum (SGD)
  112. config.weight_decay = 1e-4 # weight_decay (SGD)
  113. # learning rate config
  114. config.lr_schedule = 'warmup' # learning rate schedule
  115. config.lr_init = 0.01 # init learning rate
  116. config.lr_max = 0.1 # max learning rate
  117. config.factor = 0.1 # factor of lr to drop
  118. config.epoch_number_to_drop = [5,15] # learing rate will drop after these epochs
  119. config.warmup_epochs = 5 # warmup epochs in learning rate schedule
  120. # dataset config
  121. config.dataset = "imagenet-1K" # dataset
  122. config.label_smooth = False # label_smooth
  123. config.label_smooth_factor = 0.0 # label_smooth_factor
  124. # parameter save config
  125. config.keep_checkpoint_max = 3 # only keep the last keep_checkpoint_max checkpoint
  126. ```
  127. ## [Training Process](#contents)
  128. ### [Training](#contents)
  129. #### Running on Ascend
  130. Run `scripts/train_standalone.sh` to train the model standalone. The usage of the script is:
  131. ```shell
  132. sh scripts/train_standalone.sh [device_id] [dataset_dir] [ckpt_path_to_save] [eval_each_epoch] [pretrained_ckpt(optional)]
  133. ```
  134. For example, you can run the shell command below to launch the training procedure.
  135. ```shell
  136. sh scripts/train_standalone.sh 0 /data/dataset/imagenet/ scripts/pretrian/ 0
  137. ```
  138. If eval_each_epoch is 1, it will evaluate after each epoch and save the parameters with the max accuracy. But in this case, the time of one epoch will be longer.
  139. If eval_each_epoch is 0, it will save parameters every some epochs instead of evaluating in the training process.
  140. The script will run training in the background, you can view the results through the file `train_log.txt` as follows (eval_each_epoch = 0):
  141. ```text
  142. epoch: 1 step: 40036, loss is 3.6232593
  143. epoch time: 10048893.336 ms, per step time: 250.996 ms
  144. epoch: 2 step: 40036, loss is 3.200775
  145. epoch time: 9306154.456 ms, per step time: 232.445 ms
  146. ...
  147. ```
  148. or as follows (eval_each_epoch = 1):
  149. ```text
  150. epoch: 1 step: 40036, loss is 3.6232593
  151. epoch time: 10048893.336 ms, per step time: 250.996 ms
  152. Save the maximum accuracy checkpoint,the accuracy is 0.2629158669225848
  153. ...
  154. ```
  155. The model checkpoint will be saved into `[ckpt_path_to_save]`.
  156. ### [Distributed Training](#contents)
  157. #### Running on Ascend
  158. Run `scripts/train_distributed.sh` to train the model distributed. The usage of the script is:
  159. ```text
  160. sh scripts/train_distributed.sh [rank_table] [dataset_dir] [ckpt_path_to_save] [rank_size] [eval_each_epoch] [pretrained_ckpt(optional)]
  161. ```
  162. For example, you can run the shell command below to launch the training procedure.
  163. ```shell
  164. sh scripts/train_distributed.sh /home/rank_table.json /data/dataset/imagenet/ ../scripts 8 0 ../pretrain/dpn92.ckpt
  165. ```
  166. The above shell script will run distribute training in the background. You can view the results through the file `train_parallel[X]/log.txt` as follows:
  167. ```text
  168. epoch: 1 step 5004, loss is 4.5680037
  169. epoch time: 2312519.441 ms, per step time: 462.134 ms
  170. epoch: 2 step 5004, loss is 2.964888
  171. Epoch time: 1350398.913 ms, per step time: 369.864 ms
  172. ...
  173. ```
  174. The model checkpoint will be saved into `[ckpt_path_to_save]`.
  175. ## [Evaluation Process](#contents)
  176. ### [Running on Ascend](#contents)
  177. Run `scripts/eval.sh` to evaluate the model with one Ascend processor. The usage of the script is:
  178. ```text
  179. sh scripts/eval.sh [device_id] [dataset_dir] [pretrained_ckpt]
  180. ```
  181. For example, you can run the shell command below to launch the validation procedure.
  182. ```text
  183. sh scripts/eval.sh 0 /data/dataset/imagenet/ pretrain/dpn-180_5004.ckpt
  184. ```
  185. The above shell script will run evaluation in the background. You can view the results through the file `eval_log.txt`. The result will be achieved as follows:
  186. ```text
  187. Evaluation result: {'top_5_accuracy': 0.9449223751600512, 'top_1_accuracy': 0.7911731754161332}.
  188. DPN evaluate success!
  189. ```
  190. # [Model Description](#contents)
  191. ## [Performance](#contents)
  192. The evaluation of model performance is divided into two parts: accuracy and efficiency. The part of accuracy shows the accuracy of the model in classifying images on ImageNet-1K dataset, and it can be evaluated by top-k measure. The part of efficiency reveals the time cost by model training on ImageNet-1K.
  193. All results are validated at image size of 224x224. The dataset preprocessing and training configurations are shown in [Dataset](#dataset) section.
  194. ### [Accuracy](#contents)
  195. #### DPN92 (Training)
  196. | Parameters | Ascend |
  197. | ----------------- | --------------------------- |
  198. | Model Version | DPN92 (Train) |
  199. | Resource | Ascend 910; OS Euler2.8 |
  200. | Uploaded Date | 12/20/2020 (month/day/year) |
  201. | MindSpore Version | 1.1.0 |
  202. | Dataset | ImageNet-1K |
  203. | epochs | 180 |
  204. | outputs | probability |
  205. | train performance | Top1:78.91%; Top5:94.53% |
  206. ### [Efficiency](#contents)
  207. #### DPN92
  208. | Parameters | Ascend |
  209. | ----------------- | --------------------------------- |
  210. | Model Version | DPN92 |
  211. | Resource | Ascend 910; OS Euler2.8 |
  212. | Uploaded Date | 12/20/2020 (month/day/year) |
  213. | MindSpore Version | 1.1.0 |
  214. | Dataset | ImageNet-1K |
  215. | batch_size | 32 |
  216. | outputs | probability |
  217. | speed | 1pc:233 ms/step;8pc:240 ms/step |
  218. # [Description of Random Situation](#contents)
  219. In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
  220. # [ModelZoo Homepage](#contents)
  221. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).