You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 11 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284
  1. # DeepLabV3 for MindSpore
  2. DeepLab is a series of image semantic segmentation models, DeepLabV3 improves significantly over previous versions. Two keypoints of DeepLabV3:Its multi-grid atrous convolution makes it better to deal with segmenting objects at multiple scales, and augmented ASPP makes image-level features available to capture long range information.
  3. This repository provides a script and recipe to DeepLabV3 model and achieve state-of-the-art performance.
  4. ## Table Of Contents
  5. * [Model overview](#model-overview)
  6. * [Model Architecture](#model-architecture)
  7. * [Default configuration](#default-configuration)
  8. * [Setup](#setup)
  9. * [Requirements](#requirements)
  10. * [Quick start guide](#quick-start-guide)
  11. * [Performance](#performance)
  12. * [Results](#results)
  13. * [Training accuracy](#training-accuracy)
  14. * [Training performance](#training-performance)
  15. * [One-hour performance](#one-hour-performance)
  16. ## Model overview
  17. Refer to [this paper][1] for network details.
  18. `Chen L C, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv:1706.05587, 2017.`
  19. [1]: https://arxiv.org/abs/1706.05587
  20. ## Default Configuration
  21. - network structure
  22. Resnet101 as backbone, atrous convolution for dense feature extraction.
  23. - preprocessing on training data:
  24. crop size: 513 * 513
  25. random scale: scale range 0.5 to 2.0
  26. random flip
  27. mean subtraction: means are [103.53, 116.28, 123.675]
  28. - preprocessing on validation data:
  29. The image's long side is resized to 513, then the image is padded to 513 * 513
  30. - training parameters:
  31. - Momentum: 0.9
  32. - LR scheduler: cosine
  33. - Weight decay: 0.0001
  34. ## Setup
  35. The following section lists the requirements to start training the deeplabv3 model.
  36. ### Requirements
  37. Before running code of this project,please ensure you have the following environments:
  38. - [MindSpore](https://www.mindspore.cn/)
  39. - Hardware environment with the Ascend AI processor
  40. For more information about how to get started with MindSpore, see the following sections:
  41. - [MindSpore's Tutorial](https://www.mindspore.cn/tutorial/zh-CN/master/index.html)
  42. - [MindSpore's Api](https://www.mindspore.cn/api/zh-CN/master/index.html)
  43. ## Quick Start Guide
  44. ### 1. Clone the respository
  45. ```
  46. git clone xxx
  47. cd ModelZoo_DeepLabV3_MS_MTI/00-access
  48. ```
  49. ### 2. Install python packages in requirements.txt
  50. ### 3. Download and preprocess the dataset
  51. - Download segmentation dataset.
  52. - Prepare the training data list file. The list file saves the relative path to image and annotation pairs. Lines are like:
  53. ```
  54. JPEGImages/00001.jpg SegmentationClassGray/00001.png
  55. JPEGImages/00002.jpg SegmentationClassGray/00002.png
  56. JPEGImages/00003.jpg SegmentationClassGray/00003.png
  57. JPEGImages/00004.jpg SegmentationClassGray/00004.png
  58. ......
  59. ```
  60. - Configure and run build_data.sh to convert dataset to mindrecords. Arguments in build_data.sh:
  61. ```
  62. --data_root root path of training data
  63. --data_lst list of training data(prepared above)
  64. --dst_path where mindrecords are saved
  65. --num_shards number of shards of the mindrecords
  66. --shuffle shuffle or not
  67. ```
  68. ### 4. Generate config json file for 8-cards training
  69. ```
  70. # From the root of this projectcd tools
  71. python get_multicards_json.py 10.111.*.*
  72. # 10.111.*.* is the computer's ip address.
  73. ```
  74. ### 5. Train
  75. Based on original DeeplabV3 paper, we reproduce two training experiments on vocaug (also as trainaug) dataset and evaluate on voc val dataset.
  76. For single device training, please config parameters, training script is as follows:
  77. ```
  78. # run_standalone_train.sh
  79. python ${train_code_path}/train.py --data_file=/PATH/TO/MINDRECORD_NAME \
  80. --train_dir=${train_path}/ckpt \
  81. --train_epochs=200 \
  82. --batch_size=32 \
  83. --crop_size=513 \
  84. --base_lr=0.015 \
  85. --lr_type=cos \
  86. --min_scale=0.5 \
  87. --max_scale=2.0 \
  88. --ignore_label=255 \
  89. --num_classes=21 \
  90. --model=deeplab_v3_s16 \
  91. --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
  92. --save_steps=1500 \
  93. --keep_checkpoint_max=200 >log 2>&1 &
  94. ```
  95. For 8 devices training, training steps are as follows:
  96. 1. Train s16 with vocaug dataset, finetuning from resnet101 pretrained model, script is as follows:
  97. ```
  98. # run_distribute_train_s16_r1.sh
  99. for((i=0;i<=$RANK_SIZE-1;i++));
  100. do
  101. export RANK_ID=$i
  102. export DEVICE_ID=`expr $i + $RANK_START_ID`
  103. echo 'start rank='$i', device id='$DEVICE_ID'...'
  104. mkdir ${train_path}/device$DEVICE_ID
  105. cd ${train_path}/device$DEVICE_ID
  106. python ${train_code_path}/train.py --train_dir=${train_path}/ckpt \
  107. --data_file=/PATH/TO/MINDRECORD_NAME \
  108. --train_epochs=300 \
  109. --batch_size=32 \
  110. --crop_size=513 \
  111. --base_lr=0.08 \
  112. --lr_type=cos \
  113. --min_scale=0.5 \
  114. --max_scale=2.0 \
  115. --ignore_label=255 \
  116. --num_classes=21 \
  117. --model=deeplab_v3_s16 \
  118. --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
  119. --is_distributed \
  120. --save_steps=410 \
  121. --keep_checkpoint_max=200 >log 2>&1 &
  122. done
  123. ```
  124. 2. Train s8 with vocaug dataset, finetuning from model in previous step, training script is as follows:
  125. ```
  126. # run_distribute_train_s8_r1.sh
  127. for((i=0;i<=$RANK_SIZE-1;i++));
  128. do
  129. export RANK_ID=$i
  130. export DEVICE_ID=`expr $i + $RANK_START_ID`
  131. echo 'start rank='$i', device id='$DEVICE_ID'...'
  132. mkdir ${train_path}/device$DEVICE_ID
  133. cd ${train_path}/device$DEVICE_ID
  134. python ${train_code_path}/train.py --train_dir=${train_path}/ckpt \
  135. --data_file=/PATH/TO/MINDRECORD_NAME \
  136. --train_epochs=800 \
  137. --batch_size=16 \
  138. --crop_size=513 \
  139. --base_lr=0.02 \
  140. --lr_type=cos \
  141. --min_scale=0.5 \
  142. --max_scale=2.0 \
  143. --ignore_label=255 \
  144. --num_classes=21 \
  145. --model=deeplab_v3_s8 \
  146. --loss_scale=2048 \
  147. --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
  148. --is_distributed \
  149. --save_steps=820 \
  150. --keep_checkpoint_max=200 >log 2>&1 &
  151. done
  152. ```
  153. 3. Train s8 with voctrain dataset, finetuning from model in pervious step, training script is as follows:
  154. ```
  155. # run_distribute_train_r2.sh
  156. for((i=0;i<=$RANK_SIZE-1;i++));
  157. do
  158. export RANK_ID=$i
  159. export DEVICE_ID=`expr $i + $RANK_START_ID`
  160. echo 'start rank='$i', device id='$DEVICE_ID'...'
  161. mkdir ${train_path}/device$DEVICE_ID
  162. cd ${train_path}/device$DEVICE_ID
  163. python ${train_code_path}/train.py --train_dir=${train_path}/ckpt \
  164. --data_file=/PATH/TO/MINDRECORD_NAME \
  165. --train_epochs=300 \
  166. --batch_size=16 \
  167. --crop_size=513 \
  168. --base_lr=0.008 \
  169. --lr_type=cos \
  170. --min_scale=0.5 \
  171. --max_scale=2.0 \
  172. --ignore_label=255 \
  173. --num_classes=21 \
  174. --model=deeplab_v3_s8 \
  175. --loss_scale=2048 \
  176. --ckpt_pre_trained=/PATH/TO/PRETRAIN_MODEL \
  177. --is_distributed \
  178. --save_steps=110 \
  179. --keep_checkpoint_max=200 >log 2>&1 &
  180. done
  181. ```
  182. ### 6. Test
  183. Config checkpoint with --ckpt_path, run script, mIOU with print in eval_path/eval_log.
  184. ```
  185. ./run_eval_s16.sh # test s16
  186. ./run_eval_s8.sh # test s8
  187. ./run_eval_s8_multiscale.sh # test s8 + multiscale
  188. ./run_eval_s8_multiscale_flip.sh # test s8 + multiscale + flip
  189. ```
  190. Example of test script is as follows:
  191. ```
  192. python ${train_code_path}/eval.py --data_root=/PATH/TO/DATA \
  193. --data_lst=/PATH/TO/DATA_lst.txt \
  194. --batch_size=16 \
  195. --crop_size=513 \
  196. --ignore_label=255 \
  197. --num_classes=21 \
  198. --model=deeplab_v3_s8 \
  199. --scales=0.5 \
  200. --scales=0.75 \
  201. --scales=1.0 \
  202. --scales=1.25 \
  203. --scales=1.75 \
  204. --flip \
  205. --freeze_bn \
  206. --ckpt_path=/PATH/TO/PRETRAIN_MODEL >${eval_path}/eval_log 2>&1 &
  207. ```
  208. ## Performance
  209. ### Result
  210. Our result were obtained by running the applicable training script. To achieve the same results, follow the steps in the Quick Start Guide.
  211. #### Training accuracy
  212. | **Network** | OS=16 | OS=8 | MS | Flip | mIOU | mIOU in paper |
  213. | :----------: | :-----: | :----: | :----: | :-----: | :-----: | :-------------: |
  214. | deeplab_v3 | √ | | | | 77.37 | 77.21 |
  215. | deeplab_v3 | | √ | | | 78.84 | 78.51 |
  216. | deeplab_v3 | | √ | √ | | 79.70 |79.45 |
  217. | deeplab_v3 | | √ | √ | √ | 79.89 | 79.77 |
  218. #### Training performance
  219. | **NPUs** | train performance |
  220. | :------: | :---------------: |
  221. | 1 | 26 img/s |
  222. | 8 | 131 img/s |