You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 11 kB

4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262
  1. # Contents
  2. - [Contents](#contents)
  3. - [Unet Description](#unet-description)
  4. - [Model Architecture](#model-architecture)
  5. - [Dataset](#dataset)
  6. - [Environment Requirements](#environment-requirements)
  7. - [Quick Start](#quick-start)
  8. - [Script Description](#script-description)
  9. - [Script and Sample Code](#script-and-sample-code)
  10. - [Script Parameters](#script-parameters)
  11. - [Training Process](#training-process)
  12. - [Training](#training)
  13. - [running on Ascend](#running-on-ascend)
  14. - [Distributed Training](#distributed-training)
  15. - [Evaluation Process](#evaluation-process)
  16. - [Evaluation](#evaluation)
  17. - [Model Description](#model-description)
  18. - [Performance](#performance)
  19. - [Evaluation Performance](#evaluation-performance)
  20. - [Inference Performance](#inference-performance)
  21. - [Description of Random Situation](#description-of-random-situation)
  22. - [ModelZoo Homepage](#modelzoo-homepage)
  23. ## [Unet Description](#contents)
  24. Unet3D model is widely used for 3D medical image segmentation. The construct of Unet3D network is similar to the Unet, the main difference is that Unet3D use 3D operations like Conv3D while Unet is anentirely 2D architecture. To know more information about Unet3D network, you can read the original paper Unet3D: Learning Dense VolumetricSegmentation from Sparse Annotation.
  25. ## [Model Architecture](#contents)
  26. Unet3D model is created based on the previous Unet(2D), which includes an encoder part and a decoder part. The encoder part is used to analyze the whole picture and extract and analyze features, while the decoder part is to generate a segmented block image. In this model, we also add residual block in the base block to improve the network.
  27. ## [Dataset](#contents)
  28. Dataset used: [LUNA16](https://luna16.grand-challenge.org/)
  29. - Description: The data is to automatically detect the location of nodules from volumetric CT images. 888 CT scans from LIDC-IDRI database are provided. The complete dataset is divided into 10 subsets that should be used for the 10-fold cross-validation. All subsets are available as compressed zip files.
  30. - Dataset size:888
  31. - Train:878 images
  32. - Test:10 images
  33. - Data format:zip
  34. - Note:Data will be processed in convert_nifti.py
  35. ## [Environment Requirements](#contents)
  36. - Hardware(Ascend)
  37. - Prepare hardware environment with Ascend processor.
  38. - Framework
  39. - [MindSpore](https://www.mindspore.cn/install/en)
  40. - For more information, please check the resources below:
  41. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  42. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  43. ## [Quick Start](#contents)
  44. After installing MindSpore via the official website, you can start training and evaluation as follows:
  45. - Select the network and dataset to use
  46. ```shell
  47. Convert dataset into mifti format.
  48. python ./src/convert_nifti.py --input_path=/path/to/input_image/ --output_path=/path/to/output_image/
  49. ```
  50. Refer to `src/config.py`. We support some parameter configurations for quick start.
  51. - Run on Ascend
  52. ```python
  53. # run training example
  54. python train.py --data_url=/path/to/data/ --seg_url=/path/to/segment/ > train.log 2>&1 &
  55. # run distributed training example
  56. bash scripts/run_distribute_train.sh [RANK_TABLE_FILE] [IMAGE_PATH] [SEG_PATH]
  57. # run evaluation example
  58. python eval.py --data_url=/path/to/data/ --seg_url=/path/to/segment/ --ckpt_path=/path/to/checkpoint/ > eval.log 2>&1 &
  59. ```
  60. ## [Script Description](#contents)
  61. ### [Script and Sample Code](#contents)
  62. ```text
  63. .
  64. └─unet3d
  65. ├── README.md // descriptions about Unet3D
  66. ├── scripts
  67. │ ├──run_disribute_train.sh // shell script for distributed on Ascend
  68. │ ├──run_standalone_train.sh // shell script for standalone on Ascend
  69. │ ├──run_standalone_eval.sh // shell script for evaluation on Ascend
  70. ├── src
  71. │ ├──config.py // parameter configuration
  72. │ ├──dataset.py // creating dataset
  73. │   ├──lr_schedule.py // learning rate scheduler
  74. │ ├──transform.py // handle dataset
  75. │ ├──convert_nifti.py // convert dataset
  76. │ ├──loss.py // loss
  77. │ ├──utils.py // General components (callback function)
  78. │ ├──unet3d_model.py // Unet3D model
  79. │ ├──unet3d_parts.py // Unet3D part
  80. ├── train.py // training script
  81. ├── eval.py // evaluation script
  82. ```
  83. ### [Script Parameters](#contents)
  84. Parameters for both training and evaluation can be set in config.py
  85. - config for Unet3d, luna16 dataset
  86. ```python
  87. 'model': 'Unet3d', # model name
  88. 'lr': 0.0005, # learning rate
  89. 'epochs': 10, # total training epochs when run 1p
  90. 'batchsize': 1, # training batch size
  91. "warmup_step": 120, # warmp up step in lr generator
  92. "warmup_ratio": 0.3, # warpm up ratio
  93. 'num_classes': 4, # the number of classes in the dataset
  94. 'in_channels': 1, # the number of channels
  95. 'keep_checkpoint_max': 5, # only keep the last keep_checkpoint_max checkpoint
  96. 'loss_scale': 256.0, # loss scale
  97. 'roi_size': [224, 224, 96], # random roi size
  98. 'overlap': 0.25, # overlap rate
  99. 'min_val': -500, # intersity original range min
  100. 'max_val': 1000, # intersity original range max
  101. 'upper_limit': 5 # upper limit of num_classes
  102. 'lower_limit': 3 # lower limit of num_classes
  103. ```
  104. ## [Training Process](#contents)
  105. ### Training
  106. #### running on Ascend
  107. ```shell
  108. python train.py --data_url=/path/to/data/ -seg_url=/path/to/segment/ > train.log 2>&1 &
  109. ```
  110. The python command above will run in the background, you can view the results through the file `train.log`.
  111. After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
  112. ```shell
  113. epoch: 1 step: 878, loss is 0.55011123
  114. epoch time: 1443410.353 ms, per step time: 1688.199 ms
  115. epoch: 2 step: 878, loss is 0.58278626
  116. epoch time: 1172136.839 ms, per step time: 1370.920 ms
  117. epoch: 3 step: 878, loss is 0.43625978
  118. epoch time: 1135890.834 ms, per step time: 1328.537 ms
  119. epoch: 4 step: 878, loss is 0.06556784
  120. epoch time: 1180467.795 ms, per step time: 1380.664 ms
  121. ```
  122. #### Distributed Training
  123. > Notes:
  124. > RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/distributed_training_ascend.html) , and the device_ip can be got as [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). For large models like InceptionV4, it's better to export an external environment variable `export HCCL_CONNECT_TIMEOUT=600` to extend hccl connection checking time from the default 120 seconds to 600 seconds. Otherwise, the connection could be timeout since compiling time increases with the growth of model size.
  125. >
  126. ```shell
  127. bash scripts/run_distribute_train.sh [RANK_TABLE_FILE] [IMAGE_PATH] [SEG_PATH]
  128. ```
  129. The above shell script will run distribute training in the background. You can view the results through the file `/train_parallel[X]/log.txt`. The loss value will be achieved as follows:
  130. ```shell
  131. epoch: 1 step: 110, loss is 0.8294426
  132. epoch time: 468891.643 ms, per step time: 4382.165 ms
  133. epoch: 2 step: 110, loss is 0.58278626
  134. epoch time: 165469.201 ms, per step time: 1546.441 ms
  135. epoch: 3 step: 110, loss is 0.43625978
  136. epoch time: 158915.771 ms, per step time: 1485.194 ms
  137. ...
  138. epoch: 9 step: 110, loss is 0.016280059
  139. epoch time: 172815.179 ms, per step time: 1615.095 ms
  140. epoch: 10 step: 110, loss is 0.020185348
  141. epoch time: 140476.520 ms, per step time: 1312.865 ms
  142. ```
  143. ## [Evaluation Process](#contents)
  144. ### Evaluation
  145. - evaluation on dataset when running on Ascend
  146. Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/unet3d/Unet3d-10_110.ckpt".
  147. ```shell
  148. python eval.py --data_url=/path/to/data/ --seg_url=/path/to/segment/ --ckpt_path=/path/to/checkpoint/ > eval.log 2>&1 &
  149. ```
  150. The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
  151. ```shell
  152. # grep "eval average dice is:" eval.log
  153. eval average dice is 0.9502010010453671
  154. ```
  155. ## [Model Description](#contents)
  156. ### [Performance](#contents)
  157. #### Evaluation Performance
  158. | Parameters | Ascend |
  159. | ------------------- | --------------------------------------------------------- |
  160. | Model Version | Unet3D |
  161. | Resource | Ascend 910; CPU 2.60GHz,192cores;Memory,755G |
  162. | uploaded Date | 03/18/2021 (month/day/year) |
  163. | MindSpore Version | 1.2.0 |
  164. | Dataset | LUNA16 |
  165. | Training Parameters | epoch = 10, batch_size = 1 |
  166. | Optimizer | Adam |
  167. | Loss Function | SoftmaxCrossEntropyWithLogits |
  168. | Speed | 8pcs: 1795ms/step |
  169. | Total time | 8pcs: 0.62hours |
  170. | Parameters (M) | 34 |
  171. | Scripts | [unet3d script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/unet3d) |
  172. #### Inference Performance
  173. | Parameters | Ascend |
  174. | ------------------- | --------------------------- |
  175. | Model Version | Unet3D |
  176. | Resource | Ascend 910 |
  177. | Uploaded Date | 03/18/2021 (month/day/year) |
  178. | MindSpore Version | 1.2.0 |
  179. | Dataset | LUNA16 |
  180. | batch_size | 1 |
  181. | Dice | dice = 0.9502 |
  182. | Model for inference | 56M(.ckpt file) |
  183. # [Description of Random Situation](#contents)
  184. We set seed to 1 in train.py.
  185. ## [ModelZoo Homepage](#contents)
  186. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).