You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 20 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374
  1. # Contents
  2. - [Contents](#contents)
  3. - [Unet Description](#unet-description)
  4. - [Model Architecture](#model-architecture)
  5. - [Dataset](#dataset)
  6. - [Environment Requirements](#environment-requirements)
  7. - [Quick Start](#quick-start)
  8. - [Script Description](#script-description)
  9. - [Script and Sample Code](#script-and-sample-code)
  10. - [Script Parameters](#script-parameters)
  11. - [Training Process](#training-process)
  12. - [Training](#training)
  13. - [running on Ascend](#running-on-ascend)
  14. - [Distributed Training](#distributed-training)
  15. - [Evaluation Process](#evaluation-process)
  16. - [Evaluation](#evaluation)
  17. - [Model Description](#model-description)
  18. - [Performance](#performance)
  19. - [Evaluation Performance](#evaluation-performance)
  20. - [How to use](#how-to-use)
  21. - [Inference](#inference)
  22. - [Running on Ascend 310](#running-on-ascend-310)
  23. - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model)
  24. - [Transfer training](#transfer-training)
  25. - [Description of Random Situation](#description-of-random-situation)
  26. - [ModelZoo Homepage](#modelzoo-homepage)
  27. ## [Unet Description](#contents)
  28. Unet for 2D image segmentation. This implementation is as described in the original paper [UNet: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597). Unet, in the 2015 ISBI cell tracking competition, many of the best are obtained. In this paper, a network model for medical image segmentation is proposed, and a data enhancement method is proposed to effectively use the annotation data to solve the problem of insufficient annotation data in the medical field. A U-shaped network structure is also used to extract the context and location information.
  29. UNet++ is a neural architecture for semantic and instance segmentation with re-designed skip pathways and deep supervision.
  30. [U-Net Paper](https://arxiv.org/abs/1505.04597): Olaf Ronneberger, Philipp Fischer, Thomas Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation." *conditionally accepted at MICCAI 2015*. 2015.
  31. [UNet++ Paper](https://arxiv.org/abs/1912.05074): Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh and J. Liang, "UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation," in IEEE Transactions on Medical Imaging, vol. 39, no. 6, pp. 1856-1867, June 2020, doi: 10.1109/TMI.2019.2959609.
  32. ## [Model Architecture](#contents)
  33. Specifically, the U network structure is proposed in UNET, which can better extract and fuse high-level features and obtain context information and spatial location information. The U network structure is composed of encoder and decoder. The encoder is composed of two 3x3 conv and a 2x2 max pooling iteration. The number of channels is doubled after each down sampling. The decoder is composed of a 2x2 deconv, concat layer and two 3x3 convolutions, and then outputs after a 1x1 convolution.
  34. ## [Dataset](#contents)
  35. Dataset used: [ISBI Challenge](http://brainiac2.mit.edu/isbi_challenge/home)
  36. - Description: The training and test datasets are two stacks of 30 sections from a serial section Transmission Electron Microscopy (ssTEM) data set of the Drosophila first instar larva ventral nerve cord (VNC). The microcube measures 2 x 2 x 1.5 microns approx., with a resolution of 4x4x50 nm/pixel.
  37. - License: You are free to use this data set for the purpose of generating or testing non-commercial image segmentation software. If any scientific publications derive from the usage of this data set, you must cite TrakEM2 and the following publication: Cardona A, Saalfeld S, Preibisch S, Schmid B, Cheng A, Pulokas J, Tomancak P, Hartenstein V. 2010. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLoS Biol 8(10): e1000502. doi:10.1371/journal.pbio.1000502.
  38. - Dataset size:22.5M,
  39. - Train:15M, 30 images (Training data contains 2 multi-page TIF files, each containing 30 2D-images. train-volume.tif and train-labels.tif respectly contain data and label.)
  40. - Val:(We randomly divide the training data into 5-fold and evaluate the model by across 5-fold cross-validation.)
  41. - Test:7.5M, 30 images (Testing data contains 1 multi-page TIF files, each containing 30 2D-images. test-volume.tif respectly contain data.)
  42. - Data format:binary files(TIF file)
  43. - Note:Data will be processed in src/data_loader.py
  44. We also support cell nuclei dataset which is used in [Unet++ original paper](https://arxiv.org/abs/1912.05074). If you want to use the dataset, please add `'dataset': 'Cell_nuclei'` in `src/config.py`.
  45. ## [Environment Requirements](#contents)
  46. - Hardware(Ascend)
  47. - Prepare hardware environment with Ascend processor.
  48. - Framework
  49. - [MindSpore](https://www.mindspore.cn/install/en)
  50. - For more information, please check the resources below:
  51. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  52. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  53. ## [Quick Start](#contents)
  54. After installing MindSpore via the official website, you can start training and evaluation as follows:
  55. - Select the network and dataset to use
  56. 1. Select `cfg_unet` in `src/config.py`. We support unet and unet++, and we provide some parameter configurations for quick start.
  57. 2. If you want other parameters, please refer to `src/config.py`. You can set `'model'` to `'unet_nested'` or `'unet_simple'` to select which net to use. We support `ISBI` and `Cell_nuclei` two dataset, you can set `'dataset'` to `'Cell_nuclei'` to use `Cell_nuclei` dataset, default is `ISBI`.
  58. - Run on Ascend
  59. ```python
  60. # run training example
  61. python train.py --data_url=/path/to/data/ > train.log 2>&1 &
  62. OR
  63. bash scripts/run_standalone_train.sh [DATASET]
  64. # run distributed training example
  65. bash scripts/run_distribute_train.sh [RANK_TABLE_FILE] [DATASET]
  66. # run evaluation example
  67. python eval.py --data_url=/path/to/data/ --ckpt_path=/path/to/checkpoint/ > eval.log 2>&1 &
  68. OR
  69. bash scripts/run_standalone_eval.sh [DATASET] [CHECKPOINT]
  70. ```
  71. - Run on docker
  72. Build docker images(Change version to the one you actually used)
  73. ```shell
  74. # build docker
  75. docker build -t unet:20.1.0 . --build-arg FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0
  76. ```
  77. Create a container layer over the created image and start it
  78. ```shell
  79. # start docker
  80. bash scripts/docker_start.sh unet:20.1.0 [DATA_DIR] [MODEL_DIR]
  81. ```
  82. Then you can run everything just like on ascend.
  83. ## [Script Description](#contents)
  84. ### [Script and Sample Code](#contents)
  85. ```shell
  86. ├── model_zoo
  87. ├── README.md // descriptions about all the models
  88. ├── unet
  89. ├── README.md // descriptions about Unet
  90. ├── ascend310_infer // code of infer on ascend 310
  91. ├── scripts
  92. │ ├──docker_start.sh // shell script for quick docker start
  93. │ ├──run_disribute_train.sh // shell script for distributed on Ascend
  94. │ ├──run_infer_310.sh // shell script for infer on ascend 310
  95. │ ├──run_standalone_train.sh // shell script for standalone on Ascend
  96. │ ├──run_standalone_eval.sh // shell script for evaluation on Ascend
  97. ├── src
  98. │ ├──config.py // parameter configuration
  99. │ ├──data_loader.py // creating dataset
  100. │ ├──loss.py // loss
  101. │ ├──eval_callback.py // evaluation callback while training
  102. │ ├──utils.py // General components (callback function)
  103. │ ├──unet_medical // Unet medical architecture
  104. ├──__init__.py // init file
  105. ├──unet_model.py // unet model
  106. ├──unet_parts.py // unet part
  107. │ ├──unet_nested // Unet++ architecture
  108. ├──__init__.py // init file
  109. ├──unet_model.py // unet model
  110. ├──unet_parts.py // unet part
  111. ├── train.py // training script
  112. ├── eval.py // evaluation script
  113. ├── export.py // export script
  114. ├── mindspore_hub_conf.py // hub config file
  115. ├── postprocess.py // unet 310 infer postprocess.
  116. ├── preprocess.py // unet 310 infer preprocess dataset
  117. ├── requirements.txt // Requirements of third party package.
  118. ```
  119. ### [Script Parameters](#contents)
  120. Parameters for both training and evaluation can be set in config.py
  121. - config for Unet, ISBI dataset
  122. ```python
  123. 'name': 'Unet', # model name
  124. 'lr': 0.0001, # learning rate
  125. 'epochs': 400, # total training epochs when run 1p
  126. 'repeat': 400, # Repeat times pre one epoch
  127. 'distribute_epochs': 1600, # total training epochs when run 8p
  128. 'batchsize': 16, # training batch size
  129. 'cross_valid_ind': 1, # cross valid ind
  130. 'num_classes': 2, # the number of classes in the dataset
  131. 'num_channels': 1, # the number of channels
  132. 'keep_checkpoint_max': 10, # only keep the last keep_checkpoint_max checkpoint
  133. 'weight_decay': 0.0005, # weight decay value
  134. 'loss_scale': 1024.0, # loss scale
  135. 'FixedLossScaleManager': 1024.0, # fix loss scale
  136. 'resume': False, # whether training with pretrain model
  137. 'resume_ckpt': './', # pretrain model path
  138. 'transfer_training': False # whether do transfer training
  139. 'filter_weight': ["final.weight"] # weight name to filter while doing transfer training
  140. 'run_eval': False # Run evaluation when training
  141. 'save_best_ckpt': True # Save best checkpoint when run_eval is True
  142. 'eval_start_epoch': 0 # Evaluation start epoch when run_eval is True
  143. 'eval_interval': 1 # valuation interval when run_eval is True
  144. ```
  145. - config for Unet++, cell nuclei dataset
  146. ```python
  147. 'model': 'unet_nested', # model name
  148. 'dataset': 'Cell_nuclei', # dataset name
  149. 'img_size': [96, 96], # image size
  150. 'lr': 3e-4, # learning rate
  151. 'epochs': 200, # total training epochs when run 1p
  152. 'repeat': 10, # Repeat times pre one epoch
  153. 'distribute_epochs': 1600, # total training epochs when run 8p
  154. 'batchsize': 16, # batch size
  155. 'num_classes': 2, # the number of classes in the dataset
  156. 'num_channels': 3, # the number of input image channels
  157. 'keep_checkpoint_max': 10, # only keep the last keep_checkpoint_max checkpoint
  158. 'weight_decay': 0.0005, # weight decay value
  159. 'loss_scale': 1024.0, # loss scale
  160. 'FixedLossScaleManager': 1024.0, # loss scale
  161. 'use_bn': True, # whether to use BN
  162. 'use_ds': True, # whether to use deep supervisio
  163. 'use_deconv': True, # whether to use Conv2dTranspose
  164. 'resume': False, # whether training with pretrain model
  165. 'resume_ckpt': './', # pretrain model path
  166. 'transfer_training': False # whether do transfer training
  167. 'filter_weight': ['final1.weight', 'final2.weight', 'final3.weight', 'final4.weight'] # weight name to filter while doing transfer training
  168. 'run_eval': False # Run evaluation when training
  169. 'save_best_ckpt': True # Save best checkpoint when run_eval is True
  170. 'eval_start_epoch': 0 # Evaluation start epoch when run_eval is True
  171. 'eval_interval': 1 # valuation interval when run_eval is True
  172. ```
  173. *Note: total steps pre epoch is floor(epochs / repeat), because unet dataset usually is small, we repeat the dataset to avoid drop too many images when add batch size.*
  174. ## [Training Process](#contents)
  175. ### Training
  176. #### running on Ascend
  177. ```shell
  178. python train.py --data_url=/path/to/data/ > train.log 2>&1 &
  179. OR
  180. bash scripts/run_standalone_train.sh [DATASET]
  181. ```
  182. The python command above will run in the background, you can view the results through the file `train.log`.
  183. After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
  184. ```shell
  185. # grep "loss is " train.log
  186. step: 1, loss is 0.7011719, fps is 0.25025035060906264
  187. step: 2, loss is 0.69433594, fps is 56.77693756377044
  188. step: 3, loss is 0.69189453, fps is 57.3293877244179
  189. step: 4, loss is 0.6894531, fps is 57.840651522059716
  190. step: 5, loss is 0.6850586, fps is 57.89903776054361
  191. step: 6, loss is 0.6777344, fps is 58.08073627299014
  192. ...
  193. step: 597, loss is 0.19030762, fps is 58.28088370287449
  194. step: 598, loss is 0.19958496, fps is 57.95493929352674
  195. step: 599, loss is 0.18371582, fps is 58.04039977720966
  196. step: 600, loss is 0.22070312, fps is 56.99692546024671
  197. ```
  198. The model checkpoint will be saved in the current directory.
  199. #### Distributed Training
  200. ```shell
  201. bash scripts/run_distribute_train.sh [RANK_TABLE_FILE] [DATASET]
  202. ```
  203. The above shell script will run distribute training in the background. You can view the results through the file `logs/device[X]/log.log`. The loss value will be achieved as follows:
  204. ```shell
  205. # grep "loss is" logs/device0/log.log
  206. step: 1, loss is 0.70524895, fps is 0.15914689861221412
  207. step: 2, loss is 0.6925452, fps is 56.43668656967454
  208. ...
  209. step: 299, loss is 0.20551169, fps is 58.4039329983891
  210. step: 300, loss is 0.18949677, fps is 57.63118508760329
  211. ```
  212. #### Evaluation while training
  213. You can add `run_eval` to start shell and set it True, if you want evaluation while training. And you can set argument option: `save_best_ckpt`, `eval_start_epoch`, `eval_interval`, `eval_metrics` when `run_eval` is True.
  214. ## [Evaluation Process](#contents)
  215. ### Evaluation
  216. - evaluation on ISBI dataset when running on Ascend
  217. Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/unet/ckpt_unet_medical_adam-48_600.ckpt".
  218. ```shell
  219. python eval.py --data_url=/path/to/data/ --ckpt_path=/path/to/unet.ckpt > eval.log 2>&1 &
  220. OR
  221. bash scripts/run_standalone_eval.sh [DATASET] [CHECKPOINT]
  222. ```
  223. The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
  224. ```shell
  225. # grep "Cross valid dice coeff is:" eval.log
  226. ============== Cross valid dice coeff is: {'dice_coeff': 0.9111}
  227. ```
  228. ## [Model Description](#contents)
  229. ### [Performance](#contents)
  230. #### Evaluation Performance
  231. | Parameters | Ascend |
  232. | -------------------------- | ------------------------------------------------------------ |
  233. | Model Version | Unet |
  234. | Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
  235. | uploaded Date | 09/15/2020 (month/day/year) |
  236. | MindSpore Version | 1.2.0 |
  237. | Dataset | ISBI |
  238. | Training Parameters | 1pc: epoch=400, total steps=600, batch_size = 16, lr=0.0001 |
  239. | Optimizer | Adam |
  240. | Loss Function | Softmax Cross Entropy |
  241. | outputs | probability |
  242. | Loss | 0.22070312 |
  243. | Speed | 1pc: 267 ms/step |
  244. | Total time | 1pc: 2.67 mins |
  245. | Parameters (M) | 93M |
  246. | Checkpoint for Fine tuning | 355.11M (.ckpt file) |
  247. | Scripts | [unet script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/unet) |
  248. ### [How to use](#contents)
  249. #### Inference
  250. If you need to use the trained model to perform inference on multiple hardware platforms, such as Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). Following the steps below, this is a simple example:
  251. ##### Running on Ascend 310
  252. Export MindIR
  253. ```shell
  254. python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
  255. ```
  256. The ckpt_file parameter is required,
  257. `EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
  258. Before performing inference, the MINDIR file must be exported by export script on the 910 environment.
  259. Current batch_size can only be set to 1.
  260. ```shell
  261. # Ascend310 inference
  262. bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
  263. ```
  264. `DEVICE_ID` is optional, default value is 0.
  265. Inference result is saved in current path, you can find result in acc.log file.
  266. ```text
  267. Cross valid dice coeff is: 0.9054352151297033
  268. ```
  269. #### Continue Training on the Pretrained Model
  270. Set options `resume` to True in `config.py`, and set `resume_ckpt` to the path of your checkpoint. e.g.
  271. ```python
  272. 'resume': True,
  273. 'resume_ckpt': 'ckpt_0/ckpt_unet_sample_adam_1-1_600.ckpt',
  274. 'transfer_training': False,
  275. 'filter_weight': ["final.weight"]
  276. ```
  277. #### Transfer training
  278. Do the same thing as resuming traing above. In addition, set `transfer_training` to True. The `filter_weight` shows the weights which will be filtered for different dataset. Usually, the default value of `filter_weight` don't need to be changed. The default values includes the weights which depends on the class number. e.g.
  279. ```python
  280. 'resume': True,
  281. 'resume_ckpt': 'ckpt_0/ckpt_unet_sample_adam_1-1_600.ckpt',
  282. 'transfer_training': True,
  283. 'filter_weight': ["final.weight"]
  284. ```
  285. ## [Description of Random Situation](#contents)
  286. In data_loader.py, we set the seed inside “_get_val_train_indices" function. We also use random seed in train.py.
  287. ## [ModelZoo Homepage](#contents)
  288. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).