You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 16 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329
  1. # Contents
  2. - [Unet Description](#unet-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Environment Requirements](#environment-requirements)
  6. - [Quick Start](#quick-start)
  7. - [Script Description](#script-description)
  8. - [Script and Sample Code](#script-and-sample-code)
  9. - [Script Parameters](#script-parameters)
  10. - [Training Process](#training-process)
  11. - [Training](#training)
  12. - [Distributed Training](#distributed-training)
  13. - [Evaluation Process](#evaluation-process)
  14. - [Evaluation](#evaluation)
  15. - [Model Description](#model-description)
  16. - [Performance](#performance)
  17. - [Evaluation Performance](#evaluation-performance)
  18. - [How to use](#how-to-use)
  19. - [Inference](#inference)
  20. - [Continue Training on the Pretrained Model](#continue-training-on-the-pretrained-model)
  21. - [Description of Random Situation](#description-of-random-situation)
  22. - [ModelZoo Homepage](#modelzoo-homepage)
  23. ## [Unet Description](#contents)
  24. Unet Medical model for 2D image segmentation. This implementation is as described in the original paper [UNet: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597). Unet, in the 2015 ISBI cell tracking competition, many of the best are obtained. In this paper, a network model for medical image segmentation is proposed, and a data enhancement method is proposed to effectively use the annotation data to solve the problem of insufficient annotation data in the medical field. A U-shaped network structure is also used to extract the context and location information.
  25. [Paper](https://arxiv.org/abs/1505.04597): Olaf Ronneberger, Philipp Fischer, Thomas Brox. "U-Net: Convolutional Networks for Biomedical Image Segmentation." *conditionally accepted at MICCAI 2015*. 2015.
  26. # [Model Architecture](#contents)
  27. Specifically, the U network structure is proposed in UNET, which can better extract and fuse high-level features and obtain context information and spatial location information. The U network structure is composed of encoder and decoder. The encoder is composed of two 3x3 conv and a 2x2 max pooling iteration. The number of channels is doubled after each down sampling. The decoder is composed of a 2x2 deconv, concat layer and two 3x3 convolutions, and then outputs after a 1x1 convolution.
  28. # [Dataset](#contents)
  29. Dataset used: [ISBI Challenge](http://brainiac2.mit.edu/isbi_challenge/home)
  30. - Description: The training and test datasets are two stacks of 30 sections from a serial section Transmission Electron Microscopy (ssTEM) data set of the Drosophila first instar larva ventral nerve cord (VNC). The microcube measures 2 x 2 x 1.5 microns approx., with a resolution of 4x4x50 nm/pixel.
  31. - License: You are free to use this data set for the purpose of generating or testing non-commercial image segmentation software. If any scientific publications derive from the usage of this data set, you must cite TrakEM2 and the following publication: Cardona A, Saalfeld S, Preibisch S, Schmid B, Cheng A, Pulokas J, Tomancak P, Hartenstein V. 2010. An Integrated Micro- and Macroarchitectural Analysis of the Drosophila Brain by Computer-Assisted Serial Section Electron Microscopy. PLoS Biol 8(10): e1000502. doi:10.1371/journal.pbio.1000502.
  32. - Dataset size:22.5M,
  33. - Train:15M, 30 images (Training data contains 2 multi-page TIF files, each containing 30 2D-images. train-volume.tif and train-labels.tif respectly contain data and label.)
  34. - Val:(We randomly divide the training data into 5-fold and evaluate the model by across 5-fold cross-validation.)
  35. - Test:7.5M, 30 images (Testing data contains 1 multi-page TIF files, each containing 30 2D-images. test-volume.tif respectly contain data.)
  36. - Data format:binary files(TIF file)
  37. - Note:Data will be processed in src/data_loader.py
  38. # [Environment Requirements](#contents)
  39. - Hardware(Ascend)
  40. - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
  41. - Framework
  42. - [MindSpore](https://www.mindspore.cn/install/en)
  43. - For more information, please check the resources below:
  44. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  45. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  46. # [Quick Start](#contents)
  47. After installing MindSpore via the official website, you can start training and evaluation as follows:
  48. - running on Ascend
  49. ```python
  50. # run training example
  51. python train.py --data_url=/path/to/data/ > train.log 2>&1 &
  52. OR
  53. bash scripts/run_standalone_train.sh [DATASET]
  54. # run distributed training example
  55. bash scripts/run_distribute_train.sh [RANK_TABLE_FILE] [DATASET]
  56. # run evaluation example
  57. python eval.py --data_url=/path/to/data/ --ckpt_path=/path/to/checkpoint/ > eval.log 2>&1 &
  58. OR
  59. bash scripts/run_standalone_eval.sh [DATASET] [CHECKPOINT]
  60. ```
  61. # [Script Description](#contents)
  62. ## [Script and Sample Code](#contents)
  63. ```text
  64. ├── model_zoo
  65. ├── README.md // descriptions about all the models
  66. ├── unet
  67. ├── README.md // descriptions about Unet
  68. ├── scripts
  69. │ ├──run_standalone_train.sh // shell script for distributed on Ascend
  70. │ ├──run_standalone_eval.sh // shell script for evaluation on Ascend
  71. ├── src
  72. │ ├──config.py // parameter configuration
  73. │ ├──data_loader.py // creating dataset
  74. │ ├──loss.py // loss
  75. │ ├──utils.py // General components (callback function)
  76. │ ├──unet.py // Unet architecture
  77. ├──__init__.py // init file
  78. ├──unet_model.py // unet model
  79. ├──unet_parts.py // unet part
  80. ├── train.py // training script
  81. ├──launch_8p.py // training 8P script
  82. ├── eval.py // evaluation script
  83. ```
  84. ## [Script Parameters](#contents)
  85. Parameters for both training and evaluation can be set in config.py
  86. - config for Unet, ISBI dataset
  87. ```python
  88. 'name': 'Unet', # model name
  89. 'lr': 0.0001, # learning rate
  90. 'epochs': 400, # total training epochs when run 1p
  91. 'distribute_epochs': 1600, # total training epochs when run 8p
  92. 'batchsize': 16, # training batch size
  93. 'cross_valid_ind': 1, # cross valid ind
  94. 'num_classes': 2, # the number of classes in the dataset
  95. 'num_channels': 1, # the number of channels
  96. 'keep_checkpoint_max': 10, # only keep the last keep_checkpoint_max checkpoint
  97. 'weight_decay': 0.0005, # weight decay value
  98. 'loss_scale': 1024.0, # loss scale
  99. 'FixedLossScaleManager': 1024.0, # fix loss scale
  100. 'resume': False, # whether training with pretrain model
  101. 'resume_ckpt': './', # pretrain model path
  102. ```
  103. ## [Training Process](#contents)
  104. ### Training
  105. - running on Ascend
  106. ```shell
  107. python train.py --data_url=/path/to/data/ > train.log 2>&1 &
  108. OR
  109. bash scripts/run_standalone_train.sh [DATASET]
  110. ```
  111. The python command above will run in the background, you can view the results through the file `train.log`.
  112. After training, you'll get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
  113. ```shell
  114. # grep "loss is " train.log
  115. step: 1, loss is 0.7011719, fps is 0.25025035060906264
  116. step: 2, loss is 0.69433594, fps is 56.77693756377044
  117. step: 3, loss is 0.69189453, fps is 57.3293877244179
  118. step: 4, loss is 0.6894531, fps is 57.840651522059716
  119. step: 5, loss is 0.6850586, fps is 57.89903776054361
  120. step: 6, loss is 0.6777344, fps is 58.08073627299014
  121. ...
  122. step: 597, loss is 0.19030762, fps is 58.28088370287449
  123. step: 598, loss is 0.19958496, fps is 57.95493929352674
  124. step: 599, loss is 0.18371582, fps is 58.04039977720966
  125. step: 600, loss is 0.22070312, fps is 56.99692546024671
  126. ```
  127. The model checkpoint will be saved in the current directory.
  128. ### Distributed Training
  129. ```shell
  130. bash scripts/run_distribute_train.sh [RANK_TABLE_FILE] [DATASET]
  131. ```
  132. The above shell script will run distribute training in the background. You can view the results through the file `logs/device[X]/log.log`. The loss value will be achieved as follows:
  133. ```shell
  134. # grep "loss is" logs/device0/log.log
  135. step: 1, loss is 0.70524895, fps is 0.15914689861221412
  136. step: 2, loss is 0.6925452, fps is 56.43668656967454
  137. ...
  138. step: 299, loss is 0.20551169, fps is 58.4039329983891
  139. step: 300, loss is 0.18949677, fps is 57.63118508760329
  140. ```
  141. ## [Evaluation Process](#contents)
  142. ### Evaluation
  143. - evaluation on ISBI dataset when running on Ascend
  144. Before running the command below, please check the checkpoint path used for evaluation. Please set the checkpoint path to be the absolute full path, e.g., "username/unet/ckpt_unet_medical_adam-48_600.ckpt".
  145. ```shell
  146. python eval.py --data_url=/path/to/data/ --ckpt_path=/path/to/checkpoint/ > eval.log 2>&1 &
  147. OR
  148. bash scripts/run_standalone_eval.sh [DATASET] [CHECKPOINT]
  149. ```
  150. The above python command will run in the background. You can view the results through the file "eval.log". The accuracy of the test dataset will be as follows:
  151. ```shell
  152. # grep "Cross valid dice coeff is:" eval.log
  153. ============== Cross valid dice coeff is: {'dice_coeff': 0.9085704886070473}
  154. ```
  155. # [Model Description](#contents)
  156. ## Performance
  157. ### Evaluation Performance
  158. | Parameters | Ascend |
  159. | -------------------------- | ------------------------------------------------------------ |
  160. | Model Version | Unet |
  161. | Resource | Ascend 910 ;CPU 2.60GHz,192cores; Memory,755G |
  162. | uploaded Date | 09/15/2020 (month/day/year) |
  163. | MindSpore Version | 1.0.0 |
  164. | Dataset | ISBI |
  165. | Training Parameters | 1pc: epoch=400, total steps=600, batch_size = 16, lr=0.0001 |
  166. | | 8pc: epoch=1600, total steps=300, batch_size = 16, lr=0.0001 |
  167. | Optimizer | ADAM |
  168. | Loss Function | Softmax Cross Entropy |
  169. | outputs | probability |
  170. | Loss | 0.22070312 |
  171. | Speed | 1pc: 267 ms/step; 8pc: 280 ms/step; |
  172. | Total time | 1pc: 2.67 mins; 8pc: 1.40 mins |
  173. | Parameters (M) | 93M |
  174. | Checkpoint for Fine tuning | 355.11M (.ckpt file) |
  175. | Scripts | [unet script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/unet) |
  176. ## [How to use](#contents)
  177. ### Inference
  178. If you need to use the trained model to perform inference on multiple hardware platforms, such as Ascend 910 or Ascend 310, you can refer to this [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/migrate_3rd_scripts.html). Following the steps below, this is a simple example:
  179. - Running on Ascend
  180. ```python
  181. # Set context
  182. device_id = int(os.getenv('DEVICE_ID'))
  183. context.set_context(mode=context.GRAPH_MODE, device_target="Ascend",save_graphs=True,device_id=device_id)
  184. # Load unseen dataset for inference
  185. _, valid_dataset = create_dataset(data_dir, 1, 1, False, cross_valid_ind, False)
  186. # Define model and Load pre-trained model
  187. net = UNet(n_channels=cfg['num_channels'], n_classes=cfg['num_classes'])
  188. param_dict= load_checkpoint(ckpt_path)
  189. load_param_into_net(net , param_dict)
  190. criterion = CrossEntropyWithLogits()
  191. model = Model(net, loss_fn=loss, optimizer=opt, metrics={'acc'})
  192. # Make predictions on the unseen dataset
  193. print("============== Starting Evaluating ============")
  194. dice_score = model.eval(valid_dataset, dataset_sink_mode=False)
  195. print("============== Cross valid dice coeff is:", dice_score)
  196. ```
  197. - Running on Ascend 310
  198. Export MindIR
  199. ```shell
  200. python export.py --ckpt_file [CKPT_PATH] --file_name [FILE_NAME] --file_format [FILE_FORMAT]
  201. ```
  202. The ckpt_file parameter is required,
  203. `EXPORT_FORMAT` should be in ["AIR", "MINDIR"]
  204. Before performing inference, the MINDIR file must be exported by export script on the 910 environment.
  205. Current batch_size can only be set to 1.
  206. ```shell
  207. # Ascend310 inference
  208. bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [DEVICE_ID]
  209. ```
  210. `DEVICE_ID` is optional, default value is 0.
  211. Inference result is saved in current path, you can find result in acc.log file.
  212. ```text
  213. Cross valid dice coeff is: 0.9054352151297033
  214. ```
  215. ### Continue Training on the Pretrained Model
  216. - running on Ascend
  217. ```python
  218. # Define model
  219. net = UNet(n_channels=cfg['num_channels'], n_classes=cfg['num_classes'])
  220. # Continue training if set 'resume' to be True
  221. if cfg['resume']:
  222. param_dict = load_checkpoint(cfg['resume_ckpt'])
  223. load_param_into_net(net, param_dict)
  224. # Load dataset
  225. train_dataset, _ = create_dataset(data_dir, epochs, batch_size, True, cross_valid_ind, run_distribute)
  226. train_data_size = train_dataset.get_dataset_size()
  227. optimizer = nn.Adam(params=net.trainable_params(), learning_rate=lr, weight_decay=cfg['weight_decay'],
  228. loss_scale=cfg['loss_scale'])
  229. criterion = CrossEntropyWithLogits()
  230. loss_scale_manager = mindspore.train.loss_scale_manager.FixedLossScaleManager(cfg['FixedLossScaleManager'], False)
  231. model = Model(net, loss_fn=criterion, loss_scale_manager=loss_scale_manager, optimizer=optimizer, amp_level="O3")
  232. # Set callbacks
  233. ckpt_config = CheckpointConfig(save_checkpoint_steps=train_data_size,
  234. keep_checkpoint_max=cfg['keep_checkpoint_max'])
  235. ckpoint_cb = ModelCheckpoint(prefix='ckpt_unet_medical_adam',
  236. directory='./ckpt_{}/'.format(device_id),
  237. config=ckpt_config)
  238. print("============== Starting Training ==============")
  239. model.train(1, train_dataset, callbacks=[StepLossTimeMonitor(batch_size=batch_size), ckpoint_cb],
  240. dataset_sink_mode=False)
  241. print("============== End Training ==============")
  242. ```
  243. # [Description of Random Situation](#contents)
  244. In data_loader.py, we set the seed inside “_get_val_train_indices" function. We also use random seed in train.py.
  245. # [ModelZoo Homepage](#contents)
  246. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).