You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 3.4 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115
  1. # Inception-v3 Example
  2. ## Description
  3. This is an example of training Inception-v3 in MindSpore.
  4. ## Requirements
  5. - Install [Mindspore](http://www.mindspore.cn/install/en).
  6. - Downlaod the dataset.
  7. ## Structure
  8. ```shell
  9. .
  10. └─Inception-v3
  11. ├─README.md
  12. ├─scripts
  13. ├─run_standalone_train_for_gpu.sh # launch standalone training with gpu platform(1p)
  14. ├─run_distribute_train_for_gpu.sh # launch distributed training with gpu platform(8p)
  15. └─run_eval_for_gpu.sh # launch evaluating with gpu platform
  16. ├─src
  17. ├─config.py # parameter configuration
  18. ├─dataset.py # data preprocessing
  19. ├─inception_v3.py # network definition
  20. ├─loss.py # Customized CrossEntropy loss function
  21. ├─lr_generator.py # learning rate generator
  22. ├─eval.py # eval net
  23. ├─export.py # convert checkpoint
  24. └─train.py # train net
  25. ```
  26. ## Parameter Configuration
  27. Parameters for both training and evaluating can be set in config.py
  28. ```
  29. 'random_seed': 1, # fix random seed
  30. 'rank': 0, # local rank of distributed
  31. 'group_size': 1, # world size of distributed
  32. 'work_nums': 8, # number of workers to read the data
  33. 'decay_method': 'cosine', # learning rate scheduler mode
  34. "loss_scale": 1, # loss scale
  35. 'batch_size': 128, # input batchsize
  36. 'epoch_size': 250, # total epoch numbers
  37. 'num_classes': 1000, # dataset class numbers
  38. 'smooth_factor': 0.1, # label smoothing factor
  39. 'aux_factor': 0.2, # loss factor of aux logit
  40. 'lr_init': 0.00004, # initiate learning rate
  41. 'lr_max': 0.4, # max bound of learning rate
  42. 'lr_end': 0.000004, # min bound of learning rate
  43. 'warmup_epochs': 1, # warmup epoch numbers
  44. 'weight_decay': 0.00004, # weight decay
  45. 'momentum': 0.9, # momentum
  46. 'opt_eps': 1.0, # epsilon
  47. 'keep_checkpoint_max': 100, # max numbers to keep checkpoints
  48. 'ckpt_path': './checkpoint/', # save checkpoint path
  49. 'is_save_on_master': 1 # save checkpoint on rank0, distributed parameters
  50. ```
  51. ## Running the example
  52. ### Train
  53. #### Usage
  54. ```
  55. # distribute training example(8p)
  56. sh run_distribute_train_for_gpu.sh DATA_DIR
  57. # standalone training
  58. sh run_standalone_train_for_gpu.sh DEVICE_ID DATA_DIR
  59. ```
  60. #### Launch
  61. ```bash
  62. # distributed training example(8p) for GPU
  63. sh scripts/run_distribute_train_for_gpu.sh /dataset/train
  64. # standalone training example for GPU
  65. sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train
  66. ```
  67. #### Result
  68. You can find checkpoint file together with result in log.
  69. ### Evaluation
  70. #### Usage
  71. ```
  72. # Evaluation
  73. sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
  74. ```
  75. #### Launch
  76. ```bash
  77. # Evaluation with checkpoint
  78. sh scripts/run_eval_for_gpu.sh 0 /dataset/val ./checkpoint/inceptionv3-rank3-247_1251.ckpt
  79. ```
  80. > checkpoint can be produced in training process.
  81. #### Result
  82. Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log.
  83. ```
  84. acc=78.75%(TOP1)
  85. acc=94.07%(TOP5)
  86. ```