You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 4.3 kB

5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
  1. # ResNext50 Example
  2. ## Description
  3. This is an example of training ResNext50 with ImageNet dataset in Mindspore.
  4. ## Requirements
  5. - Install [Mindspore](http://www.mindspore.cn/install/en).
  6. - Downlaod the dataset ImageNet2012.
  7. ## Structure
  8. ```shell
  9. .
  10. └─resnext50
  11. ├─README.md
  12. ├─scripts
  13. ├─run_standalone_train.sh # launch standalone training(1p)
  14. ├─run_distribute_train.sh # launch distributed training(8p)
  15. └─run_eval.sh # launch evaluating
  16. ├─src
  17. ├─backbone
  18. ├─_init_.py # initalize
  19. ├─resnet.py # resnext50 backbone
  20. ├─utils
  21. ├─_init_.py # initalize
  22. ├─cunstom_op.py # network operation
  23. ├─logging.py # print log
  24. ├─optimizers_init_.py # get parameters
  25. ├─sampler.py # distributed sampler
  26. ├─var_init_.py # calculate gain value
  27. ├─_init_.py # initalize
  28. ├─config.py # parameter configuration
  29. ├─crossentropy.py # CrossEntropy loss function
  30. ├─dataset.py # data preprocessing
  31. ├─head.py # commom head
  32. ├─image_classification.py # get resnet
  33. ├─linear_warmup.py # linear warmup learning rate
  34. ├─warmup_cosine_annealing.py # learning rate each step
  35. ├─warmup_step_lr.py # warmup step learning rate
  36. ├─eval.py # eval net
  37. └─train.py # train net
  38. ```
  39. ## Parameter Configuration
  40. Parameters for both training and evaluating can be set in config.py
  41. ```
  42. "image_height": '224,224' # image size
  43. "num_classes": 1000, # dataset class number
  44. "per_batch_size": 128, # batch size of input tensor
  45. "lr": 0.05, # base learning rate
  46. "lr_scheduler": 'cosine_annealing', # learning rate mode
  47. "lr_epochs": '30,60,90,120', # epoch of lr changing
  48. "lr_gamma": 0.1, # decrease lr by a factor of exponential lr_scheduler
  49. "eta_min": 0, # eta_min in cosine_annealing scheduler
  50. "T_max": 150, # T-max in cosine_annealing scheduler
  51. "max_epoch": 150, # max epoch num to train the model
  52. "backbone": 'resnext50', # backbone metwork
  53. "warmup_epochs" : 1, # warmup epoch
  54. "weight_decay": 0.0001, # weight decay
  55. "momentum": 0.9, # momentum
  56. "is_dynamic_loss_scale": 0, # dynamic loss scale
  57. "loss_scale": 1024, # loss scale
  58. "label_smooth": 1, # label_smooth
  59. "label_smooth_factor": 0.1, # label_smooth_factor
  60. "ckpt_interval": 2000, # ckpt_interval
  61. "ckpt_path": 'outputs/', # checkpoint save location
  62. "is_save_on_master": 1,
  63. "rank": 0, # local rank of distributed
  64. "group_size": 1 # world size of distributed
  65. ```
  66. ## Running the example
  67. ### Train
  68. #### Usage
  69. ```
  70. # distribute training example(8p)
  71. sh run_distribute_train.sh MINDSPORE_HCCL_CONFIG_PATH DATA_PATH
  72. # standalone training
  73. sh run_standalone_train.sh DEVICE_ID DATA_PATH
  74. ```
  75. #### Launch
  76. ```bash
  77. # distributed training example(8p)
  78. sh scripts/run_distribute_train.sh MINDSPORE_HCCL_CONFIG_PATH /ImageNet/train
  79. # standalone training example
  80. sh scripts/run_standalone_train.sh 0 /ImageNet_Original/train
  81. ```
  82. #### Result
  83. You can find checkpoint file together with result in log.
  84. ### Evaluation
  85. #### Usage
  86. ```
  87. # Evaluation
  88. sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH
  89. ```
  90. #### Launch
  91. ```bash
  92. # Evaluation with checkpoint
  93. sh scripts/run_eval.sh 0 /opt/npu/datasets/classification/val /resnext50_100.ckpt
  94. ```
  95. > checkpoint can be produced in training process.
  96. #### Result
  97. Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log.
  98. ```
  99. acc=78,16%(TOP1)
  100. acc=93.88%(TOP5)
  101. ```