|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128 |
- # ResNext50 Example
-
- ## Description
-
- This is an example of training ResNext50 with ImageNet dataset in Mindspore.
-
- ## Requirements
-
- - Install [Mindspore](http://www.mindspore.cn/install/en).
- - Downlaod the dataset ImageNet2012.
-
- ## Structure
-
- ```shell
- .
- └─resnext50
- ├─README.md
- ├─scripts
- ├─run_standalone_train.sh # launch standalone training(1p)
- ├─run_distribute_train.sh # launch distributed training(8p)
- └─run_eval.sh # launch evaluating
- ├─src
- ├─backbone
- ├─_init_.py # initalize
- ├─resnet.py # resnext50 backbone
- ├─utils
- ├─_init_.py # initalize
- ├─cunstom_op.py # network operation
- ├─logging.py # print log
- ├─optimizers_init_.py # get parameters
- ├─sampler.py # distributed sampler
- ├─var_init_.py # calculate gain value
- ├─_init_.py # initalize
- ├─config.py # parameter configuration
- ├─crossentropy.py # CrossEntropy loss function
- ├─dataset.py # data preprocessing
- ├─head.py # commom head
- ├─image_classification.py # get resnet
- ├─linear_warmup.py # linear warmup learning rate
- ├─warmup_cosine_annealing.py # learning rate each step
- ├─warmup_step_lr.py # warmup step learning rate
- ├─eval.py # eval net
- └─train.py # train net
-
- ```
-
- ## Parameter Configuration
-
- Parameters for both training and evaluating can be set in config.py
-
- ```
- "image_height": '224,224' # image size
- "num_classes": 1000, # dataset class number
- "per_batch_size": 128, # batch size of input tensor
- "lr": 0.05, # base learning rate
- "lr_scheduler": 'cosine_annealing', # learning rate mode
- "lr_epochs": '30,60,90,120', # epoch of lr changing
- "lr_gamma": 0.1, # decrease lr by a factor of exponential lr_scheduler
- "eta_min": 0, # eta_min in cosine_annealing scheduler
- "T_max": 150, # T-max in cosine_annealing scheduler
- "max_epoch": 150, # max epoch num to train the model
- "backbone": 'resnext50', # backbone metwork
- "warmup_epochs" : 1, # warmup epoch
- "weight_decay": 0.0001, # weight decay
- "momentum": 0.9, # momentum
- "is_dynamic_loss_scale": 0, # dynamic loss scale
- "loss_scale": 1024, # loss scale
- "label_smooth": 1, # label_smooth
- "label_smooth_factor": 0.1, # label_smooth_factor
- "ckpt_interval": 2000, # ckpt_interval
- "ckpt_path": 'outputs/', # checkpoint save location
- "is_save_on_master": 1,
- "rank": 0, # local rank of distributed
- "group_size": 1 # world size of distributed
- ```
-
- ## Running the example
-
- ### Train
-
- #### Usage
-
- ```
- # distribute training example(8p)
- sh run_distribute_train.sh MINDSPORE_HCCL_CONFIG_PATH DATA_PATH
- # standalone training
- sh run_standalone_train.sh DEVICE_ID DATA_PATH
- ```
-
- #### Launch
-
- ```bash
- # distributed training example(8p)
- sh scripts/run_distribute_train.sh MINDSPORE_HCCL_CONFIG_PATH /ImageNet/train
- # standalone training example
- sh scripts/run_standalone_train.sh 0 /ImageNet_Original/train
- ```
-
- #### Result
-
- You can find checkpoint file together with result in log.
-
- ### Evaluation
-
- #### Usage
-
- ```
- # Evaluation
- sh run_eval.sh DEVICE_ID DATA_PATH PRETRAINED_CKPT_PATH
- ```
-
- #### Launch
-
- ```bash
- # Evaluation with checkpoint
- sh scripts/run_eval.sh 0 /opt/npu/datasets/classification/val /resnext50_100.ckpt
- ```
-
- > checkpoint can be produced in training process.
-
- #### Result
-
- Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log.
-
- ```
- acc=78,16%(TOP1)
- acc=93.88%(TOP5)
- ```
|