History

mindspore-ci-bot be62fd7fa6 !5768 Save the GPU backend multi card output in different folders. Merge pull request !5768 from linqingke/fasterrcnn		5 years ago
..
scripts	Save the GPU backend multi card output in different folders.	5 years ago

src	change map calls	5 years ago

README.md	Add an example of training NASNet in MindSpore	5 years ago

eval.py	Add an example of training NASNet in MindSpore	5 years ago

export.py	Add an example of training NASNet in MindSpore	5 years ago

train.py	rename mirror_mean to gradients_mean	5 years ago

README.md

NASNet Example
- Description
- Requirements
- Structure
- Parameter Configuration
- Running the example
- - Train
  - - Usage
    - Launch
    - Result
  - Evaluation
  - - Usage
    - Launch
    - Result

NASNet Example

Description

This is an example of training NASNet-A-Mobile in MindSpore.

Requirements

Install Mindspore.
Download the dataset.

Structure

.
└─nasnet      
  ├─README.md
  ├─scripts      
    ├─run_standalone_train_for_gpu.sh         # launch standalone training with gpu platform(1p)
    ├─run_distribute_train_for_gpu.sh         # launch distributed training with gpu platform(8p)
    └─run_eval_for_gpu.sh                     # launch evaluating with gpu platform
  ├─src
    ├─config.py                       # parameter configuration
    ├─dataset.py                      # data preprocessing
    ├─loss.py                         # Customized CrossEntropy loss function
    ├─lr_generator.py                 # learning rate generator
    ├─nasnet_a_mobile.py              # network definition
  ├─eval.py                           # eval net
  ├─export.py                         # convert checkpoint
  └─train.py                          # train net

Parameter Configuration

Parameters for both training and evaluating can be set in config.py

'random_seed': 1,                # fix random seed
'rank': 0,                       # local rank of distributed
'group_size': 1,                 # world size of distributed
'work_nums': 8,                  # number of workers to read the data
'epoch_size': 250,               # total epoch numbers
'keep_checkpoint_max': 100,      # max numbers to keep checkpoints
'ckpt_path': './checkpoint/',    # save checkpoint path
'is_save_on_master': 1           # save checkpoint on rank0, distributed parameters
'batch_size': 32,                # input batchsize
'num_classes': 1000,             # dataset class numbers
'label_smooth_factor': 0.1,      # label smoothing factor
'aux_factor': 0.4,               # loss factor of aux logit
'lr_init': 0.04,                 # initiate learning rate
'lr_decay_rate': 0.97,           # decay rate of learning rate
'num_epoch_per_decay': 2.4,      # decay epoch number
'weight_decay': 0.00004,         # weight decay
'momentum': 0.9,                 # momentum
'opt_eps': 1.0,                  # epsilon
'rmsprop_decay': 0.9,            # rmsprop decay
'loss_scale': 1,                 # loss scale

Running the example

Train

Usage

# distribute training example(8p)
sh run_distribute_train_for_gpu.sh DATA_DIR 
# standalone training
sh run_standalone_train_for_gpu.sh DEVICE_ID DATA_DIR

Launch

# distributed training example(8p) for GPU
sh scripts/run_distribute_train_for_gpu.sh /dataset/train
# standalone training example for GPU
sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train

Result

You can find checkpoint file together with result in log.

Evaluation

Usage

# Evaluation
sh run_eval_for_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT

Launch

# Evaluation with checkpoint
sh scripts/run_eval_for_gpu.sh 0 /dataset/val ./checkpoint/nasnet-a-mobile-rank0-248_10009.ckpt

checkpoint can be produced in training process.

Result

Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log.

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ Python Text Unity3D Asset C other

314202276@qq.com 5518576+mindspore_ci@user.noreply.gitee.com tommylike@qq.com zhaozhenlong1@huawei.com jiangjinsheng@huawei.com yiren19920727@163.com zhaojichen1@huawei.com shiliang10@huawei.com guozhijian@huawei.com zhoufeng54@huawei.com chenzomi12@gmail.com wangkaisheng2@huawei.com huanghui44@huawei.com fary.fanrui@huawei.com xiefangqi2@huawei.com weiluning@huawei.com sunsuodong@huawei.com chenweifeng720@huawei.com jpc.chen@huawei.com 6576637+ms_yan@user.noreply.gitee.com yujianfeng5@huawei.com zhoupeichen@huawei.com 2713219276@qq.com hangangqiang2@huawei.com lichentrue@163.com

README.md

NASNet Example

Description

Requirements

Structure

Parameter Configuration

Running the example

Train

Usage

Launch

Result

Evaluation

Usage

Launch

Result

Contributors (25+) All

Contributors (25+)
All