| @@ -1,23 +1,78 @@ | |||
| # ResNext50 Example | |||
| # Contents | |||
| ## Description | |||
| - [ResNeXt50 Description](#resnext50-description) | |||
| - [Model Architecture](#model-architecture) | |||
| - [Dataset](#dataset) | |||
| - [Features](#features) | |||
| - [Mixed Precision](#mixed-precision) | |||
| - [Environment Requirements](#environment-requirements) | |||
| - [Quick Start](#quick-start) | |||
| - [Script Description](#script-description) | |||
| - [Script and Sample Code](#script-and-sample-code) | |||
| - [Script Parameters](#script-parameters) | |||
| - [Training Process](#training-process) | |||
| - [Evaluation Process](#evaluation-process) | |||
| - [Model Description](#model-description) | |||
| - [Performance](#performance) | |||
| - [Training Performance](#evaluation-performance) | |||
| - [Inference Performance](#evaluation-performance) | |||
| - [Description of Random Situation](#description-of-random-situation) | |||
| - [ModelZoo Homepage](#modelzoo-homepage) | |||
| This is an example of training ResNext50 in MindSpore. | |||
| # [ResNeXt50 Description](#contents) | |||
| ## Requirements | |||
| ResNeXt is a simple, highly modularized network architecture for image classification. It designs results in a homogeneous, multi-branch architecture that has only a few hyper-parameters to set in ResNeXt. This strategy exposes a new dimension, which we call “cardinality” (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width. | |||
| - Install [Mindspore](http://www.mindspore.cn/install/en). | |||
| - Downlaod the dataset. | |||
| [Paper](https://arxiv.org/abs/1611.05431): Xie S, Girshick R, Dollár, Piotr, et al. Aggregated Residual Transformations for Deep Neural Networks. 2016. | |||
| ## Structure | |||
| # [Model architecture](#contents) | |||
| ```shell | |||
| The overall network architecture of ResNeXt is show below: | |||
| [Link](https://arxiv.org/abs/1611.05431) | |||
| # [Dataset](#contents) | |||
| Dataset used: [imagenet](http://www.image-net.org/) | |||
| - Dataset size: ~125G, 1.2W colorful images in 1000 classes | |||
| - Train: 120G, 1.2W images | |||
| - Test: 5G, 50000 images | |||
| - Data format: RGB images. | |||
| - Note: Data will be processed in src/dataset.py | |||
| # [Features](#contents) | |||
| ## [Mixed Precision](#contents) | |||
| The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. | |||
| For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. | |||
| # [Environment Requirements](#contents) | |||
| - Hardware(Ascend/GPU) | |||
| - Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. | |||
| - Framework | |||
| - [MindSpore](http://10.90.67.50/mindspore/archive/20200506/OpenSource/me_vm_x86/) | |||
| - For more information, please check the resources below: | |||
| - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) | |||
| - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html) | |||
| # [Script description](#contents) | |||
| ## [Script and sample code](#contents) | |||
| ```python | |||
| . | |||
| └─resnext50 | |||
| ├─README.md | |||
| ├─scripts | |||
| ├─run_standalone_train.sh # launch standalone training(1p) | |||
| ├─run_distribute_train.sh # launch distributed training(8p) | |||
| ├─run_standalone_train.sh # launch standalone training for ascend(1p) | |||
| ├─run_distribute_train.sh # launch distributed training for ascend(8p) | |||
| ├─run_standalone_train_for_gpu.sh # launch standalone training for gpu(1p) | |||
| ├─run_distribute_train_for_gpu.sh # launch distributed training for gpu(8p) | |||
| └─run_eval.sh # launch evaluating | |||
| ├─src | |||
| ├─backbone | |||
| @@ -44,9 +99,9 @@ This is an example of training ResNext50 in MindSpore. | |||
| ``` | |||
| ## Parameter Configuration | |||
| ## [Script Parameters](#contents) | |||
| Parameters for both training and evaluating can be set in config.py | |||
| Parameters for both training and evaluating can be set in config.py. | |||
| ``` | |||
| "image_height": '224,224' # image size | |||
| @@ -74,17 +129,29 @@ Parameters for both training and evaluating can be set in config.py | |||
| "group_size": 1 # world size of distributed | |||
| ``` | |||
| ## Running the example | |||
| ### Train | |||
| ## [Training Process](#contents) | |||
| #### Usage | |||
| You can start training by python script: | |||
| ``` | |||
| # distribute training example(8p) | |||
| sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH | |||
| # standalone training | |||
| sh run_standalone_train.sh DEVICE_ID DATA_PATH | |||
| python train.py --data_dir ~/imagenet/train/ --platform Ascend --is_distributed 0 | |||
| ``` | |||
| or shell stript: | |||
| ``` | |||
| Ascend: | |||
| # distribute training example(8p) | |||
| sh run_distribute_train.sh RANK_TABLE_FILE DATA_PATH | |||
| # standalone training | |||
| sh run_standalone_train.sh DEVICE_ID DATA_PATH | |||
| GPU: | |||
| # distribute training example(8p) | |||
| sh run_distribute_train_for_gpu.sh DATA_PATH | |||
| # standalone training | |||
| sh run_standalone_train_for_gpu.sh DEVICE_ID DATA_PATH | |||
| ``` | |||
| #### Launch | |||
| @@ -101,13 +168,19 @@ sh scripts/run_distribute_train_for_gpu.sh /dataset/train | |||
| sh scripts/run_standalone_train_for_gpu.sh 0 /dataset/train | |||
| ``` | |||
| #### Result | |||
| You can find checkpoint file together with result in log. | |||
| ### Evaluation | |||
| ## [Evaluation Process](#contents) | |||
| #### Usage | |||
| ### Usage | |||
| You can start training by python script: | |||
| ``` | |||
| python eval.py --data_dir ~/imagenet/val/ --platform Ascend --pretrained resnext.ckpt | |||
| ``` | |||
| or shell stript: | |||
| ``` | |||
| # Evaluation | |||
| @@ -122,8 +195,6 @@ PLATFORM is Ascend or GPU, default is Ascend. | |||
| sh scripts/run_eval.sh 0 /opt/npu/datasets/classification/val /resnext50_100.ckpt Ascend | |||
| ``` | |||
| > checkpoint can be produced in training process. | |||
| #### Result | |||
| Evaluation result will be stored in the scripts path. Under this, you can find result like the followings in log. | |||
| @@ -131,4 +202,45 @@ Evaluation result will be stored in the scripts path. Under this, you can find r | |||
| ``` | |||
| acc=78.16%(TOP1) | |||
| acc=93.88%(TOP5) | |||
| ``` | |||
| ``` | |||
| # [Model description](#contents) | |||
| ## [Performance](#contents) | |||
| ### Training Performance | |||
| | Parameters | ResNeXt50 | | | |||
| | -------------------------- | ---------------------------------------------------------- | ------------------------- | | |||
| | Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G | | |||
| | uploaded Date | 06/30/2020 | 07/23/2020 | | |||
| | MindSpore Version | 0.5.0 | 0.6.0 | | |||
| | Dataset | ImageNet | ImageNet | | |||
| | Training Parameters | src/config.py | src/config.py | | |||
| | Optimizer | Momentum | Momentum | | |||
| | Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy | | |||
| | Loss | 1.76592 | 1.8965 | | |||
| | Accuracy | 78%(TOP1) | 77.8%(TOP1) | | |||
| | Total time | 7.8 h 8ps | 21.5 h 8ps | | |||
| | Checkpoint for Fine tuning | 192 M(.ckpt file) | 192 M(.ckpt file) | | |||
| #### Inference Performance | |||
| | Parameters | | | | | |||
| | -------------------------- | ----------------------------- | ------------------------- | -------------------- | | |||
| | Resource | Huawei 910 | NV SMX2 V100-32G | Huawei 310 | | |||
| | uploaded Date | 06/30/2020 | 07/23/2020 | 07/23/2020 | | |||
| | MindSpore Version | 0.5.0 | 0.6.0 | 0.6.0 | | |||
| | Dataset | ImageNet, 1.2W | ImageNet, 1.2W | ImageNet, 1.2W | | |||
| | batch_size | 1 | 1 | 1 | | |||
| | outputs | probability | probability | probability | | |||
| | Accuracy | acc=78.16%(TOP1) | acc=78.05%(TOP1) | | | |||
| # [Description of Random Situation](#contents) | |||
| In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. | |||
| # [ModelZoo Homepage](#contents) | |||
| Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). | |||