From: @huangbo77 Reviewed-by: Signed-off-by:tags/v1.2.0-rc1
| @@ -15,6 +15,7 @@ In order to facilitate developers to enjoy the benefits of MindSpore framework, | |||
| - [Official](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official) | |||
| - [Computer Vision](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) | |||
| - [Image Classification](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) | |||
| - [DenseNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/densenet/README.md) | |||
| - [GoogleNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet/README.md) | |||
| - [ResNet50[benchmark]](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet/README.md) | |||
| - [ResNet50_Quant](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet50_quant/README.md) | |||
| @@ -15,6 +15,7 @@ | |||
| - [官方](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official) | |||
| - [计算机视觉](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) | |||
| - [图像分类](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv) | |||
| - [DenseNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/densenet/README.md) | |||
| - [GoogleNet](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/googlenet/README.md) | |||
| - [ResNet-50[基准]](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/resnet/README.md) | |||
| - [ResNet50_Quant](https://gitee.com/mindspore/mindspore/blob/master/model_zoo/official/cv/resnet50_quant/README.md) | |||
| @@ -1,6 +1,6 @@ | |||
| # Contents | |||
| - [DenseNet121 Description](#densenet121-description) | |||
| - [DenseNet Description](#densenet-description) | |||
| - [Model Architecture](#model-architecture) | |||
| - [Dataset](#dataset) | |||
| - [Features](#features) | |||
| @@ -16,25 +16,28 @@ | |||
| - [Evaluation Process](#evaluation-process) | |||
| - [Evaluation](#evaluation) | |||
| - [Model Description](#model-description) | |||
| - [Performance](#performance) | |||
| - [Performance](#performance) | |||
| - [Training accuracy results](#training-accuracy-results) | |||
| - [Training performance results](#training-performance-results) | |||
| - [Description of Random Situation](#description-of-random-situation) | |||
| - [ModelZoo Homepage](#modelzoo-homepage) | |||
| # [DenseNet121 Description](#contents) | |||
| # [DenseNet Description](#contents) | |||
| DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet121 is a implementation on [MindSpore](https://www.mindspore.cn/). | |||
| DenseNet is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet is a implementation on [MindSpore](https://www.mindspore.cn/). | |||
| The repository also contains scripts to launch training and inference routines. | |||
| # [Model Architecture](#contents) | |||
| DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers. | |||
| DenseNet supports two kinds of implementations: DenseNet100 and DenseNet121, where the number represents number of layers in the network. | |||
| DenseNet121 builds on 4 densely connected block and DenseNet100 builds on 3. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers. | |||
| # [Dataset](#contents) | |||
| Dataset used: ImageNet | |||
| Dataset used in DenseNet121: ImageNet | |||
| The default configuration of the Dataset are as follows: | |||
| - Training Dataset preprocess: | |||
| @@ -49,11 +52,27 @@ The default configuration of the Dataset are as follows: | |||
| - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center) | |||
| - Normalize the input image with respect to mean and standard deviation | |||
| Dataset used in DenseNet100: Cifar-10 | |||
| The default configuration of the Dataset are as follows: | |||
| - Training Dataset preprocess: | |||
| - Input size of images is 32\*32 | |||
| - Randomly cropping is applied to the image with padding=4 | |||
| - Probability of the image being flipped set to 0.5 | |||
| - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4) | |||
| - Normalize the input image with respect to mean and standard deviation | |||
| - Test Dataset preprocess: | |||
| - Input size of images is 32\*32 | |||
| - Normalize the input image with respect to mean and standard deviation | |||
| # [Features](#contents) | |||
| ## Mixed Precision | |||
| The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. | |||
| For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. | |||
| # [Environment Requirements](#contents) | |||
| @@ -74,15 +93,15 @@ After installing MindSpore via the official website, you can start training and | |||
| ```python | |||
| # run training example | |||
| python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| # run distributed training example | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| # run evaluation example | |||
| python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| OR | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| ``` | |||
| For distributed training, a hccl configuration file with JSON format needs to be created in advance. | |||
| @@ -95,17 +114,19 @@ After installing MindSpore via the official website, you can start training and | |||
| For running on GPU, please change `platform` from `Ascend` to `GPU` | |||
| ```python | |||
| # run training example | |||
| export CUDA_VISIBLE_DEVICES=0 | |||
| python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| # run distributed training example | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] | |||
| # run evaluation example | |||
| python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| OR | |||
| sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] | |||
| sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] | |||
| ``` | |||
| # [Script Description](#contents) | |||
| @@ -114,8 +135,8 @@ After installing MindSpore via the official website, you can start training and | |||
| ```text | |||
| ├── model_zoo | |||
| ├── README.md // descriptions about all the models | |||
| ├── densenet121 | |||
| ├── README.md // descriptions about densenet121 | |||
| ├── densenet | |||
| ├── README.md // descriptions about densenet | |||
| ├── scripts | |||
| │ ├── run_distribute_train.sh // shell script for distributed on Ascend | |||
| │ ├── run_distribute_train_gpu.sh // shell script for distributed on GPU | |||
| @@ -144,9 +165,9 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| ```python | |||
| --data_dir train data dir | |||
| --num_classes num of classes in dataset(default:1000) | |||
| --num_classes num of classes in dataset(default:1000 for densenet121; 10 for densenet100) | |||
| --image_size image size of the dataset | |||
| --per_batch_size mini-batch size (default: 256) per gpu | |||
| --per_batch_size mini-batch size (default: 32 for densenet121; 64 for densenet100) per gpu | |||
| --pretrained path of pretrained model | |||
| --lr_scheduler type of LR schedule: exponential, cosine_annealing | |||
| --lr initial learning rate | |||
| @@ -176,10 +197,10 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| - running on Ascend | |||
| ```python | |||
| python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| ``` | |||
| The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows: | |||
| The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows: | |||
| ```shell | |||
| 2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec | |||
| @@ -195,22 +216,30 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| ```python | |||
| export CUDA_VISIBLE_DEVICES=0 | |||
| python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| ``` | |||
| The python command above will run in the background, you can view the results through the file `train.log`. | |||
| After training, you'll get some checkpoint files under the folder `./ckpt_0/` by default. | |||
| - running on CPU | |||
| ```python | |||
| python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 & | |||
| ``` | |||
| The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. | |||
| ### Distributed Training | |||
| - running on Ascend | |||
| ```bash | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| ``` | |||
| The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows: | |||
| The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows: | |||
| ```log | |||
| 2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec | |||
| @@ -227,7 +256,7 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| ```bash | |||
| cd scripts | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] | |||
| ``` | |||
| The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`. | |||
| @@ -241,14 +270,14 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| running the command below for evaluation. | |||
| ```python | |||
| python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| OR | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| ``` | |||
| The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows: | |||
| The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows: | |||
| ```shell | |||
| ```log | |||
| 2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43% | |||
| 2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60% | |||
| ``` | |||
| @@ -258,27 +287,49 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| running the command below for evaluation. | |||
| ```python | |||
| python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| OR | |||
| sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] | |||
| sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] | |||
| ``` | |||
| The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of the test dataset will be as follows: | |||
| The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows: | |||
| ```shell | |||
| ```log | |||
| 2021-02-04 14:20:50,551:INFO:after allreduce eval: top1_correct=37637, tot=49984, acc=75.30% | |||
| 2021-02-04 14:20:50,551:INFO:after allreduce eval: top5_correct=46370, tot=49984, acc=92.77% | |||
| ``` | |||
| The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows: | |||
| ```log | |||
| 2021-03-12 18:04:07,893:INFO:after allreduce eval: top1_correct=9536, tot=9984, acc=95.51% | |||
| ``` | |||
| - evaluation on CPU | |||
| running the command below for evaluation. | |||
| ```python | |||
| python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='CPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| ``` | |||
| The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows: | |||
| ```log | |||
| 2021-03-18 09:06:43,247:INFO:after allreduce eval: top1_correct=9492, tot=9984, acc=95.07% | |||
| ``` | |||
| # [Model Description](#contents) | |||
| ## [Performance](#contents) | |||
| ### DenseNet121 | |||
| ### Training accuracy results | |||
| | Parameters | Ascend | GPU | | |||
| | ------------------- | --------------------------- | --------------------------- | | |||
| | Model Version | Inception V1 | Inception V1 | | |||
| | Model Version | DenseNet121 | DenseNet121 | | |||
| | Resource | Ascend 910 | Tesla V100-PCIE | | |||
| | Uploaded Date | 09/15/2020 (month/day/year) | 01/27/2021 (month/day/year) | | |||
| | MindSpore Version | 1.0.0 | 1.1.0 | | |||
| @@ -291,7 +342,7 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| | Parameters | Ascend | GPU | | |||
| | ------------------- | --------------------------- | ---------------------------- | | |||
| | Model Version | Inception V1 | Inception V1 | | |||
| | Model Version | DenseNet121 | DenseNet121 | | |||
| | Resource | Ascend 910 | Tesla V100-PCIE | | |||
| | Uploaded Date | 09/15/2020 (month/day/year) | 02/04/2021 (month/day/year) | | |||
| | MindSpore Version | 1.0.0 | 1.1.1 | | |||
| @@ -300,6 +351,23 @@ You can modify the training behaviour through the various flags in the `train.py | |||
| | outputs | probability | probability | | |||
| | speed | 1pc:760 img/s;8pc:6000 img/s| 1pc:161 img/s;8pc:1288 img/s | | |||
| ### DenseNet100 | |||
| ### Training performance | |||
| | Parameters | GPU | | |||
| | ------------------- | ---------------------------- | | |||
| | Model Version | DenseNet100 | | |||
| | Resource | Tesla V100-PCIE | | |||
| | Uploaded Date | 03/18/2021 (month/day/year) | | |||
| | MindSpore Version | 1.2.0 | | |||
| | Dataset | Cifar-10 | | |||
| | batch_size | 64 | | |||
| | epochs | 300 | | |||
| | outputs | probability | | |||
| | accuracy | 95.31% | | |||
| | speed | 1pc: 600.07 img/sec | | |||
| # [Description of Random Situation](#contents) | |||
| In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. | |||
| @@ -3,7 +3,7 @@ | |||
| <!-- TOC --> | |||
| - [目录](#目录) | |||
| - [DenseNet121描述](#densenet121描述) | |||
| - [DenseNet描述](#densenet描述) | |||
| - [模型架构](#模型架构) | |||
| - [数据集](#数据集) | |||
| - [特性](#特性) | |||
| @@ -27,32 +27,50 @@ | |||
| <!-- /TOC --> | |||
| # DenseNet121描述 | |||
| # DenseNet描述 | |||
| DenseNet-121是一个基于卷积的神经网络,用于图像分类。有关该模型的描述,可查阅[此论文](https://arxiv.org/abs/1608.06993)。华为的DenseNet-121是[MindSpore](https://www.mindspore.cn/)上的一个实现。 | |||
| DenseNet是一个基于卷积的神经网络,用于图像分类。有关该模型的描述,可查阅[此论文](https://arxiv.org/abs/1608.06993)。华为的DenseNet是[MindSpore](https://www.mindspore.cn/)上的一个实现。 | |||
| 仓库中还包含用于启动训练和推理例程的脚本。 | |||
| # 模型架构 | |||
| DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都会接受其前面所有层作为其额外的输入,并将自己的特征映射传递给后续所有层。会使用到级联。每一层都从前几层接受“集体知识”。 | |||
| DenseNet模型支持两种模式:DenseNet-100 和DenseNet-121。数字表示网络中包含的卷积层数量。 | |||
| DenseNet-121构建在4个密集连接块上, DenseNet-100则构建在3个密集连接块上。各个密集块中,每个层都会接受其前面所有层作为其额外的输入,并将自己的特征映射传递给后续所有层。会使用到级联。每一层都从前几层接受“集体知识”。 | |||
| # 数据集 | |||
| 使用的数据集: ImageNet | |||
| DenseNet-121使用的数据集: ImageNet | |||
| 数据集的默认配置如下: | |||
| - 训练数据集预处理: | |||
| - 图像的输入尺寸:224\*224 | |||
| - 裁剪的原始尺寸大小范围(最小值,最大值):(0.08, 1.0) | |||
| - 裁剪的宽高比范围(最小值,最大值):(0.75, 1.333) | |||
| - 图像翻转概率:0.5 | |||
| - 随机调节亮度、对比度、饱和度:(0.4, 0.4, 0.4) | |||
| - 根据平均值和标准偏差对输入图像进行归一化 | |||
| - 测试数据集预处理: | |||
| - 图像的输入尺寸:224\*224(将图像缩放到256\*256,然后在中央区域裁剪图像) | |||
| - 根据平均值和标准偏差对输入图像进行归一化 | |||
| DenseNet-100使用的数据集: Cifar-10 | |||
| 数据集的默认配置如下: | |||
| - 训练数据集预处理: | |||
| - 图像的输入尺寸:224\*224 | |||
| - 裁剪的原始尺寸大小范围(最小值,最大值):(0.08, 1.0) | |||
| - 裁剪的宽高比范围(最小值,最大值):(0.75, 1.333) | |||
| - 图像翻转概率:0.5 | |||
| - 随机调节亮度、对比度、饱和度:(0.4, 0.4, 0.4) | |||
| - 根据平均值和标准偏差对输入图像进行归一化 | |||
| - 图像的输入尺寸:32\*32 | |||
| - 随机裁剪的边界填充值:4 | |||
| - 图像翻转概率:0.5 | |||
| - 随机调节亮度、对比度、饱和度:(0.4, 0.4, 0.4) | |||
| - 根据平均值和标准偏差对输入图像进行归一化 | |||
| - 测试数据集预处理: | |||
| - 图像的输入尺寸:224\*224(将图像缩放到256\*256,然后在中央区域裁剪图像) | |||
| - 根据平均值和标准偏差对输入图像进行归一化 | |||
| - 图像的输入尺寸:32\*32 | |||
| - 根据平均值和标准偏差对输入图像进行归一化 | |||
| # 特性 | |||
| @@ -79,15 +97,15 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| ```python | |||
| # 训练示例 | |||
| python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| # 分布式训练示例 | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| # 评估示例 | |||
| python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| OR | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| ``` | |||
| 分布式训练需要提前创建JSON格式的HCCL配置文件。 | |||
| @@ -101,15 +119,15 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| ```python | |||
| # 训练示例 | |||
| export CUDA_VISIBLE_DEVICES=0 | |||
| python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| # 分布式训练示例 | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] | |||
| # 评估示例 | |||
| python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| OR | |||
| sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] | |||
| sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] | |||
| ``` | |||
| # 脚本说明 | |||
| @@ -119,8 +137,8 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| ```shell | |||
| ├── model_zoo | |||
| ├── README.md // 所有模型的说明 | |||
| ├── densenet121 | |||
| ├── README.md // DenseNet-121相关说明 | |||
| ├── densenet | |||
| ├── README.md // DenseNet相关说明 | |||
| ├── scripts | |||
| │ ├── run_distribute_train.sh // Ascend分布式shell脚本 | |||
| │ ├── run_distribute_train_gpu.sh // GPU分布式shell脚本 | |||
| @@ -148,10 +166,10 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| 可通过`train.py`脚本中的参数修改训练行为。`train.py`脚本中的参数如下: | |||
| ```param | |||
| --Data_dir 训练数据目录 | |||
| --num_classes 数据集中的类个数(默认为1000) | |||
| --data_dir 训练数据目录 | |||
| --num_classes 数据集中的类个数(DenseNet-121中默认为1000,DenseNet-100中默认为10) | |||
| --image_size 数据集图片大小 | |||
| --per_batch_size 每GPU的迷你批次大小(默认为256) | |||
| --per_batch_size 每GPU的迷你批次大小(DenseNet-121中默认为32, DenseNet-100中默认为64) | |||
| --pretrained 预训练模型的路径 | |||
| --lr_scheduler LR调度类型,取值包括 exponential,cosine_annealing | |||
| --lr 初始学习率 | |||
| @@ -181,10 +199,10 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| - Ascend处理器环境运行 | |||
| ```python | |||
| python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 & | |||
| ``` | |||
| 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。损失值的实现如下: | |||
| 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下: | |||
| ```log | |||
| 2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec | |||
| @@ -200,7 +218,15 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| ```python | |||
| export CUDA_VISIBLE_DEVICES=0 | |||
| python train.py --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 & | |||
| ``` | |||
| 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。 | |||
| - CPU处理器环境运行 | |||
| ```python | |||
| python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 & | |||
| ``` | |||
| 以上python命令在后台运行,在`output/202x-xx-xx_time_xx_xx/`目录下生成日志和模型检查点。 | |||
| @@ -210,10 +236,10 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| - Ascend处理器环境运行 | |||
| ```shell | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT | |||
| ``` | |||
| 上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。损失值的实现如下: | |||
| 上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。在ImageNet数据集上训练DenseNet-121的损失值的实现如下: | |||
| ```log | |||
| 2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec | |||
| @@ -230,7 +256,7 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| ```bash | |||
| cd scripts | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [DATASET_PATH] | |||
| sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH] | |||
| ``` | |||
| 上述shell脚本将在后台进行分布式训练。可以通过文件`train[X]/output/202x-xx-xx_time_xx_xx_xx/`查看结果日志和模型检查点。 | |||
| @@ -244,12 +270,12 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| 运行以下命令进行评估。 | |||
| ```eval | |||
| python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 & | |||
| OR | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT | |||
| ``` | |||
| 上述python命令在后台运行。可以通过“output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log”文件查看结果。测试数据集的准确率如下: | |||
| 上述python命令在后台运行。可以通过“output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log”文件查看结果。DenseNet-121在ImageNet的测试数据集的准确率如下: | |||
| ```log | |||
| 2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43% | |||
| @@ -261,27 +287,49 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| 运行以下命令进行评估。 | |||
| ```eval | |||
| python eval.py --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| OR | |||
| sh run_distribute_eval_gpu.sh 1 0 [DATASET_PATH] [CHECKPOINT_PATH] | |||
| sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH] | |||
| ``` | |||
| 上述python命令在后台运行。可以通过“eval/eval.log”文件查看结果。测试数据集的准确率如下: | |||
| 上述python命令在后台运行。可以通过“eval/eval.log”文件查看结果。DenseNet-121在ImageNet的测试数据集的准确率如下: | |||
| ```log | |||
| 2021-02-04 14:20:50,551:INFO:after allreduce eval: top1_correct=37637, tot=49984, acc=75.30% | |||
| 2021-02-04 14:20:50,551:INFO:after allreduce eval: top5_correct=46370, tot=49984, acc=92.77% | |||
| ``` | |||
| DenseNet-100在Cifar-10的测试数据集的准确率如下: | |||
| ```log | |||
| 2021-03-12 18:04:07,893:INFO:after allreduce eval: top1_correct=9536, tot=9984, acc=95.51% | |||
| ``` | |||
| - CPU处理器环境 | |||
| 运行以下命令进行评估。 | |||
| ```eval | |||
| python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='CPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 & | |||
| ``` | |||
| 上述python命令在后台运行。可以通过“eval/eval.log”文件查看结果。DenseNet-100在Cifar-10的测试数据集的准确率如下: | |||
| ```log | |||
| 2021-03-18 09:06:43,247:INFO:after allreduce eval: top1_correct=9492, tot=9984, acc=95.07% | |||
| ``` | |||
| # 模型描述 | |||
| ## 性能 | |||
| ### DenseNet121 | |||
| ### 训练准确率结果 | |||
| | 参数 | Ascend | GPU | | |||
| | ------------------- | -------------------------- | -------------------------- | | |||
| | 模型版本 | Inception V1 | Inception V1 | | |||
| | 模型版本 | DenseNet-121 | DenseNet-121 | | |||
| | 资源 | Ascend 910 | Tesla V100-PCIE | | |||
| | 上传日期 | 2020/9/15 | 2021/2/4 | | |||
| | MindSpore版本 | 1.0.0 | 1.1.1 | | |||
| @@ -294,7 +342,7 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| | 参数 | Ascend | GPU | | |||
| | ------------------- | -------------------------------- | -------------------------------- | | |||
| | 模型版本 | Inception V1 | Inception V1 | | |||
| | 模型版本 | DenseNet-121 | DenseNet-121 | | |||
| | 资源 | Ascend 910 | Tesla V100-PCIE | | |||
| | 上传日期 | 2020/9/15 | 2021/2/4 | | |||
| | MindSpore版本 | 1.0.0 | 1.1.1 | | |||
| @@ -303,6 +351,23 @@ DenseNet-121构建在4个密集连接块上。各个密集块中,每个层都 | |||
| | 输出 | 概率 | 概率 | | |||
| | 速度 | 单卡:760 img/s;8卡:6000 img/s | 单卡:161 img/s;8卡:1288 img/s | | |||
| ### DenseNet100 | |||
| ### 训练结果 | |||
| | 参数 | GPU | | |||
| | ------------------- | -------------------------------- | | |||
| | 模型版本 | DenseNet-100 | | |||
| | 资源 | Tesla V100-PCIE | | |||
| | 上传日期 | 2021/03/18 | | |||
| | MindSpore版本 | 1.2.0 | | |||
| | 数据集 | Cifar-10 | | |||
| | 轮次 | 300 | | |||
| | batch_size | 64 | | |||
| | 输出 | 概率 | | |||
| | 训练性能 | Top1:95.28% | | |||
| | 速度 | 单卡:600.07 img/sec | | |||
| # 随机情况说明 | |||
| dataset.py中设置了“create_dataset”函数内的种子,同时还使用了train.py中的随机种子。 | |||
| @@ -310,4 +375,3 @@ dataset.py中设置了“create_dataset”函数内的种子,同时还使用 | |||
| # ModelZoo主页 | |||
| 请浏览官网[主页](https://gitee.com/mindspore/mindspore/tree/master/model_zoo)。 | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -15,7 +15,7 @@ | |||
| """ | |||
| ##############test densenet example################# | |||
| python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT | |||
| python eval.py --net densenet121 --dataset imagenet --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT | |||
| """ | |||
| import os | |||
| @@ -34,10 +34,6 @@ from mindspore.ops import functional as F | |||
| from mindspore.common import dtype as mstype | |||
| from src.utils.logging import get_logger | |||
| from src.datasets import classification_dataset | |||
| from src.network import DenseNet121 | |||
| from src.config import config | |||
| class ParameterReduce(nn.Cell): | |||
| """ | |||
| @@ -61,10 +57,13 @@ def parse_args(cloud_args=None): | |||
| """ | |||
| parser = argparse.ArgumentParser('mindspore classification test') | |||
| # network and dataset choices | |||
| parser.add_argument('--net', type=str, default='', help='Densenet Model, densenet100 or densenet121') | |||
| parser.add_argument('--dataset', type=str, default='', help='Dataset, either cifar10 or imagenet') | |||
| # dataset related | |||
| parser.add_argument('--data_dir', type=str, default='', help='eval data dir') | |||
| parser.add_argument('--num_classes', type=int, default=1000, help='num of classes in dataset') | |||
| parser.add_argument('--image_size', type=str, default='224,224', help='image size of the dataset') | |||
| # network related | |||
| parser.add_argument('--backbone', default='resnet50', help='backbone') | |||
| parser.add_argument('--pretrained', default='', type=str, help='fully path of pretrained model to load.' | |||
| @@ -80,12 +79,21 @@ def parse_args(cloud_args=None): | |||
| parser.add_argument('--train_url', type=str, default="", help='train url') | |||
| # platform | |||
| parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU'), help='device target') | |||
| parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU', 'CPU'), | |||
| help='device target') | |||
| args, _ = parser.parse_known_args() | |||
| args = merge_args(args, cloud_args) | |||
| if args.net == "densenet100": | |||
| from src.config import config_100 as config | |||
| else: | |||
| from src.config import config_121 as config | |||
| args.per_batch_size = config.per_batch_size | |||
| args.image_size = config.image_size | |||
| args.num_classes = config.num_classes | |||
| args.image_size = list(map(int, args.image_size.split(','))) | |||
| return args | |||
| @@ -151,7 +159,8 @@ def generate_results(model, rank, group_size, top1_correct, top5_correct, img_to | |||
| def test(cloud_args=None): | |||
| """ | |||
| network eval function. Get top1 and top5 ACC from classification. | |||
| network eval function. Get top1 and top5 ACC from classification for imagenet, | |||
| and top1 ACC for cifar10. | |||
| The result will be save at [./outputs] by default. | |||
| """ | |||
| args = parse_args(cloud_args) | |||
| @@ -185,13 +194,23 @@ def test(cloud_args=None): | |||
| else: | |||
| args.models = [args.pretrained,] | |||
| if args.net == "densenet100": | |||
| from src.network.densenet import DenseNet100 as DenseNet | |||
| else: | |||
| from src.network.densenet import DenseNet121 as DenseNet | |||
| if args.dataset == "cifar10": | |||
| from src.datasets import classification_dataset_cifar10 as classification_dataset | |||
| else: | |||
| from src.datasets import classification_dataset_imagenet as classification_dataset | |||
| for model in args.models: | |||
| de_dataset = classification_dataset(args.data_dir, image_size=args.image_size, | |||
| per_batch_size=args.per_batch_size, | |||
| max_epoch=1, rank=args.rank, group_size=args.group_size, | |||
| mode='eval') | |||
| eval_dataloader = de_dataset.create_tuple_iterator() | |||
| network = DenseNet121(args.num_classes) | |||
| network = DenseNet(args.num_classes) | |||
| param_dict = load_checkpoint(model) | |||
| param_dict_new = {} | |||
| @@ -240,15 +259,13 @@ def test(cloud_args=None): | |||
| img_tot = results[2, 0] | |||
| acc1 = 100.0 * top1_correct / img_tot | |||
| acc5 = 100.0 * top5_correct / img_tot | |||
| args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct, | |||
| img_tot, | |||
| args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct, img_tot, | |||
| acc1)) | |||
| args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct, | |||
| img_tot, | |||
| acc5)) | |||
| if args.dataset == 'imagenet': | |||
| args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct, img_tot, | |||
| acc5)) | |||
| if args.is_distributed: | |||
| release() | |||
| if __name__ == "__main__": | |||
| test() | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -20,14 +20,13 @@ from mindspore.common import dtype as mstype | |||
| from mindspore import context, Tensor | |||
| from mindspore.train.serialization import export, load_checkpoint, load_param_into_net | |||
| from src.network import DenseNet121 | |||
| from src.config import config | |||
| parser = argparse.ArgumentParser(description="densenet export") | |||
| parser = argparse.ArgumentParser(description="densenet121 export") | |||
| parser.add_argument("--net", type=str, default='', help="Densenet Model, densenet100 or densenet121") | |||
| parser.add_argument("--device_id", type=int, default=0, help="Device id") | |||
| parser.add_argument("--batch_size", type=int, default=32, help="batch size") | |||
| parser.add_argument("--ckpt_file", type=str, required=True, help="Checkpoint file path.") | |||
| parser.add_argument("--file_name", type=str, default="densenet121", help="output file name.") | |||
| parser.add_argument("--file_name", type=str, default="densenet", help="output file name.") | |||
| parser.add_argument("--file_format", type=str, choices=["AIR", "ONNX", "MINDIR"], default="AIR", help="file format") | |||
| parser.add_argument("--device_target", type=str, choices=["Ascend", "GPU", "CPU"], default="Ascend", | |||
| help="device target") | |||
| @@ -37,8 +36,15 @@ context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target) | |||
| if args.device_target == "Ascend": | |||
| context.set_context(device_id=args.device_id) | |||
| if args.net == "densenet100": | |||
| from src.config import config_100 as config | |||
| from src.network.densenet import DenseNet100 as DenseNet | |||
| else: | |||
| from src.config import config_121 as config | |||
| from src.network.densenet import DenseNet121 as DenseNet | |||
| if __name__ == "__main__": | |||
| network = DenseNet121(config.num_classes) | |||
| network = DenseNet(config.num_classes) | |||
| param_dict = load_checkpoint(args.ckpt_file) | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -13,9 +13,11 @@ | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| """hub config.""" | |||
| from src.network import DenseNet121 | |||
| from src.network import DenseNet121, DenseNet100 | |||
| def create_network(name, *args, **kwargs): | |||
| if name == 'densenet121': | |||
| return DenseNet121(*args, **kwargs) | |||
| if name == 'densenet100': | |||
| return DenseNet100(*args, **kwargs) | |||
| raise NotImplementedError(f"{name} is not implemented in the repo") | |||
| @@ -1,5 +1,5 @@ | |||
| #!/bin/bash | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -16,8 +16,8 @@ | |||
| echo "==============================================================================================================" | |||
| echo "Please run the script as: " | |||
| echo "sh scripts/run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_PATH" | |||
| echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json /path/to/dataset /path/to/ckpt" | |||
| echo "sh scripts/run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE NET_NAME DATASET_NAME DATASET CKPT_PATH" | |||
| echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json densenet121 imagenet /path/to/dataset /path/to/ckpt" | |||
| echo "It is better to use absolute path." | |||
| echo "=================================================================================================================" | |||
| @@ -25,8 +25,10 @@ echo "After running the script, the network runs in the background. The log will | |||
| export RANK_SIZE=$1 | |||
| export RANK_TABLE_FILE=$2 | |||
| DATASET=$3 | |||
| CKPT_PATH=$4 | |||
| NET_NAME=$3 | |||
| DATASET_NAME=$4 | |||
| DATASET=$5 | |||
| CKPT_PATH=$6 | |||
| for((i=0;i<RANK_SIZE;i++)) | |||
| do | |||
| @@ -40,9 +42,10 @@ do | |||
| echo "start inferring for rank $i, device $DEVICE_ID" | |||
| env > env.log | |||
| python eval.py \ | |||
| --net=$NET_NAME \ | |||
| --dataset=$DATASET_NAME \ | |||
| --data_dir=$DATASET \ | |||
| --pretrained=$CKPT_PATH > log.txt 2>&1 & | |||
| cd ../ | |||
| done | |||
| @@ -14,26 +14,26 @@ | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| if [ $# -lt 4 ] | |||
| if [ $# -lt 6 ] | |||
| then | |||
| echo "Usage: sh run_distribute_eval_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [CHECKPOINT_PATH]" | |||
| exit 1 | |||
| echo "Usage: sh run_distribute_eval_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]" | |||
| exit 1 | |||
| fi | |||
| if [ $1 -lt 1 ] && [ $1 -gt 8 ] | |||
| then | |||
| echo "error: DEVICE_NUM=$1 is not in (1-8)" | |||
| exit 1 | |||
| exit 1 | |||
| fi | |||
| export DEVICE_NUM=$1 | |||
| export RANK_SIZE=$1 | |||
| # check checkpoint file | |||
| if [ ! -f $4 ] | |||
| if [ ! -f $6 ] | |||
| then | |||
| echo "error: CHECKPOINT_PATH=$4 is not a file" | |||
| exit 1 | |||
| echo "error: CHECKPOINT_PATH=$6 is not a file" | |||
| exit 1 | |||
| fi | |||
| BASEPATH=$(cd "`dirname $0`" || exit; pwd) | |||
| @@ -51,13 +51,17 @@ export CUDA_VISIBLE_DEVICES="$2" | |||
| if [ $1 -gt 1 ] | |||
| then | |||
| mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../eval.py \ | |||
| --data_dir=$3 \ | |||
| --net=$3 \ | |||
| --dataset=$4 \ | |||
| --data_dir=$5 \ | |||
| --device_target='GPU' \ | |||
| --pretrained=$4 > eval.log 2>&1 & | |||
| --pretrained=$6 > eval.log 2>&1 & | |||
| else | |||
| python3 ${BASEPATH}/../eval.py \ | |||
| --data_dir=$3 \ | |||
| --net=$3 \ | |||
| --dataset=$4 \ | |||
| --data_dir=$5 \ | |||
| --device_target='GPU' \ | |||
| --pretrained=$4 > eval.log 2>&1 & | |||
| --pretrained=$6 > eval.log 2>&1 & | |||
| fi | |||
| @@ -1,5 +1,5 @@ | |||
| #!/bin/bash | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -16,8 +16,8 @@ | |||
| echo "==============================================================================================================" | |||
| echo "Please run the script as: " | |||
| echo "sh scripts/run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_FILE" | |||
| echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json /path/to/dataset ckpt_file" | |||
| echo "sh scripts/run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE NET_NAME DATASET_NAME DATASET CKPT_FILE" | |||
| echo "for example: sh scripts/run_distribute_train.sh 8 /data/hccl.json densenet121 imagenet /path/to/dataset ckpt_file" | |||
| echo "It is better to use absolute path." | |||
| echo "=================================================================================================================" | |||
| @@ -25,8 +25,10 @@ echo "After running the script, the network runs in the background. The log will | |||
| export RANK_SIZE=$1 | |||
| export RANK_TABLE_FILE=$2 | |||
| DATASET=$3 | |||
| CKPT_FILE=$4 | |||
| NET_NAME=$3 | |||
| DATASET_NAME=$4 | |||
| DATASET=$5 | |||
| CKPT_FILE=$6 | |||
| for((i=0;i<RANK_SIZE;i++)) | |||
| do | |||
| @@ -41,9 +43,9 @@ do | |||
| env > env.log | |||
| if [ -f $CKPT_FILE ] | |||
| then | |||
| python train.py --data_dir=$DATASET --pretrained=$CKPT_FILE > log.txt 2>&1 & | |||
| python train.py --net=$NET_NAME --dataset=$DATASET_NAME --data_dir=$DATASET --pretrained=$CKPT_FILE > log.txt 2>&1 & | |||
| else | |||
| python train.py --data_dir=$DATASET > log.txt 2>&1 & | |||
| python train.py --net=$NET_NAME --dataset=$DATASET_NAME --data_dir=$DATASET > log.txt 2>&1 & | |||
| fi | |||
| cd ../ | |||
| @@ -14,16 +14,16 @@ | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| if [ $# -lt 3 ] | |||
| if [ $# -lt 5 ] | |||
| then | |||
| echo "Usage: sh run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [DATASET_PATH] [PRE_TRAINED](optional)" | |||
| exit 1 | |||
| echo "Usage: sh run_distribute_train_gpu.sh [DEVICE_NUM] [VISIABLE_DEVICES(0,1,2,3,4,5,6,7)] [NET_NAME] [DATASET_NAME] [DATASET_PATH] [PRE_TRAINED](optional)" | |||
| exit 1 | |||
| fi | |||
| if [ $1 -lt 1 ] && [ $1 -gt 8 ] | |||
| then | |||
| echo "error: DEVICE_NUM=$1 is not in (1-8)" | |||
| exit 1 | |||
| exit 1 | |||
| fi | |||
| export DEVICE_NUM=$1 | |||
| @@ -40,30 +40,38 @@ cd ../train || exit | |||
| export CUDA_VISIBLE_DEVICES="$2" | |||
| if [ -f $4 ] # pretrained ckpt | |||
| then | |||
| if [ -f $6 ] # pretrained ckpt | |||
| then | |||
| if [ $1 -gt 1 ] | |||
| then | |||
| mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../train.py \ | |||
| --data_dir=$3 \ | |||
| --net=$3 \ | |||
| --dataset=$4 \ | |||
| --data_dir=$5 \ | |||
| --device_target='GPU' \ | |||
| --pretrained=$4 > train.log 2>&1 & | |||
| --pretrained=$6 > train.log 2>&1 & | |||
| else | |||
| python3 ${BASEPATH}/../train.py \ | |||
| --data_dir=$3 \ | |||
| --net=$3 \ | |||
| --dataset=$4 \ | |||
| --data_dir=$5 \ | |||
| --is_distributed=0 \ | |||
| --device_target='GPU' \ | |||
| --pretrained=$4 > train.log 2>&1 & | |||
| --pretrained=$6 > train.log 2>&1 & | |||
| fi | |||
| else | |||
| if [ $1 -gt 1 ] | |||
| then | |||
| mpirun -n $1 --allow-run-as-root python3 ${BASEPATH}/../train.py \ | |||
| --data_dir=$3 \ | |||
| --net=$3 \ | |||
| --dataset=$4 \ | |||
| --data_dir=$5 \ | |||
| --device_target='GPU' > train.log 2>&1 & | |||
| else | |||
| python3 ${BASEPATH}/../train.py \ | |||
| --data_dir=$3 \ | |||
| --net=$3 \ | |||
| --dataset=$4 \ | |||
| --data_dir=$5 \ | |||
| --is_distributed=0 \ | |||
| --device_target='GPU' > train.log 2>&1 & | |||
| fi | |||
| @@ -0,0 +1,46 @@ | |||
| #!/bin/bash | |||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| # You may obtain a copy of the License at | |||
| # | |||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||
| # | |||
| # Unless required by applicable law or agreed to in writing, software | |||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| # See the License for the specific language governing permissions and | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| if [ $# -lt 4 ] | |||
| then | |||
| echo "Usage: sh run_eval_cpu.sh [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]" | |||
| exit 1 | |||
| fi | |||
| # check checkpoint file | |||
| if [ ! -f $4 ] | |||
| then | |||
| echo "error: CHECKPOINT_PATH=$4 is not a file" | |||
| exit 1 | |||
| fi | |||
| BASEPATH=$(cd "`dirname $0`" || exit; pwd) | |||
| export PYTHONPATH=${BASEPATH}:$PYTHONPATH | |||
| if [ -d "../eval" ]; | |||
| then | |||
| rm -rf ../eval | |||
| fi | |||
| mkdir ../eval | |||
| cd ../eval || exit | |||
| python ${BASEPATH}/../eval.py \ | |||
| --net=$1 \ | |||
| --dataset=$2 \ | |||
| --data_dir=$3 \ | |||
| --device_target='CPU' \ | |||
| --is_distributed=0 \ | |||
| --pretrained=$4 > eval.log 2>&1 & | |||
| @@ -0,0 +1,49 @@ | |||
| #!/bin/bash | |||
| # Copyright 2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| # You may obtain a copy of the License at | |||
| # | |||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||
| # | |||
| # Unless required by applicable law or agreed to in writing, software | |||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
| # See the License for the specific language governing permissions and | |||
| # limitations under the License. | |||
| # ============================================================================ | |||
| if [ $# -lt 3 ] | |||
| then | |||
| echo "Usage: sh run_train_cpu.sh [NET_NAME] [DATASET_NAME] [DATASET_PATH] [PRE_TRAINED](optional)" | |||
| exit 1 | |||
| fi | |||
| BASEPATH=$(cd "`dirname $0`" || exit; pwd) | |||
| export PYTHONPATH=${BASEPATH}:$PYTHONPATH | |||
| if [ -d "../train" ]; | |||
| then | |||
| rm -rf ../train | |||
| fi | |||
| mkdir ../train | |||
| cd ../train || exit | |||
| if [ -f $4 ] # pretrained ckpt | |||
| then | |||
| python ${BASEPATH}/../train.py \ | |||
| --net=$1 \ | |||
| --dataset=$2 \ | |||
| --data_dir=$3 \ | |||
| --is_distributed=0 \ | |||
| --device_target='CPU' \ | |||
| --pretrained=$4 > train.log 2>&1 & | |||
| else | |||
| python ${BASEPATH}/../train.py \ | |||
| --net=$1 \ | |||
| --dataset=$2 \ | |||
| --data_dir=$3 \ | |||
| --is_distributed=0 \ | |||
| --device_target='CPU' > train.log 2>&1 & | |||
| fi | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -15,7 +15,39 @@ | |||
| """config""" | |||
| from easydict import EasyDict as ed | |||
| config = ed({ | |||
| #config for densenet100 and cifar10 | |||
| config_100 = ed({ | |||
| "image_size": '32, 32', | |||
| "num_classes": 10, | |||
| "lr": 0.1, | |||
| "lr_scheduler": 'exponential', | |||
| "lr_epochs": '150, 225, 300', | |||
| "lr_gamma": 0.1, | |||
| "eta_min": 0, | |||
| "T_max": 120, | |||
| "max_epoch": 300, | |||
| "per_batch_size": 64, | |||
| "warmup_epochs": 0, | |||
| "weight_decay": 0.0001, | |||
| "momentum": 0.9, | |||
| "is_dynamic_loss_scale": 0, | |||
| "loss_scale": 1024, | |||
| "label_smooth": 0, | |||
| "label_smooth_factor": 0.1, | |||
| "log_interval": 100, | |||
| "ckpt_interval": 3124, | |||
| "ckpt_path": 'outputs_cifar10/', | |||
| "is_save_on_master": 1, | |||
| "rank": 0, | |||
| "group_size": 1 | |||
| }) | |||
| # config for densenet121 and imagenet | |||
| config_121 = ed({ | |||
| "image_size": '224,224', | |||
| "num_classes": 1000, | |||
| @@ -38,7 +70,7 @@ config = ed({ | |||
| "log_interval": 100, | |||
| "ckpt_interval": 50000, | |||
| "ckpt_path": 'outputs/', | |||
| "ckpt_path": 'outputs_imagenet/', | |||
| "is_save_on_master": 1, | |||
| "rank": 0, | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -17,6 +17,6 @@ | |||
| read dataset for classification | |||
| """ | |||
| from .classification import classification_dataset | |||
| from .classification import classification_dataset_cifar10, classification_dataset_imagenet | |||
| __all__ = ["classification_dataset"] | |||
| __all__ = ["classification_dataset_cifar10", "classification_dataset_imagenet"] | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -50,17 +50,10 @@ class TxtDataset(): | |||
| return len(self.imgs) | |||
| def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, | |||
| mode='train', | |||
| input_mode='folder', | |||
| root='', | |||
| num_parallel_workers=None, | |||
| shuffle=None, | |||
| sampler=None, | |||
| class_indexing=None, | |||
| drop_remainder=True, | |||
| transform=None, | |||
| target_transform=None): | |||
| def classification_dataset_imagenet(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, mode='train', | |||
| input_mode='folder', root='', num_parallel_workers=None, shuffle=None, | |||
| sampler=None, class_indexing=None, drop_remainder=True, transform=None, | |||
| target_transform=None): | |||
| """ | |||
| A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt". | |||
| If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images | |||
| @@ -88,7 +81,7 @@ def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank | |||
| unique index starting from 0). | |||
| Examples: | |||
| >>> from src.datasets.classification import classification_dataset | |||
| >>> from src.datasets.classification import classification_dataset_imagenet | |||
| >>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images | |||
| >>> data_dir = "/path/to/imagefolder_directory" | |||
| >>> de_dataset = classification_dataset(data_dir, image_size=[224, 244], | |||
| @@ -152,3 +145,77 @@ def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank | |||
| de_dataset = de_dataset.repeat(1) | |||
| return de_dataset | |||
| def classification_dataset_cifar10(data_dir, image_size, per_batch_size, max_epoch, rank, group_size, mode='train', | |||
| num_parallel_workers=None, shuffle=None, sampler=None, drop_remainder=True, | |||
| transform=None, target_transform=None): | |||
| """ | |||
| A function that returns cifar10 dataset for classification. | |||
| Args: | |||
| data_dir (str): Path to the root directory that contains the dataset's bin files. | |||
| image_size (Union(int, sequence)): Size of the input images. | |||
| per_batch_size (int): the batch size of evey step during training. | |||
| max_epoch (int): the number of epochs. | |||
| rank (int): The shard ID within num_shards (default=None). | |||
| group_size (int): Number of shards that the dataset should be divided | |||
| into (default=None). | |||
| mode (str): "train" or others. Default: " train". | |||
| input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder". | |||
| root (str): the images path for "input_mode="txt"". Default: " ". | |||
| num_parallel_workers (int): Number of workers to read the data. Default: None. | |||
| shuffle (bool): Whether or not to perform shuffle on the dataset | |||
| (default=None, performs shuffle). | |||
| sampler (Sampler): Object used to choose samples from the dataset. Default: None. | |||
| Examples: | |||
| >>> from src.datasets.classification import classification_dataset_cifar10 | |||
| >>> # path to imagefolder directory. This directory needs to contain bin files of data. | |||
| >>> data_dir = "/path/to/datafolder_directory" | |||
| >>> de_dataset = classification_dataset_cifar10(data_dir, image_size=[32, 32], | |||
| >>> per_batch_size=64, max_epoch=100, | |||
| >>> rank=0, group_size=1) | |||
| """ | |||
| mean = [0.5 * 255, 0.5 * 255, 0.5 * 255] | |||
| std = [0.5 * 255, 0.5 * 255, 0.5 * 255] | |||
| if transform is None: | |||
| if mode == 'train': | |||
| transform_img = [ | |||
| vision_C.RandomCrop(image_size, padding=4), | |||
| vision_C.RandomHorizontalFlip(prob=0.5), | |||
| vision_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4), | |||
| vision_C.Normalize(mean=mean, std=std), | |||
| vision_C.HWC2CHW() | |||
| ] | |||
| else: | |||
| transform_img = [ | |||
| vision_C.Normalize(mean=mean, std=std), | |||
| vision_C.HWC2CHW() | |||
| ] | |||
| else: | |||
| transform_img = transform | |||
| if target_transform is None: | |||
| transform_label = [ | |||
| normal_C.TypeCast(mstype.int32) | |||
| ] | |||
| else: | |||
| transform_label = target_transform | |||
| de_dataset = de.Cifar10Dataset(data_dir, num_parallel_workers=num_parallel_workers, shuffle=shuffle, | |||
| sampler=sampler, num_shards=group_size, | |||
| shard_id=rank) | |||
| de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img) | |||
| de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label) | |||
| columns_to_project = ["image", "label"] | |||
| de_dataset = de_dataset.project(columns=columns_to_project) | |||
| de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder) | |||
| de_dataset = de_dataset.repeat(1) | |||
| return de_dataset | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -15,4 +15,4 @@ | |||
| """ | |||
| densenet network | |||
| """ | |||
| from .densenet import DenseNet121 | |||
| from .densenet import DenseNet121, DenseNet100 | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -25,7 +25,7 @@ from mindspore.ops import operations as P | |||
| from mindspore.common import initializer as init | |||
| from src.utils.var_init import default_recurisive_init, KaimingNormal | |||
| __all__ = ["DenseNet121"] | |||
| __all__ = ["DenseNet121", "DenseNet100"] | |||
| class GlobalAvgPooling(nn.Cell): | |||
| """ | |||
| @@ -123,13 +123,17 @@ class _Transition(nn.Cell): | |||
| """ | |||
| the transition layer | |||
| """ | |||
| def __init__(self, num_input_features, num_output_features): | |||
| def __init__(self, num_input_features, num_output_features, avgpool=False): | |||
| super(_Transition, self).__init__() | |||
| if avgpool: | |||
| poollayer = nn.AvgPool2d(kernel_size=2, stride=2) | |||
| else: | |||
| poollayer = nn.MaxPool2d(kernel_size=2, stride=2) | |||
| self.features = nn.SequentialCell(OrderedDict([ | |||
| ('norm', nn.BatchNorm2d(num_input_features)), | |||
| ('relu', nn.ReLU()), | |||
| ('conv', conv1x1(num_input_features, num_output_features)), | |||
| ('pool', nn.MaxPool2d(kernel_size=2, stride=2)) | |||
| ('pool', poollayer) | |||
| ])) | |||
| def construct(self, x): | |||
| @@ -142,17 +146,23 @@ class Densenet(nn.Cell): | |||
| """ | |||
| __constants__ = ['features'] | |||
| def __init__(self, growth_rate, block_config, num_init_features, bn_size=4, drop_rate=0): | |||
| def __init__(self, growth_rate, block_config, num_init_features=None, bn_size=4, drop_rate=0): | |||
| super(Densenet, self).__init__() | |||
| layers = OrderedDict() | |||
| layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3) | |||
| layers['norm0'] = nn.BatchNorm2d(num_init_features) | |||
| layers['relu0'] = nn.ReLU() | |||
| layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') | |||
| if num_init_features: | |||
| layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3) | |||
| layers['norm0'] = nn.BatchNorm2d(num_init_features) | |||
| layers['relu0'] = nn.ReLU() | |||
| layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same') | |||
| num_features = num_init_features | |||
| else: | |||
| layers['conv0'] = conv3x3(3, growth_rate*2, stride=1, padding=1) | |||
| layers['norm0'] = nn.BatchNorm2d(growth_rate*2) | |||
| layers['relu0'] = nn.ReLU() | |||
| num_features = growth_rate * 2 | |||
| # Each denseblock | |||
| num_features = num_init_features | |||
| for i, num_layers in enumerate(block_config): | |||
| block = _DenseBlock( | |||
| num_layers=num_layers, | |||
| @@ -165,8 +175,12 @@ class Densenet(nn.Cell): | |||
| num_features = num_features + num_layers*growth_rate | |||
| if i != len(block_config)-1: | |||
| trans = _Transition(num_input_features=num_features, | |||
| num_output_features=num_features // 2) | |||
| if num_init_features: | |||
| trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2, | |||
| avgpool=False) | |||
| else: | |||
| trans = _Transition(num_input_features=num_features, num_output_features=num_features // 2, | |||
| avgpool=True) | |||
| layers['transition%d'%(i+1)] = trans | |||
| num_features = num_features // 2 | |||
| @@ -184,6 +198,11 @@ class Densenet(nn.Cell): | |||
| def get_out_channels(self): | |||
| return self.out_channels | |||
| def _densenet100(**kwargs): | |||
| return Densenet(growth_rate=12, block_config=(16, 16, 16), **kwargs) | |||
| def _densenet121(**kwargs): | |||
| return Densenet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, **kwargs) | |||
| @@ -200,6 +219,38 @@ def _densenet201(**kwargs): | |||
| return Densenet(growth_rate=32, block_config=(6, 12, 48, 32), num_init_features=64, **kwargs) | |||
| class DenseNet100(nn.Cell): | |||
| """ | |||
| the densenet100 architecture | |||
| """ | |||
| def __init__(self, num_classes, include_top=True): | |||
| super(DenseNet100, self).__init__() | |||
| self.backbone = _densenet100() | |||
| out_channels = self.backbone.get_out_channels() | |||
| self.include_top = include_top | |||
| if self.include_top: | |||
| self.head = CommonHead(num_classes, out_channels) | |||
| default_recurisive_init(self) | |||
| for _, cell in self.cells_and_names(): | |||
| if isinstance(cell, nn.Conv2d): | |||
| cell.weight.set_data(init.initializer(KaimingNormal(a=math.sqrt(5), mode='fan_out', | |||
| nonlinearity='relu'), | |||
| cell.weight.shape, | |||
| cell.weight.dtype)) | |||
| elif isinstance(cell, nn.BatchNorm2d): | |||
| cell.gamma.set_data(init.initializer('ones', cell.gamma.shape)) | |||
| cell.beta.set_data(init.initializer('zeros', cell.beta.shape)) | |||
| elif isinstance(cell, nn.Dense): | |||
| cell.bias.set_data(init.initializer('zeros', cell.bias.shape)) | |||
| def construct(self, x): | |||
| x = self.backbone(x) | |||
| if not self.include_top: | |||
| return x | |||
| x = self.head(x) | |||
| return x | |||
| class DenseNet121(nn.Cell): | |||
| """ | |||
| @@ -1,4 +1,4 @@ | |||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||
| # Copyright 2020-2021 Huawei Technologies Co., Ltd | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||
| # you may not use this file except in compliance with the License. | |||
| @@ -32,21 +32,18 @@ from mindspore.context import ParallelMode | |||
| from mindspore.common import set_seed | |||
| from src.optimizers import get_param_groups | |||
| from src.network import DenseNet121 | |||
| from src.datasets import classification_dataset | |||
| from src.losses.crossentropy import CrossEntropy | |||
| from src.lr_scheduler import MultiStepLR, CosineAnnealingLR | |||
| from src.utils.logging import get_logger | |||
| from src.config import config | |||
| set_seed(1) | |||
| class BuildTrainNetwork(nn.Cell): | |||
| """build training network""" | |||
| def __init__(self, network, criterion): | |||
| def __init__(self, net, crit): | |||
| super(BuildTrainNetwork, self).__init__() | |||
| self.network = network | |||
| self.criterion = criterion | |||
| self.network = net | |||
| self.criterion = crit | |||
| def construct(self, input_data, label): | |||
| output = self.network(input_data) | |||
| @@ -108,6 +105,10 @@ def parse_args(cloud_args=None): | |||
| """parameters""" | |||
| parser = argparse.ArgumentParser('mindspore classification training') | |||
| # network and dataset choices | |||
| parser.add_argument('--net', type=str, default='', help='Densenet Model, densenet100 or densenet121') | |||
| parser.add_argument('--dataset', type=str, default='', help='Dataset, either cifar10 or imagenet') | |||
| # dataset related | |||
| parser.add_argument('--data_dir', type=str, default='', help='train data dir') | |||
| @@ -121,10 +122,17 @@ def parse_args(cloud_args=None): | |||
| parser.add_argument('--train_url', type=str, default="", help='train url') | |||
| # platform | |||
| parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU'), help='device target') | |||
| parser.add_argument('--device_target', type=str, default='Ascend', choices=('Ascend', 'GPU', 'CPU'), | |||
| help='device target') | |||
| args, _ = parser.parse_known_args() | |||
| args = merge_args(args, cloud_args) | |||
| if args.net == "densenet100": | |||
| from src.config import config_100 as config | |||
| else: | |||
| from src.config import config_121 as config | |||
| args.image_size = config.image_size | |||
| args.num_classes = config.num_classes | |||
| args.lr = config.lr | |||
| @@ -158,15 +166,26 @@ def merge_args(args, cloud_args): | |||
| """dictionary""" | |||
| args_dict = vars(args) | |||
| if isinstance(cloud_args, dict): | |||
| for key in cloud_args.keys(): | |||
| val = cloud_args[key] | |||
| if key in args_dict and val: | |||
| arg_type = type(args_dict[key]) | |||
| for k in cloud_args.keys(): | |||
| val = cloud_args[k] | |||
| if k in args_dict and val: | |||
| arg_type = type(args_dict[k]) | |||
| if arg_type is not type(None): | |||
| val = arg_type(val) | |||
| args_dict[key] = val | |||
| args_dict[k] = val | |||
| return args | |||
| def get_lr_scheduler(args): | |||
| if args.lr_scheduler == 'exponential': | |||
| lr_scheduler = MultiStepLR(args.lr, args.lr_epochs, args.lr_gamma, args.steps_per_epoch, args.max_epoch, | |||
| warmup_epochs=args.warmup_epochs) | |||
| elif args.lr_scheduler == 'cosine_annealing': | |||
| lr_scheduler = CosineAnnealingLR(args.lr, args.T_max, args.steps_per_epoch, args.max_epoch, | |||
| warmup_epochs=args.warmup_epochs, eta_min=args.eta_min) | |||
| else: | |||
| raise NotImplementedError(args.lr_scheduler) | |||
| return lr_scheduler | |||
| def train(cloud_args=None): | |||
| """training process""" | |||
| args = parse_args(cloud_args) | |||
| @@ -200,9 +219,18 @@ def train(cloud_args=None): | |||
| datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S')) | |||
| args.logger = get_logger(args.outputs_dir, args.rank) | |||
| if args.net == "densenet100": | |||
| from src.network.densenet import DenseNet100 as DenseNet | |||
| else: | |||
| from src.network.densenet import DenseNet121 as DenseNet | |||
| if args.dataset == "cifar10": | |||
| from src.datasets import classification_dataset_cifar10 as classification_dataset | |||
| else: | |||
| from src.datasets import classification_dataset_imagenet as classification_dataset | |||
| # dataloader | |||
| de_dataset = classification_dataset(args.data_dir, args.image_size, | |||
| args.per_batch_size, args.max_epoch, | |||
| de_dataset = classification_dataset(args.data_dir, args.image_size, args.per_batch_size, args.max_epoch, | |||
| args.rank, args.group_size) | |||
| de_dataset.map_model = 4 | |||
| args.steps_per_epoch = de_dataset.get_dataset_size() | |||
| @@ -212,12 +240,11 @@ def train(cloud_args=None): | |||
| # network | |||
| args.logger.important_info('start create network') | |||
| # get network and init | |||
| network = DenseNet121(args.num_classes) | |||
| network = DenseNet(args.num_classes) | |||
| # loss | |||
| if not args.label_smooth: | |||
| args.label_smooth_factor = 0.0 | |||
| criterion = CrossEntropy(smooth_factor=args.label_smooth_factor, | |||
| num_classes=args.num_classes) | |||
| criterion = CrossEntropy(smooth_factor=args.label_smooth_factor, num_classes=args.num_classes) | |||
| # load pretrain model | |||
| if os.path.isfile(args.pretrained): | |||
| @@ -234,30 +261,12 @@ def train(cloud_args=None): | |||
| args.logger.info('load model {} success'.format(args.pretrained)) | |||
| # lr scheduler | |||
| if args.lr_scheduler == 'exponential': | |||
| lr_scheduler = MultiStepLR(args.lr, | |||
| args.lr_epochs, | |||
| args.lr_gamma, | |||
| args.steps_per_epoch, | |||
| args.max_epoch, | |||
| warmup_epochs=args.warmup_epochs) | |||
| elif args.lr_scheduler == 'cosine_annealing': | |||
| lr_scheduler = CosineAnnealingLR(args.lr, | |||
| args.T_max, | |||
| args.steps_per_epoch, | |||
| args.max_epoch, | |||
| warmup_epochs=args.warmup_epochs, | |||
| eta_min=args.eta_min) | |||
| else: | |||
| raise NotImplementedError(args.lr_scheduler) | |||
| lr_scheduler = get_lr_scheduler(args) | |||
| lr_schedule = lr_scheduler.get_lr() | |||
| # optimizer | |||
| opt = Momentum(params=get_param_groups(network), | |||
| learning_rate=Tensor(lr_schedule), | |||
| momentum=args.momentum, | |||
| weight_decay=args.weight_decay, | |||
| loss_scale=args.loss_scale) | |||
| opt = Momentum(params=get_param_groups(network), learning_rate=Tensor(lr_schedule), | |||
| momentum=args.momentum, weight_decay=args.weight_decay, loss_scale=args.loss_scale) | |||
| # mixed precision training | |||
| criterion.add_flags_recursive(fp32=True) | |||
| @@ -280,6 +289,8 @@ def train(cloud_args=None): | |||
| model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O3") | |||
| elif args.device_target == 'GPU': | |||
| model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O0") | |||
| elif args.device_target == 'CPU': | |||
| model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O0") | |||
| else: | |||
| raise ValueError("Unsupported device target.") | |||
| @@ -290,8 +301,7 @@ def train(cloud_args=None): | |||
| ckpt_max_num = args.max_epoch * args.steps_per_epoch // args.ckpt_interval | |||
| ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval, | |||
| keep_checkpoint_max=ckpt_max_num) | |||
| ckpt_cb = ModelCheckpoint(config=ckpt_config, | |||
| directory=args.outputs_dir, | |||
| ckpt_cb = ModelCheckpoint(config=ckpt_config, directory=args.outputs_dir, | |||
| prefix='{}'.format(args.rank)) | |||
| callbacks.append(ckpt_cb) | |||