From: @Gogery Reviewed-by: @guoqi1024 Signed-off-by: @guoqi1024tags/v1.1.0
| @@ -0,0 +1,234 @@ | |||||
| # Contents | |||||
| - [Contents](#contents) | |||||
| - [Xception Description](#xception-description) | |||||
| - [Model architecture](#model-architecture) | |||||
| - [Dataset](#dataset) | |||||
| - [Features](#features) | |||||
| - [Mixed Precision(Ascend)](#mixed-precisionascend) | |||||
| - [Environment Requirements](#environment-requirements) | |||||
| - [Script description](#script-description) | |||||
| - [Script and sample code](#script-and-sample-code) | |||||
| - [Script Parameters](#script-parameters) | |||||
| - [Training process](#training-process) | |||||
| - [Usage](#usage) | |||||
| - [Launch](#launch) | |||||
| - [Result](#result) | |||||
| - [Eval process](#eval-process) | |||||
| - [Usage](#usage-1) | |||||
| - [Launch](#launch-1) | |||||
| - [Result](#result-1) | |||||
| - [Model description](#model-description) | |||||
| - [Performance](#performance) | |||||
| - [Training Performance](#training-performance) | |||||
| - [Inference Performance](#inference-performance) | |||||
| - [Description of Random Situation](#description-of-random-situation) | |||||
| - [ModelZoo Homepage](#modelzoo-homepage) | |||||
| # [Xception Description](#contents) | |||||
| Xception by Google is extreme version of Inception. With a modified depthwise separable convolution, it is even better than Inception-v3. This paper was published in 2017. | |||||
| [Paper](https://arxiv.org/pdf/1610.02357v3.pdf) Franois Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2017. | |||||
| # [Model architecture](#contents) | |||||
| The overall network architecture of Xception is show below: | |||||
| [Link](https://arxiv.org/pdf/1610.02357v3.pdf) | |||||
| # [Dataset](#contents) | |||||
| Dataset used can refer to paper. | |||||
| - Dataset size: 125G, 1250k colorful images in 1000 classes | |||||
| - Train: 120G, 1200k images | |||||
| - Test: 5G, 50k images | |||||
| - Data format: RGB images. | |||||
| - Note: Data will be processed in src/dataset.py | |||||
| # [Features](#contents) | |||||
| ## [Mixed Precision(Ascend)](#contents) | |||||
| The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. | |||||
| For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’. | |||||
| # [Environment Requirements](#contents) | |||||
| - Hardware(Ascend) | |||||
| - Prepare hardware environment with Ascend. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. | |||||
| - Framework | |||||
| - [MindSpore](https://www.mindspore.cn/install/en) | |||||
| - For more information, please check the resources below: | |||||
| - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) | |||||
| - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) | |||||
| # [Script description](#contents) | |||||
| ## [Script and sample code](#contents) | |||||
| ```shell | |||||
| . | |||||
| └─Xception | |||||
| ├─README.md | |||||
| ├─scripts | |||||
| ├─run_standalone_train.sh # launch standalone training with ascend platform(1p) | |||||
| ├─run_distribute_train.sh # launch distributed training with ascend platform(8p) | |||||
| └─run_eval.sh # launch evaluating with ascend platform | |||||
| ├─src | |||||
| ├─config.py # parameter configuration | |||||
| ├─dataset.py # data preprocessing | |||||
| ├─Xception.py # network definition | |||||
| ├─CrossEntropySmooth.py # Customized CrossEntropy loss function | |||||
| └─lr_generator.py # learning rate generator | |||||
| ├─train.py # train net | |||||
| └─eval.py # eval net | |||||
| ``` | |||||
| ## [Script Parameters](#contents) | |||||
| ```python | |||||
| Major parameters in train.py and config.py are: | |||||
| 'num_classes': 1000 # dataset class numbers | |||||
| 'batch_size': 128 # input batchsize | |||||
| 'loss_scale': 1024 # loss scale | |||||
| 'momentum': 0.9 # momentum | |||||
| 'weight_decay': 1e-4 # weight decay | |||||
| 'epoch_size': 250 # total epoch numbers | |||||
| 'save_checkpoint': True # save checkpoint | |||||
| 'save_checkpoint_epochs': 1 # save checkpoint epochs | |||||
| 'keep_checkpoint_max': 5 # max numbers to keep checkpoints | |||||
| 'save_checkpoint_path': "./" # save checkpoint path | |||||
| 'warmup_epochs': 1 # warmup epoch numbers | |||||
| 'lr_decay_mode': "liner" # lr decay mode | |||||
| 'use_label_smooth': True # use label smooth | |||||
| 'finish_epoch': 0 # finished epochs numbers | |||||
| 'label_smooth_factor': 0.1 # label smoothing factor | |||||
| 'lr_init': 0.00004 # initiate learning rate | |||||
| 'lr_max': 0.4 # max bound of learning rate | |||||
| 'lr_end': 0.00004 # min bound of learning rate | |||||
| "weight_init": 'xavier_uniform' # Weight initialization mode | |||||
| ``` | |||||
| ## [Training process](#contents) | |||||
| ### Usage | |||||
| You can start training using python or shell scripts. The usage of shell scripts as follows: | |||||
| - Ascend: | |||||
| ```shell | |||||
| # distribute training example(8p) | |||||
| sh scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH | |||||
| # standalone training | |||||
| sh scripts/run_standalone_train.sh DEVICE_ID DATA_PATH | |||||
| ``` | |||||
| > Notes: RANK_TABLE_FILE can refer to [Link](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/distributed_training_ascend.html), and the device_ip can be got as [Link](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). | |||||
| > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` | |||||
| ### Launch | |||||
| ``` shell | |||||
| # training example | |||||
| python: | |||||
| Ascend: | |||||
| python train.py --device_target Ascend --dataset_path /dataset/train | |||||
| shell: | |||||
| Ascend: | |||||
| # distribute training example(8p) | |||||
| sh scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH | |||||
| # standalone training | |||||
| sh scripts/run_standalone_train.sh DEVICE_ID DATA_PATH | |||||
| ``` | |||||
| ### Result | |||||
| Training result will be stored in the example path. Checkpoints will be stored at `. /model_0` by default, and training log will be redirected to `log.txt` like followings. | |||||
| ``` shell | |||||
| epoch: [ 0/250], step:[ 1250/ 1251], loss:[4.761/5.613], time:[529.305], lr:[0.400] | |||||
| epoch time: 1128662.862, per step time: 902.209, avg loss: 5.609 | |||||
| epoch: [ 1/250], step:[ 1250/ 1251], loss:[4.164/4.318], time:[503.708], lr:[0.398] | |||||
| epoch time: 889163.081, per step time: 710.762, avg loss: 4.312 | |||||
| ``` | |||||
| ## [Eval process](#contents) | |||||
| ### Usage | |||||
| You can start training using python or shell scripts. The usage of shell scripts as follows: | |||||
| ```shell | |||||
| sh scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT | |||||
| ``` | |||||
| ### Launch | |||||
| ```shell | |||||
| # eval example | |||||
| python: | |||||
| Ascend: python eval.py --device_target Ascend --checkpoint_path PATH_CHECKPOINT --dataset_path DATA_DIR | |||||
| shell: | |||||
| Ascend: sh scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT | |||||
| ``` | |||||
| > checkpoint can be produced in training process. | |||||
| ### Result | |||||
| Evaluation result will be stored in the example path, you can find result like the followings in `eval.log`. | |||||
| ```shell | |||||
| result: {'Loss': 1.7797744848789312, 'Top_1_Acc': 0.7985777243589743, 'Top_5_Acc': 0.9485777243589744} | |||||
| ``` | |||||
| # [Model description](#contents) | |||||
| ## [Performance](#contents) | |||||
| ### Training Performance | |||||
| | Parameters | Ascend | | |||||
| | -------------------------- | ---------------------------------------------- | | |||||
| | Model Version | Xception | | |||||
| | Resource | HUAWEI CLOUD Modelarts | | |||||
| | uploaded Date | 11/15/2020 | | |||||
| | MindSpore Version | 1.0.0 | | |||||
| | Dataset | 1200k images | | |||||
| | Batch_size | 128 | | |||||
| | Training Parameters | src/config.py | | |||||
| | Optimizer | Momentum | | |||||
| | Loss Function | CrossEntropySmooth | | |||||
| | Loss | 1.78 | | |||||
| | Accuracy (8p) | Top1[79.9%] Top5[94.9%] | | |||||
| | Total time (8p) | 63h | | |||||
| | Params (M) | 180M | | |||||
| | Scripts | [Xception script](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/cv/Xception) | | |||||
| #### Inference Performance | |||||
| | Parameters | Ascend | | |||||
| | ------------------- | --------------------------- | | |||||
| | Model Version | Xception | | |||||
| | Resource | HUAWEI CLOUD Modelarts | | |||||
| | Uploaded Date | 11/15/2020 | | |||||
| | MindSpore Version | 1.0.0 | | |||||
| | Dataset | 50k images | | |||||
| | Batch_size | 128 | | |||||
| | Accuracy | Top1[79.9%] Top5[94.9%] | | |||||
| | Total time | 3mins | | |||||
| # [Description of Random Situation](#contents) | |||||
| In `dataset.py`, we set the seed inside `create_dataset` function. We also use random seed in `train.py`. | |||||
| # [ModelZoo Homepage](#contents) | |||||
| Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo). | |||||
| @@ -0,0 +1,63 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """eval Xception.""" | |||||
| import argparse | |||||
| from mindspore import context, nn | |||||
| from mindspore.train.model import Model | |||||
| from mindspore.common import set_seed | |||||
| from mindspore.train.serialization import load_checkpoint, load_param_into_net | |||||
| from src.Xception import xception | |||||
| from src.config import config | |||||
| from src.dataset import create_dataset | |||||
| from src.loss import CrossEntropySmooth | |||||
| set_seed(1) | |||||
| if __name__ == '__main__': | |||||
| parser = argparse.ArgumentParser(description='Image classification') | |||||
| parser.add_argument('--device_target', type=str, default='Ascend', help='Device target') | |||||
| parser.add_argument('--device_id', type=int, default=0, help='Device id') | |||||
| parser.add_argument('--checkpoint_path', type=str, default=None, help='Checkpoint file path') | |||||
| parser.add_argument('--dataset_path', type=str, default=None, help='Dataset path') | |||||
| args_opt = parser.parse_args() | |||||
| context.set_context(device_id=args_opt.device_id) | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target='Ascend', save_graphs=False) | |||||
| # create dataset | |||||
| dataset = create_dataset(args_opt.dataset_path, do_train=False, batch_size=config.batch_size, device_num=1, rank=0) | |||||
| step_size = dataset.get_dataset_size() | |||||
| # define net | |||||
| net = xception(class_num=config.class_num) | |||||
| # load checkpoint | |||||
| param_dict = load_checkpoint(args_opt.checkpoint_path) | |||||
| load_param_into_net(net, param_dict) | |||||
| net.set_train(False) | |||||
| # define loss, model | |||||
| loss = CrossEntropySmooth(smooth_factor=config.label_smooth_factor, num_classes=config.class_num) | |||||
| # define model | |||||
| eval_metrics = {'Loss': nn.Loss(), | |||||
| 'Top_1_Acc': nn.Top1CategoricalAccuracy(), | |||||
| 'Top_5_Acc': nn.Top5CategoricalAccuracy()} | |||||
| model = Model(net, loss_fn=loss, metrics=eval_metrics) | |||||
| # eval model | |||||
| res = model.eval(dataset, dataset_sink_mode=False) | |||||
| print("result:", res, "ckpt=", args_opt.checkpoint_path) | |||||
| @@ -0,0 +1,50 @@ | |||||
| #!/bin/bash | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| DATA_DIR=$2 | |||||
| export RANK_TABLE_FILE=$1 | |||||
| export RANK_SIZE=8 | |||||
| cores=`cat /proc/cpuinfo|grep "processor" |wc -l` | |||||
| echo "the number of logical core" $cores | |||||
| avg_core_per_rank=`expr $cores \/ $RANK_SIZE` | |||||
| core_gap=`expr $avg_core_per_rank \- 1` | |||||
| echo "avg_core_per_rank" $avg_core_per_rank | |||||
| echo "core_gap" $core_gap | |||||
| for((i=0;i<RANK_SIZE;i++)) | |||||
| do | |||||
| start=`expr $i \* $avg_core_per_rank` | |||||
| export DEVICE_ID=$i | |||||
| export RANK_ID=$i | |||||
| export DEPLOY_MODE=0 | |||||
| export GE_USE_STATIC_MEMORY=1 | |||||
| end=`expr $start \+ $core_gap` | |||||
| cmdopt=$start"-"$end | |||||
| rm -rf train_parallel$i | |||||
| mkdir ./train_parallel$i | |||||
| cp *.py ./train_parallel$i | |||||
| cd ./train_parallel$i || exit | |||||
| echo "start training for rank $i, device $DEVICE_ID" | |||||
| env > env.log | |||||
| taskset -c $cmdopt python ../train.py \ | |||||
| --is_distributed \ | |||||
| --device_target=Ascend \ | |||||
| --dataset_path=$DATA_DIR > log.txt 2>&1 & | |||||
| cd ../ | |||||
| done | |||||
| @@ -0,0 +1,25 @@ | |||||
| #!/bin/bash | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| export DEVICE_ID=$1 | |||||
| DATA_DIR=$2 | |||||
| PATH_CHECKPOINT=$3 | |||||
| python ./eval.py \ | |||||
| --device_target=Ascend \ | |||||
| --device_id=$DEVICE_ID \ | |||||
| --checkpoint_path=$PATH_CHECKPOINT \ | |||||
| --dataset_path=$DATA_DIR > eval.log 2>&1 & | |||||
| @@ -0,0 +1,22 @@ | |||||
| #!/bin/bash | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| export DEVICE_ID=$1 | |||||
| DATA_DIR=$2 | |||||
| python ./train.py \ | |||||
| --device_target=Ascend \ | |||||
| --dataset_path=$DATA_DIR > log.txt 2>&1 & | |||||
| @@ -0,0 +1,186 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """Xception.""" | |||||
| import mindspore.nn as nn | |||||
| import mindspore.ops.operations as P | |||||
| from src.config import config | |||||
| class SeparableConv2d(nn.Cell): | |||||
| def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, padding=0): | |||||
| super(SeparableConv2d, self).__init__() | |||||
| self.conv1 = nn.Conv2d(in_channels, in_channels, kernel_size, stride, group=in_channels, pad_mode='pad', | |||||
| padding=padding, weight_init=config.weight_init) | |||||
| self.pointwise = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, pad_mode='valid', | |||||
| weight_init=config.weight_init) | |||||
| def construct(self, x): | |||||
| x = self.conv1(x) | |||||
| x = self.pointwise(x) | |||||
| return x | |||||
| class Block(nn.Cell): | |||||
| def __init__(self, in_filters, out_filters, reps, strides=1, start_with_relu=True, grow_first=True): | |||||
| super(Block, self).__init__() | |||||
| if out_filters != in_filters or strides != 1: | |||||
| self.skip = nn.Conv2d(in_filters, out_filters, 1, stride=strides, pad_mode='valid', has_bias=False, | |||||
| weight_init=config.weight_init) | |||||
| self.skipbn = nn.BatchNorm2d(out_filters, momentum=0.9) | |||||
| else: | |||||
| self.skip = None | |||||
| self.relu = nn.ReLU() | |||||
| rep = [] | |||||
| filters = in_filters | |||||
| if grow_first: | |||||
| rep.append(nn.ReLU()) | |||||
| rep.append(SeparableConv2d(in_filters, out_filters, kernel_size=3, stride=1, padding=1)) | |||||
| rep.append(nn.BatchNorm2d(out_filters, momentum=0.9)) | |||||
| filters = out_filters | |||||
| for _ in range(reps - 1): | |||||
| rep.append(nn.ReLU()) | |||||
| rep.append(SeparableConv2d(filters, filters, kernel_size=3, stride=1, padding=1)) | |||||
| rep.append(nn.BatchNorm2d(filters, momentum=0.9)) | |||||
| if not grow_first: | |||||
| rep.append(nn.ReLU()) | |||||
| rep.append(SeparableConv2d(in_filters, out_filters, kernel_size=3, stride=1, padding=1)) | |||||
| rep.append(nn.BatchNorm2d(out_filters, momentum=0.9)) | |||||
| if not start_with_relu: | |||||
| rep = rep[1:] | |||||
| else: | |||||
| rep[0] = nn.ReLU() | |||||
| if strides != 1: | |||||
| rep.append(nn.MaxPool2d(3, strides, pad_mode="same")) | |||||
| self.rep = nn.SequentialCell(*rep) | |||||
| self.add = P.TensorAdd() | |||||
| def construct(self, inp): | |||||
| x = self.rep(inp) | |||||
| if self.skip is not None: | |||||
| skip = self.skip(inp) | |||||
| skip = self.skipbn(skip) | |||||
| else: | |||||
| skip = inp | |||||
| x = self.add(x, skip) | |||||
| return x | |||||
| class Xception(nn.Cell): | |||||
| """ | |||||
| Xception optimized for the ImageNet dataset, as specified in | |||||
| https://arxiv.org/abs/1610.02357.pdf | |||||
| """ | |||||
| def __init__(self, num_classes=1000): | |||||
| """ Constructor | |||||
| Args: | |||||
| num_classes: number of classes. | |||||
| """ | |||||
| super(Xception, self).__init__() | |||||
| self.num_classes = num_classes | |||||
| self.conv1 = nn.Conv2d(3, 32, 3, 2, pad_mode='valid', weight_init=config.weight_init) | |||||
| self.bn1 = nn.BatchNorm2d(32, momentum=0.9) | |||||
| self.relu = nn.ReLU() | |||||
| self.conv2 = nn.Conv2d(32, 64, 3, pad_mode='valid', weight_init=config.weight_init) | |||||
| self.bn2 = nn.BatchNorm2d(64, momentum=0.9) | |||||
| # Entry flow | |||||
| self.block1 = Block(64, 128, 2, 2, start_with_relu=False, grow_first=True) | |||||
| self.block2 = Block(128, 256, 2, 2, start_with_relu=True, grow_first=True) | |||||
| self.block3 = Block(256, 728, 2, 2, start_with_relu=True, grow_first=True) | |||||
| # Middle flow | |||||
| self.block4 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block5 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block6 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block7 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block8 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block9 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block10 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| self.block11 = Block(728, 728, 3, 1, start_with_relu=True, grow_first=True) | |||||
| # Exit flow | |||||
| self.block12 = Block(728, 1024, 2, 2, start_with_relu=True, grow_first=False) | |||||
| self.conv3 = SeparableConv2d(1024, 1536, 3, 1, 1) | |||||
| self.bn3 = nn.BatchNorm2d(1536, momentum=0.9) | |||||
| self.conv4 = SeparableConv2d(1536, 2048, 3, 1, 1) | |||||
| self.bn4 = nn.BatchNorm2d(2048, momentum=0.9) | |||||
| self.avg_pool = nn.AvgPool2d(10) | |||||
| self.dropout = nn.Dropout() | |||||
| self.fc = nn.Dense(2048, num_classes) | |||||
| def construct(self, x): | |||||
| shape = P.Shape() | |||||
| reshape = P.Reshape() | |||||
| x = self.conv1(x) | |||||
| x = self.bn1(x) | |||||
| x = self.relu(x) | |||||
| x = self.conv2(x) | |||||
| x = self.bn2(x) | |||||
| x = self.relu(x) | |||||
| x = self.block1(x) | |||||
| x = self.block2(x) | |||||
| x = self.block3(x) | |||||
| x = self.block4(x) | |||||
| x = self.block5(x) | |||||
| x = self.block6(x) | |||||
| x = self.block7(x) | |||||
| x = self.block8(x) | |||||
| x = self.block9(x) | |||||
| x = self.block10(x) | |||||
| x = self.block11(x) | |||||
| x = self.block12(x) | |||||
| x = self.conv3(x) | |||||
| x = self.bn3(x) | |||||
| x = self.relu(x) | |||||
| x = self.conv4(x) | |||||
| x = self.bn4(x) | |||||
| x = self.relu(x) | |||||
| x = self.avg_pool(x) | |||||
| x = self.dropout(x) | |||||
| x = reshape(x, (shape(x)[0], -1)) | |||||
| x = self.fc(x) | |||||
| return x | |||||
| def xception(class_num=1000): | |||||
| """ | |||||
| Get Xception neural network. | |||||
| Args: | |||||
| class_num (int): Class number. | |||||
| Returns: | |||||
| Cell, cell instance of Xception neural network. | |||||
| Examples: | |||||
| >>> net = xception(1000) | |||||
| """ | |||||
| return Xception(class_num) | |||||
| @@ -0,0 +1,41 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """ | |||||
| network config setting, will be used in train.py and eval.py | |||||
| """ | |||||
| from easydict import EasyDict as ed | |||||
| # config for Xception, imagenet2012. | |||||
| config = ed({ | |||||
| "class_num": 1000, | |||||
| "batch_size": 128, | |||||
| "loss_scale": 1024, | |||||
| "momentum": 0.9, | |||||
| "weight_decay": 1e-4, | |||||
| "epoch_size": 250, | |||||
| "save_checkpoint": True, | |||||
| "save_checkpoint_epochs": 1, | |||||
| "keep_checkpoint_max": 5, | |||||
| "save_checkpoint_path": "./", | |||||
| "warmup_epochs": 1, | |||||
| "lr_decay_mode": "liner", | |||||
| "use_label_smooth": True, | |||||
| "finish_epoch": 0, | |||||
| "label_smooth_factor": 0.1, | |||||
| "lr_init": 0.00004, | |||||
| "lr_max": 0.4, | |||||
| "lr_end": 0.00004, | |||||
| "weight_init": 'xavier_uniform' | |||||
| }) | |||||
| @@ -0,0 +1,66 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """ | |||||
| Data operations, will be used in train.py and eval.py | |||||
| """ | |||||
| import mindspore.common.dtype as mstype | |||||
| import mindspore.dataset.engine as de | |||||
| import mindspore.dataset.transforms.c_transforms as C2 | |||||
| import mindspore.dataset.vision.c_transforms as C | |||||
| def create_dataset(dataset_path, do_train, batch_size=16, device_num=1, rank=0): | |||||
| """ | |||||
| create a train or eval dataset | |||||
| Args: | |||||
| dataset_path(string): the path of dataset. | |||||
| do_train(bool): whether dataset is used for train or eval. | |||||
| batch_size(int): the batch size of dataset. Default: 16. | |||||
| device_num (int): Number of shards that the dataset should be divided into (default=1). | |||||
| rank (int): The shard ID within num_shards (default=0). | |||||
| Returns: | |||||
| dataset | |||||
| """ | |||||
| if device_num == 1: | |||||
| ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True) | |||||
| else: | |||||
| ds = de.ImageFolderDataset(dataset_path, num_parallel_workers=8, shuffle=True, | |||||
| num_shards=device_num, shard_id=rank) | |||||
| # define map operations | |||||
| if do_train: | |||||
| trans = [ | |||||
| C.RandomCropDecodeResize(299), | |||||
| C.RandomHorizontalFlip(prob=0.5), | |||||
| C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4) | |||||
| ] | |||||
| else: | |||||
| trans = [ | |||||
| C.Decode(), | |||||
| C.Resize(320), | |||||
| C.CenterCrop(299) | |||||
| ] | |||||
| trans += [ | |||||
| C.Normalize(mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5]), | |||||
| C.HWC2CHW(), | |||||
| C2.TypeCast(mstype.float32) | |||||
| ] | |||||
| type_cast_op = C2.TypeCast(mstype.int32) | |||||
| ds = ds.map(input_columns="image", operations=trans, num_parallel_workers=8) | |||||
| ds = ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=8) | |||||
| # apply batch operations | |||||
| ds = ds.batch(batch_size, drop_remainder=True) | |||||
| return ds | |||||
| @@ -0,0 +1,38 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """define loss function for network""" | |||||
| import mindspore.nn as nn | |||||
| from mindspore import Tensor | |||||
| from mindspore.common import dtype as mstype | |||||
| from mindspore.nn.loss.loss import _Loss | |||||
| from mindspore.ops import functional as F | |||||
| from mindspore.ops import operations as P | |||||
| class CrossEntropySmooth(_Loss): | |||||
| """CrossEntropy""" | |||||
| def __init__(self, sparse=True, reduction='mean', smooth_factor=0., num_classes=1000): | |||||
| super(CrossEntropySmooth, self).__init__() | |||||
| self.onehot = P.OneHot() | |||||
| self.sparse = sparse | |||||
| self.on_value = Tensor(1.0 - smooth_factor, mstype.float32) | |||||
| self.off_value = Tensor(1.0 * smooth_factor / (num_classes - 1), mstype.float32) | |||||
| self.ce = nn.SoftmaxCrossEntropyWithLogits(reduction=reduction) | |||||
| def construct(self, logit, label): | |||||
| if self.sparse: | |||||
| label = self.onehot(label, F.shape(logit)[1], self.on_value, self.off_value) | |||||
| loss = self.ce(logit, label) | |||||
| return loss | |||||
| @@ -0,0 +1,87 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """learning rate generator""" | |||||
| import math | |||||
| import numpy as np | |||||
| def get_lr(lr_init, lr_end, lr_max, warmup_epochs, total_epochs, steps_per_epoch, lr_decay_mode): | |||||
| """ | |||||
| generate learning rate array | |||||
| Args: | |||||
| lr_init(float): init learning rate | |||||
| lr_end(float): end learning rate | |||||
| lr_max(float): max learning rate | |||||
| warmup_epochs(int): number of warmup epochs | |||||
| total_epochs(int): total epoch of training | |||||
| steps_per_epoch(int): steps of one epoch | |||||
| lr_decay_mode(string): learning rate decay mode, including steps, poly or default | |||||
| Returns: | |||||
| np.array, learning rate array | |||||
| """ | |||||
| lr_each_step = [] | |||||
| total_steps = steps_per_epoch * total_epochs | |||||
| warmup_steps = steps_per_epoch * warmup_epochs | |||||
| if lr_decay_mode == 'steps': | |||||
| decay_epoch_index = [0.3 * total_steps, 0.6 * total_steps, 0.8 * total_steps] | |||||
| for i in range(total_steps): | |||||
| if i < decay_epoch_index[0]: | |||||
| lr = lr_max | |||||
| elif i < decay_epoch_index[1]: | |||||
| lr = lr_max * 0.1 | |||||
| elif i < decay_epoch_index[2]: | |||||
| lr = lr_max * 0.01 | |||||
| else: | |||||
| lr = lr_max * 0.001 | |||||
| lr_each_step.append(lr) | |||||
| elif lr_decay_mode == 'poly': | |||||
| if warmup_steps != 0: | |||||
| inc_each_step = (float(lr_max) - float(lr_init)) / float(warmup_steps) | |||||
| else: | |||||
| inc_each_step = 0 | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr = float(lr_init) + inc_each_step * float(i) | |||||
| else: | |||||
| base = (1.0 - (float(i) - float(warmup_steps)) / (float(total_steps) - float(warmup_steps))) | |||||
| lr = float(lr_max) * base * base | |||||
| if lr < 0.0: | |||||
| lr = 0.0 | |||||
| lr_each_step.append(lr) | |||||
| elif lr_decay_mode == 'cosine': | |||||
| decay_steps = total_steps - warmup_steps | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr_inc = (float(lr_max) - float(lr_init)) / float(warmup_steps) | |||||
| lr = float(lr_init) + lr_inc * (i + 1) | |||||
| else: | |||||
| linear_decay = (total_steps - i) / decay_steps | |||||
| cosine_decay = 0.5 * (1 + math.cos(math.pi * 2 * 0.47 * i / decay_steps)) | |||||
| decayed = linear_decay * cosine_decay + 0.00001 | |||||
| lr = lr_max * decayed | |||||
| lr_each_step.append(lr) | |||||
| else: | |||||
| for i in range(total_steps): | |||||
| if i < warmup_steps: | |||||
| lr = lr_init + (lr_max - lr_init) * i / warmup_steps | |||||
| else: | |||||
| lr = lr_max - (lr_max - lr_end) * (i - warmup_steps) / (total_steps - warmup_steps) | |||||
| lr_each_step.append(lr) | |||||
| lr_each_step = np.array(lr_each_step).astype(np.float32) | |||||
| return lr_each_step | |||||
| @@ -0,0 +1,173 @@ | |||||
| # Copyright 2020 Huawei Technologies Co., Ltd | |||||
| # | |||||
| # Licensed under the Apache License, Version 2.0 (the "License"); | |||||
| # you may not use this file except in compliance with the License. | |||||
| # You may obtain a copy of the License at | |||||
| # | |||||
| # http://www.apache.org/licenses/LICENSE-2.0 | |||||
| # | |||||
| # Unless required by applicable law or agreed to in writing, software | |||||
| # distributed under the License is distributed on an "AS IS" BASIS, | |||||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||||
| # See the License for the specific language governing permissions and | |||||
| # limitations under the License. | |||||
| # ============================================================================ | |||||
| """train Xception.""" | |||||
| import os | |||||
| import time | |||||
| import argparse | |||||
| import numpy as np | |||||
| from mindspore import context | |||||
| from mindspore import Tensor | |||||
| from mindspore.nn.optim.momentum import Momentum | |||||
| from mindspore.train.model import Model, ParallelMode | |||||
| from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, Callback | |||||
| from mindspore.train.serialization import load_checkpoint, load_param_into_net | |||||
| from mindspore.communication.management import init, get_rank, get_group_size | |||||
| from mindspore.train.loss_scale_manager import FixedLossScaleManager | |||||
| from mindspore.common import dtype as mstype | |||||
| from mindspore.common import set_seed | |||||
| from src.lr_generator import get_lr | |||||
| from src.Xception import xception | |||||
| from src.config import config | |||||
| from src.dataset import create_dataset | |||||
| from src.loss import CrossEntropySmooth | |||||
| set_seed(1) | |||||
| class Monitor(Callback): | |||||
| """ | |||||
| Monitor loss and time. | |||||
| Args: | |||||
| lr_init (numpy array): train lr | |||||
| Returns: | |||||
| None | |||||
| Examples: | |||||
| >>> Monitor(lr_init=Tensor([0.05]*100).asnumpy()) | |||||
| """ | |||||
| def __init__(self, lr_init=None): | |||||
| super(Monitor, self).__init__() | |||||
| self.lr_init = lr_init | |||||
| self.lr_init_len = len(lr_init) | |||||
| def epoch_begin(self, run_context): | |||||
| self.losses = [] | |||||
| self.epoch_time = time.time() | |||||
| def epoch_end(self, run_context): | |||||
| cb_params = run_context.original_args() | |||||
| epoch_mseconds = (time.time() - self.epoch_time) * 1000 | |||||
| per_step_mseconds = epoch_mseconds / cb_params.batch_num | |||||
| print("epoch time: {:5.3f}, per step time: {:5.3f}, avg loss: {:5.3f}".format(epoch_mseconds, | |||||
| per_step_mseconds, | |||||
| np.mean(self.losses))) | |||||
| def step_begin(self, run_context): | |||||
| self.step_time = time.time() | |||||
| def step_end(self, run_context): | |||||
| cb_params = run_context.original_args() | |||||
| step_mseconds = (time.time() - self.step_time) * 1000 | |||||
| step_loss = cb_params.net_outputs | |||||
| if isinstance(step_loss, (tuple, list)) and isinstance(step_loss[0], Tensor): | |||||
| step_loss = step_loss[0] | |||||
| if isinstance(step_loss, Tensor): | |||||
| step_loss = np.mean(step_loss.asnumpy()) | |||||
| self.losses.append(step_loss) | |||||
| cur_step_in_epoch = (cb_params.cur_step_num - 1) % cb_params.batch_num | |||||
| print("epoch: [{:3d}/{:3d}], step:[{:5d}/{:5d}], loss:[{:5.3f}/{:5.3f}], time:[{:5.3f}], lr:[{:5.3f}]".format( | |||||
| cb_params.cur_epoch_num - 1 + config.finish_epoch, cb_params.epoch_num + config.finish_epoch, | |||||
| cur_step_in_epoch, cb_params.batch_num, step_loss, | |||||
| np.mean(self.losses), step_mseconds, self.lr_init[cb_params.cur_step_num - 1])) | |||||
| if __name__ == '__main__': | |||||
| parser = argparse.ArgumentParser(description='image classification training') | |||||
| parser.add_argument('--is_distributed', action='store_true', default=False, help='distributed training') | |||||
| parser.add_argument('--device_target', type=str, default='Ascend', help='run platform') | |||||
| parser.add_argument('--dataset_path', type=str, default=None, help='dataset path') | |||||
| parser.add_argument('--resume', type=str, default='', help='resume training with existed checkpoint') | |||||
| args_opt = parser.parse_args() | |||||
| # init distributed | |||||
| if args_opt.is_distributed: | |||||
| if os.getenv('DEVICE_ID', "not_set").isdigit(): | |||||
| context.set_context(device_id=int(os.getenv('DEVICE_ID'))) | |||||
| rank = get_rank() | |||||
| group_size = get_group_size() | |||||
| parallel_mode = ParallelMode.DATA_PARALLEL | |||||
| context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=group_size, gradients_mean=True) | |||||
| init() | |||||
| else: | |||||
| rank = 0 | |||||
| group_size = 1 | |||||
| context.set_context(device_id=0) | |||||
| if args_opt.device_target == "Ascend": | |||||
| #train on Ascend | |||||
| context.set_context(mode=context.GRAPH_MODE, device_target='Ascend', save_graphs=False) | |||||
| # define network | |||||
| net = xception(class_num=config.class_num) | |||||
| net.to_float(mstype.float16) | |||||
| # define loss | |||||
| if not config.use_label_smooth: | |||||
| config.label_smooth_factor = 0.0 | |||||
| loss = CrossEntropySmooth(smooth_factor=config.label_smooth_factor, num_classes=config.class_num) | |||||
| # define dataset | |||||
| dataset = create_dataset(args_opt.dataset_path, do_train=True, batch_size=config.batch_size, | |||||
| device_num=group_size, rank=rank) | |||||
| step_size = dataset.get_dataset_size() | |||||
| # resume | |||||
| if args_opt.resume: | |||||
| ckpt = load_checkpoint(args_opt.resume) | |||||
| load_param_into_net(net, ckpt) | |||||
| # get learning rate | |||||
| loss_scale = FixedLossScaleManager(config.loss_scale, drop_overflow_update=False) | |||||
| lr = Tensor(get_lr(lr_init=config.lr_init, | |||||
| lr_end=config.lr_end, | |||||
| lr_max=config.lr_max, | |||||
| warmup_epochs=config.warmup_epochs, | |||||
| total_epochs=config.epoch_size, | |||||
| steps_per_epoch=step_size, | |||||
| lr_decay_mode=config.lr_decay_mode)) | |||||
| # define optimization | |||||
| opt = Momentum(net.trainable_params(), lr, config.momentum, config.weight_decay, config.loss_scale) | |||||
| # define model | |||||
| model = Model(net, loss_fn=loss, optimizer=opt, loss_scale_manager=loss_scale, metrics={'acc'}, | |||||
| amp_level='O3', keep_batchnorm_fp32=True) | |||||
| # define callbacks | |||||
| cb = [Monitor(lr_init=lr.asnumpy())] | |||||
| if config.save_checkpoint: | |||||
| save_ckpt_path = os.path.join(config.save_checkpoint_path, 'model_' + str(rank) + '/') | |||||
| config_ck = CheckpointConfig(save_checkpoint_steps=config.save_checkpoint_epochs * step_size, | |||||
| keep_checkpoint_max=config.keep_checkpoint_max) | |||||
| ckpt_cb = ModelCheckpoint(f"Xception-rank{rank}", directory=save_ckpt_path, config=config_ck) | |||||
| # begin train | |||||
| if args_opt.is_distributed: | |||||
| if rank == 0: | |||||
| cb += [ckpt_cb] | |||||
| model.train(config.epoch_size - config.finish_epoch, dataset, callbacks=cb, dataset_sink_mode=False) | |||||
| else: | |||||
| cb += [ckpt_cb] | |||||
| model.train(config.epoch_size - config.finish_epoch, dataset, callbacks=cb, dataset_sink_mode=False) | |||||
| print("train success") | |||||
| else: | |||||
| raise ValueError("Unsupported device_target.") | |||||