!6274 new add densenet121 and update resnet, vgg.

Merge pull request !6274 from linqingke/fasterrcnn
5 years ago · a8851f308b
--- a/model_zoo/official/cv/densenet121/README.md
+++ b/model_zoo/official/cv/densenet121/README.md
@@ -0,0 +1,272 @@
 # Contents

 - [DenseNet121 Description](#densenet121-description)
 - [Model Architecture](#model-architecture)
 - [Dataset](#dataset)
 - [Features](#features)
    - [Mixed Precision](#mixed-precision)
 - [Environment Requirements](#environment-requirements)
 - [Quick Start](#quick-start)    
 - [Script Description](#script-description)
    - [Script and Sample Code](#script-and-sample-code)
    - [Script Parameters](#script-parameters)
    - [Training Process](#training-process)
        - [Training](#training)
        - [Distributed Training](#distributed-training)  
    - [Evaluation Process](#evaluation-process)
        - [Evaluation](#evaluation)
 - [Model Description](#model-description)
    - [Performance](#performance)  
        - [Training accuracy results](#training-accuracy-results)
        - [Training performance results](#yraining-performance-results)
 - [Description of Random Situation](#description-of-random-situation)
 - [ModelZoo Homepage](#modelzoo-homepage)


 # [DenseNet121 Description](#contents)

 DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet121 is a implementation on [MindSpore](https://www.mindspore.cn/).

 The repository also contains scripts to launch training and inference routines.

 # [Model Architecture](#contents)

 DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.



 # [Dataset](#contents)

 Dataset used: ImageNet 
 The default configuration of the Dataset are as follows:
 - Training Dataset preprocess:
   - Input size of images is 224\*224
   - Range (min, max) of respective size of the original size to be cropped is (0.08, 1.0)
   - Range (min, max) of aspect ratio to be cropped is (0.75, 1.333)
   - Probability of the image being flipped set to 0.5
   - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4)
   - Normalize the input image with respect to mean and standard deviation

 - Test Dataset preprocess:
   - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center)
   - Normalize the input image with respect to mean and standard deviation



 # [Features](#contents)

 ## Mixed Precision

 The [mixed precision](https://www.mindspore.cn/tutorial/zh-CN/master/advanced_use/mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware. 
 For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.



 # [Environment Requirements](#contents)

 - Hardware（Ascend）
  - Prepare hardware environment with Ascend AI processor. If you want to try Ascend  , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. 
 - Framework
  - [MindSpore](https://www.mindspore.cn/install/en)
 - For more information, please check the resources below：
  - [MindSpore tutorials](https://www.mindspore.cn/tutorial/zh-CN/master/index.html) 
  - [MindSpore API](https://www.mindspore.cn/api/zh-CN/master/index.html)



 # [Quick Start](#contents)

 After installing MindSpore via the official website, you can start training and evaluation as follows: 

  ```python
  # run training example
  python train.py --data_dir /PATH/TO/DATASET --is_distributed 0> train.log 2>&1 & 
  
  # run distributed training example
  sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET
  
  # run evaluation example
  python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT> eval.log 2>&1 & 
  OR
  sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
  ```

  For distributed training, a hccl configuration file with JSON format needs to be created in advance.

  Please follow the instructions in the link below:

  https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.



 # [Script Description](#contents)

 ## [Script and Sample Code](#contents)

 ```
 ├── model_zoo
    ├── README.md                          // descriptions about all the models
    ├── densenet121        
        ├── README.md                    // descriptions about densenet121
        ├── scripts 
        │   ├── run_distribute_train.sh             // shell script for distributed on Ascend
        │   ├── run_distribute_eval.sh              // shell script for evaluation on Ascend
        ├── src 
        │   ├── datasets             // dataset processing function
        │   ├── losses          
        │       ├──crossentropy.py            // densenet loss function
        │   ├── lr_scheduler           
        │       ├──lr_scheduler.py            // densenet learning rate schedule function
        │   ├── network            
        │       ├──densenet.py            // densenet architecture
        │   ├──optimizers            // densenet optimize function
        │   ├──utils            
        │       ├──logging.py            // logging function
        │       ├──var_init.py            // densenet variable init function
        │   ├── config.py             // network config
        ├── train.py               // training script 
        ├── eval.py               //  evaluation script 
 ```

 ## [Script Parameters](#contents)

 You can modify the training behaviour through the various flags in the `train.py` script. Flags in the `train.py` script are as follows:

 ```
  --data_dir              train data dir
  --num_classes           num of classes in dataset（default:1000)
  --image_size            image size of the dataset
  --per_batch_size        mini-batch size (default: 256) per gpu
  --pretrained            path of pretrained model
  --lr_scheduler          type of LR schedule: exponential, cosine_annealing
  --lr                    initial learning rate
  --lr_epochs             epoch milestone of lr changing
  --lr_gamma              decrease lr by a factor of exponential lr_scheduler
  --eta_min               eta_min in cosine_annealing scheduler
  --T_max                 T_max in cosine_annealing scheduler
  --max_epoch             max epoch num to train the model
  --warmup_epochs         warmup epoch(when batchsize is large)
  --weight_decay          weight decay (default: 1e-4)
  --momentum              momentum(default: 0.9)
  --label_smooth          whether to use label smooth in CE
  --label_smooth_factor   smooth strength of original one-hot
  --log_interval          logging interval(dafault:100)
  --ckpt_path             path to save checkpoint
  --ckpt_interval         the interval to save checkpoint
  --is_save_on_master     save checkpoint on master or all rank
  --is_distributed        if multi device(default: 1)
  --rank                  local rank of distributed(default: 0)
  --group_size            world size of distributed(default: 1)
 ```



 ## [Training Process](#contents)

 ### Training 

 - running on Ascend

  ```
  python train.py --data_dir /PATH/TO/DATASET --is_distributed 0 > train.log 2>&1 & 
  ```
  
  The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
   
  ```
  2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
  2020-08-22 16:58:56,619:INFO:local passed
  2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
  2020-08-22 17:02:19,921:INFO:local passed
  2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
  2020-08-22 17:05:43,113:INFO:local passed
  ...
  ```



 ### Distributed Training

 - running on Ascend

  ```
  sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET
  ```
  
  The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value will be achieved as follows:
  
  ```
  2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
  2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
  2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
  2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
  2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
  2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
  ...
  ...
  ```



 ## [Evaluation Process](#contents)

 ### Evaluation

 - evaluation on Ascend

  running the command below for evaluation. 
  
  ```
  python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT> eval.log 2>&1 & 
  OR
  sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
  ```
  
  The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows:
  
  ```
  2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
  2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
  ```




 # [Model Description](#contents)
 ## [Performance](#contents)

 ### Training accuracy results

 | Parameters          | Densenet                    |
 | ------------------- | --------------------------- |
 | Model Version       | Inception V1                |
 | Resource            | Ascend 910                  |
 | Uploaded Date       | 09/15/2020 (month/day/year) |
 | MindSpore Version   | 1.0.0              |
 | Dataset             | ImageNet                    |
 | epochs              | 120                         |
 | outputs             | probability                 |
 | train performance   | Top1:75.13%; Top5:92.57%    |

 ### Training performance results

 | Parameters          | Densenet                    |
 | ------------------- | --------------------------- |
 | Model Version       | Inception V1                |
 | Resource            | Ascend 910                  |
 | Uploaded Date       | 09/15/2020 (month/day/year) |
 | MindSpore Version   | 1.0.0                 |
 | Dataset             | ImageNet                    |
 | batch_size          | 32                          |
 | outputs             | probability                 |
 | speed               | 1pc:760 img/s;8pc:6000 img/s|



 # [Description of Random Situation](#contents)

 In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py. 


 # [ModelZoo Homepage](#contents)  
 Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).  
--- a/model_zoo/official/cv/densenet121/eval.py
+++ b/model_zoo/official/cv/densenet121/eval.py
@@ -0,0 +1,244 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 """
 ##############test densenet example#################
 python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT
 """

 import os
 import argparse
 import datetime
 import glob
 import numpy as np
 from mindspore import context

 import mindspore.nn as nn
 from mindspore import Tensor
 from mindspore.communication.management import init, get_rank, get_group_size, release
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from mindspore.ops import operations as P
 from mindspore.ops import functional as F
 from mindspore.common import dtype as mstype

 from src.utils.logging import get_logger
 from src.datasets import classification_dataset
 from src.network import DenseNet121
 from src.config import config

 devid = int(os.getenv('DEVICE_ID'))
 context.set_context(mode=context.GRAPH_MODE, device_target="Davinci",
                    save_graphs=True, device_id=devid)


 class ParameterReduce(nn.Cell):
    """
    reduce parameter
    """
    def __init__(self):
        super(ParameterReduce, self).__init__()
        self.cast = P.Cast()
        self.reduce = P.AllReduce()

    def construct(self, x):
        one = self.cast(F.scalar_to_array(1.0), mstype.float32)
        out = x * one
        ret = self.reduce(out)
        return ret


 def parse_args(cloud_args=None):
    """
    parse args
    """
    parser = argparse.ArgumentParser('mindspore classification test')

    # dataset related
    parser.add_argument('--data_dir', type=str, default='', help='eval data dir')
    parser.add_argument('--num_classes', type=int, default=1000, help='num of classes in dataset')
    parser.add_argument('--image_size', type=str, default='224,224', help='image size of the dataset')
    # network related
    parser.add_argument('--backbone', default='resnet50', help='backbone')
    parser.add_argument('--pretrained', default='', type=str, help='fully path of pretrained model to load.'
                                                                   'If it is a direction, it will test all ckpt')

    # logging related
    parser.add_argument('--log_path', type=str, default='outputs/', help='path to save log')
    parser.add_argument('--is_distributed', type=int, default=1, help='if multi device')
    parser.add_argument('--rank', type=int, default=0, help='local rank of distributed')
    parser.add_argument('--group_size', type=int, default=1, help='world size of distributed')

    # roma obs
    parser.add_argument('--train_url', type=str, default="", help='train url')

    args, _ = parser.parse_known_args()
    args = merge_args(args, cloud_args)

    args.per_batch_size = config.per_batch_size
    args.image_size = list(map(int, args.image_size.split(',')))

    return args


 def get_top5_acc(top5_arg, gt_class):
    sub_count = 0
    for top5, gt in zip(top5_arg, gt_class):
        if gt in top5:
            sub_count += 1
    return sub_count

 def merge_args(args, cloud_args):
    """
    merge args and cloud_args
    """
    args_dict = vars(args)
    if isinstance(cloud_args, dict):
        for key in cloud_args.keys():
            val = cloud_args[key]
            if key in args_dict and val:
                arg_type = type(args_dict[key])
                if arg_type is not type(None):
                    val = arg_type(val)
                args_dict[key] = val
    return args

 def test(cloud_args=None):
    """
    network eval function. Get top1 and top5 ACC from classification.
    The result will be save at [./outputs] by default.
    """
    args = parse_args(cloud_args)

    # init distributed
    if args.is_distributed:
        init()
        args.rank = get_rank()
        args.group_size = get_group_size()

    args.outputs_dir = os.path.join(args.log_path,
                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))

    args.logger = get_logger(args.outputs_dir, args.rank)
    args.logger.save_args(args)

    # network
    args.logger.important_info('start create network')
    if os.path.isdir(args.pretrained):
        models = list(glob.glob(os.path.join(args.pretrained, '*.ckpt')))

        f = lambda x: -1 * int(os.path.splitext(os.path.split(x)[-1])[0].split('-')[-1].split('_')[0])

        args.models = sorted(models, key=f)
    else:
        args.models = [args.pretrained,]

    for model in args.models:
        de_dataset = classification_dataset(args.data_dir, image_size=args.image_size,
                                            per_batch_size=args.per_batch_size,
                                            max_epoch=1, rank=args.rank, group_size=args.group_size,
                                            mode='eval')
        eval_dataloader = de_dataset.create_tuple_iterator()
        network = DenseNet121(args.num_classes)

        param_dict = load_checkpoint(model)
        param_dict_new = {}
        for key, values in param_dict.items():
            if key.startswith('moments.'):
                continue
            elif key.startswith('network.'):
                param_dict_new[key[8:]] = values
            else:
                param_dict_new[key] = values
        load_param_into_net(network, param_dict_new)
        args.logger.info('load model {} success'.format(model))

        network.add_flags_recursive(fp16=True)

        img_tot = 0
        top1_correct = 0
        top5_correct = 0
        network.set_train(False)
        for data, gt_classes in eval_dataloader:
            output = network(Tensor(data, mstype.float32))
            output = output.asnumpy()
            gt_classes = gt_classes.asnumpy()

            top1_output = np.argmax(output, (-1))
            top5_output = np.argsort(output)[:, -5:]

            t1_correct = np.equal(top1_output, gt_classes).sum()
            top1_correct += t1_correct
            top5_correct += get_top5_acc(top5_output, gt_classes)
            img_tot += args.per_batch_size

        results = [[top1_correct], [top5_correct], [img_tot]]
        args.logger.info('before results={}'.format(results))
        if args.is_distributed:
            model_md5 = model.replace('/', '')
            tmp_dir = '../cache'
            if not os.path.exists(tmp_dir):
                os.mkdir(tmp_dir)
            top1_correct_npy = '{}/top1_rank_{}_{}.npy'.format(tmp_dir, args.rank, model_md5)
            top5_correct_npy = '{}/top5_rank_{}_{}.npy'.format(tmp_dir, args.rank, model_md5)
            img_tot_npy = '{}/img_tot_rank_{}_{}.npy'.format(tmp_dir, args.rank, model_md5)
            np.save(top1_correct_npy, top1_correct)
            np.save(top5_correct_npy, top5_correct)
            np.save(img_tot_npy, img_tot)
            while True:
                rank_ok = True
                for other_rank in range(args.group_size):
                    top1_correct_npy = '{}/top1_rank_{}_{}.npy'.format(tmp_dir, other_rank, model_md5)
                    top5_correct_npy = '{}/top5_rank_{}_{}.npy'.format(tmp_dir, other_rank, model_md5)
                    img_tot_npy = '{}/img_tot_rank_{}_{}.npy'.format(tmp_dir, other_rank, model_md5)
                    if not os.path.exists(top1_correct_npy) or not os.path.exists(top5_correct_npy) \
                       or not os.path.exists(img_tot_npy):
                        rank_ok = False
                if rank_ok:
                    break

            top1_correct_all = 0
            top5_correct_all = 0
            img_tot_all = 0
            for other_rank in range(args.group_size):
                top1_correct_npy = '{}/top1_rank_{}_{}.npy'.format(tmp_dir, other_rank, model_md5)
                top5_correct_npy = '{}/top5_rank_{}_{}.npy'.format(tmp_dir, other_rank, model_md5)
                img_tot_npy = '{}/img_tot_rank_{}_{}.npy'.format(tmp_dir, other_rank, model_md5)
                top1_correct_all += np.load(top1_correct_npy)
                top5_correct_all += np.load(top5_correct_npy)
                img_tot_all += np.load(img_tot_npy)
            results = [[top1_correct_all], [top5_correct_all], [img_tot_all]]
            results = np.array(results)

        else:
            results = np.array(results)

        args.logger.info('after results={}'.format(results))
        top1_correct = results[0, 0]
        top5_correct = results[1, 0]
        img_tot = results[2, 0]
        acc1 = 100.0 * top1_correct / img_tot
        acc5 = 100.0 * top5_correct / img_tot
        args.logger.info('after allreduce eval: top1_correct={}, tot={}, acc={:.2f}%'.format(top1_correct,
                                                                                             img_tot,
                                                                                             acc1))
        args.logger.info('after allreduce eval: top5_correct={}, tot={}, acc={:.2f}%'.format(top5_correct,
                                                                                             img_tot,
                                                                                             acc5))
    if args.is_distributed:
        release()


 if __name__ == "__main__":
    test()
--- a/model_zoo/official/cv/densenet121/scripts/run_distribute_eval.sh
+++ b/model_zoo/official/cv/densenet121/scripts/run_distribute_eval.sh
@@ -0,0 +1,48 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 echo "=============================================================================================================="
 echo "Please run the scipt as: "
 echo "sh run_distribute_eval.sh DEVICE_NUM RANK_TABLE_FILE DATASET CKPT_PATH"
 echo "for example: sh run_distribute_train.sh 8 /data/hccl.json /path/to/dataset /path/to/ckpt"
 echo "It is better to use absolute path."
 echo "================================================================================================================="

 echo "After running the scipt, the network runs in the background. The log will be generated in eval_x/log.txt"

 export RANK_SIZE=$1
 export RANK_TABLE_FILE=$2
 DATASET=$3
 CKPT_PATH=$4

 for((i=0;i<RANK_SIZE;i++))
 do
    export DEVICE_ID=$i
    rm -rf eval_$i
    mkdir ./eval_$i
    cp ./*.py ./eval_$i
    cp -r ./src ./eval_$i
    cd ./eval_$i || exit
    export RANK_ID=$i
    echo "start training for rank $i, device $DEVICE_ID"
    env > env.log
    python eval.py  \
    --data_dir=$DATASET  \
    --pretrained=$CKPT_PATH > log.txt 2>&1 &

    cd ../
 done

--- a/model_zoo/official/cv/densenet121/scripts/run_distribute_train.sh
+++ b/model_zoo/official/cv/densenet121/scripts/run_distribute_train.sh
@@ -0,0 +1,45 @@
 #!/bin/bash
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 echo "=============================================================================================================="
 echo "Please run the scipt as: "
 echo "sh scipts/run_distribute_train.sh DEVICE_NUM RANK_TABLE_FILE DATASET"
 echo "for example: sh scipts/run_distribute_train.sh 8 /data/hccl.json /path/to/dataset"
 echo "It is better to use absolute path."
 echo "================================================================================================================="

 echo "After running the scipt, the network runs in the background. The log will be generated in train_x/log.txt"

 export RANK_SIZE=$1
 export RANK_TABLE_FILE=$2
 DATASET=$3

 for((i=0;i<RANK_SIZE;i++))
 do
    export DEVICE_ID=$i
    rm -rf train_$i
    mkdir ./train_$i
    cp ./*.py ./train_$i
    cp -r ./src ./train_$i
    cd ./train_$i || exit
    export RANK_ID=$i
    echo "start training for rank $i, device $DEVICE_ID"
    env > env.log
    python train.py  \
    --data_dir=$DATASET > log.txt 2>&1 &

    cd ../
 done
--- a/model_zoo/official/cv/densenet121/src/init.py
+++ b/model_zoo/official/cv/densenet121/src/init.py
--- a/model_zoo/official/cv/densenet121/src/config.py
+++ b/model_zoo/official/cv/densenet121/src/config.py
@@ -0,0 +1,46 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """config"""
 from easydict import EasyDict as ed

 config = ed({
    "image_size": '224,224',
    "num_classes": 1000,

    "lr": 0.1,
    "lr_scheduler": 'cosine_annealing',
    "lr_epochs": '30,60,90,120',
    "lr_gamma": 0.1,
    "eta_min": 0,
    "T_max": 120,
    "max_epoch": 120,
    "per_batch_size": 32,
    "warmup_epochs": 0,

    "weight_decay": 0.0001,
    "momentum": 0.9,
    "is_dynamic_loss_scale": 0,
    "loss_scale": 1024,
    "label_smooth": 0,
    "label_smooth_factor": 0.1,

    "log_interval": 100,
    "ckpt_interval": 2000,
    "ckpt_path": 'outputs/',
    "is_save_on_master": 1,

    "rank": 0,
    "group_size": 1
 })
--- a/model_zoo/official/cv/densenet121/src/datasets/init.py
+++ b/model_zoo/official/cv/densenet121/src/datasets/init.py
@@ -0,0 +1,22 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 """
 read dataset for classification
 """

 from .classification import classification_dataset

 __all__ = ["classification_dataset"]
--- a/model_zoo/official/cv/densenet121/src/datasets/classification.py
+++ b/model_zoo/official/cv/densenet121/src/datasets/classification.py
@@ -0,0 +1,155 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 """
 A function that returns a dataset for classification.
 """

 import os
 from PIL import Image, ImageFile
 from mindspore import dtype as mstype
 import mindspore.dataset as de
 import mindspore.dataset.vision.c_transforms as vision_C
 import mindspore.dataset.transforms.c_transforms as normal_C
 from src.datasets.sampler import DistributedSampler

 ImageFile.LOAD_TRUNCATED_IMAGES = True

 class TxtDataset():
    """
    read dataset from txt
    """
    def __init__(self, root, txt_name):
        super(TxtDataset, self).__init__()
        self.imgs = []
        self.labels = []
        fin = open(txt_name, "r")
        for line in fin:
            img_name, label = line.strip().split(' ')
            self.imgs.append(os.path.join(root, img_name))
            self.labels.append(int(label))
        fin.close()

    def __getitem__(self, index):
        img = Image.open(self.imgs[index]).convert('RGB')
        return img, self.labels[index]

    def __len__(self):
        return len(self.imgs)


 def classification_dataset(data_dir, image_size, per_batch_size, max_epoch, rank, group_size,
                           mode='train',
                           input_mode='folder',
                           root='',
                           num_parallel_workers=None,
                           shuffle=None,
                           sampler=None,
                           class_indexing=None,
                           drop_remainder=True,
                           transform=None,
                           target_transform=None):
    """
    A function that returns a dataset for classification. The mode of input dataset could be "folder" or "txt".
    If it is "folder", all images within one folder have the same label. If it is "txt", all paths of images
    are written into a textfile.

    Args:
        data_dir (str): Path to the root directory that contains the dataset for "input_mode="folder"".
            Or path of the textfile that contains every image's path of the dataset.
        image_size (str): Size of the input images.
        per_batch_size (int): the batch size of evey step during training.
        max_epoch (int): the number of epochs.
        rank (int): The shard ID within num_shards (default=None).
        group_size (int): Number of shards that the dataset should be divided
            into (default=None).
        mode (str): "train" or others. Default: " train".
        input_mode (str): The form of the input dataset. "folder" or "txt". Default: "folder".
        root (str): the images path for "input_mode="txt"". Default: " ".
        num_parallel_workers (int): Number of workers to read the data. Default: None.
        shuffle (bool): Whether or not to perform shuffle on the dataset
            (default=None, performs shuffle).
        sampler (Sampler): Object used to choose samples from the dataset. Default: None.
        class_indexing (dict): A str-to-int mapping from folder name to index
            (default=None, the folder names will be sorted
            alphabetically and each class will be given a
            unique index starting from 0).

    Examples:
        >>> from src.datasets.classification import classification_dataset
        >>> # path to imagefolder directory. This directory needs to contain sub-directories which contain the images
        >>> dataset_dir = "/path/to/imagefolder_directory"
        >>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
        >>>                               per_batch_size=64, max_epoch=100,
        >>>                               rank=0, group_size=4)
        >>> # Path of the textfile that contains every image's path of the dataset.
        >>> dataset_dir = "/path/to/dataset/images/train.txt"
        >>> images_dir = "/path/to/dataset/images"
        >>> de_dataset = classification_dataset(train_data_dir, image_size=[224, 244],
        >>>                               per_batch_size=64, max_epoch=100,
        >>>                               rank=0, group_size=4,
        >>>                               input_mode="txt", root=images_dir)
    """

    mean = [0.485 * 255, 0.456 * 255, 0.406 * 255]
    std = [0.229 * 255, 0.224 * 255, 0.225 * 255]

    if transform is None:
        if mode == 'train':
            transform_img = [
                vision_C.RandomCropDecodeResize(image_size, scale=(0.08, 1.0), ratio=(0.75, 1.333)),
                vision_C.RandomHorizontalFlip(prob=0.5),
                vision_C.RandomColorAdjust(brightness=0.4, contrast=0.4, saturation=0.4),
                vision_C.Normalize(mean=mean, std=std),
                vision_C.HWC2CHW()
            ]
        else:
            transform_img = [
                vision_C.Decode(),
                vision_C.Resize((256, 256)),
                vision_C.CenterCrop(image_size),
                vision_C.Normalize(mean=mean, std=std),
                vision_C.HWC2CHW()
            ]
    else:
        transform_img = transform

    if target_transform is None:
        transform_label = [
            normal_C.TypeCast(mstype.int32)
        ]
    else:
        transform_label = target_transform

    if input_mode == 'folder':
        de_dataset = de.ImageFolderDataset(data_dir, num_parallel_workers=num_parallel_workers,
                                           shuffle=shuffle, sampler=sampler, class_indexing=class_indexing,
                                           num_shards=group_size, shard_id=rank)
    else:
        dataset = TxtDataset(root, data_dir)
        sampler = DistributedSampler(dataset, rank, group_size, shuffle=shuffle)
        de_dataset = de.GeneratorDataset(dataset, ["image", "label"], sampler=sampler)
        de_dataset.set_dataset_size(len(sampler))

    de_dataset = de_dataset.map(input_columns="image", num_parallel_workers=8, operations=transform_img)
    de_dataset = de_dataset.map(input_columns="label", num_parallel_workers=8, operations=transform_label)

    columns_to_project = ["image", "label"]
    de_dataset = de_dataset.project(columns=columns_to_project)

    de_dataset = de_dataset.batch(per_batch_size, drop_remainder=drop_remainder)
    de_dataset = de_dataset.repeat(1)

    return de_dataset
--- a/model_zoo/official/cv/densenet121/src/datasets/sampler.py
+++ b/model_zoo/official/cv/densenet121/src/datasets/sampler.py
@@ -0,0 +1,51 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 """
 shuffle and distribute sample
 """

 import math
 import numpy as np


 class DistributedSampler():
    """
    function to distribute and shuffle sample
    """
    def __init__(self, dataset, rank, group_size, shuffle=True, seed=0):
        self.dataset = dataset
        self.rank = rank
        self.group_size = group_size
        self.dataset_length = len(self.dataset)
        self.num_samples = int(math.ceil(self.dataset_length * 1.0 / self.group_size))
        self.total_size = self.num_samples * self.group_size
        self.shuffle = shuffle
        self.seed = seed

    def __iter__(self):
        if self.shuffle:
            self.seed = (self.seed + 1) & 0xffffffff
            np.random.seed(self.seed)
            indices = np.random.permutation(self.dataset_length).tolist()
        else:
            indices = list(range(len(self.dataset_length)))

        indices += indices[:(self.total_size - len(indices))]
        indices = indices[self.rank::self.group_size]
        return iter(indices)

    def __len__(self):
        return self.num_samples
--- a/model_zoo/official/cv/densenet121/src/losses/init.py
+++ b/model_zoo/official/cv/densenet121/src/losses/init.py
@@ -0,0 +1,19 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 loss function
 """

 from .crossentropy import *
--- a/model_zoo/official/cv/densenet121/src/losses/crossentropy.py
+++ b/model_zoo/official/cv/densenet121/src/losses/crossentropy.py
@@ -0,0 +1,44 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 loss function CrossEntropy
 """

 from mindspore.nn.loss.loss import _Loss
 from mindspore.ops import operations as P
 from mindspore.ops import functional as F
 from mindspore import Tensor
 from mindspore.common import dtype as mstype
 import mindspore.nn as nn


 class CrossEntropy(_Loss):
    """
    loss function CrossEntropy
    """
    def __init__(self, smooth_factor=0., num_classes=1000):
        super(CrossEntropy, self).__init__()
        self.onehot = P.OneHot()
        self.on_value = Tensor(1.0 - smooth_factor, mstype.float32)
        self.off_value = Tensor(1.0 * smooth_factor / (num_classes -1), mstype.float32)
        self.ce = nn.SoftmaxCrossEntropyWithLogits()
        self.mean = P.ReduceMean(False)

    def construct(self, logit, label):
        one_hot_label = self.onehot(label,
                                    F.shape(logit)[1], self.on_value, self.off_value)
        loss = self.ce(logit, one_hot_label)
        loss = self.mean(loss, 0)
        return loss
--- a/model_zoo/official/cv/densenet121/src/lr_scheduler/init.py
+++ b/model_zoo/official/cv/densenet121/src/lr_scheduler/init.py
@@ -0,0 +1,19 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 """
 learning rate scheduler
 """
 from .lr_scheduler import *
--- a/model_zoo/official/cv/densenet121/src/lr_scheduler/lr_scheduler.py
+++ b/model_zoo/official/cv/densenet121/src/lr_scheduler/lr_scheduler.py
@@ -0,0 +1,656 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 learning rate scheduler
 """

 import math
 from collections import Counter
 import numpy as np

 __all__ = ["LambdaLR", "MultiplicativeLR", "StepLR", "MultiStepLR", "ExponentialLR",
           "CosineAnnealingLR", "CyclicLR", "CosineAnnealingWarmRestarts", "OneCycleLR"]

 class _WarmUp():
    def __init__(self, warmup_init_lr):
        self.warmup_init_lr = warmup_init_lr

    def get_lr(self):
        # Get learning rate during warmup
        raise NotImplementedError

 class _LinearWarmUp(_WarmUp):
    """
    linear warmup function
    """
    def __init__(self, lr, warmup_epochs, steps_per_epoch, warmup_init_lr=0):
        self.base_lr = lr
        self.warmup_init_lr = warmup_init_lr
        self.warmup_steps = int(warmup_epochs * steps_per_epoch)

        super(_LinearWarmUp, self).__init__(warmup_init_lr)

    def get_warmup_steps(self):
        return self.warmup_steps

    def get_lr(self, current_step):
        lr_inc = (float(self.base_lr) - float(self.warmup_init_lr)) / float(self.warmup_steps)
        lr = float(self.warmup_init_lr) + lr_inc * current_step
        return lr

 class _ConstWarmUp(_WarmUp):

    def get_lr(self):
        return self.warmup_init_lr

 class _LRScheduler():

    def __init__(self, lr, max_epoch, steps_per_epoch):
        self.base_lr = lr
        self.steps_per_epoch = steps_per_epoch
        self.total_steps = int(max_epoch * steps_per_epoch)

    def get_lr(self):
        # Compute learning rate using chainable form of the scheduler
        raise NotImplementedError


 class LambdaLR(_LRScheduler):
    """Sets the learning rate to the initial lr times a given function.

    Args:
        lr (float): Initial learning rate which is the
            lower boundary in the cycle.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        lr_lambda (function or list): A function which computes a multiplicative
            factor given an integer parameter epoch.
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0
    Example:
        >>> # Assuming optimizer has two groups.
        >>> lambda1 = lambda epoch: epoch // 30
        >>> scheduler = LambdaLR(lr=0.1, lr_lambda=lambda1, steps_per_epoch=5000,
        >>>                      max_epoch=90, warmup_epochs=0)
        >>> lr = scheduler.get_lr()
    """

    def __init__(self, lr, lr_lambda, steps_per_epoch, max_epoch, warmup_epochs=0):
        self.lr_lambda = lr_lambda
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(LambdaLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                cur_ep = i // self.steps_per_epoch
                lr = self.base_lr * self.lr_lambda(cur_ep)
            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class MultiplicativeLR(_LRScheduler):
    """Multiply the learning rate by the factor given
    in the specified function.

    Args:
        lr_lambda (function or list): A function which computes a multiplicative
            factor given an integer parameter epoch,.

    Example:
        >>> lmbda = lambda epoch: 0.95
        >>> scheduler = MultiplicativeLR(lr=0.1, lr_lambda=lambda1, steps_per_epoch=5000,
        >>>                              max_epoch=90, warmup_epochs=0)
        >>> lr = scheduler.get_lr()
    """
    def __init__(self, lr, lr_lambda, steps_per_epoch, max_epoch, warmup_epochs=0):
        self.lr_lambda = lr_lambda
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(MultiplicativeLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        current_lr = self.base_lr
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                cur_ep = i // self.steps_per_epoch
                if i % self.steps_per_epoch == 0 and cur_ep > 0:
                    current_lr = current_lr * self.lr_lambda(cur_ep)

                lr = current_lr

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class StepLR(_LRScheduler):
    """Decays the learning rate by gamma every epoch_size epochs.

    Args:
        lr (float): Initial learning rate which is the
            lower boundary in the cycle.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        epoch_size (int): Period of learning rate decay.
        gamma (float): Multiplicative factor of learning rate decay.
            Default: 0.1.
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0

    Example:
        >>> # Assuming optimizer uses lr = 0.05 for all groups
        >>> # lr = 0.05     if epoch < 30
        >>> # lr = 0.005    if 30 <= epoch < 60
        >>> # lr = 0.0005   if 60 <= epoch < 90
        >>> # ...
        >>> scheduler = StepLR(lr=0.1, epoch_size=30, gamma=0.1, steps_per_epoch=5000,
        >>>                     max_epoch=90, warmup_epochs=0)
        >>> lr = scheduler.get_lr()
    """

    def __init__(self, lr, epoch_size, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
        self.epoch_size = epoch_size
        self.gamma = gamma
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(StepLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                cur_ep = i // self.steps_per_epoch
                lr = self.base_lr * self.gamma**(cur_ep // self.epoch_size)

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class MultiStepLR(_LRScheduler):
    """Decays the learning rate by gamma once the number of epoch reaches one
    of the milestones.

    Args:
        lr (float): Initial learning rate which is the
            lower boundary in the cycle.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        milestones (list): List of epoch indices. Must be increasing.
        gamma (float): Multiplicative factor of learning rate decay.
            Default: 0.1.
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0

    Example:
        >>> # Assuming optimizer uses lr = 0.05 for all groups
        >>> # lr = 0.05     if epoch < 30
        >>> # lr = 0.005    if 30 <= epoch < 80
        >>> # lr = 0.0005   if epoch >= 80
        >>> scheduler = MultiStepLR(lr=0.1, milestones=[30,80], gamma=0.1, steps_per_epoch=5000,
        >>>                         max_epoch=90, warmup_epochs=0)
        >>> lr = scheduler.get_lr()
    """

    def __init__(self, lr, milestones, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
        self.milestones = Counter(milestones)
        self.gamma = gamma
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(MultiStepLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        current_lr = self.base_lr
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                cur_ep = i // self.steps_per_epoch
                if i % self.steps_per_epoch == 0 and cur_ep in self.milestones:
                    current_lr = current_lr * self.gamma
                lr = current_lr

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class ExponentialLR(_LRScheduler):
    """Decays the learning rate of each parameter group by gamma every epoch.

    Args:
        lr (float): Initial learning rate which is the
            lower boundary in the cycle.
        gamma (float): Multiplicative factor of learning rate decay.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0
    """

    def __init__(self, lr, gamma, steps_per_epoch, max_epoch, warmup_epochs=0):
        self.gamma = gamma
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(ExponentialLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        current_lr = self.base_lr
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                if i % self.steps_per_epoch == 0 and i > 0:
                    current_lr = current_lr * self.gamma
                lr = current_lr

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class CosineAnnealingLR(_LRScheduler):
    r"""Set the learning rate using a cosine annealing schedule, where
    :math:`\eta_{max}` is set to the initial lr and :math:`T_{cur}` is the
    number of epochs since the last restart in SGDR:

    .. math::
        \begin{aligned}
            \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1
            + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right),
            & T_{cur} \neq (2k+1)T_{max}; \\
            \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min})
            \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right),
            & T_{cur} = (2k+1)T_{max}.
        \end{aligned}

    It has been proposed in
    `SGDR: Stochastic Gradient Descent with Warm Restarts`_. Note that this only
    implements the cosine annealing part of SGDR, and not the restarts.

    Args:
        lr (float): Initial learning rate which is the
            lower boundary in the cycle.
        T_max (int): Maximum number of iterations.
        eta_min (float): Minimum learning rate. Default: 0.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0

    .. _SGDR\: Stochastic Gradient Descent with Warm Restarts:
        https://arxiv.org/abs/1608.03983
    """

    def __init__(self, lr, T_max, steps_per_epoch, max_epoch, warmup_epochs=0, eta_min=0):
        self.T_max = T_max
        self.eta_min = eta_min
        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(CosineAnnealingLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        current_lr = self.base_lr
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                cur_ep = i // self.steps_per_epoch
                if i % self.steps_per_epoch == 0 and i > 0:
                    current_lr = self.eta_min + \
                                 (self.base_lr - self.eta_min) * (1. + math.cos(math.pi*cur_ep / self.T_max)) / 2

                lr = current_lr

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class CyclicLR(_LRScheduler):
    r"""Sets the learning rate according to cyclical learning rate policy (CLR).
    The policy cycles the learning rate between two boundaries with a constant
    frequency, as detailed in the paper `Cyclical Learning Rates for Training
    Neural Networks`_. The distance between the two boundaries can be scaled on
    a per-iteration or per-cycle basis.

    Cyclical learning rate policy changes the learning rate after every batch.

    This class has three built-in policies, as put forth in the paper:

    * "triangular": A basic triangular cycle without amplitude scaling.
    * "triangular2": A basic triangular cycle that scales initial amplitude by half each cycle.
    * "exp_range": A cycle that scales initial amplitude by :math:`\text{gamma}^{\text{cycle iterations}}`
      at each cycle iteration.

    This implementation was adapted from the github repo: `bckenstler/CLR`_

    Args:
        lr (float): Initial learning rate which is the
            lower boundary in the cycle.
        max_lr (float): Upper learning rate boundaries in the cycle.
            Functionally, it defines the cycle amplitude (max_lr - base_lr).
            The lr at any cycle is the sum of base_lr and some scaling
            of the amplitude; therefore max_lr may not actually be reached
            depending on scaling function.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        step_size_up (int): Number of training iterations in the
            increasing half of a cycle. Default: 2000
        step_size_down (int): Number of training iterations in the
            decreasing half of a cycle. If step_size_down is None,
            it is set to step_size_up. Default: None
        mode (str): One of {triangular, triangular2, exp_range}.
            Values correspond to policies detailed above.
            If scale_fn is not None, this argument is ignored.
            Default: 'triangular'
        gamma (float): Constant in 'exp_range' scaling function:
            gamma**(cycle iterations)
            Default: 1.0
        scale_fn (function): Custom scaling policy defined by a single
            argument lambda function, where
            0 <= scale_fn(x) <= 1 for all x >= 0.
            If specified, then 'mode' is ignored.
            Default: None
        scale_mode (str): {'cycle', 'iterations'}.
            Defines whether scale_fn is evaluated on
            cycle number or cycle iterations (training
            iterations since start of cycle).
            Default: 'cycle'
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0

    .. _Cyclical Learning Rates for Training Neural Networks: https://arxiv.org/abs/1506.01186
    .. _bckenstler/CLR: https://github.com/bckenstler/CLR
    """

    def __init__(self,
                 lr,
                 max_lr,
                 steps_per_epoch,
                 max_epoch,
                 step_size_up=2000,
                 step_size_down=None,
                 mode='triangular',
                 gamma=1.,
                 scale_fn=None,
                 scale_mode='cycle',
                 warmup_epochs=0):

        self.max_lr = max_lr

        step_size_up = float(step_size_up)
        step_size_down = float(step_size_down) if step_size_down is not None else step_size_up
        self.total_size = step_size_up + step_size_down
        self.step_ratio = step_size_up / self.total_size

        if mode not in ['triangular', 'triangular2', 'exp_range'] \
                and scale_fn is None:
            raise ValueError('mode is invalid and scale_fn is None')

        self.mode = mode
        self.gamma = gamma

        if scale_fn is None:
            if self.mode == 'triangular':
                self.scale_fn = self._triangular_scale_fn
                self.scale_mode = 'cycle'
            elif self.mode == 'triangular2':
                self.scale_fn = self._triangular2_scale_fn
                self.scale_mode = 'cycle'
            elif self.mode == 'exp_range':
                self.scale_fn = self._exp_range_scale_fn
                self.scale_mode = 'iterations'
        else:
            self.scale_fn = scale_fn
            self.scale_mode = scale_mode

        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(CyclicLR, self).__init__(lr, max_epoch, steps_per_epoch)

    def _triangular_scale_fn(self, x):
        return 1.

    def _triangular2_scale_fn(self, x):
        return 1 / (2. ** (x - 1))

    def _exp_range_scale_fn(self, x):
        return self.gamma**(x)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                # Calculates the learning rate at batch index.
                cycle = math.floor(1 + i / self.total_size)
                x = 1. + i / self.total_size - cycle
                if x <= self.step_ratio:
                    scale_factor = x / self.step_ratio
                else:
                    scale_factor = (x - 1) / (self.step_ratio - 1)

                base_height = (self.max_lr - self.base_lr) * scale_factor
                if self.scale_mode == 'cycle':
                    lr = self.base_lr + base_height * self.scale_fn(cycle)
                else:
                    lr = self.base_lr + base_height * self.scale_fn(i)

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class CosineAnnealingWarmRestarts(_LRScheduler):
    r"""Set the learning rate using a cosine annealing schedule, where
    :math:`\eta_{max}` is set to the initial lr, :math:`T_{cur}` is the
    number of epochs since the last restart and :math:`T_{i}` is the number
    of epochs between two warm restarts in SGDR:

    .. math::
        \eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
        \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)

    When :math:`T_{cur}=T_{i}`, set :math:`\eta_t = \eta_{min}`.
    When :math:`T_{cur}=0` after restart, set :math:`\eta_t=\eta_{max}`.

    It has been proposed in
    `SGDR: Stochastic Gradient Descent with Warm Restarts`_.

    Args:
        lr (float): Initial learning rate.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        T_0 (int): Number of iterations for the first restart.
        T_mult (int, optional): A factor increases :math:`T_{i}` after a restart. Default: 1.
        eta_min (float, optional): Minimum learning rate. Default: 0.
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0

    .. _SGDR\: Stochastic Gradient Descent with Warm Restarts:
        https://arxiv.org/abs/1608.03983
    """

    def __init__(self, lr, steps_per_epoch, max_epoch, T_0, T_mult=1, eta_min=0, warmup_epochs=0):
        if T_0 <= 0 or not isinstance(T_0, int):
            raise ValueError("Expected positive integer T_0, but got {}".format(T_0))
        if T_mult < 1 or not isinstance(T_mult, int):
            raise ValueError("Expected integer T_mult >= 1, but got {}".format(T_mult))
        self.T_0 = T_0
        self.T_i = T_0
        self.T_mult = T_mult
        self.eta_min = eta_min
        self.T_cur = 0

        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(CosineAnnealingWarmRestarts, self).__init__(lr, max_epoch, steps_per_epoch)

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                if i % self.steps_per_epoch == 0 and i > 0:
                    self.T_cur += 1
                    if self.T_cur >= self.T_i:
                        self.T_cur = self.T_cur - self.T_i
                        self.T_i = self.T_i * self.T_mult

                lr = self.eta_min + (self.base_lr - self.eta_min) * \
                            (1 + math.cos(math.pi * self.T_cur / self.T_i)) / 2

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)


 class OneCycleLR(_LRScheduler):
    r"""Sets the learning rate of each parameter group according to the
    1cycle learning rate policy. The 1cycle policy anneals the learning
    rate from an initial learning rate to some maximum learning rate and then
    from that maximum learning rate to some minimum learning rate much lower
    than the initial learning rate.
    This policy was initially described in the paper `Super-Convergence:
    Very Fast Training of Neural Networks Using Large Learning Rates`_.

    The 1cycle learning rate policy changes the learning rate after every batch.
    This scheduler is not chainable.


    Args:
        lr (float): Initial learning rate.
        steps_per_epoch (int): The number of steps per epoch to train for. This is
            used along with epochs in order to infer the total number of steps in the cycle.
        max_epoch (int): The number of epochs to train for. This is used along
            with steps_per_epoch in order to infer the total number of steps in the cycle.
        pct_start (float): The percentage of the cycle (in number of steps) spent
            increasing the learning rate.
            Default: 0.3
        anneal_strategy (str): {'cos', 'linear'}
            Specifies the annealing strategy: "cos" for cosine annealing, "linear" for
            linear annealing.
            Default: 'cos'
        div_factor (float): Determines the max learning rate via
            max_lr = lr * div_factor
            Default: 25
        final_div_factor (float): Determines the minimum learning rate via
            min_lr = lr / final_div_factor
            Default: 1e4
        warmup_epochs (int): The number of epochs to Warmup.
            Default: 0


    .. _Super-Convergence\: Very Fast Training of Neural Networks Using Large Learning Rates:
        https://arxiv.org/abs/1708.07120
    """
    def __init__(self,
                 lr,
                 steps_per_epoch,
                 max_epoch,
                 pct_start=0.3,
                 anneal_strategy='cos',
                 div_factor=25.,
                 final_div_factor=1e4,
                 warmup_epochs=0):

        self.warmup = _LinearWarmUp(lr, warmup_epochs, steps_per_epoch)
        super(OneCycleLR, self).__init__(lr, max_epoch, steps_per_epoch)

        self.step_size_up = float(pct_start * self.total_steps) - 1
        self.step_size_down = float(self.total_steps - self.step_size_up) - 1

        # Validate pct_start
        if pct_start < 0 or pct_start > 1 or not isinstance(pct_start, float):
            raise ValueError("Expected float between 0 and 1 pct_start, but got {}".format(pct_start))

        # Validate anneal_strategy
        if anneal_strategy not in ['cos', 'linear']:
            raise ValueError("anneal_strategy must by one of 'cos' or 'linear', instead got {}".format(anneal_strategy))
        if anneal_strategy == 'cos':
            self.anneal_func = self._annealing_cos
        elif anneal_strategy == 'linear':
            self.anneal_func = self._annealing_linear

        # Initialize learning rate variables
        self.max_lr = lr * div_factor
        self.min_lr = lr / final_div_factor

    def _annealing_cos(self, start, end, pct):
        "Cosine anneal from `start` to `end` as pct goes from 0.0 to 1.0."
        cos_out = math.cos(math.pi * pct) + 1
        return end + (start - end) / 2.0 * cos_out

    def _annealing_linear(self, start, end, pct):
        "Linearly anneal from `start` to `end` as pct goes from 0.0 to 1.0."
        return (end - start) * pct + start

    def get_lr(self):
        warmup_steps = self.warmup.get_warmup_steps()

        lr_each_step = []
        for i in range(self.total_steps):
            if i < warmup_steps:
                lr = self.warmup.get_lr(i+1)
            else:
                if i <= self.step_size_up:
                    lr = self.anneal_func(self.base_lr, self.max_lr, i / self.step_size_up)

                else:
                    down_step_num = i - self.step_size_up
                    lr = self.anneal_func(self.max_lr, self.min_lr, down_step_num / self.step_size_down)

            lr_each_step.append(lr)

        return np.array(lr_each_step).astype(np.float32)
--- a/model_zoo/official/cv/densenet121/src/network/init.py
+++ b/model_zoo/official/cv/densenet121/src/network/init.py
@@ -0,0 +1,18 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 densenet network
 """
 from .densenet import DenseNet121
--- a/model_zoo/official/cv/densenet121/src/network/densenet.py
+++ b/model_zoo/official/cv/densenet121/src/network/densenet.py
@@ -0,0 +1,230 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================

 """
 model architecture of densenet
 """

 import math
 from collections import OrderedDict

 import mindspore.nn as nn
 from mindspore.ops import operations as P
 from mindspore.common import initializer as init
 from src.utils.var_init import default_recurisive_init, KaimingNormal

 __all__ = ["DenseNet121"]

 class GlobalAvgPooling(nn.Cell):
    """
    GlobalAvgPooling function.
    """
    def __init__(self):
        super(GlobalAvgPooling, self).__init__()
        self.mean = P.ReduceMean(True)
        self.shape = P.Shape()
        self.reshape = P.Reshape()

    def construct(self, x):
        x = self.mean(x, (2, 3))
        b, c, _, _ = self.shape(x)
        x = self.reshape(x, (b, c))
        return x

 class CommonHead(nn.Cell):
    def __init__(self, num_classes, out_channels):
        super(CommonHead, self).__init__()
        self.avgpool = GlobalAvgPooling()
        self.fc = nn.Dense(out_channels, num_classes, has_bias=True)

    def construct(self, x):
        x = self.avgpool(x)
        x = self.fc(x)
        return x

 def conv7x7(in_channels, out_channels, stride=1, padding=3, has_bias=False):
    return nn.Conv2d(in_channels, out_channels, kernel_size=7, stride=stride, has_bias=has_bias,
                     padding=padding, pad_mode="pad")


 def conv3x3(in_channels, out_channels, stride=1, padding=1, has_bias=False):
    return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, has_bias=has_bias,
                     padding=padding, pad_mode="pad")


 def conv1x1(in_channels, out_channels, stride=1, padding=0, has_bias=False):
    return nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, has_bias=has_bias,
                     padding=padding, pad_mode="pad")


 class _DenseLayer(nn.Cell):
    """
    the dense layer, include 2 conv layer
    """
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
        super(_DenseLayer, self).__init__()
        self.norm1 = nn.BatchNorm2d(num_input_features)
        self.relu1 = nn.ReLU()
        self.conv1 = conv1x1(num_input_features, bn_size*growth_rate)

        self.norm2 = nn.BatchNorm2d(bn_size*growth_rate)
        self.relu2 = nn.ReLU()
        self.conv2 = conv3x3(bn_size*growth_rate, growth_rate)

        # nn.Dropout in MindSpore use keep_prob, diff from Pytorch
        self.keep_prob = 1.0 - drop_rate
        self.dropout = nn.Dropout(keep_prob=self.keep_prob)

    def construct(self, features):
        bottleneck = self.conv1(self.relu1(self.norm1(features)))
        new_features = self.conv2(self.relu2(self.norm2(bottleneck)))
        if self.keep_prob < 1:
            new_features = self.dropout(new_features)
        return new_features

 class _DenseBlock(nn.Cell):
    """
    the dense block
    """
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
        super(_DenseBlock, self).__init__()
        self.cell_list = nn.CellList()
        for i in range(num_layers):
            layer = _DenseLayer(
                num_input_features + i * growth_rate,
                growth_rate=growth_rate,
                bn_size=bn_size,
                drop_rate=drop_rate
            )
            self.cell_list.append(layer)

        self.concate = P.Concat(axis=1)

    def construct(self, init_features):
        features = init_features
        for layer in self.cell_list:
            new_features = layer(features)
            features = self.concate((features, new_features))
        return features

 class _Transition(nn.Cell):
    """
    the transiton layer
    """
    def __init__(self, num_input_features, num_output_features):
        super(_Transition, self).__init__()
        self.features = nn.SequentialCell(OrderedDict([
            ('norm', nn.BatchNorm2d(num_input_features)),
            ('relu', nn.ReLU()),
            ('conv', conv1x1(num_input_features, num_output_features)),
            ('pool', nn.MaxPool2d(kernel_size=2, stride=2))
        ]))

    def construct(self, x):
        x = self.features(x)
        return x

 class Densenet(nn.Cell):
    """
    the densenet architecture
    """
    __constants__ = ['features']

    def __init__(self, growth_rate, block_config, num_init_features, bn_size=4, drop_rate=0):
        super(Densenet, self).__init__()

        layers = OrderedDict()
        layers['conv0'] = conv7x7(3, num_init_features, stride=2, padding=3)
        layers['norm0'] = nn.BatchNorm2d(num_init_features)
        layers['relu0'] = nn.ReLU()
        layers['pool0'] = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')

        # Each denseblock
        num_features = num_init_features
        for i, num_layers in enumerate(block_config):
            block = _DenseBlock(
                num_layers=num_layers,
                num_input_features=num_features,
                bn_size=bn_size,
                growth_rate=growth_rate,
                drop_rate=drop_rate
            )
            layers['denseblock%d'%(i+1)] = block
            num_features = num_features + num_layers*growth_rate

            if i != len(block_config)-1:
                trans = _Transition(num_input_features=num_features,
                                    num_output_features=num_features // 2)
                layers['transition%d'%(i+1)] = trans
                num_features = num_features // 2

        # Final batch norm
        layers['norm5'] = nn.BatchNorm2d(num_features)
        layers['relu5'] = nn.ReLU()

        self.features = nn.SequentialCell(layers)
        self.out_channels = num_features

    def construct(self, x):
        x = self.features(x)
        return x

    def get_out_channels(self):
        return self.out_channels

 def _densenet121(**kwargs):
    return Densenet(growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64, **kwargs)


 def _densenet161(**kwargs):
    return Densenet(growth_rate=48, block_config=(6, 12, 36, 24), num_init_features=96, **kwargs)


 def _densenet169(**kwargs):
    return Densenet(growth_rate=32, block_config=(6, 12, 32, 32), num_init_features=64, **kwargs)


 def _densenet201(**kwargs):
    return Densenet(growth_rate=32, block_config=(6, 12, 48, 32), num_init_features=64, **kwargs)



 class DenseNet121(nn.Cell):
    """
    the densenet121 architectur
    """
    def __init__(self, num_classes):
        super(DenseNet121, self).__init__()
        self.backbone = _densenet121()
        out_channels = self.backbone.get_out_channels()
        self.head = CommonHead(num_classes, out_channels)

        default_recurisive_init(self)
        for _, cell in self.cells_and_names():
            if isinstance(cell, nn.Conv2d):
                cell.weight.set_data(init.initializer(KaimingNormal(a=math.sqrt(5), mode='fan_out',
                                                                    nonlinearity='relu'),
                                                      cell.weight.shape,
                                                      cell.weight.dtype))
            elif isinstance(cell, nn.BatchNorm2d):
                cell.gamma.set_data(init.initializer('ones', cell.gamma.shape))
                cell.beta.set_data(init.initializer('zeros', cell.beta.shape))
            elif isinstance(cell, nn.Dense):
                cell.bias.set_data(init.initializer('zeros', cell.bias.shape))

    def construct(self, x):
        x = self.backbone(x)
        x = self.head(x)
        return x
--- a/model_zoo/official/cv/densenet121/src/optimizers/init.py
+++ b/model_zoo/official/cv/densenet121/src/optimizers/init.py
@@ -0,0 +1,41 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 get parameter function
 """
 def get_param_groups(network):
    """
    get parameter groups
    """
    decay_params = []
    no_decay_params = []
    for x in network.trainable_params():
        parameter_name = x.name
        if parameter_name.endswith('.bias'):
            # all bias not using weight decay
            # print('no decay:{}'.format(parameter_name))
            no_decay_params.append(x)
        elif parameter_name.endswith('.gamma'):
            # bn weight bias not using weight decay, be carefully for now x not include BN
            # print('no decay:{}'.format(parameter_name))
            no_decay_params.append(x)
        elif parameter_name.endswith('.beta'):
            # bn weight bias not using weight decay, be carefully for now x not include BN
            # print('no decay:{}'.format(parameter_name))
            no_decay_params.append(x)
        else:
            decay_params.append(x)

    return [{'params': no_decay_params, 'weight_decay': 0.0}, {'params': decay_params}]
--- a/model_zoo/official/cv/densenet121/src/utils/init.py
+++ b/model_zoo/official/cv/densenet121/src/utils/init.py
@@ -0,0 +1,14 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
--- a/model_zoo/official/cv/densenet121/src/utils/logging.py
+++ b/model_zoo/official/cv/densenet121/src/utils/logging.py
@@ -0,0 +1,82 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 get logger.
 """
 import logging
 import os
 import sys
 from datetime import datetime

 class LOGGER(logging.Logger):
    """
    set up logging file.

    Args:
        logger_name (string): logger name.
        log_dir (string): path of logger.

    Returns:
        string, logger path
    """
    def __init__(self, logger_name, rank=0):
        super(LOGGER, self).__init__(logger_name)
        if rank % 8 == 0:
            console = logging.StreamHandler(sys.stdout)
            console.setLevel(logging.INFO)
            formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
            console.setFormatter(formatter)
            self.addHandler(console)

    def setup_logging_file(self, log_dir, rank=0):
        """set up log file"""
        self.rank = rank
        if not os.path.exists(log_dir):
            os.makedirs(log_dir, exist_ok=True)
        log_name = datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S') + '_rank_{}.log'.format(rank)
        self.log_fn = os.path.join(log_dir, log_name)
        fh = logging.FileHandler(self.log_fn)
        fh.setLevel(logging.INFO)
        formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(message)s')
        fh.setFormatter(formatter)
        self.addHandler(fh)

    def info(self, msg, *args, **kwargs):
        if self.isEnabledFor(logging.INFO):
            self._log(logging.INFO, msg, args, **kwargs)

    def save_args(self, args):
        self.info('Args:')
        args_dict = vars(args)
        for key in args_dict.keys():
            self.info('--> %s: %s', key, args_dict[key])
        self.info('')

    def important_info(self, msg, *args, **kwargs):
        if self.isEnabledFor(logging.INFO) and self.rank == 0:
            line_width = 2
            important_msg = '\n'
            important_msg += ('*'*70 + '\n')*line_width
            important_msg += ('*'*line_width + '\n')*2
            important_msg += '*'*line_width + ' '*8 + msg + '\n'
            important_msg += ('*'*line_width + '\n')*2
            important_msg += ('*'*70 + '\n')*line_width
            self.info(important_msg, *args, **kwargs)


 def get_logger(path, rank):
    logger = LOGGER("mindversion", rank)
    logger.setup_logging_file(path, rank)
    return logger
--- a/model_zoo/official/cv/densenet121/src/utils/var_init.py
+++ b/model_zoo/official/cv/densenet121/src/utils/var_init.py
@@ -0,0 +1,204 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """
 Initialize.
 """
 import math
 from functools import reduce
 import numpy as np
 import mindspore.nn as nn
 from mindspore import Tensor
 from mindspore.common import initializer as init

 def _calculate_gain(nonlinearity, param=None):
    r"""
    Return the recommended gain value for the given nonlinearity function.

    The values are as follows:
    ================= ====================================================
    nonlinearity      gain
    ================= ====================================================
    Linear / Identity :math:`1`
    Conv{1,2,3}D      :math:`1`
    Sigmoid           :math:`1`
    Tanh              :math:`\frac{5}{3}`
    ReLU              :math:`\sqrt{2}`
    Leaky Relu        :math:`\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}`
    ================= ====================================================

    Args:
        nonlinearity: the non-linear function
        param: optional parameter for the non-linear function

    Examples:
        >>> gain = calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2
    """
    linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d', 'conv_transpose2d', 'conv_transpose3d']
    if nonlinearity in linear_fns or nonlinearity == 'sigmoid':
        return 1
    if nonlinearity == 'tanh':
        return 5.0 / 3
    if nonlinearity == 'relu':
        return math.sqrt(2.0)
    if nonlinearity == 'leaky_relu':
        if param is None:
            negative_slope = 0.01
        elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):
            negative_slope = param
        else:
            raise ValueError("negative_slope {} not a valid number".format(param))
        return math.sqrt(2.0 / (1 + negative_slope ** 2))

    raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))

 def _assignment(arr, num):
    """Assign the value of `num` to `arr`."""
    if arr.shape == ():
        arr = arr.reshape((1))
        arr[:] = num
        arr = arr.reshape(())
    else:
        if isinstance(num, np.ndarray):
            arr[:] = num[:]
        else:
            arr[:] = num
    return arr

 def _calculate_in_and_out(arr):
    """
    Calculate n_in and n_out.

    Args:
        arr (Array): Input array.

    Returns:
        Tuple, a tuple with two elements, the first element is `n_in` and the second element is `n_out`.
    """
    dim = len(arr.shape)
    if dim < 2:
        raise ValueError("If initialize data with xavier uniform, the dimension of data must greater than 1.")

    n_in = arr.shape[1]
    n_out = arr.shape[0]

    if dim > 2:
        counter = reduce(lambda x, y: x * y, arr.shape[2:])
        n_in *= counter
        n_out *= counter
    return n_in, n_out

 def _select_fan(array, mode):
    mode = mode.lower()
    valid_modes = ['fan_in', 'fan_out']
    if mode not in valid_modes:
        raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))

    fan_in, fan_out = _calculate_in_and_out(array)
    return fan_in if mode == 'fan_in' else fan_out

 class KaimingInit(init.Initializer):
    r"""
    Base Class. Initialize the array with He kaiming algorithm.

    Args:
        a: the negative slope of the rectifier used after this layer (only
            used with ``'leaky_relu'``)
        mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'``
            preserves the magnitude of the variance of the weights in the
            forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the
            backwards pass.
        nonlinearity: the non-linear function, recommended to use only with
            ``'relu'`` or ``'leaky_relu'`` (default).
    """
    def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):
        super(KaimingInit, self).__init__()
        self.mode = mode
        self.gain = _calculate_gain(nonlinearity, a)

    def _initialize(self, arr):
        pass


 class KaimingUniform(KaimingInit):
    r"""
    Initialize the array with He kaiming uniform algorithm. The resulting tensor will
    have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where

    .. math::
        \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}}

    Input:
        arr (Array): The array to be assigned.

    Returns:
        Array, assigned array.

    Examples:
        >>> w = np.empty(3, 5)
        >>> KaimingUniform(w, mode='fan_in', nonlinearity='relu')
    """

    def _initialize(self, arr):
        fan = _select_fan(arr, self.mode)
        bound = math.sqrt(3.0) * self.gain / math.sqrt(fan)
        data = np.random.uniform(-bound, bound, arr.shape)

        _assignment(arr, data)


 class KaimingNormal(KaimingInit):
    r"""
    Initialize the array with He kaiming normal algorithm. The resulting tensor will
    have values sampled from :math:`\mathcal{N}(0, \text{std}^2)` where

    .. math::
        \text{std} = \frac{\text{gain}}{\sqrt{\text{fan\_mode}}}

    Input:
        arr (Array): The array to be assigned.

    Returns:
        Array, assigned array.

    Examples:
        >>> w = np.empty(3, 5)
        >>> KaimingNormal(w, mode='fan_out', nonlinearity='relu')
    """

    def _initialize(self, arr):
        fan = _select_fan(arr, self.mode)
        std = self.gain / math.sqrt(fan)
        data = np.random.normal(0, std, arr.shape)

        _assignment(arr, data)


 def default_recurisive_init(custom_cell):
    """default_recurisive_init"""
    for _, cell in custom_cell.cells_and_names():
        if isinstance(cell, nn.Conv2d):
            cell.weight.set_data(init.initializer(KaimingUniform(a=math.sqrt(5)), cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                fan_in, _ = _calculate_in_and_out(cell.weight.asnumpy())
                bound = 1 / math.sqrt(fan_in)
                cell.bias.set_data(Tensor(np.random.uniform(-bound, bound, cell.bias.shape), cell.bias.dtype))
        elif isinstance(cell, nn.Dense):
            cell.weight.set_data(init.initializer(KaimingUniform(a=math.sqrt(5)), cell.weight.shape, cell.weight.dtype))
            if cell.bias is not None:
                fan_in, _ = _calculate_in_and_out(cell.weight.asnumpy())
                bound = 1 / math.sqrt(fan_in)
                cell.bias.set_data(Tensor(np.random.uniform(-bound, bound, cell.bias.shape), cell.bias.dtype))
        elif isinstance(cell, (nn.BatchNorm2d, nn.BatchNorm1d)):
            pass
--- a/model_zoo/official/cv/densenet121/train.py
+++ b/model_zoo/official/cv/densenet121/train.py
@@ -0,0 +1,290 @@
 # Copyright 2020 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """train launch."""
 import os
 import time
 import argparse
 import datetime

 import mindspore.nn as nn
 from mindspore import Tensor
 from mindspore.nn.optim import Momentum
 from mindspore.communication.management import init, get_rank, get_group_size
 from mindspore.train.callback import ModelCheckpoint
 from mindspore.train.callback import CheckpointConfig, Callback
 from mindspore.train.serialization import load_checkpoint, load_param_into_net
 from mindspore.train.model import Model
 from mindspore.train.loss_scale_manager import DynamicLossScaleManager, FixedLossScaleManager
 from mindspore import context
 from mindspore.context import ParallelMode
 from mindspore.common import set_seed

 from src.optimizers import get_param_groups
 from src.network import DenseNet121
 from src.datasets import classification_dataset
 from src.losses.crossentropy import CrossEntropy
 from src.lr_scheduler import MultiStepLR, CosineAnnealingLR
 from src.utils.logging import get_logger
 from src.config import config

 devid = int(os.getenv('DEVICE_ID'))
 context.set_context(mode=context.GRAPH_MODE, enable_auto_mixed_precision=True,
                    device_target="Davinci", save_graphs=False, device_id=devid)

 set_seed(1)

 class BuildTrainNetwork(nn.Cell):
    """build training network"""
    def __init__(self, network, criterion):
        super(BuildTrainNetwork, self).__init__()
        self.network = network
        self.criterion = criterion

    def construct(self, input_data, label):
        output = self.network(input_data)
        loss = self.criterion(output, label)
        return loss

 class ProgressMonitor(Callback):
    """monitor loss and time"""
    def __init__(self, args):
        super(ProgressMonitor, self).__init__()
        self.me_epoch_start_time = 0
        self.me_epoch_start_step_num = 0
        self.args = args
        self.ckpt_history = []

    def begin(self, run_context):
        self.args.logger.info('start network train...')

    def epoch_begin(self, run_context):
        pass

    def epoch_end(self, run_context, *me_args):
        """process epoch end"""
        cb_params = run_context.original_args()
        me_step = cb_params.cur_step_num - 1

        real_epoch = me_step // self.args.steps_per_epoch
        time_used = time.time() - self.me_epoch_start_time
        fps_mean = self.args.per_batch_size * (me_step-self.me_epoch_start_step_num) * self.args.group_size / time_used
        self.args.logger.info('epoch[{}], iter[{}], loss:{},'
                              'mean_fps:{:.2f} imgs/sec'.format(real_epoch, me_step, cb_params.net_outputs, fps_mean))
        if self.args.rank_save_ckpt_flag:
            import glob
            ckpts = glob.glob(os.path.join(self.args.outputs_dir, '*.ckpt'))
            for ckpt in ckpts:
                ckpt_fn = os.path.basename(ckpt)
                if not ckpt_fn.startswith('{}-'.format(self.args.rank)):
                    continue
                if ckpt in self.ckpt_history:
                    continue
                self.ckpt_history.append(ckpt)
                self.args.logger.info('epoch[{}], iter[{}], loss:{}, ckpt:{},'
                                      'ckpt_fn:{}'.format(real_epoch, me_step, cb_params.net_outputs, ckpt, ckpt_fn))

        self.me_epoch_start_step_num = me_step
        self.me_epoch_start_time = time.time()

    def step_begin(self, run_context):
        pass

    def step_end(self, run_context, *me_args):
        pass

    def end(self, run_context):
        self.args.logger.info('end network train...')


 def parse_args(cloud_args=None):
    """parameters"""
    parser = argparse.ArgumentParser('mindspore classification training')

    # dataset related
    parser.add_argument('--data_dir', type=str, default='', help='train data dir')

    # network related
    parser.add_argument('--pretrained', default='', type=str, help='model_path, local pretrained model to load')

    # distributed related
    parser.add_argument('--is_distributed', type=int, default=1, help='if multi device')

    # roma obs
    parser.add_argument('--train_url', type=str, default="", help='train url')

    args, _ = parser.parse_known_args()
    args = merge_args(args, cloud_args)
    args.image_size = config.image_size
    args.num_classes = config.num_classes
    args.lr = config.lr
    args.lr_scheduler = config.lr_scheduler
    args.lr_epochs = config.lr_epochs
    args.lr_gamma = config.lr_gamma
    args.eta_min = config.eta_min
    args.T_max = config.T_max
    args.max_epoch = config.max_epoch
    args.warmup_epochs = config.warmup_epochs
    args.weight_decay = config.weight_decay
    args.momentum = config.momentum
    args.is_dynamic_loss_scale = config.is_dynamic_loss_scale
    args.loss_scale = config.loss_scale
    args.label_smooth = config.label_smooth
    args.label_smooth_factor = config.label_smooth_factor
    args.ckpt_interval = config.ckpt_interval
    args.ckpt_path = config.ckpt_path
    args.is_save_on_master = config.is_save_on_master
    args.rank = config.rank
    args.group_size = config.group_size
    args.log_interval = config.log_interval
    args.per_batch_size = config.per_batch_size

    args.lr_epochs = list(map(int, args.lr_epochs.split(',')))
    args.image_size = list(map(int, args.image_size.split(',')))

    return args

 def merge_args(args, cloud_args):
    """dictionary"""
    args_dict = vars(args)
    if isinstance(cloud_args, dict):
        for key in cloud_args.keys():
            val = cloud_args[key]
            if key in args_dict and val:
                arg_type = type(args_dict[key])
                if arg_type is not type(None):
                    val = arg_type(val)
                args_dict[key] = val
    return args

 def train(cloud_args=None):
    """training process"""
    args = parse_args(cloud_args)

    # init distributed
    if args.is_distributed:
        init()
        args.rank = get_rank()
        args.group_size = get_group_size()

    if args.is_dynamic_loss_scale == 1:
        args.loss_scale = 1  # for dynamic loss scale can not set loss scale in momentum opt

    # select for master rank save ckpt or all rank save, compatiable for model parallel
    args.rank_save_ckpt_flag = 0
    if args.is_save_on_master:
        if args.rank == 0:
            args.rank_save_ckpt_flag = 1
    else:
        args.rank_save_ckpt_flag = 1

    # logger
    args.outputs_dir = os.path.join(args.ckpt_path,
                                    datetime.datetime.now().strftime('%Y-%m-%d_time_%H_%M_%S'))
    args.logger = get_logger(args.outputs_dir, args.rank)

    # dataloader
    de_dataset = classification_dataset(args.data_dir, args.image_size,
                                        args.per_batch_size, args.max_epoch,
                                        args.rank, args.group_size)
    de_dataset.map_model = 4
    args.steps_per_epoch = de_dataset.get_dataset_size()

    args.logger.save_args(args)

    # network
    args.logger.important_info('start create network')
    # get network and init
    network = DenseNet121(args.num_classes)
    # loss
    if not args.label_smooth:
        args.label_smooth_factor = 0.0
    criterion = CrossEntropy(smooth_factor=args.label_smooth_factor,
                             num_classes=args.num_classes)

    # load pretrain model
    if os.path.isfile(args.pretrained):
        param_dict = load_checkpoint(args.pretrained)
        param_dict_new = {}
        for key, values in param_dict.items():
            if key.startswith('moments.'):
                continue
            elif key.startswith('network.'):
                param_dict_new[key[8:]] = values
            else:
                param_dict_new[key] = values
        load_param_into_net(network, param_dict_new)
        args.logger.info('load model {} success'.format(args.pretrained))

    # lr scheduler
    if args.lr_scheduler == 'exponential':
        lr_scheduler = MultiStepLR(args.lr,
                                   args.lr_epochs,
                                   args.lr_gamma,
                                   args.steps_per_epoch,
                                   args.max_epoch,
                                   warmup_epochs=args.warmup_epochs)
    elif args.lr_scheduler == 'cosine_annealing':
        lr_scheduler = CosineAnnealingLR(args.lr,
                                         args.T_max,
                                         args.steps_per_epoch,
                                         args.max_epoch,
                                         warmup_epochs=args.warmup_epochs,
                                         eta_min=args.eta_min)
    else:
        raise NotImplementedError(args.lr_scheduler)
    lr_schedule = lr_scheduler.get_lr()

    # optimizer
    opt = Momentum(params=get_param_groups(network),
                   learning_rate=Tensor(lr_schedule),
                   momentum=args.momentum,
                   weight_decay=args.weight_decay,
                   loss_scale=args.loss_scale)

    # mixed precision training
    criterion.add_flags_recursive(fp32=True)

    # package training process, adjust lr + forward + backward + optimizer
    train_net = BuildTrainNetwork(network, criterion)
    if args.is_distributed:
        parallel_mode = ParallelMode.DATA_PARALLEL
    else:
        parallel_mode = ParallelMode.STAND_ALONE
    if args.is_dynamic_loss_scale == 1:
        loss_scale_manager = DynamicLossScaleManager(init_loss_scale=65536, scale_factor=2, scale_window=2000)
    else:
        loss_scale_manager = FixedLossScaleManager(args.loss_scale, drop_overflow_update=False)

    context.set_auto_parallel_context(parallel_mode=parallel_mode, device_num=args.group_size,
                                      parameter_broadcast=True, gradients_mean=True)
    model = Model(train_net, optimizer=opt, metrics=None, loss_scale_manager=loss_scale_manager, amp_level="O3")

    # checkpoint save
    progress_cb = ProgressMonitor(args)
    callbacks = [progress_cb,]
    if args.rank_save_ckpt_flag:
        ckpt_max_num = args.max_epoch * args.steps_per_epoch // args.ckpt_interval
        ckpt_config = CheckpointConfig(save_checkpoint_steps=args.ckpt_interval,
                                       keep_checkpoint_max=ckpt_max_num)
        ckpt_cb = ModelCheckpoint(config=ckpt_config,
                                  directory=args.outputs_dir,
                                  prefix='{}'.format(args.rank))
        callbacks.append(ckpt_cb)

    model.train(args.max_epoch, de_dataset, callbacks=callbacks)


 if __name__ == "__main__":
    train()
--- a/model_zoo/official/cv/resnet/src/config.py
+++ b/model_zoo/official/cv/resnet/src/config.py
@@ -40,7 +40,7 @@ config1 = ed({
 # config for resnet50, imagenet2012
 config2 = ed({
    "class_num": 1001,
    "batch_size": 32,
    "batch_size": 256,
    "loss_scale": 1024,
    "momentum": 0.9,
    "weight_decay": 1e-4,
@@ -55,7 +55,7 @@ config2 = ed({
    "use_label_smooth": True,
    "label_smooth_factor": 0.1,
    "lr_init": 0,
    "lr_max": 0.1,
    "lr_max": 0.8,
    "lr_end": 0.0
 })

--- a/model_zoo/official/cv/vgg16/README.md
+++ b/model_zoo/official/cv/vgg16/README.md
@@ -292,6 +292,7 @@ train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579
 ```
 > About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html).

 > **Attention** This will bind the processor cores according to the `device_num` and total processor numbers. If you don't expect to run pretraining with binding processor cores, remove the operations about `taskset` in `scripts/run_distribute_train.sh`

 #### Run vgg16 on GPU

--- a/model_zoo/official/cv/vgg16/scripts/run_distribute_train.sh
+++ b/model_zoo/official/cv/vgg16/scripts/run_distribute_train.sh
@@ -44,13 +44,20 @@ then
    dataset_type=$3
 fi


 export DEVICE_NUM=8
 export RANK_SIZE=8
 export RANK_TABLE_FILE=$1

 cpus=`cat /proc/cpuinfo| grep "processor"| wc -l`
 avg=`expr $cpus \/ $RANK_SIZE`
 gap=`expr $avg \- 1`

 for((i=0;i<RANK_SIZE;i++))
 do
    start=`expr $i \* $avg`
    end=`expr $start \+ $gap`
    cmdopt=$start"-"$end

    export DEVICE_ID=$i
    export RANK_ID=$i
    rm -rf ./train_parallel$i
@@ -60,6 +67,6 @@ do
    cd ./train_parallel$i || exit
    echo "start training for rank $RANK_ID, device $DEVICE_ID, $dataset_type"
    env > env.log
    python train.py --data_path=$2 --device_target="Ascend" --device_id=$i --is_distributed=1 --dataset=$dataset_type &> log &
    taskset -c $cmdopt python train.py --data_path=$2 --device_target="Ascend" --device_id=$i --is_distributed=1 --dataset=$dataset_type &> log &
    cd ..
 done
--- a/model_zoo/official/nlp/mass/src/transformer/feed_forward_network.py
+++ b/model_zoo/official/nlp/mass/src/transformer/feed_forward_network.py
@@ -66,7 +66,7 @@ class FeedForwardNet(nn.Cell):
        )
        self.get_shape = P.Shape()
        self.reshape = P.Reshape()
        self.dropout = nn.Dropout(keep_prob=1 - hidden_dropout_prob)
        self.dropout = nn.Dropout(keep_prob=1.0 - hidden_dropout_prob)

    def construct(self, input_tensor):
        """
--- a/model_zoo/official/nlp/mass/src/transformer/multi_head_attention.py
+++ b/model_zoo/official/nlp/mass/src/transformer/multi_head_attention.py
@@ -133,7 +133,7 @@ class MultiHeadAttention(nn.Cell):
        self.matmul = P.BatchMatMul()

        self.softmax = nn.Softmax()
        self.dropout = nn.Dropout(1 - attention_dropout_prob)
        self.dropout = nn.Dropout(1.0 - attention_dropout_prob)

        if self.has_attention_mask:
            self.expand_dims = P.ExpandDims()
--- a/model_zoo/official/nlp/mass/src/transformer/residual_conn.py
+++ b/model_zoo/official/nlp/mass/src/transformer/residual_conn.py
@@ -31,7 +31,7 @@ class ResidualConnection(nn.Cell):
    def __init__(self, dropout_prob=0.1):
        super(ResidualConnection, self).__init__()
        self.add = P.TensorAdd()
        self.dropout = nn.Dropout(1 - dropout_prob)
        self.dropout = nn.Dropout(1.0 - dropout_prob)

    def construct(self, hidden_tensor, residual):
        """
--- a/model_zoo/official/nlp/mass/src/transformer/transformer.py
+++ b/model_zoo/official/nlp/mass/src/transformer/transformer.py
@@ -104,7 +104,7 @@ class Transformer(nn.Cell):
        self.dtype = config.dtype
        self.cast_compute_type = SaturateCast(dst_type=config.compute_type)
        self.slice = P.StridedSlice()
        self.dropout = nn.Dropout(keep_prob=1 - config.hidden_dropout_prob)
        self.dropout = nn.Dropout(keep_prob=1.0 - config.hidden_dropout_prob)

        self._create_attention_mask_from_input_mask = CreateAttentionMaskFromInputMask(config)