Huawei_Technology
/
mindspore

# Contents

- [DenseNet Description](#densenet-description)
- [Model Architecture](#model-architecture)
- [Dataset](#dataset)
- [Features](#features)
    - [Mixed Precision](#mixed-precision)
- [Environment Requirements](#environment-requirements)
- [Quick Start](#quick-start)
- [Script Description](#script-description)
    - [Script and Sample Code](#script-and-sample-code)
    - [Script Parameters](#script-parameters)
    - [Training Process](#training-process)
        - [Training](#training)
        - [Distributed Training](#distributed-training)  
    - [Evaluation Process](#evaluation-process)
        - [Evaluation](#evaluation)
- [Model Description](#model-description)
    - [Performance](#performance)
        - [Training accuracy results](#training-accuracy-results)
        - [Training performance results](#training-performance-results)
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)

# [DenseNet Description](#contents)

DenseNet is a convolution based neural network for the task of image classification. The paper describing the model can be found [here](https://arxiv.org/abs/1608.06993). HuaWei’s DenseNet is a implementation on [MindSpore](https://www.mindspore.cn/).

The repository also contains scripts to launch training and inference routines.

# [Model Architecture](#contents)

DenseNet supports two kinds of implementations: DenseNet100 and DenseNet121, where the number represents number of layers in the network.

DenseNet121 builds on 4 densely connected block and DenseNet100 builds on 3. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.

# [Dataset](#contents)

Dataset used in DenseNet121: ImageNet

The default configuration of the Dataset are as follows:

- Training Dataset preprocess:
    - Input size of images is 224\*224
    - Range (min, max) of respective size of the original size to be cropped is (0.08, 1.0)
    - Range (min, max) of aspect ratio to be cropped is (0.75, 1.333)
    - Probability of the image being flipped set to 0.5
    - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4)
    - Normalize the input image with respect to mean and standard deviation

- Test Dataset preprocess:
    - Input size of images is 224\*224 (Resize to 256\*256 then crops images at the center)
    - Normalize the input image with respect to mean and standard deviation

Dataset used in DenseNet100: Cifar-10

The default configuration of the Dataset are as follows:

- Training Dataset preprocess:
    - Input size of images is 32\*32
    - Randomly cropping is applied to the image with padding=4
    - Probability of the image being flipped set to 0.5
    - Randomly adjust the brightness, contrast, saturation (0.4, 0.4, 0.4)
    - Normalize the input image with respect to mean and standard deviation

- Test Dataset preprocess:
    - Input size of images is 32\*32
    - Normalize the input image with respect to mean and standard deviation

# [Features](#contents)

## Mixed Precision

The [mixed precision](https://www.mindspore.cn/tutorial/training/en/master/advanced_use/enable_mixed_precision.html) training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.

For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.

# [Environment Requirements](#contents)

- Hardware（Ascend/GPU）
    - Prepare hardware environment with Ascend or GPU processor.
- Framework
    - [MindSpore](https://www.mindspore.cn/install/en)
- For more information, please check the resources below：
    - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)

# [Quick Start](#contents)

After installing MindSpore via the official website, you can start training and evaluation as follows:

- running on Ascend

  ```python
  # run training example
  python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &

  # run distributed training example
  sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT

  # run evaluation example
  python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 &
  OR
  sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT
  ```

  For distributed training, a hccl configuration file with JSON format needs to be created in advance.

  Please follow the instructions in the link below:

  <https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools>.

- running on GPU

  For running on GPU, please change `platform` from `Ascend` to `GPU`

  ```python
  # run training example
  export CUDA_VISIBLE_DEVICES=0
  python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 &

  # run distributed training example
  sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7  [NET_NAME] [DATASET_NAME] [DATASET_PATH]

  # run evaluation example
  python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 &
  OR
  sh run_distribute_eval_gpu.sh 1 0  [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]
  ```

# [Script Description](#contents)

## [Script and Sample Code](#contents)

```text
├── model_zoo
    ├── README.md                          // descriptions about all the models
    ├── densenet
        ├── README.md                    // descriptions about densenet
        ├── scripts
        │   ├── run_distribute_train.sh             // shell script for distributed on Ascend
        │   ├── run_distribute_train_gpu.sh             // shell script for distributed on GPU
        │   ├── run_distribute_eval.sh              // shell script for evaluation on Ascend
        │   ├── run_distribute_eval_gpu.sh              // shell script for evaluation on GPU
        ├── src
        │   ├── datasets             // dataset processing function
        │   ├── losses
        │       ├──crossentropy.py            // densenet loss function
        │   ├── lr_scheduler
        │       ├──lr_scheduler.py            // densenet learning rate schedule function
        │   ├── network
        │       ├──densenet.py            // densenet architecture
        │   ├──optimizers            // densenet optimize function
        │   ├──utils
        │       ├──logging.py            // logging function
        │       ├──var_init.py            // densenet variable init function
        │   ├── config.py             // network config
        ├── train.py               // training script
        ├── eval.py               //  evaluation script
```

## [Script Parameters](#contents)

You can modify the training behaviour through the various flags in the `train.py` script. Flags in the `train.py` script are as follows:

```python
  --data_dir              train data dir
  --num_classes           num of classes in dataset(default:1000 for densenet121; 10 for densenet100)
  --image_size            image size of the dataset
  --per_batch_size        mini-batch size (default: 32 for densenet121; 64 for densenet100) per gpu
  --pretrained            path of pretrained model
  --lr_scheduler          type of LR schedule: exponential, cosine_annealing
  --lr                    initial learning rate
  --lr_epochs             epoch milestone of lr changing
  --lr_gamma              decrease lr by a factor of exponential lr_scheduler
  --eta_min               eta_min in cosine_annealing scheduler
  --T_max                 T_max in cosine_annealing scheduler
  --max_epoch             max epoch num to train the model
  --warmup_epochs         warmup epoch(when batchsize is large)
  --weight_decay          weight decay (default: 1e-4)
  --momentum              momentum(default: 0.9)
  --label_smooth          whether to use label smooth in CE
  --label_smooth_factor   smooth strength of original one-hot
  --log_interval          logging interval(default:100)
  --ckpt_path             path to save checkpoint
  --ckpt_interval         the interval to save checkpoint
  --is_save_on_master     save checkpoint on master or all rank
  --is_distributed        if multi device(default: 1)
  --rank                  local rank of distributed(default: 0)
  --group_size            world size of distributed(default: 1)
```

## [Training Process](#contents)

### Training

- running on Ascend

  ```python
  python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
  ```

  The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows:

  ```shell
  2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
  2020-08-22 16:58:56,619:INFO:local passed
  2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
  2020-08-22 17:02:19,921:INFO:local passed
  2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
  2020-08-22 17:05:43,113:INFO:local passed
  ...
  ```

- running on GPU

  ```python
  export CUDA_VISIBLE_DEVICES=0
  python train.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='GPU' > train.log 2>&1 &
  ```

  The python command above will run in the background, you can view the results through the file `train.log`.

  After training, you'll get some checkpoint files under the folder `./ckpt_0/` by default.

- running on CPU

  ```python
  python train.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --is_distributed=0 --device_target='CPU' > train.log 2>&1 &
  ```

  The python command above will run in the background, The log and model checkpoint will be generated in `output/202x-xx-xx_time_xx_xx_xx/`.

### Distributed Training

- running on Ascend

  ```bash
  sh scripts/run_distribute_train.sh 8 rank_table.json [NET_NAME]  [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
  ```

  The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file `train[X]/output/202x-xx-xx_time_xx_xx_xx/`. The loss value of training DenseNet121 on ImageNet will be achieved as follows:

  ```log
  2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
  2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
  2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
  2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
  2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
  2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
  ...
  ...
  ```

- running on GPU

  ```bash
  cd scripts
  sh run_distribute_train_gpu.sh 8 0,1,2,3,4,5,6,7 [NET_NAME] [DATASET_NAME] [DATASET_PATH]
  ```

  The above shell script will run distribute training in the background. You can view the results through the file `train/train.log`.

## [Evaluation Process](#contents)

### Evaluation

- evaluation on Ascend

  running the command below for evaluation.

  ```python
  python eval.py --net [NET_NAME] --dataset [DATASET_NAME] --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 &
  OR
  sh scripts/run_distribute_eval.sh 8 rank_table.json [NET_NAME] [DATASET_NAME] /PATH/TO/DATASET /PATH/TO/CHECKPOINT
  ```

  The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows:

  ```log
  2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
  2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
  ```

- evaluation on GPU

  running the command below for evaluation.

  ```python
  python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='GPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 &
  OR
  sh run_distribute_eval_gpu.sh 1 0 [NET_NAME] [DATASET_NAME] [DATASET_PATH] [CHECKPOINT_PATH]
  ```

  The above python command will run in the background. You can view the results through the file "eval/eval.log". The accuracy of evaluating DenseNet121 on the test dataset of ImageNet will be as follows:

  ```log
  2021-02-04 14:20:50,551:INFO:after allreduce eval: top1_correct=37637, tot=49984, acc=75.30%
  2021-02-04 14:20:50,551:INFO:after allreduce eval: top5_correct=46370, tot=49984, acc=92.77%
  ```

  The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows:

  ```log
  2021-03-12 18:04:07,893:INFO:after allreduce eval: top1_correct=9536, tot=9984, acc=95.51%
  ```

- evaluation on CPU

  running the command below for evaluation.

  ```python
  python eval.py --net=[NET_NAME] --dataset=[DATASET_NAME] --data_dir=[DATASET_PATH] --device_target='CPU' --pretrained=[CHECKPOINT_PATH] > eval.log 2>&1 &
  ```

  The above python command will run in the background. You can view the results through the file "eval/eval.log".  The accuracy of evaluating DenseNet100 on the test dataset of Cifar-10 will be as follows:

  ```log
  2021-03-18 09:06:43,247:INFO:after allreduce eval: top1_correct=9492, tot=9984, acc=95.07%
  ```

# [Model Description](#contents)

## [Performance](#contents)

### DenseNet121

### Training accuracy results

| Parameters          | Ascend                      | GPU                         |
| ------------------- | --------------------------- | --------------------------- |
| Model Version       | DenseNet121               | DenseNet121               |
| Resource            | Ascend 910; OS Euler2.8                  | Tesla V100-PCIE             |
| Uploaded Date       | 09/15/2020 (month/day/year) | 01/27/2021 (month/day/year) |
| MindSpore Version   | 1.0.0                       | 1.1.0                       |
| Dataset             | ImageNet                    | ImageNet                    |
| epochs              | 120                         | 120                         |
| outputs             | probability                 | probability                 |
| accuracy            | Top1:75.13%; Top5:92.57%    | Top1:75.30%; Top5:92.77%    |

### Training performance results

| Parameters          | Ascend                      | GPU                          |
| ------------------- | --------------------------- | ---------------------------- |
| Model Version       | DenseNet121              | DenseNet121               |
| Resource            | Ascend 910; OS Euler2.8                  | Tesla V100-PCIE              |
| Uploaded Date       | 09/15/2020 (month/day/year) | 02/04/2021 (month/day/year)  |
| MindSpore Version   | 1.0.0                       | 1.1.1                        |
| Dataset             | ImageNet                    | ImageNet                     |
| batch_size          | 32                          | 32                           |
| outputs             | probability                 | probability                  |
| speed               | 1pc:760 img/s;8pc:6000 img/s| 1pc:161 img/s;8pc:1288 img/s |

### DenseNet100

### Training performance

| Parameters          | GPU                          |
| ------------------- | ---------------------------- |
| Model Version       | DenseNet100               |
| Resource            | Tesla V100-PCIE              |
| Uploaded Date       | 03/18/2021 (month/day/year)  |
| MindSpore Version   | 1.2.0                        |
| Dataset             | Cifar-10                     |
| batch_size          | 64                           |
| epochs              | 300                         |
| outputs             | probability                  |
| accuracy            | 95.31%                  |
| speed               | 1pc: 600.07 img/sec    |

# [Description of Random Situation](#contents)

In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.

# [ModelZoo Homepage](#contents)  

Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).