Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
5 years ago | |
|---|---|---|
| .. | ||
| scripts | 5 years ago | |
| src | 5 years ago | |
| README.md | 5 years ago | |
| eval.py | 5 years ago | |
| mindspore_hub_conf.py | 5 years ago | |
| train.py | 5 years ago | |
DenseNet121 is a convolution based neural network for the task of image classification. The paper describing the model can be found here. HuaWei’s DenseNet121 is a implementation on MindSpore.
The repository also contains scripts to launch training and inference routines.
DenseNet121 builds on 4 densely connected block. In every dense block, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Concatenation is used. Each layer is receiving a “collective knowledge” from all preceding layers.
Dataset used: ImageNet
The default configuration of the Dataset are as follows:
Training Dataset preprocess:
Test Dataset preprocess:
The mixed precision training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
After installing MindSpore via the official website, you can start training and evaluation as follows:
# run training example
python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
# run distributed training example
sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
# run evaluation example
python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
For distributed training, a hccl configuration file with JSON format needs to be created in advance.
Please follow the instructions in the link below:
https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
├── model_zoo
├── README.md // descriptions about all the models
├── densenet121
├── README.md // descriptions about densenet121
├── scripts
│ ├── run_distribute_train.sh // shell script for distributed on Ascend
│ ├── run_distribute_eval.sh // shell script for evaluation on Ascend
├── src
│ ├── datasets // dataset processing function
│ ├── losses
│ ├──crossentropy.py // densenet loss function
│ ├── lr_scheduler
│ ├──lr_scheduler.py // densenet learning rate schedule function
│ ├── network
│ ├──densenet.py // densenet architecture
│ ├──optimizers // densenet optimize function
│ ├──utils
│ ├──logging.py // logging function
│ ├──var_init.py // densenet variable init function
│ ├── config.py // network config
├── train.py // training script
├── eval.py // evaluation script
You can modify the training behaviour through the various flags in the train.py script. Flags in the train.py script are as follows:
--data_dir train data dir
--num_classes num of classes in dataset(default:1000)
--image_size image size of the dataset
--per_batch_size mini-batch size (default: 256) per gpu
--pretrained path of pretrained model
--lr_scheduler type of LR schedule: exponential, cosine_annealing
--lr initial learning rate
--lr_epochs epoch milestone of lr changing
--lr_gamma decrease lr by a factor of exponential lr_scheduler
--eta_min eta_min in cosine_annealing scheduler
--T_max T_max in cosine_annealing scheduler
--max_epoch max epoch num to train the model
--warmup_epochs warmup epoch(when batchsize is large)
--weight_decay weight decay (default: 1e-4)
--momentum momentum(default: 0.9)
--label_smooth whether to use label smooth in CE
--label_smooth_factor smooth strength of original one-hot
--log_interval logging interval(dafault:100)
--ckpt_path path to save checkpoint
--ckpt_interval the interval to save checkpoint
--is_save_on_master save checkpoint on master or all rank
--is_distributed if multi device(default: 1)
--rank local rank of distributed(default: 0)
--group_size world size of distributed(default: 1)
running on Ascend
python train.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/PRETRAINED_CKPT --is_distributed 0 > train.log 2>&1 &
The python command above will run in the background, The log and model checkpoint will be generated in output/202x-xx-xx_time_xx_xx_xx/. The loss value will be achieved as follows:
2020-08-22 16:58:56,617:INFO:epoch[0], iter[5003], loss:4.367, mean_fps:0.00 imgs/sec
2020-08-22 16:58:56,619:INFO:local passed
2020-08-22 17:02:19,920:INFO:epoch[1], iter[10007], loss:3.193, mean_fps:6301.11 imgs/sec
2020-08-22 17:02:19,921:INFO:local passed
2020-08-22 17:05:43,112:INFO:epoch[2], iter[15011], loss:3.096, mean_fps:6304.53 imgs/sec
2020-08-22 17:05:43,113:INFO:local passed
...
running on Ascend
sh scripts/run_distribute_train.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/PRETRAINED_CKPT
The above shell script will run distribute training in the background. You can view the results log and model checkpoint through the file train[X]/output/202x-xx-xx_time_xx_xx_xx/. The loss value will be achieved as follows:
2020-08-22 16:58:54,556:INFO:epoch[0], iter[5003], loss:3.857, mean_fps:0.00 imgs/sec
2020-08-22 17:02:19,188:INFO:epoch[1], iter[10007], loss:3.18, mean_fps:6260.18 imgs/sec
2020-08-22 17:05:42,490:INFO:epoch[2], iter[15011], loss:2.621, mean_fps:6301.11 imgs/sec
2020-08-22 17:09:05,686:INFO:epoch[3], iter[20015], loss:3.113, mean_fps:6304.37 imgs/sec
2020-08-22 17:12:28,925:INFO:epoch[4], iter[25019], loss:3.29, mean_fps:6303.07 imgs/sec
2020-08-22 17:15:52,167:INFO:epoch[5], iter[30023], loss:2.865, mean_fps:6302.98 imgs/sec
...
...
evaluation on Ascend
running the command below for evaluation.
python eval.py --data_dir /PATH/TO/DATASET --pretrained /PATH/TO/CHECKPOINT > eval.log 2>&1 &
OR
sh scripts/run_distribute_eval.sh 8 rank_table.json /PATH/TO/DATASET /PATH/TO/CHECKPOINT
The above python command will run in the background. You can view the results through the file "output/202x-xx-xx_time_xx_xx_xx/202x_xxxx.log". The accuracy of the test dataset will be as follows:
2020-08-24 09:21:50,551:INFO:after allreduce eval: top1_correct=37657, tot=49920, acc=75.43%
2020-08-24 09:21:50,551:INFO:after allreduce eval: top5_correct=46224, tot=49920, acc=92.60%
| Parameters | Densenet |
|---|---|
| Model Version | Inception V1 |
| Resource | Ascend 910 |
| Uploaded Date | 09/15/2020 (month/day/year) |
| MindSpore Version | 1.0.0 |
| Dataset | ImageNet |
| epochs | 120 |
| outputs | probability |
| accuracy | Top1:75.13%; Top5:92.57% |
| Parameters | Densenet |
|---|---|
| Model Version | Inception V1 |
| Resource | Ascend 910 |
| Uploaded Date | 09/15/2020 (month/day/year) |
| MindSpore Version | 1.0.0 |
| Dataset | ImageNet |
| batch_size | 32 |
| outputs | probability |
| speed | 1pc:760 img/s;8pc:6000 img/s |
In dataset.py, we set the seed inside “create_dataset" function. We also use random seed in train.py.
Please check the official homepage.
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other