From: @ttudu Reviewed-by: Signed-off-by:tags/v1.2.0-rc1
| @@ -0,0 +1,5 @@ | |||
| ARG FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0 | |||
| FROM ${FROM_IMAGE_NAME} | |||
| COPY requirements.txt . | |||
| RUN pip3.7 install -r requirements.txt | |||
| @@ -46,6 +46,12 @@ Dataset used: [COCO2017](<https://cocodataset.org/>) | |||
| # Environment Requirements | |||
| - Hardware(Ascend) | |||
| - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. | |||
| - Docker base image | |||
| - [Ascend Hub](ascend.huawei.com/ascendhub/#/home) | |||
| - Install [MindSpore](https://www.mindspore.cn/install/en). | |||
| - Download the dataset COCO2017. | |||
| @@ -104,6 +110,39 @@ sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] | |||
| ``` | |||
| # Run in docker | |||
| 1. Build docker images | |||
| ```shell | |||
| # build docker | |||
| docker build -t fasterrcnn:20.1.0 . --build-arg FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0 | |||
| ``` | |||
| 2. Create a container layer over the created image and start it | |||
| ```shell | |||
| # start docker | |||
| bash scripts/docker_start.sh fasterrcnn:20.1.0 [DATA_DIR] [MODEL_DIR] | |||
| ``` | |||
| 3. Train | |||
| ```shell | |||
| # standalone training | |||
| sh run_standalone_train_ascend.sh [PRETRAINED_MODEL] | |||
| # distributed training | |||
| sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| ``` | |||
| 4. Eval | |||
| ```shell | |||
| # eval | |||
| sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] | |||
| ``` | |||
| # Script Description | |||
| ## Script and Sample Code | |||
| @@ -150,9 +189,36 @@ sh run_standalone_train_ascend.sh [PRETRAINED_MODEL] | |||
| sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| ``` | |||
| > Rank_table.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). | |||
| > As for PRETRAINED_MODEL,it should be a ResNet50 checkpoint that trained over ImageNet2012. Ready-made pretrained_models are not available now. Stay tuned. | |||
| > The original dataset path needs to be in the config.py,you can select "coco_root" or "image_dir". | |||
| Notes: | |||
| 1. Rank_table.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). | |||
| 2. As for PRETRAINED_MODEL,it should be a trained ResNet50 checkpoint. If you need to load Ready-made pretrained FasterRcnn checkpoint, you may make changes to the train.py script as follows. | |||
| ```python | |||
| # Comment out the following code | |||
| # load_path = args_opt.pre_trained | |||
| # if load_path != "": | |||
| # param_dict = load_checkpoint(load_path) | |||
| # for item in list(param_dict.keys()): | |||
| # if not item.startswith('backbone'): | |||
| # param_dict.pop(item) | |||
| # load_param_into_net(net, param_dict) | |||
| # Add the following codes after optimizer definition since the FasterRcnn checkpoint includes optimizer parameters: | |||
| lr = Tensor(dynamic_lr(config, rank_size=device_num), mstype.float32) | |||
| opt = SGD(params=net.trainable_params(), learning_rate=lr, momentum=config.momentum, | |||
| weight_decay=config.weight_decay, loss_scale=config.loss_scale) | |||
| if load_path != "": | |||
| param_dict = load_checkpoint(load_path) | |||
| for item in list(param_dict.keys()): | |||
| if item in ("global_step", "learning_rate") or "rcnn.reg_scores" in item or "rcnn.cls_scores" in item: | |||
| param_dict.pop(item) | |||
| load_param_into_net(opt, param_dict) | |||
| load_param_into_net(net, param_dict) | |||
| ``` | |||
| 3. The original dataset path needs to be in the config.py,you can select "coco_root" or "image_dir". | |||
| ### Result | |||
| @@ -47,6 +47,12 @@ Faster R-CNN是一个两阶段目标检测网络,该网络采用RPN,可以 | |||
| # 环境要求 | |||
| - 硬件(Ascend) | |||
| - 使用Ascend处理器来搭建硬件环境。如需试用Ascend处理器,请发送[申请表](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx)至ascend@huawei.com,审核通过即可获得资源。 | |||
| - 获取基础镜像 | |||
| - [Ascend Hub](ascend.huawei.com/ascendhub/#/home) | |||
| - 安装[MindSpore](https://www.mindspore.cn/install)。 | |||
| - 下载数据集COCO 2017。 | |||
| @@ -107,6 +113,39 @@ sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] | |||
| ``` | |||
| # 在docker上运行 | |||
| 1. 编译镜像 | |||
| ```shell | |||
| # 编译镜像 | |||
| docker build -t fasterrcnn:20.1.0 . --build-arg FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0 | |||
| ``` | |||
| 2. 启动容器实例 | |||
| ```shell | |||
| # 启动容器实例 | |||
| bash scripts/docker_start.sh fasterrcnn:20.1.0 [DATA_DIR] [MODEL_DIR] | |||
| ``` | |||
| 3. 训练 | |||
| ```shell | |||
| # 单机训练 | |||
| sh run_standalone_train_ascend.sh [PRETRAINED_MODEL] | |||
| # 分布式训练 | |||
| sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| ``` | |||
| 4. 评估 | |||
| ```shell | |||
| # 评估 | |||
| sh run_eval_ascend.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] | |||
| ``` | |||
| # 脚本说明 | |||
| ## 脚本及样例代码 | |||
| @@ -153,9 +192,36 @@ sh run_standalone_train_ascend.sh [PRETRAINED_MODEL] | |||
| sh run_distribute_train_ascend.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| ``` | |||
| > 运行分布式任务时需要用到RANK_TABLE_FILE指定的rank_table.json。您可以使用[hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)生成该文件。 | |||
| > PRETRAINED_MODEL应该是在ImageNet 2012上训练的ResNet-50检查点。现成的pretrained_models目前不可用。敬请期待。 | |||
| > config.py中包含原数据集路径,可以选择“coco_root”或“image_dir”。 | |||
| Notes: | |||
| 1. 运行分布式任务时需要用到RANK_TABLE_FILE指定的rank_table.json。您可以使用[hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools)生成该文件。 | |||
| 2. PRETRAINED_MODEL应该是训练好的ResNet-50检查点。如果需要加载训练好的FasterRcnn的检查点,需要对train.py作如下修改: | |||
| ```python | |||
| # 注释掉如下代码 | |||
| # load_path = args_opt.pre_trained | |||
| # if load_path != "": | |||
| # param_dict = load_checkpoint(load_path) | |||
| # for item in list(param_dict.keys()): | |||
| # if not item.startswith('backbone'): | |||
| # param_dict.pop(item) | |||
| # load_param_into_net(net, param_dict) | |||
| # 加载训练好的FasterRcnn检查点时需加载网络参数和优化器到模型,因此可以在定义优化器后添加如下代码: | |||
| lr = Tensor(dynamic_lr(config, rank_size=device_num), mstype.float32) | |||
| opt = SGD(params=net.trainable_params(), learning_rate=lr, momentum=config.momentum, | |||
| weight_decay=config.weight_decay, loss_scale=config.loss_scale) | |||
| if load_path != "": | |||
| param_dict = load_checkpoint(load_path) | |||
| for item in list(param_dict.keys()): | |||
| if item in ("global_step", "learning_rate") or "rcnn.reg_scores" in item or "rcnn.cls_scores" in item: | |||
| param_dict.pop(item) | |||
| load_param_into_net(opt, param_dict) | |||
| load_param_into_net(net, param_dict) | |||
| ``` | |||
| 3. config.py中包含原数据集路径,可以选择“coco_root”或“image_dir”。 | |||
| ### 结果 | |||
| @@ -0,0 +1,3 @@ | |||
| Cython | |||
| pycocotools | |||
| mmcv==0.2.14 | |||
| @@ -0,0 +1,28 @@ | |||
| #!/bin/bash | |||
| docker_image=$1 | |||
| data_dir=$2 | |||
| model_dir=$3 | |||
| docker run -it --ipc=host \ | |||
| --device=/dev/davinci0 \ | |||
| --device=/dev/davinci1 \ | |||
| --device=/dev/davinci2 \ | |||
| --device=/dev/davinci3 \ | |||
| --device=/dev/davinci4 \ | |||
| --device=/dev/davinci5 \ | |||
| --device=/dev/davinci6 \ | |||
| --device=/dev/davinci7 \ | |||
| --device=/dev/davinci_manager \ | |||
| --device=/dev/devmm_svm \ | |||
| --device=/dev/hisi_hdc \ | |||
| -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ | |||
| -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons \ | |||
| -v ${data_dir}:${data_dir} \ | |||
| -v ${model_dir}:${model_dir} \ | |||
| -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \ | |||
| -v /var/log/npu/slog/:/var/log/npu/slog/ \ | |||
| -v /var/log/npu/profiling/:/var/log/npu/profiling \ | |||
| -v /var/log/npu/dump/:/var/log/npu/dump \ | |||
| -v /var/log/npu/:/usr/slog ${docker_image} \ | |||
| /bin/bash | |||
| @@ -0,0 +1,5 @@ | |||
| ARG FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0 | |||
| FROM ${FROM_IMAGE_NAME} | |||
| COPY requirements.txt . | |||
| RUN pip3.7 install -r requirements.txt | |||
| @@ -55,6 +55,8 @@ Note that you can run the scripts based on the dataset mentioned in original pap | |||
| - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources. | |||
| - Framework | |||
| - [MindSpore](https://gitee.com/mindspore/mindspore) | |||
| - Docker base image | |||
| - [Ascend Hub](ascend.huawei.com/ascendhub/#/home) | |||
| - For more information, please check the resources below: | |||
| - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html) | |||
| - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html) | |||
| @@ -120,6 +122,39 @@ pip install mmcv=0.2.14 | |||
| Note: | |||
| 1. VALIDATION_JSON_FILE is a label json file for evaluation. | |||
| # Run in docker | |||
| 1. Build docker images | |||
| ```shell | |||
| # build docker | |||
| docker build -t maskrcnn:20.1.0 . --build-arg FROM_IMAGE_NAME=ascend-mindspore-arm:20.1.0 | |||
| ``` | |||
| 2. Create a container layer over the created image and start it | |||
| ```shell | |||
| # start docker | |||
| bash scripts/docker_start.sh maskrcnn:20.1.0 [DATA_DIR] [MODEL_DIR] | |||
| ``` | |||
| 3. Train | |||
| ```shell | |||
| # standalone training | |||
| bash run_standalone_train.sh [PRETRAINED_CKPT] | |||
| # distributed training | |||
| bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT] | |||
| ``` | |||
| 4. Eval | |||
| ```shell | |||
| # Evaluation | |||
| bash run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH] | |||
| ``` | |||
| # [Script Description](#contents) | |||
| ## [Script and Sample Code](#contents) | |||
| @@ -336,9 +371,37 @@ bash run_standalone_train.sh [PRETRAINED_MODEL] | |||
| bash run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL] | |||
| ``` | |||
| > hccl.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). | |||
| > As for PRETRAINED_MODEL, if not set, the model will be trained from the very beginning. Ready-made pretrained_models are not available now. Stay tuned. | |||
| > This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` | |||
| - Notes | |||
| 1. hccl.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools). | |||
| 2. As for PRETRAINED_MODEL,it should be a trained ResNet50 checkpoint. If not set, the model will be trained from the very beginning. If you need to load Ready-made pretrained FasterRcnn checkpoint, you may make changes to the train.py script as follows. | |||
| ```python | |||
| # Comment out the following code | |||
| # load_path = args_opt.pre_trained | |||
| # if load_path != "": | |||
| # param_dict = load_checkpoint(load_path) | |||
| # for item in list(param_dict.keys()): | |||
| # if not item.startswith('backbone'): | |||
| # param_dict.pop(item) | |||
| # load_param_into_net(net, param_dict) | |||
| # Add the following codes after optimizer definition since the FasterRcnn checkpoint includes optimizer parameters: | |||
| lr = Tensor(dynamic_lr(config, rank_size=device_num, start_steps=config.pretrain_epoch_size * dataset_size), | |||
| mstype.float32) | |||
| opt = Momentum(params=net.trainable_params(), learning_rate=lr, momentum=config.momentum, | |||
| weight_decay=config.weight_decay, loss_scale=config.loss_scale) | |||
| if load_path != "": | |||
| param_dict = load_checkpoint(load_path) | |||
| if config.pretrain_epoch_size == 0: | |||
| for item in list(param_dict.keys()): | |||
| if item in ("global_step", "learning_rate") or "rcnn.cls" in item or "rcnn.mask" in item: | |||
| param_dict.pop(item) | |||
| load_param_into_net(net, param_dict) | |||
| load_param_into_net(opt, param_dict) | |||
| ``` | |||
| 3. This is processor cores binding operation regarding the `device_num` and total processor numbers. If you are not expect to do it, remove the operations `taskset` in `scripts/run_distribute_train.sh` | |||
| ### [Training Result](#content) | |||
| @@ -0,0 +1,3 @@ | |||
| Cython | |||
| pycocotools | |||
| mmcv==0.2.14 | |||
| @@ -0,0 +1,28 @@ | |||
| #!/bin/bash | |||
| docker_image=$1 | |||
| data_dir=$2 | |||
| model_dir=$3 | |||
| docker run -it --ipc=host \ | |||
| --device=/dev/davinci0 \ | |||
| --device=/dev/davinci1 \ | |||
| --device=/dev/davinci2 \ | |||
| --device=/dev/davinci3 \ | |||
| --device=/dev/davinci4 \ | |||
| --device=/dev/davinci5 \ | |||
| --device=/dev/davinci6 \ | |||
| --device=/dev/davinci7 \ | |||
| --device=/dev/davinci_manager \ | |||
| --device=/dev/devmm_svm \ | |||
| --device=/dev/hisi_hdc \ | |||
| -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ | |||
| -v /usr/local/Ascend/add-ons/:/usr/local/Ascend/add-ons \ | |||
| -v ${data_dir}:${data_dir} \ | |||
| -v ${model_dir}:${model_dir} \ | |||
| -v /var/log/npu/conf/slog/slog.conf:/var/log/npu/conf/slog/slog.conf \ | |||
| -v /var/log/npu/slog/:/var/log/npu/slog/ \ | |||
| -v /var/log/npu/profiling/:/var/log/npu/profiling \ | |||
| -v /var/log/npu/dump/:/var/log/npu/dump \ | |||
| -v /var/log/npu/:/usr/slog ${docker_image} \ | |||
| /bin/bash | |||