MaskRcnn is a two-stage target detection network,This network uses a region proposal network (RPN), which can share the convolution features of the whole image with the detection network, so that the calculation of region proposal is almost cost free. The whole network further combines RPN and MaskRcnn into a network by sharing the convolution features.
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
# [MaskRCNN Description](#contents)
MaskRCNN is a conceptually simple, flexible, and general framework for object instance segmentation. The approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in
parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing to estimate human poses in the same framework.
It shows top results in all three tracks of the COCO suite of challenges, including instance segmentation, boundingbox object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners.
## Requirements
# [Model Architecture](#contents)
MaskRCNN is a two-stage target detection network. It extends FasterRCNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.This network uses a region proposal network (RPN), which can share the convolution features of the whole image with the detection network, so that the calculation of region proposal is almost cost free. The whole network further combines RPN and mask branch into a network by sharing the convolution features.
[Paper](http://cn.arxiv.org/pdf/1703.06870v3): Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick. "MaskRCNN"
- Download the dataset COCO2017.
# [Dataset](#contents)
- [COCO2017](https://cocodataset.org/) is a popular dataset with bounding-box and pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning. There are 118K/5K images for train/val.
- We use coco2017 as training dataset in this example by default, and you can also use your own datasets.
- Dataset size: 19G
- Train: 18G, 118000 images
- Val: 1G, 5000 images
- Annotations: 241M, instances, captions, person_keypoints, etc.
- Data format: image and json files
- Note: Data will be processed in dataset.py
1. If coco dataset is used. **Select dataset to coco when run script.**
Install Cython and pycocotool, and you can also install mmcv to process data.
# [Environment Requirements](#contents)
```
pip install Cython
- Hardware(Ascend)
- Prepare hardware environment with Ascend processor. If you want to try Ascend , please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get the resources.
Each row is an image annotation split by spaces. The first column is a relative path of image, followed by columns containing box and class information in the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `IMAGE_DIR`(dataset directory) and the relative path in `ANNO_PATH`(the TXT file path), which can be set in `config.py`.
```
.
└─cocodataset
├─annotations
├─instance_train2017.json
└─instance_val2017.json
├─val2017
└─train2017
```
Notice that the coco2017 dataset will be converted to MindRecord which is a data format in MindSpore. The dataset conversion may take about 4 hours.
2. If your own dataset is used. **Select dataset to other when run script.**
Organize the dataset infomation into a TXT file, each row in the file is as follows:
3. Execute train script.
After dataset preparation, you can start training as follows:
```
# distributed training
sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_CKPT]
# standalone training
sh run_standalone_train.sh [PRETRAINED_CKPT]
```
Note:
1. To speed up data preprocessing, MindSpore provide a data format named MindRecord, hence the first step is to generate MindRecord files based on COCO2017 dataset before training. The process of converting raw COCO2017 dataset to MindRecord format may take about 4 hours.
2. For distributed training, a [hccl configuration file](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools) with JSON format needs to be created in advance.
3. PRETRAINED_CKPT is a resnet50 checkpoint that trained over ImageNet2012.
After training, you can start evaluation as follows:
```bash
# Evaluation
sh run_eval.sh [VALIDATION_JSON_FILE] [CHECKPOINT_PATH]
```
Note:
1. VALIDATION_JSON_FILE is a label json file for evaluation.
Each row is an image annotation which split by space, the first column is a relative path of image, the others are box and class infomations of the format [xmin,ymin,xmax,ymax,class]. We read image from an image path joined by the `IMAGE_DIR`(dataset directory) and the relative path in `ANNO_PATH`(the TXT file path), `IMAGE_DIR` and `ANNO_PATH` are setting in `config.py`.
# [Script Description](#contents)
## [Script and Sample Code](#contents)
## Example structure
```shell
.
└─maskrcnn
├─README.md
├─scripts
├─run_download_process_data.sh
├─run_standalone_train.sh
├─run_train.sh
└─run_eval.sh
└─MaskRcnn
├─README.md # README
├─scripts # shell script
├─run_standalone_train.sh # training in standalone mode(1pcs)
├─run_distribute_train.sh # training in parallel mode(8 pcs)
└─run_eval.sh # evaluation
├─src
├─maskrcnn
├─__init__.py
├─anchor_generator.py
├─bbox_assign_sample.py
├─bbox_assign_sample_stage2.py
├─mask_rcnn_r50.py
├─fpn_neck.py
├─proposal_generator.py
├─rcnn_cls.py
├─rcnn_mask.py
├─resnet50.py
├─roi_align.py
└─rpn.py
├─config.py
├─dataset.py
├─lr_schedule.py
├─network_define.py
└─util.py
├─eval.py
└─train.py
├─anchor_generator.py # generate base bounding box anchors
├─bbox_assign_sample.py # filter positive and negative bbox for the first stage learning
├─bbox_assign_sample_stage2.py # filter positive and negative bbox for the second stage learning
├─mask_rcnn_r50.py # main network architecture of maskrcnn
├─fpn_neck.py # fpn network
├─proposal_generator.py # generate proposals based on feature map
- Set options in `config.py`, including loss_scale, learning rate and network hyperparameters. Click [here](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#mindspore) for more information about dataset.
### [Training](#content)
- Run `run_standalone_train.sh` for non-distributed training of MaskRCNN model.
```
# standalone training
sh run_standalone_train.sh [PRETRAINED_MODEL]
```
### [Distributed Training](#content)
- Run `run_distribute_train.sh` for distributed training of Mask model.
```
sh run_distribute_train.sh [RANK_TABLE_FILE] [PRETRAINED_MODEL]
```
> hccl.json which is specified by RANK_TABLE_FILE is needed when you are running a distribute task. You can generate it by using the [hccl_tools](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools).
> As for PRETRAINED_MODEL,if not set, the model will be trained from the very beginning.Ready-made pretrained_models are not available now. Stay tuned.
> As for PRETRAINED_MODEL, if not set, the model will be trained from the very beginning. Ready-made pretrained_models are not available now. Stay tuned.
### [Training Result](#content)
#### Result
Training result will be stored in the example path, whose folder name begins with "train" or "train_parallel". You can find checkpoint file together with result like the followings in loss.log.
> The [captcha](https://github.com/lepture/captcha) library can be used to generate captcha images. You can generate the train and test dataset by yourself or just run the script `scripts/run_process_data.sh`. By default, the shell script will generate 10000 test images and 50000 train images separately.
> ```
> $ cd scripts
> $ sh run_process_data.sh
>
> # after execution, you will find the dataset like the follows:
- [Description of Random Situation](#description-of-random-situation)
- [ModelZoo Homepage](#modelzoo-homepage)
# [WarpCTC Description](#contents)
This is an example of training WarpCTC with self-generated captcha image dataset in MindSpore.
# [Model Architecture](#content)
WarpCTC is a two-layer stacked LSTM appending with one-layer FC neural network. See src/warpctc.py for details.
# [Dataset](#content)
The dataset is self-generated using a third-party library called [captcha](https://github.com/lepture/captcha), which can randomly generate digits from 0 to 9 in image. In this network, we set the length of digits varying from 1 to 4.
# [Environment Requirements](#contents)
- Hardware(Ascend/GPU)
- Prepare hardware environment with Ascend or GPU processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. You will be able to have access to related resources once approved.
Run the script `scripts/run_process_data.sh` to generate a dataset. By default, the shell script will generate 10000 test images and 50000 train images separately.
```
$ cd scripts
$ sh run_process_data.sh
# after execution, you will find the dataset like the follows:
.
└─warpctc
└─data
├─ train # train dataset
└─ test # evaluate dataset
```
- After the dataset is prepared, you may start running the training or the evaluation scripts as follows:
- You may refer to "Generate dataset" in [Quick Start](#quick-start) to automatically generate a dataset, or you may choose to generate a captcha dataset by yourself.
- Set options in `config.py`, including learning rate and other network hyperparameters. Click [MindSpore dataset preparation tutorial](https://www.mindspore.cn/tutorial/zh-CN/master/use/data_preparation/loading_the_datasets.html#mindspore) for more information about dataset.
### [Training](#contents)
- Run `run_standalone_train.sh` for non-distributed training of WarpCTC model, either on Ascend or on GPU.
- Run `run_distribute_train.sh` for distributed training of WarpCTC model on Ascend.
> About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html).
#### Result
Training result will be stored in folder `scripts`, whose name begins with "train" or "train_parallel". Under this, you can find checkpoint file together with result like the followings in log.