|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275 |
- # 3: Train with customized models and standard datasets
-
- In this note, you will know how to train, test and inference your own customized models under standard datasets. We use the cityscapes dataset to train a customized Cascade Mask R-CNN R50 model as an example to demonstrate the whole process, which using [`AugFPN`](https://github.com/Gus-Guo/AugFPN) to replace the default `FPN` as neck, and add `Rotate` or `Translate` as training-time auto augmentation.
-
- The basic steps are as below:
-
- 1. Prepare the standard dataset
- 2. Prepare your own customized model
- 3. Prepare a config
- 4. Train, test, and inference models on the standard dataset.
-
- ## Prepare the standard dataset
-
- In this note, as we use the standard cityscapes dataset as an example.
-
- It is recommended to symlink the dataset root to `$MMDETECTION/data`.
- If your folder structure is different, you may need to change the corresponding paths in config files.
-
- ```none
- mmdetection
- ├── mmdet
- ├── tools
- ├── configs
- ├── data
- │ ├── coco
- │ │ ├── annotations
- │ │ ├── train2017
- │ │ ├── val2017
- │ │ ├── test2017
- │ ├── cityscapes
- │ │ ├── annotations
- │ │ ├── leftImg8bit
- │ │ │ ├── train
- │ │ │ ├── val
- │ │ ├── gtFine
- │ │ │ ├── train
- │ │ │ ├── val
- │ ├── VOCdevkit
- │ │ ├── VOC2007
- │ │ ├── VOC2012
-
- ```
-
- The cityscapes annotations have to be converted into the coco format using `tools/dataset_converters/cityscapes.py`:
-
- ```shell
- pip install cityscapesscripts
- python tools/dataset_converters/cityscapes.py ./data/cityscapes --nproc 8 --out-dir ./data/cityscapes/annotations
- ```
-
- Currently the config files in `cityscapes` use COCO pre-trained weights to initialize.
- You could download the pre-trained models in advance if network is unavailable or slow, otherwise it would cause errors at the beginning of training.
-
- ## Prepare your own customized model
-
- The second step is to use your own module or training setting. Assume that we want to implement a new neck called `AugFPN` to replace with the default `FPN` under the existing detector Cascade Mask R-CNN R50. The following implements`AugFPN` under MMDetection.
-
- ### 1. Define a new neck (e.g. AugFPN)
-
- Firstly create a new file `mmdet/models/necks/augfpn.py`.
-
- ```python
- from ..builder import NECKS
-
- @NECKS.register_module()
- class AugFPN(nn.Module):
-
- def __init__(self,
- in_channels,
- out_channels,
- num_outs,
- start_level=0,
- end_level=-1,
- add_extra_convs=False):
- pass
-
- def forward(self, inputs):
- # implementation is ignored
- pass
- ```
-
- ### 2. Import the module
-
- You can either add the following line to `mmdet/models/necks/__init__.py`,
-
- ```python
- from .augfpn import AugFPN
- ```
-
- or alternatively add
-
- ```python
- custom_imports = dict(
- imports=['mmdet.models.necks.augfpn.py'],
- allow_failed_imports=False)
- ```
-
- to the config file and avoid modifying the original code.
-
- ### 3. Modify the config file
-
- ```python
- neck=dict(
- type='AugFPN',
- in_channels=[256, 512, 1024, 2048],
- out_channels=256,
- num_outs=5)
- ```
-
- For more detailed usages about customize your own models (e.g. implement a new backbone, head, loss, etc) and runtime training settings (e.g. define a new optimizer, use gradient clip, customize training schedules and hooks, etc), please refer to the guideline [Customize Models](tutorials/customize_models.md) and [Customize Runtime Settings](tutorials/customize_runtime.md) respectively.
-
- ## Prepare a config
-
- The third step is to prepare a config for your own training setting. Assume that we want to add `AugFPN` and `Rotate` or `Translate` augmentation to existing Cascade Mask R-CNN R50 to train the cityscapes dataset, and assume the config is under directory `configs/cityscapes/` and named as `cascade_mask_rcnn_r50_augfpn_autoaug_10e_cityscapes.py`, the config is as below.
-
- ```python
- # The new config inherits the base configs to highlight the necessary modification
- _base_ = [
- '../_base_/models/cascade_mask_rcnn_r50_fpn.py',
- '../_base_/datasets/cityscapes_instance.py', '../_base_/default_runtime.py'
- ]
-
- model = dict(
- # set None to avoid loading ImageNet pretrained backbone,
- # instead here we set `load_from` to load from COCO pretrained detectors.
- backbone=dict(init_cfg=None),
- # replace neck from defaultly `FPN` to our new implemented module `AugFPN`
- neck=dict(
- type='AugFPN',
- in_channels=[256, 512, 1024, 2048],
- out_channels=256,
- num_outs=5),
- # We also need to change the num_classes in head from 80 to 8, to match the
- # cityscapes dataset's annotation. This modification involves `bbox_head` and `mask_head`.
- roi_head=dict(
- bbox_head=[
- dict(
- type='Shared2FCBBoxHead',
- in_channels=256,
- fc_out_channels=1024,
- roi_feat_size=7,
- # change the number of classes from defaultly COCO to cityscapes
- num_classes=8,
- bbox_coder=dict(
- type='DeltaXYWHBBoxCoder',
- target_means=[0., 0., 0., 0.],
- target_stds=[0.1, 0.1, 0.2, 0.2]),
- reg_class_agnostic=True,
- loss_cls=dict(
- type='CrossEntropyLoss',
- use_sigmoid=False,
- loss_weight=1.0),
- loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
- loss_weight=1.0)),
- dict(
- type='Shared2FCBBoxHead',
- in_channels=256,
- fc_out_channels=1024,
- roi_feat_size=7,
- # change the number of classes from defaultly COCO to cityscapes
- num_classes=8,
- bbox_coder=dict(
- type='DeltaXYWHBBoxCoder',
- target_means=[0., 0., 0., 0.],
- target_stds=[0.05, 0.05, 0.1, 0.1]),
- reg_class_agnostic=True,
- loss_cls=dict(
- type='CrossEntropyLoss',
- use_sigmoid=False,
- loss_weight=1.0),
- loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
- loss_weight=1.0)),
- dict(
- type='Shared2FCBBoxHead',
- in_channels=256,
- fc_out_channels=1024,
- roi_feat_size=7,
- # change the number of classes from defaultly COCO to cityscapes
- num_classes=8,
- bbox_coder=dict(
- type='DeltaXYWHBBoxCoder',
- target_means=[0., 0., 0., 0.],
- target_stds=[0.033, 0.033, 0.067, 0.067]),
- reg_class_agnostic=True,
- loss_cls=dict(
- type='CrossEntropyLoss',
- use_sigmoid=False,
- loss_weight=1.0),
- loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
- ],
- mask_head=dict(
- type='FCNMaskHead',
- num_convs=4,
- in_channels=256,
- conv_out_channels=256,
- # change the number of classes from defaultly COCO to cityscapes
- num_classes=8,
- loss_mask=dict(
- type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))))
-
- # over-write `train_pipeline` for new added `AutoAugment` training setting
- img_norm_cfg = dict(
- mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
- train_pipeline = [
- dict(type='LoadImageFromFile'),
- dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
- dict(
- type='AutoAugment',
- policies=[
- [dict(
- type='Rotate',
- level=5,
- img_fill_val=(124, 116, 104),
- prob=0.5,
- scale=1)
- ],
- [dict(type='Rotate', level=7, img_fill_val=(124, 116, 104)),
- dict(
- type='Translate',
- level=5,
- prob=0.5,
- img_fill_val=(124, 116, 104))
- ],
- ]),
- dict(
- type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True),
- dict(type='RandomFlip', flip_ratio=0.5),
- dict(type='Normalize', **img_norm_cfg),
- dict(type='Pad', size_divisor=32),
- dict(type='DefaultFormatBundle'),
- dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
- ]
-
- # set batch_size per gpu, and set new training pipeline
- data = dict(
- samples_per_gpu=1,
- workers_per_gpu=3,
- # over-write `pipeline` with new training pipeline setting
- train=dict(dataset=dict(pipeline=train_pipeline)))
-
- # Set optimizer
- optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
- optimizer_config = dict(grad_clip=None)
- # Set customized learning policy
- lr_config = dict(
- policy='step',
- warmup='linear',
- warmup_iters=500,
- warmup_ratio=0.001,
- step=[8])
- runner = dict(type='EpochBasedRunner', max_epochs=10)
-
- # We can use the COCO pretrained Cascade Mask R-CNN R50 model for more stable performance initialization
- load_from = 'https://download.openmmlab.com/mmdetection/v2.0/cascade_rcnn/cascade_mask_rcnn_r50_fpn_1x_coco/cascade_mask_rcnn_r50_fpn_1x_coco_20200203-9d4dcb24.pth'
- ```
-
- ## Train a new model
-
- To train a model with the new config, you can simply run
-
- ```shell
- python tools/train.py configs/cityscapes/cascade_mask_rcnn_r50_augfpn_autoaug_10e_cityscapes.py
- ```
-
- For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).
-
- ## Test and inference
-
- To test the trained model, you can simply run
-
- ```shell
- python tools/test.py configs/cityscapes/cascade_mask_rcnn_r50_augfpn_autoaug_10e_cityscapes.py work_dirs/cascade_mask_rcnn_r50_augfpn_autoaug_10e_cityscapes.py/latest.pth --eval bbox segm
- ```
-
- For more detailed usages, please refer to the [Case 1](1_exist_data_model.md).
|