Detectron2 addresses some legacy issues left in Detectron, as a result, their models
are not compatible:
running inference with the same model weights will produce different results in the two code bases.
The major differences regarding inference are:
The height and width of a box with corners (x1, y1) and (x2, y2) is now computed more naturally as
width = x2 - x1 and height = y2 - y1;
In Detectron, a "+ 1" was added both height and width.
Note that the relevant ops in Caffe2 have adopted this change of convention
with an extra option.
So it is still possible to run inference with a Detectron2-trained model in Caffe2.
The change in height/width calculations most notably changes:
RPN now uses simpler anchors with fewer quantization artifacts.
In Detectron, the anchors were quantized and
do not have accurate areas.
In Detectron2, the anchors are center-aligned to feature grid points and not quantized.
Classification layers have a different ordering of class labels.
This involves any trainable parameter with shape (..., num_categories + 1, ...).
In Detectron2, integer labels [0, K-1] correspond to the K = num_categories object categories
and the label "K" corresponds to the special "background" category.
In Detectron, label "0" means background, and labels [1, K] correspond to the K categories.
ROIAlign is implemented differently. The new implementation is available in Caffe2.
All the ROIs are shifted by half a pixel compared to Detectron in order to create better image-feature-map alignment.
See layers/roi_align.py for details.
To enable the old behavior, use ROIAlign(aligned=False), or POOLER_TYPE=ROIAlign instead of
ROIAlignV2 (the default).
The ROIs are not required to have a minimum size of 1.
This will lead to tiny differences in the output, but should be negligible.
Mask inference function is different.
In Detectron2, the "paste_mask" function is different and should be more accurate than in Detectron. This change
can improve mask AP on COCO by ~0.5% absolute.
There are some other differences in training as well, but they won't affect
model-level compatibility. The major ones are:
RPN.POST_NMS_TOPK_TRAIN per-image, rather than per-batch.[0, width] or [0, height], and the coordinates in[0, width - 1] or [0, height - 1].We will later share more details and rationale behind the above mentioned issues
about pixels, coordinates, and "+1"s.
As mentioned above, despite the incompatibilities with Detectron, the relevant
ops have been implemented in Caffe2, in PR1
and PR2.
Therefore, models trained with detectron2 can be used in Caffe2.
A conversion script will be available later, for easier deployment.
Most ops are available in TensorFlow, although some tiny differences in
the implementation of resize / ROIAlign / padding need to be addressed.
A working conversion script is provided by tensorpack FasterRCNN
to run a standard Detectron2 model in TensorFlow.