Technical details of the implementation
The ResNet-based Keypoint Feature Pyramid Network (KFPN) that was proposed in RTM3D paper.
The unofficial implementation of the RTM3D paper by using PyTorch is here
Input:
(H, W, 3).Outputs:
(H/S, W/S, C) where S=4 (the down-sample ratio), and C=3 (the number of classes)(H/S, W/S, 2)(H/S, W/S, 2). The model estimates the imaginary and the real fraction (sin(yaw) and cos(yaw) values).(H/S, W/S, 3)z coordinate: (H/S, W/S, 1)Targets: 7 degrees of freedom (7-DOF) of objects: (cx, cy, cz, l, w, h, θ)
cx, cy, cz: The center coordinates.l, w, h: length, width, height of the bounding box.θ: The heading angle in radians of the bounding box.Objects: Cars, Pedestrians, Cyclists.
For main center heatmap: Used focal loss
For heading angle (yaw): The im and re fractions are directly regressed by using l1_loss
For z coordinate and 3 dimensions (height, width, length), I used balanced l1 loss that was proposed by the paper
Libra R-CNN: Towards Balanced Learning for Object Detection
=1.0 for all)cosine, initial learning rate: 0.001.16 (on a single GTX 1080Ti).3 × 3 max-pooling operation was applied on the center heat map, then only 50 predictions whosearctan(imaginary fraction / real fraction)