You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

models.md 4.4 kB

3 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
  1. # Use Models
  2. Models (and their sub-models) in detectron2 are built by
  3. functions such as `build_model`, `build_backbone`, `build_roi_heads`:
  4. ```python
  5. from detectron2.modeling import build_model
  6. model = build_model(cfg) # returns a torch.nn.Module
  7. ```
  8. To load an existing checkpoint to the model, use
  9. `DetectionCheckpointer(model).load(file_path)`.
  10. Detectron2 recognizes models in pytorch's `.pth` format, as well as the `.pkl` files
  11. in our model zoo.
  12. You can use a model by just `outputs = model(inputs)`.
  13. Next, we explain the inputs/outputs format used by the builtin models in detectron2.
  14. ### Model Input Format
  15. All builtin models take a `list[dict]` as the inputs. Each dict
  16. corresponds to information about one image.
  17. The dict may contain the following keys:
  18. * "image": `Tensor` in (C, H, W) format. The meaning of channels are defined by `cfg.INPUT.FORMAT`.
  19. * "instances": an `Instances` object, with the following fields:
  20. + "gt_boxes": `Boxes` object storing N boxes, one for each instance.
  21. + "gt_classes": `Tensor` of long type, a vector of N labels, in range [0, num_categories).
  22. + "gt_masks": a `PolygonMasks` object storing N masks, one for each instance.
  23. + "gt_keypoints": a `Keypoints` object storing N keypoint sets, one for each instance.
  24. * "proposals": an `Instances` object used in Fast R-CNN style models, with the following fields:
  25. + "proposal_boxes": `Boxes` object storing P proposal boxes.
  26. + "objectness_logits": `Tensor`, a vector of P scores, one for each proposal.
  27. * "height", "width": the *desired* output height and width of the image, not necessarily the same
  28. as the height or width of the `image` when input into the model, which might be after resizing.
  29. For example, it can be the *original* image height and width before resizing.
  30. If provided, the model will produce output in this resolution,
  31. rather than in the resolution of the `image` as input into the model. This is more efficient and accurate.
  32. * "sem_seg": `Tensor[int]` in (H, W) format. The semantic segmentation ground truth.
  33. #### How it connects to data loader:
  34. The output of the default [DatasetMapper]( ../modules/data.html#detectron2.data.DatasetMapper) is a dict
  35. that follows the above format.
  36. After the data loader performs batching, it becomes `list[dict]` which the builtin models support.
  37. ### Model Output Format
  38. When in training mode, the builtin models output a `dict[str->ScalarTensor]` with all the losses.
  39. When in inference mode, the builtin models output a `list[dict]`, one dict for each image. Each dict may contain:
  40. * "instances": [Instances](../modules/structures.html#detectron2.structures.Instances)
  41. object with the following fields:
  42. * "pred_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each detected instance.
  43. * "scores": `Tensor`, a vector of N scores.
  44. * "pred_classes": `Tensor`, a vector of N labels in range [0, num_categories).
  45. + "pred_masks": a `Tensor` of shape (N, H, W), masks for each detected instance.
  46. + "pred_keypoints": a `Tensor` of shape (N, num_keypoint, 3).
  47. Each row in the last dimension is (x, y, score).
  48. * "sem_seg": `Tensor` of (num_categories, H, W), the semantic segmentation prediction.
  49. * "proposals": [Instances](../modules/structures.html#detectron2.structures.Instances)
  50. object with the following fields:
  51. * "proposal_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes)
  52. object storing N boxes.
  53. * "objectness_logits": a torch vector of N scores.
  54. * "panoptic_seg": A tuple of (Tensor, list[dict]). The tensor has shape (H, W), where each element
  55. represent the segment id of the pixel. Each dict describes one segment id and has the following fields:
  56. * "id": the segment id
  57. * "isthing": whether the segment is a thing or stuff
  58. * "category_id": the category id of this segment. It represents the thing
  59. class id when `isthing==True`, and the stuff class id otherwise.
  60. ### How to use a model in your code:
  61. Contruct your own `list[dict]`, with the necessary keys.
  62. For example, for inference, provide dicts with "image", and optionally "height" and "width".
  63. Note that when in training mode, all models are required to be used under an `EventStorage`.
  64. The training statistics will be put into the storage:
  65. ```python
  66. from detectron2.utils.events import EventStorage
  67. with EventStorage() as storage:
  68. losses = model(inputs)
  69. ```

No Description