You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

benchmarks.md 8.0 kB

3 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215
  1. # Benchmarks
  2. Here we benchmark the training speed of a Mask R-CNN in detectron2,
  3. with some other popular open source Mask R-CNN implementations.
  4. ### Settings
  5. * Hardware: 8 NVIDIA V100s with NVLink.
  6. * Software: Python 3.7, CUDA 10.0, cuDNN 7.6.4, PyTorch 1.3.0 (at
  7. [this link](https://download.pytorch.org/whl/nightly/cu100/torch-1.3.0%2Bcu100-cp37-cp37m-linux_x86_64.whl)),
  8. TensorFlow 1.5.0rc2, Keras 2.2.5, MxNet 1.6.0b20190820.
  9. * Model: an end-to-end R-50-FPN Mask-RCNN model, using the same hyperparameter as the
  10. [Detectron baseline config](https://github.com/facebookresearch/Detectron/blob/master/configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml).
  11. * Metrics: We use the average throughput in iterations 100-500 to skip GPU warmup time.
  12. Note that for R-CNN-style models, the throughput of a model typically changes during training, because
  13. it depends on the predictions of the model. Therefore this metric is not directly comparable with
  14. "train speed" in model zoo, which is the average speed of the entire training run.
  15. ### Main Results
  16. ```eval_rst
  17. +-----------------------------+--------------------+
  18. | Implementation | Throughput (img/s) |
  19. +=============================+====================+
  20. | Detectron2 | 59 |
  21. +-----------------------------+--------------------+
  22. | maskrcnn-benchmark_ | 51 |
  23. +-----------------------------+--------------------+
  24. | tensorpack_ | 50 |
  25. +-----------------------------+--------------------+
  26. | mmdetection_ | 41 |
  27. +-----------------------------+--------------------+
  28. | simpledet_ | 39 |
  29. +-----------------------------+--------------------+
  30. | Detectron_ | 19 |
  31. +-----------------------------+--------------------+
  32. | `matterport/Mask_RCNN`__ | 14 |
  33. +-----------------------------+--------------------+
  34. .. _maskrcnn-benchmark: https://github.com/facebookresearch/maskrcnn-benchmark/
  35. .. _tensorpack: https://github.com/tensorpack/tensorpack/tree/master/examples/FasterRCNN
  36. .. _mmdetection: https://github.com/open-mmlab/mmdetection/
  37. .. _simpledet: https://github.com/TuSimple/simpledet/
  38. .. _Detectron: https://github.com/facebookresearch/Detectron
  39. __ https://github.com/matterport/Mask_RCNN/
  40. ```
  41. Details for each implementation:
  42. * __Detectron2__:
  43. ```
  44. python tools/train_net.py --config-file configs/Detectron1-Comparisons/mask_rcnn_R_50_FPN_noaug_1x.yaml --num-gpus 8
  45. ```
  46. * __maskrcnn-benchmark__: use commit `0ce8f6f` with `sed -i ‘s/torch.uint8/torch.bool/g’ **/*.py` to make it compatible with latest PyTorch.
  47. Then, run training with
  48. ```
  49. python -m torch.distributed.launch --nproc_per_node=8 tools/train_net.py --config-file configs/e2e_mask_rcnn_R_50_FPN_1x.yaml
  50. ```
  51. The speed we observed is faster than its model zoo, likely due to different software versions.
  52. * __tensorpack__: at commit `caafda`, `export TF_CUDNN_USE_AUTOTUNE=0`, then run
  53. ```
  54. mpirun -np 8 ./train.py --config DATA.BASEDIR=/data/coco TRAINER=horovod BACKBONE.STRIDE_1X1=True TRAIN.STEPS_PER_EPOCH=50 --load ImageNet-R50-AlignPadding.npz
  55. ```
  56. * __mmdetection__: at commit `4d9a5f`, apply the following diff, then run
  57. ```
  58. ./tools/dist_train.sh configs/mask_rcnn_r50_fpn_1x.py 8
  59. ```
  60. The speed we observed is faster than its model zoo, likely due to different software versions.
  61. <details>
  62. <summary>
  63. (diff to make it use the same architecture - click to expand)
  64. </summary>
  65. ```diff
  66. diff --git i/configs/mask_rcnn_r50_fpn_1x.py w/configs/mask_rcnn_r50_fpn_1x.py
  67. index 04f6d22..ed721f2 100644
  68. --- i/configs/mask_rcnn_r50_fpn_1x.py
  69. +++ w/configs/mask_rcnn_r50_fpn_1x.py
  70. @@ -1,14 +1,15 @@
  71. # model settings
  72. model = dict(
  73. type='MaskRCNN',
  74. - pretrained='torchvision://resnet50',
  75. + pretrained='open-mmlab://resnet50_caffe',
  76. backbone=dict(
  77. type='ResNet',
  78. depth=50,
  79. num_stages=4,
  80. out_indices=(0, 1, 2, 3),
  81. frozen_stages=1,
  82. - style='pytorch'),
  83. + norm_cfg=dict(type="BN", requires_grad=False),
  84. + style='caffe'),
  85. neck=dict(
  86. type='FPN',
  87. in_channels=[256, 512, 1024, 2048],
  88. @@ -115,7 +116,7 @@ test_cfg = dict(
  89. dataset_type = 'CocoDataset'
  90. data_root = 'data/coco/'
  91. img_norm_cfg = dict(
  92. - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
  93. + mean=[123.675, 116.28, 103.53], std=[1.0, 1.0, 1.0], to_rgb=False)
  94. train_pipeline = [
  95. dict(type='LoadImageFromFile'),
  96. dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
  97. ```
  98. </details>
  99. * __SimpleDet__: at commit `9187a1`, run
  100. ```
  101. python detection_train.py --config config/mask_r50v1_fpn_1x.py
  102. ```
  103. * __Detectron__: run
  104. ```
  105. python tools/train_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_1x.yaml
  106. ```
  107. Note that many of its ops run on CPUs, therefore the performance is limited.
  108. * __matterport/Mask_RCNN__: at commit `3deaec`, apply the following diff, `export TF_CUDNN_USE_AUTOTUNE=0`, then run
  109. ```
  110. python coco.py train --dataset=/data/coco/ --model=imagenet
  111. ```
  112. Note that many small details in this implementation might be different
  113. from Detectron's standards.
  114. <details>
  115. <summary>
  116. (diff to make it use the same hyperparameters - click to expand)
  117. </summary>
  118. ```diff
  119. diff --git i/mrcnn/model.py w/mrcnn/model.py
  120. index 62cb2b0..61d7779 100644
  121. --- i/mrcnn/model.py
  122. +++ w/mrcnn/model.py
  123. @@ -2367,8 +2367,8 @@ class MaskRCNN():
  124. epochs=epochs,
  125. steps_per_epoch=self.config.STEPS_PER_EPOCH,
  126. callbacks=callbacks,
  127. - validation_data=val_generator,
  128. - validation_steps=self.config.VALIDATION_STEPS,
  129. + #validation_data=val_generator,
  130. + #validation_steps=self.config.VALIDATION_STEPS,
  131. max_queue_size=100,
  132. workers=workers,
  133. use_multiprocessing=True,
  134. diff --git i/mrcnn/parallel_model.py w/mrcnn/parallel_model.py
  135. index d2bf53b..060172a 100644
  136. --- i/mrcnn/parallel_model.py
  137. +++ w/mrcnn/parallel_model.py
  138. @@ -32,6 +32,7 @@ class ParallelModel(KM.Model):
  139. keras_model: The Keras model to parallelize
  140. gpu_count: Number of GPUs. Must be > 1
  141. """
  142. + super().__init__()
  143. self.inner_model = keras_model
  144. self.gpu_count = gpu_count
  145. merged_outputs = self.make_parallel()
  146. diff --git i/samples/coco/coco.py w/samples/coco/coco.py
  147. index 5d172b5..239ed75 100644
  148. --- i/samples/coco/coco.py
  149. +++ w/samples/coco/coco.py
  150. @@ -81,7 +81,10 @@ class CocoConfig(Config):
  151. IMAGES_PER_GPU = 2
  152. # Uncomment to train on 8 GPUs (default is 1)
  153. - # GPU_COUNT = 8
  154. + GPU_COUNT = 8
  155. + BACKBONE = "resnet50"
  156. + STEPS_PER_EPOCH = 50
  157. + TRAIN_ROIS_PER_IMAGE = 512
  158. # Number of classes (including background)
  159. NUM_CLASSES = 1 + 80 # COCO has 80 classes
  160. @@ -496,29 +499,10 @@ if __name__ == '__main__':
  161. # *** This training schedule is an example. Update to your needs ***
  162. # Training - Stage 1
  163. - print("Training network heads")
  164. model.train(dataset_train, dataset_val,
  165. learning_rate=config.LEARNING_RATE,
  166. epochs=40,
  167. - layers='heads',
  168. - augmentation=augmentation)
  169. -
  170. - # Training - Stage 2
  171. - # Finetune layers from ResNet stage 4 and up
  172. - print("Fine tune Resnet stage 4 and up")
  173. - model.train(dataset_train, dataset_val,
  174. - learning_rate=config.LEARNING_RATE,
  175. - epochs=120,
  176. - layers='4+',
  177. - augmentation=augmentation)
  178. -
  179. - # Training - Stage 3
  180. - # Fine tune all layers
  181. - print("Fine tune all layers")
  182. - model.train(dataset_train, dataset_val,
  183. - learning_rate=config.LEARNING_RATE / 10,
  184. - epochs=160,
  185. - layers='all',
  186. + layers='3+',
  187. augmentation=augmentation)
  188. elif args.command == "evaluate":
  189. ```
  190. </details>

No Description