You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

robustness_benchmarking.md 5.6 kB

2 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
  1. # Corruption Benchmarking
  2. ## Introduction
  3. We provide tools to test object detection and instance segmentation models on the image corruption benchmark defined in [Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming](https://arxiv.org/abs/1907.07484).
  4. This page provides basic tutorials how to use the benchmark.
  5. ```latex
  6. @article{michaelis2019winter,
  7. title={Benchmarking Robustness in Object Detection:
  8. Autonomous Driving when Winter is Coming},
  9. author={Michaelis, Claudio and Mitzkus, Benjamin and
  10. Geirhos, Robert and Rusak, Evgenia and
  11. Bringmann, Oliver and Ecker, Alexander S. and
  12. Bethge, Matthias and Brendel, Wieland},
  13. journal={arXiv:1907.07484},
  14. year={2019}
  15. }
  16. ```
  17. ![image corruption example](../resources/corruptions_sev_3.png)
  18. ## About the benchmark
  19. To submit results to the benchmark please visit the [benchmark homepage](https://github.com/bethgelab/robust-detection-benchmark)
  20. The benchmark is modelled after the [imagenet-c benchmark](https://github.com/hendrycks/robustness) which was originally
  21. published in [Benchmarking Neural Network Robustness to Common Corruptions and Perturbations](https://arxiv.org/abs/1903.12261) (ICLR 2019) by Dan Hendrycks and Thomas Dietterich.
  22. The image corruption functions are included in this library but can be installed separately using:
  23. ```shell
  24. pip install imagecorruptions
  25. ```
  26. Compared to imagenet-c a few changes had to be made to handle images of arbitrary size and greyscale images.
  27. We also modified the 'motion blur' and 'snow' corruptions to remove dependency from a linux specific library,
  28. which would have to be installed separately otherwise. For details please refer to the [imagecorruptions repository](https://github.com/bethgelab/imagecorruptions).
  29. ## Inference with pretrained models
  30. We provide a testing script to evaluate a models performance on any combination of the corruptions provided in the benchmark.
  31. ### Test a dataset
  32. - [x] single GPU testing
  33. - [ ] multiple GPU testing
  34. - [ ] visualize detection results
  35. You can use the following commands to test a models performance under the 15 corruptions used in the benchmark.
  36. ```shell
  37. # single-gpu testing
  38. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
  39. ```
  40. Alternatively different group of corruptions can be selected.
  41. ```shell
  42. # noise
  43. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions noise
  44. # blur
  45. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions blur
  46. # wetaher
  47. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions weather
  48. # digital
  49. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions digital
  50. ```
  51. Or a costom set of corruptions e.g.:
  52. ```shell
  53. # gaussian noise, zoom blur and snow
  54. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --corruptions gaussian_noise zoom_blur snow
  55. ```
  56. Finally the corruption severities to evaluate can be chosen.
  57. Severity 0 corresponds to clean data and the effect increases from 1 to 5.
  58. ```shell
  59. # severity 1
  60. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --severities 1
  61. # severities 0,2,4
  62. python tools/analysis_tools/test_robustness.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] --severities 0 2 4
  63. ```
  64. ## Results for modelzoo models
  65. The results on COCO 2017val are shown in the below table.
  66. Model | Backbone | Style | Lr schd | box AP clean | box AP corr. | box % | mask AP clean | mask AP corr. | mask % |
  67. :-----:|:---------:|:-------:|:-------:|:------------:|:------------:|:-----:|:-------------:|:-------------:|:------:|
  68. Faster R-CNN | R-50-FPN | pytorch | 1x | 36.3 | 18.2 | 50.2 | - | - | - |
  69. Faster R-CNN | R-101-FPN | pytorch | 1x | 38.5 | 20.9 | 54.2 | - | - | - |
  70. Faster R-CNN | X-101-32x4d-FPN | pytorch |1x | 40.1 | 22.3 | 55.5 | - | - | - |
  71. Faster R-CNN | X-101-64x4d-FPN | pytorch |1x | 41.3 | 23.4 | 56.6 | - | - | - |
  72. Faster R-CNN | R-50-FPN-DCN | pytorch | 1x | 40.0 | 22.4 | 56.1 | - | - | - |
  73. Faster R-CNN | X-101-32x4d-FPN-DCN | pytorch | 1x | 43.4 | 26.7 | 61.6 | - | - | - |
  74. Mask R-CNN | R-50-FPN | pytorch | 1x | 37.3 | 18.7 | 50.1 | 34.2 | 16.8 | 49.1 |
  75. Mask R-CNN | R-50-FPN-DCN | pytorch | 1x | 41.1 | 23.3 | 56.7 | 37.2 | 20.7 | 55.7 |
  76. Cascade R-CNN | R-50-FPN | pytorch | 1x | 40.4 | 20.1 | 49.7 | - | - | - |
  77. Cascade Mask R-CNN | R-50-FPN | pytorch | 1x| 41.2 | 20.7 | 50.2 | 35.7 | 17.6 | 49.3 |
  78. RetinaNet | R-50-FPN | pytorch | 1x | 35.6 | 17.8 | 50.1 | - | - | - |
  79. Hybrid Task Cascade | X-101-64x4d-FPN-DCN | pytorch | 1x | 50.6 | 32.7 | 64.7 | 43.8 | 28.1 | 64.0 |
  80. Results may vary slightly due to the stochastic application of the corruptions.

No Description

Contributors (1)