You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

readme.md 5.1 kB

4 years ago
4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
  1. # Contents
  2. - [TNT Description](#tnt-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Environment Requirements](#environment-requirements)
  6. - [Script Description](#script-description)
  7. - [Script and Sample Code](#script-and-sample-code)
  8. - [Training Process](#training-process)
  9. - [Evaluation Process](#evaluation-process)
  10. - [Evaluation](#evaluation)
  11. - [Model Description](#model-description)
  12. - [Performance](#performance)
  13. - [Training Performance](#evaluation-performance)
  14. - [Inference Performance](#evaluation-performance)
  15. - [Description of Random Situation](#description-of-random-situation)
  16. - [ModelZoo Homepage](#modelzoo-homepage)
  17. ## [TNT Description](#contents)
  18. The TNT (Transformer in Transformer) network is a pure transformer model for visual recognition. TNT treats an image as a sequence of patches and treats a patch as a sequence of pixels. TNT block utilizes a outer transformer block to process the sequence of patches and an inner transformer block to process the sequence of pixels.
  19. [Paper](https://arxiv.org/abs/2103.00112): Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang. Transformer in Transformer. preprint 2021.
  20. ## [Model architecture](#contents)
  21. The overall network architecture of TNT is show below:
  22. ![](./fig/tnt.PNG)
  23. ## [Dataset](#contents)
  24. Dataset used: [Oxford-IIIT Pet](https://www.robots.ox.ac.uk/~vgg/data/pets/)
  25. - Dataset size: 7049 colorful images in 1000 classes
  26. - Train: 3680 images
  27. - Test: 3369 images
  28. - Data format: RGB images.
  29. - Note: Data will be processed in src/dataset.py
  30. ## [Environment Requirements](#contents)
  31. - Hardware(Ascend/GPU)
  32. - Prepare hardware environment with Ascend or GPU.
  33. - Framework
  34. - [MindSpore](https://www.mindspore.cn/install/en)
  35. - For more information, please check the resources below£º
  36. - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  37. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  38. ## [Script description](#contents)
  39. ### [Script and sample code](#contents)
  40. ```python
  41. TNT
  42. ├── eval.py # inference entry
  43. ├── fig
  44. │ └── tnt.png # the illustration of TNT network
  45. ├── readme.md # Readme
  46. └── src
  47. ├── config.py # config of model and data
  48. ├── pet_dataset.py # dataset loader
  49. └── tnt.py # TNT network
  50. ```
  51. ## [Training process](#contents)
  52. To Be Done
  53. ## [Eval process](#contents)
  54. ### Usage
  55. After installing MindSpore via the official website, you can start evaluation as follows:
  56. ### Launch
  57. ```bash
  58. # infer example
  59. GPU: python eval.py --model tnt-b --dataset_path ~/Pets/test.mindrecord --platform GPU --checkpoint_path [CHECKPOINT_PATH]
  60. ```
  61. > checkpoint can be downloaded at https://www.mindspore.cn/resources/hub.
  62. ### Result
  63. ```bash
  64. result: {'acc': 0.95} ckpt= ./tnt-b-pets.ckpt
  65. ```
  66. ## [Model Description](#contents)
  67. ### [Performance](#contents)
  68. #### Evaluation Performance
  69. ##### TNT on ImageNet2012
  70. | Parameters | | |
  71. | -------------------------- | -------------------------------------- |---------------------------------- |
  72. | Model Version | TNT-B |TNT-S|
  73. | uploaded Date | 21/03/2021 (month/day/year) | 21/03/2021 (month/day/year) |
  74. | MindSpore Version | 1.1 | 1.1 |
  75. | Dataset | ImageNet2012 | ImageNet2012|
  76. | Input size | 224x224 | 224x224|
  77. | Parameters (M) | 86.4 | 23.8 |
  78. | FLOPs (M) | 14.1 | 5.2 |
  79. | Accuracy (Top1) | 82.8 | 81.3 |
  80. ###### TNT on Oxford-IIIT Pet
  81. | Parameters | | |
  82. | -------------------------- | -------------------------------------- |---------------------------------- |
  83. | Model Version | TNT-B |TNT-S|
  84. | uploaded Date | 21/03/2021 (month/day/year) | 21/03/2021 (month/day/year) |
  85. | MindSpore Version | 1.1 | 1.1 |
  86. | Dataset | Oxford-IIIT Pet | Oxford-IIIT Pet|
  87. | Input size | 384x384 | 384x384|
  88. | Parameters (M) | 86.4 | 23.8 |
  89. | Accuracy (Top1) | 95.0 | 94.7 |
  90. ## [Description of Random Situation](#contents)
  91. In dataset.py, we set the seed inside "create_dataset" function. We also use random seed in train.py.
  92. ## [ModelZoo Homepage](#contents)
  93. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).