You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 15 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311
  1. # Contents
  2. - [Face Recognition Description](#Face-Recognition-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Environment Requirements](#environment-requirements)
  6. - [Script Description](#script-description)
  7. - [Script and Sample Code](#script-and-sample-code)
  8. - [Running Example](#running-example)
  9. - [Model Description](#model-description)
  10. - [Performance](#performance)
  11. - [ModelZoo Homepage](#modelzoo-homepage)
  12. # [Face Recognition Description](#contents)
  13. This is a face recognition network based on Resnet, with support for training and evaluation on Ascend910.
  14. ResNet (residual neural network) was proposed by Kaiming He and other four Chinese of Microsoft Research Institute. Through the use of ResNet unit, it successfully trained 152 layers of neural network, and won the championship in ilsvrc2015. The error rate on top 5 was 3.57%, and the parameter quantity was lower than vggnet, so the effect was very outstanding. Traditional convolution network or full connection network will have more or less information loss. At the same time, it will lead to the disappearance or explosion of gradient, which leads to the failure of deep network training. ResNet solves this problem to a certain extent. By passing the input information to the output, the integrity of the information is protected. The whole network only needs to learn the part of the difference between input and output, which simplifies the learning objectives and difficulties.The structure of ResNet can accelerate the training of neural network very quickly, and the accuracy of the model is also greatly improved. At the same time, ResNet is very popular, even can be directly used in the concept net network.
  15. [Paper](https://arxiv.org/pdf/1512.03385.pdf): Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. "Deep Residual Learning for Image Recognition"
  16. # [Model Architecture](#contents)
  17. Face Recognition uses a Resnet network for performing feature extraction, more details are show below:[Link](https://arxiv.org/pdf/1512.03385.pdf)
  18. # [Dataset](#contents)
  19. We use about 4.7 million face images as training dataset and 1.1 million as evaluating dataset in this example, and you can also use your own datasets or open source datasets (e.g. face_emore).
  20. The directory structure is as follows:
  21. ```python
  22. .
  23. └─ dataset
  24. ├─ train dataset
  25. ├─ ID1
  26. ├─ ID1_0001.jpg
  27. ├─ ID1_0002.jpg
  28. ...
  29. ├─ ID2
  30. ...
  31. ├─ ID3
  32. ...
  33. ...
  34. ├─ test dataset
  35. ├─ ID1
  36. ├─ ID1_0001.jpg
  37. ├─ ID1_0002.jpg
  38. ...
  39. ├─ ID2
  40. ...
  41. ├─ ID3
  42. ...
  43. ...
  44. ```
  45. # [Environment Requirements](#contents)
  46. - Hardware(Ascend)
  47. - Prepare hardware environment with Ascend processor.
  48. - Framework
  49. - [MindSpore](https://www.mindspore.cn/install/en)
  50. - For more information, please check the resources below:
  51. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
  52. - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
  53. # [Script Description](#contents)
  54. ## [Script and Sample Code](#contents)
  55. The entire code structure is as following:
  56. ```python
  57. └─ face_recognition
  58. ├── README.md // descriptions about face_recognition
  59. ├── scripts
  60. │ ├── run_distribute_train_base.sh // shell script for distributed training on Ascend
  61. │ ├── run_distribute_train_beta.sh // shell script for distributed training on Ascend
  62. │ ├── run_eval.sh // shell script for evaluation on Ascend
  63. │ ├── run_export.sh // shell script for exporting air model
  64. │ ├── run_standalone_train_base.sh // shell script for standalone training on Ascend
  65. │ ├── run_standalone_train_beta.sh // shell script for standalone training on Ascend
  66. ├── src
  67. │ ├── backbone
  68. │ │ ├── head.py // head unit
  69. │ │ ├── resnet.py // resnet architecture
  70. │ ├── callback_factory.py // callback logging
  71. │ ├── custom_dataset.py // custom dataset and sampler
  72. │ ├── custom_net.py // custom cell define
  73. │ ├── dataset_factory.py // creating dataset
  74. │ ├── init_network.py // init network parameter
  75. │ ├── my_logging.py // logging format setting
  76. │ ├── loss_factory.py // loss calculation
  77. │ ├── lrsche_factory.py // learning rate schedule
  78. │ ├── me_init.py // network parameter init method
  79. │ ├── metric_factory.py // metric fc layer
  80. ── utils
  81. │ ├── __init__.py // init file
  82. │ ├── config.py // parameter analysis
  83. │ ├── device_adapter.py // device adapter
  84. │ ├── local_adapter.py // local adapter
  85. │ ├── moxing_adapter.py // moxing adapter
  86. ├─ base_config.yaml // parameter configuration
  87. ├─ beta_config.yaml // parameter configuration
  88. ├─ inference_config.yaml // parameter configuration
  89. ├─ train.py // training scripts
  90. ├─ eval.py // evaluation scripts
  91. └─ export.py // export air model
  92. ```
  93. ## [Running Example](#contents)
  94. ### Train
  95. - Stand alone mode
  96. - base model
  97. ```bash
  98. cd ./scripts
  99. sh run_standalone_train_base.sh [USE_DEVICE_ID]
  100. ```
  101. for example:
  102. ```bash
  103. cd ./scripts
  104. sh run_standalone_train_base.sh 0
  105. ```
  106. - beta model
  107. ```bash
  108. cd ./scripts
  109. sh run_standalone_train_beta.sh [USE_DEVICE_ID]
  110. ```
  111. for example:
  112. ```bash
  113. cd ./scripts
  114. sh run_standalone_train_beta.sh 0
  115. ```
  116. - Distribute mode (recommended)
  117. - base model
  118. ```bash
  119. cd ./scripts
  120. sh run_distribute_train_base.sh [RANK_TABLE]
  121. ```
  122. for example:
  123. ```bash
  124. cd ./scripts
  125. sh run_distribute_train_base.sh ./rank_table_8p.json
  126. ```
  127. - beta model
  128. ```bash
  129. cd ./scripts
  130. sh run_distribute_train_beta.sh [RANK_TABLE]
  131. ```
  132. for example:
  133. ```bash
  134. cd ./scripts
  135. sh run_distribute_train_beta.sh ./rank_table_8p.json
  136. ```
  137. - ModelArts (If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start training as follows)
  138. - base model
  139. ```python
  140. # (1) Add "config_path='/path_to_code/base_config.yaml'" on the website UI interface.
  141. # (2) Perform a or b.
  142. # a. Set "enable_modelarts=True" on base_config.yaml file.
  143. # Set "is_distributed=1" on base_config.yaml file.
  144. # Set other parameters on base_config.yaml file you need.
  145. # b. Add "enable_modelarts=True" on the website UI interface.
  146. # Add "is_distributed=1" on the website UI interface.
  147. # Add other parameters on the website UI interface.
  148. # (3) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
  149. # (4) Set the code directory to "/path/FaceRecognition" on the website UI interface.
  150. # (5) Set the startup file to "train.py" on the website UI interface.
  151. # (6) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
  152. # (7) Create your job.
  153. ```
  154. - beta model
  155. ```python
  156. # (1) Copy or upload your trained model to S3 bucket.
  157. # (2) Add "config_path='/path_to_code/beta_config.yaml'" on the website UI interface.
  158. # (3) Perform a or b.
  159. # a. Set "enable_modelarts=True" on beta_config.yaml file.
  160. # Set "is_distributed=1" on base_config.yaml file.
  161. # Set "pretrained='/cache/checkpoint_path/model.ckpt'" on beta_config.yaml file.
  162. # Set "checkpoint_url=/The path of checkpoint in S3/" on beta_config.yaml file.
  163. # b. Add "enable_modelarts=True" on the website UI interface.
  164. # Add "is_distributed=1" on the website UI interface.
  165. # Add "pretrained='/cache/checkpoint_path/model.ckpt'" on default_config.yaml file.
  166. # Add "checkpoint_url=/The path of checkpoint in S3/" on default_config.yaml file.
  167. # (4) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
  168. # (5) Set the code directory to "/path/FaceRecognition" on the website UI interface.
  169. # (6) Set the startup file to "train.py" on the website UI interface.
  170. # (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
  171. # (8) Create your job.
  172. ```
  173. You will get the loss value of each epoch as following in "./scripts/data_parallel_log_[DEVICE_ID]/outputs/logs/[TIME].log" or "./scripts/log_parallel_graph/face_recognition_[DEVICE_ID].log":
  174. ```python
  175. epoch[0], iter[100], loss:(Tensor(shape=[], dtype=Float32, value= 50.2733), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 32768)), cur_lr:0.000660, mean_fps:743.09 imgs/sec
  176. epoch[0], iter[200], loss:(Tensor(shape=[], dtype=Float32, value= 49.3693), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 32768)), cur_lr:0.001314, mean_fps:4426.42 imgs/sec
  177. epoch[0], iter[300], loss:(Tensor(shape=[], dtype=Float32, value= 48.7081), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 16384)), cur_lr:0.001968, mean_fps:4428.09 imgs/sec
  178. epoch[0], iter[400], loss:(Tensor(shape=[], dtype=Float32, value= 45.7791), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 16384)), cur_lr:0.002622, mean_fps:4428.17 imgs/sec
  179. ...
  180. epoch[8], iter[27300], loss:(Tensor(shape=[], dtype=Float32, value= 2.13556), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 65536)), cur_lr:0.004000, mean_fps:4429.38 imgs/sec
  181. epoch[8], iter[27400], loss:(Tensor(shape=[], dtype=Float32, value= 2.36922), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 65536)), cur_lr:0.004000, mean_fps:4429.88 imgs/sec
  182. epoch[8], iter[27500], loss:(Tensor(shape=[], dtype=Float32, value= 2.08594), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 65536)), cur_lr:0.004000, mean_fps:4430.59 imgs/sec
  183. epoch[8], iter[27600], loss:(Tensor(shape=[], dtype=Float32, value= 2.38706), Tensor(shape=[], dtype=Bool, value= False), Tensor(shape=[], dtype=Float32, value= 65536)), cur_lr:0.004000, mean_fps:4430.37 imgs/sec
  184. ```
  185. ### Evaluation
  186. ```bash
  187. cd ./scripts
  188. sh run_eval.sh [USE_DEVICE_ID]
  189. ```
  190. You will get the result as following in "./scripts/log_inference/outputs/models/logs/[TIME].log":
  191. [test_dataset]: zj2jk=0.9495, jk2zj=0.9480, avg=0.9487
  192. If you want to run in modelarts, please check the official documentation of [modelarts](https://support.huaweicloud.com/modelarts/), and you can start evaluation as follows:
  193. ```python
  194. # run evaluation on modelarts example
  195. # (1) Copy or upload your trained model to S3 bucket.
  196. # (2) Add "config_path='/path_to_code/inference_config.yaml'" on the website UI interface.
  197. # (3) Perform a or b.
  198. # a. Set "weight='/cache/checkpoint_path/model.ckpt'" on default_config.yaml file.
  199. # Set "checkpoint_url=/The path of checkpoint in S3/" on default_config.yaml file.
  200. # b. Add "weight='/cache/checkpoint_path/model.ckpt'" on the website UI interface.
  201. # Add "checkpoint_url=/The path of checkpoint in S3/" on the website UI interface.
  202. # (4) Upload a zip dataset to S3 bucket. (you could also upload the origin dataset, but it can be so slow.)
  203. # (5) Set the code directory to "/path/FaceRecognition" on the website UI interface.
  204. # (6) Set the startup file to "eval.py" on the website UI interface.
  205. # (7) Set the "Dataset path" and "Output file path" and "Job log path" to your path on the website UI interface.
  206. # (8) Create your job.
  207. ```
  208. ### Convert model
  209. If you want to infer the network on Ascend 310, you should convert the model to AIR:
  210. ```bash
  211. cd ./scripts
  212. sh run_export.sh [BATCH_SIZE] [USE_DEVICE_ID] [PRETRAINED_BACKBONE]
  213. ```
  214. for example:
  215. ```bash
  216. cd ./scripts
  217. sh run_export.sh 16 0 ./0-1_1.ckpt
  218. ```
  219. # [Model Description](#contents)
  220. ## [Performance](#contents)
  221. ### Training Performance
  222. | Parameters | Face Recognition |
  223. | -------------------------- | ----------------------------------------------------------- |
  224. | Model Version | V1 |
  225. | Resource | Ascend 910; CPU 2.60GHz, 192cores; Memory 755G; OS Euler2.8 |
  226. | uploaded Date | 09/30/2020 (month/day/year) |
  227. | MindSpore Version | 1.0.0 |
  228. | Dataset | 4.7 million images |
  229. | Training Parameters | epoch=100, batch_size=192, momentum=0.9 |
  230. | Optimizer | Momentum |
  231. | Loss Function | Cross Entropy |
  232. | outputs | probability |
  233. | Speed | 1pc: 350-600 fps; 8pcs: 2500-4500 fps |
  234. | Total time | 1pc: NA hours; 8pcs: 10 hours |
  235. | Checkpoint for Fine tuning | 584M (.ckpt file) |
  236. ### Evaluation Performance
  237. | Parameters |Face Recognition For Tracking|
  238. | ------------------- | --------------------------- |
  239. | Model Version | V1 |
  240. | Resource | Ascend 910; OS Euler2.8 |
  241. | Uploaded Date | 09/30/2020 (month/day/year) |
  242. | MindSpore Version | 1.0.0 |
  243. | Dataset | 1.1 million images |
  244. | batch_size | 512 |
  245. | outputs | ACC |
  246. | ACC | 0.9 |
  247. | Model for inference | 584M (.ckpt file) |
  248. # [ModelZoo Homepage](#contents)
  249. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).