You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 30 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428
  1. # Contents
  2. - [BERT Description](#bert-description)
  3. - [Model Architecture](#model-architecture)
  4. - [Dataset](#dataset)
  5. - [Environment Requirements](#environment-requirements)
  6. - [Quick Start](#quick-start)
  7. - [Script Description](#script-description)
  8. - [Script and Sample Code](#script-and-sample-code)
  9. - [Script Parameters](#script-parameters)
  10. - [Dataset Preparation](#dataset-preparation)
  11. - [Training Process](#training-process)
  12. - [Evaluation Process](#evaluation-process)
  13. - [Model Description](#model-description)
  14. - [Performance](#performance)
  15. - [Training Performance](#training-performance)
  16. - [Evaluation Performance](#evaluation-performance)
  17. - [Description of Random Situation](#description-of-random-situation)
  18. - [ModelZoo Homepage](#modelzoo-homepage)
  19. # [BERT Description](#contents)
  20. The BERT network was proposed by Google in 2018. The network has made a breakthrough in the field of NLP. The network uses pre-training and fine-tune to achieve a large network structure without modifying the large network structure, and only by adding an output layer to achieve multiple text types tasks. The backbone code of BERT adopts the Encoder structure of Transformer. The attention mechanism is introduced to enable the output layer to capture high-latitude global semantic information. The pre-training uses denosing and self-encoding tasks, namely MLM(Masked Language Model) and NSP(Next Sentence Prediction). No need to label data, pre-training can be performed on massive text data, and a small amount of fine-tune can be used for downstream tasks to obtain better results. The pre-training plus fune-tune mode created by BERT is widely adopted by subsequent NLP networks.
  21. [Paper](https://arxiv.org/abs/1810.04805): Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding]((https://arxiv.org/abs/1810.04805)). arXiv preprint arXiv:1810.04805.
  22. [Paper](https://arxiv.org/abs/1909.00204): Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, Qun Liu. [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204). arXiv preprint arXiv:1909.00204.
  23. # [Model Architecture](#contents)
  24. The backbone structure of BERT is transformer. For BERT_base, the transformer contains 12 encoder modules, one encoder contains one selfattention module and one selfattention module contains one attention module; For BERT_NEZHA, the transformer contains 24 encoder modules, one encoder contains one selfattention module and one selfattention module contains one attention module. The difference between BERT_base and BERT_NEZHA is that BERT_base uses absolute position encoding to produce position embedding vector and BERT_NEZHA uses relative position encoding.
  25. # [Dataset](#contents)
  26. - Download the zhwiki or enwiki dataset for pre-training. Extract and refine texts in the dataset with [WikiExtractor](https://github.com/attardi/wikiextractor). Convert the dataset to TFRecord format. Please refer to create_pretraining_data.py file in [BERT](https://github.com/google-research/bert) repository.
  27. - Download dataset for fine-tuning and evaluation such as CLUENER, TNEWS, SQuAD v1.1, etc. Convert dataset files from json format to tfrecord format, please refer to run_classifier.py which in [BERT](https://github.com/google-research/bert) repository.
  28. # [Environment Requirements](#contents)
  29. - Hardware(Ascend)
  30. - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you can get access to the resources.
  31. - Framework
  32. - [MindSpore](https://gitee.com/mindspore/mindspore)
  33. - For more information, please check the resources below:
  34. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/en/master/index.html)
  35. - [MindSpore API](https://www.mindspore.cn/api/en/master/index.html)
  36. # [Quick Start](#contents)
  37. After installing MindSpore via the official website, you can start pre-training, fine-tuning and evaluation as follows:
  38. ```bash
  39. # run standalone pre-training example
  40. bash scripts/run_standalone_pretrain_ascend.sh 0 1 /path/cn-wiki-128
  41. # run distributed pre-training example
  42. bash scripts/run_distributed_pretrain_ascend.sh /path/cn-wiki-128 /path/hccl.json
  43. # run fine-tuning and evaluation example
  44. - If you are going to run a fine-tuning task, please prepare a checkpoint from pre-training.
  45. - Set bert network config and optimizer hyperparameters in `finetune_eval_config.py`.
  46. - Classification task: Set task related hyperparameters in scripts/run_classifier.sh.
  47. - Run `bash scripts/run_classifier.py` for fine-tuning of BERT-base and BERT-NEZHA model.
  48. bash scripts/run_classifier.sh
  49. - NER task: Set task related hyperparameters in scripts/run_ner.sh.
  50. - Run `bash scripts/run_ner.py` for fine-tuning of BERT-base and BERT-NEZHA model.
  51. bash scripts/run_ner.sh
  52. - SQuAD task: Set task related hyperparameters in scripts/run_squad.sh.
  53. - Run `bash scripts/run_squad.py` for fine-tuning of BERT-base and BERT-NEZHA model.
  54. bash scripts/run_squad.sh
  55. ```
  56. For distributed training, a hccl configuration file with JSON format needs to be created in advance.
  57. Please follow the instructions in the link below:
  58. https:gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools.
  59. # [Script Description](#contents)
  60. ## [Script and Sample Code](#contents)
  61. ```shell
  62. .
  63. └─bert
  64. ├─README.md
  65. ├─scripts
  66. ├─ascend_distributed_launcher
  67. ├─__init__.py
  68. ├─hyper_parameter_config.ini # hyper paramter for distributed pretraining
  69. ├─run_distribute_pretrain.py # script for distributed pretraining
  70. ├─README.md
  71. ├─run_classifier.sh # shell script for standalone classifier task
  72. ├─run_ner.sh # shell script for standalone NER task
  73. ├─run_squad.sh # shell script for standalone SQUAD task
  74. ├─run_standalone_pretrain_ascend.sh # shell script for standalone pretrain on ascend
  75. ├─run_distributed_pretrain_ascend.sh # shell script for distributed pretrain on ascend
  76. └─run_standaloned_pretrain_gpu.sh # shell script for distributed pretrain on gpu
  77. ├─src
  78. ├─__init__.py
  79. ├─assessment_method.py # assessment method for evaluation
  80. ├─bert_for_finetune.py # backbone code of network
  81. ├─bert_for_pre_training.py # backbone code of network
  82. ├─bert_model.py # backbone code of network
  83. ├─clue_classification_dataset_precess.py # data preprocessing
  84. ├─cluner_evaluation.py # evaluation for cluner
  85. ├─config.py # parameter configuration for pretraining
  86. ├─CRF.py # assessment method for clue dataset
  87. ├─dataset.py # data preprocessing
  88. ├─finetune_eval_config.py # parameter configuration for finetuning
  89. ├─finetune_eval_model.py # backbone code of network
  90. ├─fused_layer_norm.py # Layernormal is optimized for Ascend
  91. ├─sample_process.py # sample processing
  92. ├─utils.py # util function
  93. ├─pretrain_eval.py # train and eval net
  94. ├─run_classifier.py # finetune and eval net for classifier task
  95. ├─run_ner.py # finetune and eval net for ner task
  96. ├─run_pretrain.py # train net for pretraining phase
  97. └─run_squad.py # finetune and eval net for squad task
  98. ```
  99. ## [Script Parameters](#contents)
  100. ### Pre-Training
  101. ```
  102. usage: run_pretrain.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [--device_id N]
  103. [--enable_save_ckpt ENABLE_SAVE_CKPT] [--device_target DEVICE_TARGET]
  104. [--enable_lossscale ENABLE_LOSSSCALE] [--do_shuffle DO_SHUFFLE]
  105. [--enable_data_sink ENABLE_DATA_SINK] [--data_sink_steps N]
  106. [--accumulation_steps N]
  107. [--save_checkpoint_path SAVE_CHECKPOINT_PATH]
  108. [--load_checkpoint_path LOAD_CHECKPOINT_PATH]
  109. [--save_checkpoint_steps N] [--save_checkpoint_num N]
  110. [--data_dir DATA_DIR] [--schema_dir SCHEMA_DIR] [train_steps N]
  111. options:
  112. --device_target device where the code will be implemented: "Ascend" | "GPU", default is "Ascend"
  113. --distribute pre_training by serveral devices: "true"(training by more than 1 device) | "false", default is "false"
  114. --epoch_size epoch size: N, default is 1
  115. --device_num number of used devices: N, default is 1
  116. --device_id device id: N, default is 0
  117. --enable_save_ckpt enable save checkpoint: "true" | "false", default is "true"
  118. --enable_lossscale enable lossscale: "true" | "false", default is "true"
  119. --do_shuffle enable shuffle: "true" | "false", default is "true"
  120. --enable_data_sink enable data sink: "true" | "false", default is "true"
  121. --data_sink_steps set data sink steps: N, default is 1
  122. --accumulation_steps accumulate gradients N times before weight update: N, default is 1
  123. --save_checkpoint_path path to save checkpoint files: PATH, default is ""
  124. --load_checkpoint_path path to load checkpoint files: PATH, default is ""
  125. --save_checkpoint_steps steps for saving checkpoint files: N, default is 1000
  126. --save_checkpoint_num number for saving checkpoint files: N, default is 1
  127. --train_steps Training Steps: N, default is -1
  128. --data_dir path to dataset directory: PATH, default is ""
  129. --schema_dir path to schema.json file, PATH, default is ""
  130. ```
  131. ### Fine-Tuning and Evaluation
  132. ```
  133. usage: run_ner.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN] [----do_eval DO_EVAL]
  134. [--assessment_method ASSESSMENT_METHOD] [--use_crf USE_CRF]
  135. [--device_id N] [--epoch_num N] [--vocab_file_path VOCAB_FILE_PATH]
  136. [--label2id_file_path LABEL2ID_FILE_PATH]
  137. [--train_data_shuffle TRAIN_DATA_SHUFFLE]
  138. [--eval_data_shuffle EVAL_DATA_SHUFFLE]
  139. [--save_finetune_checkpoint_path SAVE_FINETUNE_CHECKPOINT_PATH]
  140. [--load_pretrain_checkpoint_path LOAD_PRETRAIN_CHECKPOINT_PATH]
  141. [--train_data_file_path TRAIN_DATA_FILE_PATH]
  142. [--eval_data_file_path EVAL_DATA_FILE_PATH]
  143. [--schema_file_path SCHEMA_FILE_PATH]
  144. options:
  145. --device_target device where the code will be implemented: "Ascend" | "GPU", default is "Ascend"
  146. --do_train whether to run training on training set: true | false
  147. --do_eval whether to run eval on dev set: true | false
  148. --assessment_method assessment method to do evaluation: f1 | clue_benchmark
  149. --use_crf whether to use crf to calculate loss: true | false
  150. --device_id device id to run task
  151. --epoch_num total number of training epochs to perform
  152. --num_class number of classes to do labeling
  153. --train_data_shuffle Enable train data shuffle, default is true
  154. --eval_data_shuffle Enable eval data shuffle, default is true
  155. --vocab_file_path the vocabulary file that the BERT model was trained on
  156. --label2id_file_path label to id json file
  157. --save_finetune_checkpoint_path path to save generated finetuning checkpoint
  158. --load_pretrain_checkpoint_path initial checkpoint (usually from a pre-trained BERT model)
  159. --load_finetune_checkpoint_path give a finetuning checkpoint path if only do eval
  160. --train_data_file_path ner tfrecord for training. E.g., train.tfrecord
  161. --eval_data_file_path ner tfrecord for predictions if f1 is used to evaluate result, ner json for predictions if clue_benchmark is used to evaluate result
  162. --schema_file_path path to datafile schema file
  163. usage: run_squad.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN] [----do_eval DO_EVAL]
  164. [--device_id N] [--epoch_num N] [--num_class N]
  165. [--vocab_file_path VOCAB_FILE_PATH]
  166. [--eval_json_path EVAL_JSON_PATH]
  167. [--train_data_shuffle TRAIN_DATA_SHUFFLE]
  168. [--eval_data_shuffle EVAL_DATA_SHUFFLE]
  169. [--save_finetune_checkpoint_path SAVE_FINETUNE_CHECKPOINT_PATH]
  170. [--load_pretrain_checkpoint_path LOAD_PRETRAIN_CHECKPOINT_PATH]
  171. [--load_finetune_checkpoint_path LOAD_FINETUNE_CHECKPOINT_PATH]
  172. [--train_data_file_path TRAIN_DATA_FILE_PATH]
  173. [--eval_data_file_path EVAL_DATA_FILE_PATH]
  174. [--schema_file_path SCHEMA_FILE_PATH]
  175. options:
  176. --device_target device where the code will be implemented: "Ascend" | "GPU", default is "Ascend"
  177. --do_train whether to run training on training set: true | false
  178. --do_eval whether to run eval on dev set: true | false
  179. --device_id device id to run task
  180. --epoch_num total number of training epochs to perform
  181. --num_class number of classes to classify, usually 2 for squad task
  182. --train_data_shuffle Enable train data shuffle, default is true
  183. --eval_data_shuffle Enable eval data shuffle, default is true
  184. --vocab_file_path the vocabulary file that the BERT model was trained on
  185. --eval_json_path path to squad dev json file
  186. --save_finetune_checkpoint_path path to save generated finetuning checkpoint
  187. --load_pretrain_checkpoint_path initial checkpoint (usually from a pre-trained BERT model)
  188. --load_finetune_checkpoint_path give a finetuning checkpoint path if only do eval
  189. --train_data_file_path squad tfrecord for training. E.g., train1.1.tfrecord
  190. --eval_data_file_path squad tfrecord for predictions. E.g., dev1.1.tfrecord
  191. --schema_file_path path to datafile schema file
  192. usage: run_classifier.py [--device_target DEVICE_TARGET] [--do_train DO_TRAIN] [----do_eval DO_EVAL]
  193. [--assessment_method ASSESSMENT_METHOD] [--device_id N] [--epoch_num N] [--num_class N]
  194. [--save_finetune_checkpoint_path SAVE_FINETUNE_CHECKPOINT_PATH]
  195. [--load_pretrain_checkpoint_path LOAD_PRETRAIN_CHECKPOINT_PATH]
  196. [--load_finetune_checkpoint_path LOAD_FINETUNE_CHECKPOINT_PATH]
  197. [--train_data_shuffle TRAIN_DATA_SHUFFLE]
  198. [--eval_data_shuffle EVAL_DATA_SHUFFLE]
  199. [--train_data_file_path TRAIN_DATA_FILE_PATH]
  200. [--eval_data_file_path EVAL_DATA_FILE_PATH]
  201. [--schema_file_path SCHEMA_FILE_PATH]
  202. options:
  203. --device_target targeted device to run task: Ascend | GPU
  204. --do_train whether to run training on training set: true | false
  205. --do_eval whether to run eval on dev set: true | false
  206. --assessment_method assessment method to do evaluation: accuracy | f1 | mcc | spearman_correlation
  207. --device_id device id to run task
  208. --epoch_num total number of training epochs to perform
  209. --num_class number of classes to do labeling
  210. --train_data_shuffle Enable train data shuffle, default is true
  211. --eval_data_shuffle Enable eval data shuffle, default is true
  212. --save_finetune_checkpoint_path path to save generated finetuning checkpoint
  213. --load_pretrain_checkpoint_path initial checkpoint (usually from a pre-trained BERT model)
  214. --load_finetune_checkpoint_path give a finetuning checkpoint path if only do eval
  215. --train_data_file_path tfrecord for training. E.g., train.tfrecord
  216. --eval_data_file_path tfrecord for predictions. E.g., dev.tfrecord
  217. --schema_file_path path to datafile schema file
  218. ```
  219. ## Options and Parameters
  220. Parameters for training and evaluation can be set in file `config.py` and `finetune_eval_config.py` respectively.
  221. ### Options:
  222. ```
  223. config for lossscale and etc.
  224. bert_network version of BERT model: base | nezha, default is base
  225. loss_scale_value initial value of loss scale: N, default is 2^32
  226. scale_factor factor used to update loss scale: N, default is 2
  227. scale_window steps for once updatation of loss scale: N, default is 1000
  228. optimizer optimizer used in the network: AdamWerigtDecayDynamicLR | Lamb | Momentum, default is "Lamb"
  229. ```
  230. ### Parameters:
  231. ```
  232. Parameters for dataset and network (Pre-Training/Fine-Tuning/Evaluation):
  233. batch_size batch size of input dataset: N, default is 16
  234. seq_length length of input sequence: N, default is 128
  235. vocab_size size of each embedding vector: N, must be consistant with the dataset you use. Default is 21136
  236. hidden_size size of bert encoder layers: N, default is 768
  237. num_hidden_layers number of hidden layers: N, default is 12
  238. num_attention_heads number of attention heads: N, default is 12
  239. intermediate_size size of intermediate layer: N, default is 3072
  240. hidden_act activation function used: ACTIVATION, default is "gelu"
  241. hidden_dropout_prob dropout probability for BertOutput: Q, default is 0.1
  242. attention_probs_dropout_prob dropout probability for BertAttention: Q, default is 0.1
  243. max_position_embeddings maximum length of sequences: N, default is 512
  244. type_vocab_size size of token type vocab: N, default is 16
  245. initializer_range initialization value of TruncatedNormal: Q, default is 0.02
  246. use_relative_positions use relative positions or not: True | False, default is False
  247. input_mask_from_dataset use the input mask loaded form dataset or not: True | False, default is True
  248. token_type_ids_from_dataset use the token type ids loaded from dataset or not: True | False, default is True
  249. dtype data type of input: mstype.float16 | mstype.float32, default is mstype.float32
  250. compute_type compute type in BertTransformer: mstype.float16 | mstype.float32, default is mstype.float16
  251. Parameters for optimizer:
  252. AdamWeightDecay:
  253. decay_steps steps of the learning rate decay: N
  254. learning_rate value of learning rate: Q
  255. end_learning_rate value of end learning rate: Q, must be positive
  256. power power: Q
  257. warmup_steps steps of the learning rate warm up: N
  258. weight_decay weight decay: Q
  259. eps term added to the denominator to improve numerical stability: Q
  260. Lamb:
  261. decay_steps steps of the learning rate decay: N
  262. learning_rate value of learning rate: Q
  263. end_learning_rate value of end learning rate: Q
  264. power power: Q
  265. warmup_steps steps of the learning rate warm up: N
  266. weight_decay weight decay: Q
  267. Momentum:
  268. learning_rate value of learning rate: Q
  269. momentum momentum for the moving average: Q
  270. ```
  271. ## [Training Process](#contents)
  272. ### Training
  273. #### running on Ascend
  274. ```
  275. bash scripts/run_standalone_pretrain_ascend.sh 0 1 /path/cn-wiki-128
  276. ```
  277. The command above will run in the background, you can view the results the file pretraining_log.txt. After training, you will get some checkpoint files under the script folder by default. The loss value will be achieved as follows:
  278. ```
  279. # grep "epoch" pretraining_log.txt
  280. epoch: 0.0, current epoch percent: 0.000, step: 1, outpus are (Tensor(shape=[1], dtype=Float32, [ 1.0856101e+01]), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536))
  281. epoch: 0.0, current epoch percent: 0.000, step: 2, outpus are (Tensor(shape=[1], dtype=Float32, [ 1.0821701e+01]), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536))
  282. ...
  283. ```
  284. ### Distributed Training
  285. #### running on Ascend
  286. ```
  287. bash scripts/run_distributed_pretrain_ascend.sh /path/cn-wiki-128 /path/hccl.json
  288. ```
  289. The command above will run in the background, you can view the results the file pretraining_log.txt. After training, you will get some checkpoint files under the LOG* folder by default. The loss value will be achieved as follows:
  290. ```
  291. # grep "epoch" LOG*/pretraining_log.txt
  292. epoch: 0.0, current epoch percent: 0.001, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, [ 1.08209e+01]), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536))
  293. epoch: 0.0, current epoch percent: 0.002, step: 200, outpus are (Tensor(shape=[1], dtype=Float32, [ 1.07566e+01]), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536))
  294. ...
  295. epoch: 0.0, current epoch percent: 0.001, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, [ 1.08218e+01]), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536))
  296. epoch: 0.0, current epoch percent: 0.002, step: 200, outpus are (Tensor(shape=[1], dtype=Float32, [ 1.07770e+01]), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536))
  297. ...
  298. ```
  299. ## [Evaluation Process](#contents)
  300. ### Evaluation
  301. #### evaluation on cola dataset when running on Ascend
  302. Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt".
  303. ```
  304. bash scripts/run_classifier.sh
  305. The command above will run in the background, you can view the results the file classfier_log.txt.
  306. If you choose accuracy as assessment method, the result will be as follows:
  307. acc_num XXX, total_num XXX, accuracy 0.588986
  308. ```
  309. #### evaluation on cluener dataset when running on Ascend
  310. ```
  311. bash scripts/ner.sh
  312. The command above will run in the background, you can view the results the file ner_log.txt.
  313. If you choose F1 as assessment method, the result will be as follows:
  314. Precision 0.920507
  315. Recall 0.948683
  316. F1 0.920507
  317. ```
  318. #### evaluation on squad v1.1 dataset when running on Ascend
  319. ```
  320. bash scripts/squad.sh
  321. The command above will run in the background, you can view the results the file squad_log.txt.
  322. The result will be as follows:
  323. {"exact_match": 80.3878923040233284, "f1": 87.6902384023850329}
  324. ```
  325. ## [Model Description](#contents)
  326. ## [Performance](#contents)
  327. ### Pretraining Performance
  328. | Parameters | BERT | BERT |
  329. | -------------------------- | ---------------------------------------------------------- | ------------------------- |
  330. | Model Version | base | base |
  331. | Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G |
  332. | uploaded Date | 08/22/2020 | 05/06/2020 |
  333. | MindSpore Version | 0.6.0 | 0.3.0 |
  334. | Dataset | cn-wiki-128 | ImageNet |
  335. | Training Parameters | src/config.py | src/config.py |
  336. | Optimizer | Lamb | Momentum |
  337. | Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
  338. | outputs | probability | |
  339. | Loss | | 1.913 |
  340. | Speed | 116.5 ms/step | 1.913 |
  341. | Total time | | |
  342. | Params (M) | 110M | |
  343. | Checkpoint for Fine tuning | 1.2G(.ckpt file) | |
  344. | Parameters | BERT | BERT |
  345. | -------------------------- | ---------------------------------------------------------- | ------------------------- |
  346. | Model Version | NEZHA | NEZHA |
  347. | Resource | Ascend 910, cpu:2.60GHz 56cores, memory:314G | NV SMX2 V100-32G |
  348. | uploaded Date | 08/20/2020 | 05/06/2020 |
  349. | MindSpore Version | 0.6.0 | 0.3.0 |
  350. | Dataset | cn-wiki-128 | ImageNet |
  351. | Training Parameters | src/config.py | src/config.py |
  352. | Optimizer | Lamb | Momentum |
  353. | Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy |
  354. | outputs | probability | |
  355. | Loss | | 1.913 |
  356. | Speed | | 1.913 |
  357. | Total time | | |
  358. | Params (M) | 340M | |
  359. | Checkpoint for Fine tuning | 3.2G(.ckpt file) | |
  360. #### Inference Performance
  361. | Parameters | | | |
  362. | -------------------------- | ----------------------------- | ------------------------- | -------------------- |
  363. | Model Version | V1 | | |
  364. | Resource | Huawei 910 | NV SMX2 V100-32G | Huawei 310 |
  365. | uploaded Date | 08/22/2020 | 05/22/2020 | |
  366. | MindSpore Version | 0.6.0 | 0.2.0 | 0.2.0 |
  367. | Dataset | cola, 1.2W | ImageNet, 1.2W | ImageNet, 1.2W |
  368. | batch_size | 32(1P) | 130(8P) | |
  369. | Accuracy | 0.588986 | ACC1[72.07%] ACC5[90.90%] | |
  370. | Speed | 59.25ms/step | | |
  371. | Total time | | | |
  372. | Model for inference | 1.2G(.ckpt file) | | |
  373. # [Description of Random Situation](#contents)
  374. In run_standalone_pretrain.sh and run_distributed_pretrain.sh, we set do_shuffle to shuffle the dataset.
  375. In run_classifier.sh, run_ner.sh and run_squad.sh, we set train_data_shuffle and eval_data_shuffle to shuffle the dataset.
  376. In config.py, we set the hidden_dropout_prob and attention_pros_dropout_prob to dropout some network node.
  377. In run_pretrain.py, we set the random seed to make sure distribute training has the same init weight.
  378. # [ModelZoo Homepage](#contents)
  379. Please check the official [homepage](https://gitee.com/mindspore/mindspore/tree/master/model_zoo).