You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 14 kB

5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263
  1. ![](https://www.mindspore.cn/static/img/logo.a3e472c9.png)
  2. <!-- TOC -->
  3. - [GNMT v2 For MindSpore](#gnmt-v2-for-mindspore)
  4. - [Model Structure](#model-structure)
  5. - [Dataset](#dataset)
  6. - [Environment Requirements](#environment-requirements)
  7. - [Platform](#platform)
  8. - [Software](#software)
  9. - [Quick Start](#quick-start)
  10. - [Script Description](#script-description)
  11. - [Dataset Preparation](#dataset-preparation)
  12. - [Configuration File](#configuration-file)
  13. - [Training Process](#training-process)
  14. - [Evaluation Process](#evaluation-process)
  15. - [Model Description](#model-description)
  16. - [Performance](#performance)
  17. - [Result](#result)
  18. - [Training Performance](#training-performance)
  19. - [Inference Performance](#inference-performance)
  20. - [Practice](#practice)
  21. - [Dataset Preprocessing](#dataset-preprocessing)
  22. - [Training](#training-1)
  23. - [Inference](#inference-1)
  24. - [Random Situation Description](#random-situation-description)
  25. - [Others](#others)
  26. - [ModelZoo](#modelzoo)
  27. <!-- /TOC -->
  28. # GNMT v2 For MindSpore
  29. The GNMT v2 model is similar to the model described in [Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation](https://arxiv.org/abs/1609.08144), which is mainly used for corpus translation.
  30. # Model Structure
  31. The GNMTv2 model mainly consists of an encoder, a decoder, and an attention mechanism, where the encoder and the decoder use a shared word embedding vector.
  32. Encoder: consists of four long short-term memory (LSTM) layers. The first LSTM layer is bidirectional, while the other three layers are unidirectional.
  33. Decoder: consists of four unidirectional LSTM layers and a fully connected classifier. The output embedding dimension of LSTM is 1024.
  34. Attention mechanism: uses the standardized Bahdanau attention mechanism. First, the first layer output of the decoder is used as the input of the attention mechanism. Then, the computing result of the attention mechanism is connected to the input of the decoder LSTM, which is used as the input of the subsequent LSTM layer.
  35. # Dataset
  36. Note that you can run the scripts based on the dataset mentioned in original paper or widely used in relevant domain/network architecture. In the following sections, we will introduce how to run the scripts using the related dataset below.
  37. - *WMT Englis-German* for training.
  38. - *WMT newstest2014* for evaluation.
  39. # Environment Requirements
  40. ## Platform
  41. - Hardware (Ascend)
  42. - Prepare hardware environment with Ascend processor. If you want to try Ascend, please send the [application form](https://obs-9be7.obs.cn-east-2.myhuaweicloud.com/file/other/Ascend%20Model%20Zoo%E4%BD%93%E9%AA%8C%E8%B5%84%E6%BA%90%E7%94%B3%E8%AF%B7%E8%A1%A8.docx) to ascend@huawei.com. Once approved, you could get the resources for trial.
  43. - Framework
  44. - Install [MindSpore](https://www.mindspore.cn/install/en).
  45. - For more information, please check the resources below:
  46. - [MindSpore tutorials](https://www.mindspore.cn/tutorial/en/master/index.html)
  47. - [MindSpore API](https://www.mindspore.cn/api/en/master/index.html)
  48. ## Software
  49. ```txt
  50. numpy
  51. sacrebleu==1.2.10
  52. sacremoses==0.0.19
  53. subword_nmt==0.3.7
  54. ```
  55. # [Quick Start](#contents)
  56. After dataset preparation, you can start training and evaluation as follows:
  57. ```bash
  58. # run training example
  59. python train.py --config /home/workspace/gnmt_v2/config/config.json
  60. # run distributed training example
  61. cd ./scripts
  62. sh run_distributed_train_ascend.sh
  63. # run evaluation example
  64. cd ./scripts
  65. sh run_standalone_eval_ascend.sh
  66. ```
  67. # Script Description
  68. The GNMT network script and code result are as follows:
  69. ```text
  70. ├── gnmt
  71. ├── README.md // Introduction of GNMTv2 model.
  72. ├── config
  73. │ ├──__init__.py // User interface.
  74. │ ├──config.py // Configuration instance definition.
  75. │ ├──config.json // Configuration file for pre-train or finetune.
  76. │ ├──config_test.json // Configuration file for test.
  77. ├── src
  78. │ ├──__init__.py // User interface.
  79. │ ├──dataset
  80. │ ├──__init__.py // User interface.
  81. │ ├──base.py // Base class of data loader.
  82. │ ├──bi_data_loader.py // Bilingual data loader.
  83. │ ├──load_dataset.py // Dataset loader to feed into model.
  84. │ ├──schema.py // Define schema of mindrecord.
  85. │ ├──tokenizer.py // Tokenizer class.
  86. │ ├──gnmt_model
  87. │ ├──__init__.py // User interface.
  88. │ ├──attention.py // Bahdanau attention mechanism.
  89. │ ├──beam_search.py // Beam search decoder for inferring.
  90. │ ├──bleu_calculate.py // Calculat the blue accuracy.
  91. │ ├──components.py // Components.
  92. │ ├──create_attention.py // Recurrent attention.
  93. │ ├──create_attn_padding.py // Create attention paddings from input paddings.
  94. │ ├──decoder.py // GNMT decoder component.
  95. │ ├──decoder_beam_infer.py // GNMT decoder component for beam search.
  96. │ ├──dynamic_rnn.py // DynamicRNN.
  97. │ ├──embedding.py // Embedding component.
  98. │ ├──encoder.py // GNMT encoder component.
  99. │ ├──gnmt.py // GNMT model architecture.
  100. │ ├──gnmt_for_infer.py // Use GNMT to infer.
  101. │ ├──gnmt_for_train.py // Use GNMT to train.
  102. │ ├──grad_clip.py // Gradient clip
  103. │ ├──utils
  104. │ ├──__init__.py // User interface.
  105. │ ├──initializer.py // Parameters initializer.
  106. │ ├──load_weights.py // Load weights from a checkpoint or NPZ file.
  107. │ ├──loss_moniter.py // Callback of monitering loss during training step.
  108. │ ├──lr_scheduler.py // Learning rate scheduler.
  109. │ ├──optimizer.py // Optimizer.
  110. ├── scripts
  111. │ ├──run_distributed_train_ascend.sh // shell script for distributed train on ascend.
  112. │ ├──run_standalone_eval_ascend.sh // shell script for standalone eval on ascend.
  113. │ ├──run_standalone_train_ascend.sh // shell script for standalone eval on ascend.
  114. ├── create_dataset.py // dataset preparation.
  115. ├── eval.py // Infer API entry.
  116. ├── requirements.txt // Requirements of third party package.
  117. ├── train.py // Train API entry.
  118. ```
  119. ## Dataset Preparation
  120. You may use this [shell script](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Translation/GNMT/scripts/wmt16_en_de.sh) to download and preprocess WMT English-German dataset. Assuming you get the following files:
  121. - train.tok.clean.bpe.32000.en
  122. - train.tok.clean.bpe.32000.de
  123. - vocab.bpe.32000
  124. - bpe.32000
  125. - newstest2014.en
  126. - newstest2014.de
  127. - Convert the original data to tfrecord for training and evaluation:
  128. ``` bash
  129. python create_dataset.py --src_folder /home/workspace/wmt16_de_en --output_folder /home/workspace/dataset_menu
  130. ```
  131. ## Configuration File
  132. The JSON file in the `config/` directory is the template configuration file.
  133. Almost all required options and parameters can be easily assigned, including the training platform, dataset and model configuration, and optimizer parameters. By setting the corresponding options, you can also obtain optional functions such as loss scale and checkpoint.
  134. For more information about attributes, see the `config/config.py` file.
  135. ## Training Process
  136. The model training requires the shell script `scripts/run_standalone_train_ascend.sh`. In this script, set environment variables and the training script `train.py` to be executed in `gnmt_v2/`.
  137. Start task training on a single device and run the following command in bash:
  138. ```bash
  139. cd ./scripts
  140. sh run_standalone_train_ascend.sh
  141. ```
  142. or multiple devices
  143. Task training on multiple devices and run the following command in bash to be executed in `scripts/`.:
  144. ```bash
  145. cd ./scripts
  146. sh run_distributed_train_ascend.sh
  147. ```
  148. Note: Ensure that the hccl_json file is assigned when distributed training is running.
  149. Currently, inconsecutive device IDs are not supported in `scripts/run_distributed_train_ascend.sh`. The device ID must start from 0 in the `distribute_script/rank_table_8p.json` file.
  150. ## Evaluation Process
  151. Set options in `config/config_test.json`. Make sure the 'existed_ckpt', 'dataset_schema' and 'test_dataset' are set to your own path.
  152. Run `scripts/run_standalone_eval_ascend.sh` to process the output token ids to get the BLEU scores.
  153. ```bash
  154. cd ./scripts
  155. sh run_standalone_eval_ascend.sh
  156. ```
  157. # Model Description
  158. ## Performance
  159. ### Result
  160. #### Training Performance
  161. | Parameters | Ascend |
  162. | -------------------------- | -------------------------------------------------------------- |
  163. | Resource | Ascend 910 |
  164. | uploaded Date | 11/06/2020 (month/day/year) |
  165. | MindSpore Version | 1.0.0 |
  166. | Dataset | WMT Englis-German |
  167. | Training Parameters | epoch=6, batch_size=128 |
  168. | Optimizer | Adam |
  169. | Loss Function | Softmax Cross Entropy |
  170. | BLEU Score | 24.05 |
  171. | Speed | 344ms/step (8pcs) |
  172. | Loss | 63.35 |
  173. | Params (M) | 613 |
  174. | Checkpoint for inference | 1.8G (.ckpt file) |
  175. | Scripts | [gnmt_v2](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/official/nlp/gnmt_v2) |
  176. #### Inference Performance
  177. | Parameters | Ascend |
  178. | ------------------- | --------------------------- |
  179. | Resource | Ascend 910 |
  180. | Uploaded Date | 11/06/2020 (month/day/year) |
  181. | MindSpore Version | 1.0.0 |
  182. | Dataset | WMT newstest2014 |
  183. | batch_size | 128 |
  184. | outputs | BLEU score |
  185. | Accuracy | BLEU= 24.05 |
  186. ## Practice
  187. The process of GNMTv2 performing the text translation task is as follows:
  188. 1. Download the wmt16 data corpus and extract the dataset. For details, see the chapter "_Dataset_" above.
  189. 2. Dataset preprocessing.
  190. 3. Perform training.
  191. 4. Perform inference.
  192. ### Dataset Preprocessing
  193. For a pre-trained model, configure the following options in the `config.json` file:
  194. ```
  195. python create_dataset.py --src_folder /home/work_space/wmt16_de_en --output_folder /home/work_space/dataset_menu
  196. ```
  197. ### Training
  198. For a pre-trained model, configure the following options in the `config/config.json` file:
  199. - Assign `pre_train_dataset` and `dataset_schema` to the training dataset path.
  200. - Select an optimizer ('momentum/adam/lamb' is available).
  201. - Specify `ckpt_prefix` and `ckpt_path` in `checkpoint_path` to save the model file.
  202. - Set other parameters, including dataset configuration and network configuration.
  203. - If a pre-trained model exists, assign `existed_ckpt` to the path of the existing model during fine-tuning.
  204. Run the shell script `run.sh`:
  205. ```bash
  206. cd ./scripts
  207. sh run_standalone_train_ascend.sh
  208. ```
  209. ### Inference
  210. For inference using a trained model on multiple hardware platforms, such as GPU, Ascend 910, and Ascend 310, see [Network Migration](https://www.mindspore.cn/tutorial/en/master/advanced_use/network_migration.html).
  211. For inference interruption, configure the following options in the `config/config.json` file:
  212. - Assign `test_dataset` and the `dataset_schema` to the inference dataset path.
  213. - Assign `existed_ckpt` and the `checkpoint_path` to the path of the model file generated during training.
  214. - Set other parameters, including dataset configuration and network configuration.
  215. Run the shell script `run.sh`:
  216. ```bash
  217. cd ./scripts
  218. sh run_standalone_eval_ascend.sh
  219. ```
  220. # Random Situation Description
  221. There are three random situations:
  222. - Shuffle of the dataset.
  223. - Initialization of some model weights.
  224. - Dropout operations.
  225. Some seeds have already been set in train.py to avoid the randomness of dataset shuffle and weight initialization. If you want to disable dropout, please set the corresponding dropout_prob parameter to 0 in config/config.json.
  226. # Others
  227. This model has been validated in the Ascend environment and is not validated on the CPU and GPU.
  228. # ModelZoo 主页
  229. [链接](https://gitee.com/mindspore/mindspore/tree/master/mindspore/model_zoo)