Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
|
|
5 years ago | |
|---|---|---|
| .. | ||
| scripts | 5 years ago | |
| src | 5 years ago | |
| README.md | 5 years ago | |
| create_data.py | 5 years ago | |
| eval.py | 5 years ago | |
| train.py | 5 years ago | |
This example implements training and evaluation of Transformer Model, which is introduced in the following paper:
Notes:If you are running an evaluation task, prepare the corresponding checkpoint file.
.
└─Transformer
├─README.md
├─scripts
├─process_output.sh
├─replace-quote.perl
├─run_distribute_train.sh
└─run_standalone_train.sh
├─src
├─__init__.py
├─beam_search.py
├─config.py
├─dataset.py
├─eval_config.py
├─lr_schedule.py
├─process_output.py
├─tokenization.py
├─transformer_for_train.py
├─transformer_model.py
└─weight_init.py
├─create_data.py
├─eval.py
└─train.py
You may use this shell script to download and preprocess WMT English-German dataset. Assuming you get the following files:
Convert the original data to mindrecord for training:
paste train.tok.clean.bpe.32000.en train.tok.clean.bpe.32000.de > train.all
python create_data.py --input_file train.all --vocab_file vocab.bpe.32000 --output_file /path/ende-l128-mindrecord --max_seq_length 128
Convert the original data to mindrecord for evaluation:
paste newstest2014.tok.bpe.32000.en newstest2014.tok.bpe.32000.de > test.all
python create_data.py --input_file test.all --vocab_file vocab.bpe.32000 --output_file /path/newstest2014-l128-mindrecord --num_splits 1 --max_seq_length 128 --clip_to_max_len True
Set options in config.py, including loss_scale, learning rate and network hyperparameters. Click here for more information about dataset.
Run run_standalone_train.sh for non-distributed training of Transformer model.
sh scripts/run_standalone_train.sh DEVICE_ID EPOCH_SIZE DATA_PATH
Run run_distribute_train.sh for distributed training of Transformer model.
sh scripts/run_distribute_train.sh DEVICE_NUM EPOCH_SIZE DATA_PATH MINDSPORE_HCCL_CONFIG_PATH
Set options in eval_config.py. Make sure the 'data_file', 'model_file' and 'output_file' are set to your own path.
Run eval.py for evaluation of Transformer model.
python eval.py
Run process_output.sh to process the output token ids to get the real translation results.
sh scripts/process_output.sh REF_DATA EVAL_OUTPUT VOCAB_FILE
You will get two files, REF_DATA.forbleu and EVAL_OUTPUT.forbleu, for BLEU score calculation.
Calculate BLEU score, you may use this perl script and run following command to get the BLEU score.
perl multi-bleu.perl REF_DATA.forbleu < EVAL_OUTPUT.forbleu
usage: train.py [--distribute DISTRIBUTE] [--epoch_size N] [----device_num N] [--device_id N]
[--enable_save_ckpt ENABLE_SAVE_CKPT]
[--enable_lossscale ENABLE_LOSSSCALE] [--do_shuffle DO_SHUFFLE]
[--enable_data_sink ENABLE_DATA_SINK] [--save_checkpoint_steps N]
[--save_checkpoint_num N] [--save_checkpoint_path SAVE_CHECKPOINT_PATH]
[--data_path DATA_PATH]
options:
--distribute pre_training by serveral devices: "true"(training by more than 1 device) | "false", default is "false"
--epoch_size epoch size: N, default is 52
--device_num number of used devices: N, default is 1
--device_id device id: N, default is 0
--enable_save_ckpt enable save checkpoint: "true" | "false", default is "true"
--enable_lossscale enable lossscale: "true" | "false", default is "true"
--do_shuffle enable shuffle: "true" | "false", default is "true"
--enable_data_sink enable data sink: "true" | "false", default is "false"
--checkpoint_path path to load checkpoint files: PATH, default is ""
--save_checkpoint_steps steps for saving checkpoint files: N, default is 2500
--save_checkpoint_num number for saving checkpoint files: N, default is 30
--save_checkpoint_path path to save checkpoint files: PATH, default is "./checkpoint/"
--data_path path to dataset file: PATH, default is ""
It contains of parameters of Transformer model and options for training and evaluation, which is set in file config.py and evaluation_config.py respectively.
config.py:
transformer_network version of Transformer model: base | large, default is large
init_loss_scale_value initial value of loss scale: N, default is 2^10
scale_factor factor used to update loss scale: N, default is 2
scale_window steps for once updatation of loss scale: N, default is 2000
optimizer optimizer used in the network: Adam, default is "Adam"
eval_config.py:
transformer_network version of Transformer model: base | large, default is large
data_file data file: PATH
model_file checkpoint file to be loaded: PATH
output_file output file of evaluation: PATH
Parameters for dataset and network (Training/Evaluation):
batch_size batch size of input dataset: N, default is 96
seq_length length of input sequence: N, default is 128
vocab_size size of each embedding vector: N, default is 36560
hidden_size size of Transformer encoder layers: N, default is 1024
num_hidden_layers number of hidden layers: N, default is 6
num_attention_heads number of attention heads: N, default is 16
intermediate_size size of intermediate layer: N, default is 4096
hidden_act activation function used: ACTIVATION, default is "relu"
hidden_dropout_prob dropout probability for TransformerOutput: Q, default is 0.3
attention_probs_dropout_prob dropout probability for TransformerAttention: Q, default is 0.3
max_position_embeddings maximum length of sequences: N, default is 128
initializer_range initialization value of TruncatedNormal: Q, default is 0.02
label_smoothing label smoothing setting: Q, default is 0.1
input_mask_from_dataset use the input mask loaded form dataset or not: True | False, default is True
beam_width beam width setting: N, default is 4
max_decode_length max decode length in evaluation: N, default is 80
length_penalty_weight normalize scores of translations according to their length: Q, default is 1.0
compute_type compute type in Transformer: mstype.float16 | mstype.float32, default is mstype.float16
Parameters for learning rate:
learning_rate value of learning rate: Q
warmup_steps steps of the learning rate warm up: N
start_decay_step step of the learning rate to decay: N
min_lr minimal learning rate: Q
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
C++ Python Text Unity3D Asset C other