|
|
|
@@ -162,7 +162,7 @@ usage: run_general_distill.py [--distribute DISTRIBUTE] [--epoch_size N] [---- |
|
|
|
|
|
|
|
options: |
|
|
|
--device_target device where the code will be implemented: "Ascend" | "GPU", default is "Ascend" |
|
|
|
--distribute pre_training by serveral devices: "true"(training by more than 1 device) | "false", default is "false" |
|
|
|
--distribute pre_training by several devices: "true"(training by more than 1 device) | "false", default is "false" |
|
|
|
--epoch_size epoch size: N, default is 1 |
|
|
|
--device_id device id: N, default is 0 |
|
|
|
--device_num number of used devices: N, default is 1 |
|
|
|
@@ -241,7 +241,7 @@ Parameters for optimizer: |
|
|
|
```text |
|
|
|
Parameters for bert network: |
|
|
|
seq_length length of input sequence: N, default is 128 |
|
|
|
vocab_size size of each embedding vector: N, must be consistant with the dataset you use. Default is 30522 |
|
|
|
vocab_size size of each embedding vector: N, must be consistent with the dataset you use. Default is 30522 |
|
|
|
hidden_size size of bert encoder layers: N |
|
|
|
num_hidden_layers number of hidden layers: N |
|
|
|
num_attention_heads number of attention heads: N, default is 12 |
|
|
|
@@ -275,8 +275,8 @@ The command above will run in the background, you can view the results the file |
|
|
|
|
|
|
|
```text |
|
|
|
# grep "epoch" log.txt |
|
|
|
epoch: 1, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, 28.2093), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
epoch: 2, step: 200, outpus are (Tensor(shape=[1], dtype=Float32, 30.1724), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
epoch: 1, step: 100, outputs are (Tensor(shape=[1], dtype=Float32, 28.2093), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
epoch: 2, step: 200, outputs are (Tensor(shape=[1], dtype=Float32, 30.1724), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
... |
|
|
|
``` |
|
|
|
|
|
|
|
@@ -294,7 +294,7 @@ The command above will run in the background, you can view the results the file |
|
|
|
|
|
|
|
```text |
|
|
|
# grep "epoch" log.txt |
|
|
|
epoch: 1, step: 100, outpus are 28.2093 |
|
|
|
epoch: 1, step: 100, outputs are 28.2093 |
|
|
|
... |
|
|
|
``` |
|
|
|
|
|
|
|
@@ -312,9 +312,9 @@ The command above will run in the background, you can view the results the file |
|
|
|
|
|
|
|
```text |
|
|
|
# grep "epoch" LOG*/log.txt |
|
|
|
epoch: 1, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, 28.1478), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
epoch: 1, step: 100, outputs are (Tensor(shape=[1], dtype=Float32, 28.1478), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
... |
|
|
|
epoch: 1, step: 100, outpus are (Tensor(shape=[1], dtype=Float32, 30.5901), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
epoch: 1, step: 100, outputs are (Tensor(shape=[1], dtype=Float32, 30.5901), Tensor(shape=[], dtype=Bool, False), Tensor(shape=[], dtype=Float32, 65536)) |
|
|
|
... |
|
|
|
``` |
|
|
|
|
|
|
|
@@ -330,7 +330,7 @@ The command above will run in the background, you can view the results the file |
|
|
|
|
|
|
|
```text |
|
|
|
# grep "epoch" LOG*/log.txt |
|
|
|
epoch: 1, step: 1, outpus are 63.4098 |
|
|
|
epoch: 1, step: 1, outputs are 63.4098 |
|
|
|
... |
|
|
|
``` |
|
|
|
|
|
|
|
@@ -410,7 +410,7 @@ The best acc is 0.891176 |
|
|
|
| Resource | Ascend 910, cpu:2.60GHz 192cores, memory:755G | NV SMX2 V100-32G, cpu:2.10GHz 64cores, memory:251G | |
|
|
|
| uploaded Date | 08/20/2020 | 08/24/2020 | |
|
|
|
| MindSpore Version | 1.0.0 | 1.0.0 | |
|
|
|
| Dataset | cn-wiki-128 | cn-wiki-128 | |
|
|
|
| Dataset | en-wiki-128 | en-wiki-128 | |
|
|
|
| Training Parameters | src/gd_config.py | src/gd_config.py | |
|
|
|
| Optimizer | AdamWeightDecay | AdamWeightDecay | |
|
|
|
| Loss Function | SoftmaxCrossEntropy | SoftmaxCrossEntropy | |
|
|
|
|