| @@ -411,20 +411,22 @@ epoch: 0.0, current epoch percent: 0.002, step: 200, outpus are (Tensor(shape=[1 | |||||
| Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt". | Before running the command below, please check the load pretrain checkpoint path has been set. Please set the checkpoint path to be the absolute full path, e.g:"/username/pretrain/checkpoint_100_300.ckpt". | ||||
| ``` | ``` | ||||
| bash scripts/run_classifier.sh | bash scripts/run_classifier.sh | ||||
| ``` | |||||
| The command above will run in the background, you can view training logs in classfier_log.txt. | The command above will run in the background, you can view training logs in classfier_log.txt. | ||||
| If you choose accuracy as assessment method, the result will be as follows: | If you choose accuracy as assessment method, the result will be as follows: | ||||
| ``` | |||||
| acc_num XXX, total_num XXX, accuracy 0.588986 | acc_num XXX, total_num XXX, accuracy 0.588986 | ||||
| ``` | ``` | ||||
| #### evaluation on cluener dataset when running on Ascend | #### evaluation on cluener dataset when running on Ascend | ||||
| ``` | ``` | ||||
| bash scripts/ner.sh | bash scripts/ner.sh | ||||
| ``` | |||||
| The command above will run in the background, you can view training logs in ner_log.txt. | The command above will run in the background, you can view training logs in ner_log.txt. | ||||
| If you choose F1 as assessment method, the result will be as follows: | If you choose F1 as assessment method, the result will be as follows: | ||||
| ``` | |||||
| Precision 0.920507 | Precision 0.920507 | ||||
| Recall 0.948683 | Recall 0.948683 | ||||
| F1 0.920507 | F1 0.920507 | ||||
| @@ -433,9 +435,10 @@ F1 0.920507 | |||||
| #### evaluation on squad v1.1 dataset when running on Ascend | #### evaluation on squad v1.1 dataset when running on Ascend | ||||
| ``` | ``` | ||||
| bash scripts/squad.sh | bash scripts/squad.sh | ||||
| ``` | |||||
| The command above will run in the background, you can view training logs in squad_log.txt. | The command above will run in the background, you can view training logs in squad_log.txt. | ||||
| The result will be as follows: | The result will be as follows: | ||||
| ``` | |||||
| {"exact_match": 80.3878923040233284, "f1": 87.6902384023850329} | {"exact_match": 80.3878923040233284, "f1": 87.6902384023850329} | ||||
| ``` | ``` | ||||
| @@ -1,11 +1,11 @@ | |||||
| # Run distribute pretrain | # Run distribute pretrain | ||||
| ## description | ## description | ||||
| The number of D chips can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that. | |||||
| The number of Ascend accelerators can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that. | |||||
| ## how to use | ## how to use | ||||
| For example, if we want to generate the launch command of the distributed training of Bert model on D chip, we can run the following command in `/bert/` dir: | |||||
| For example, if we want to generate the launch command of the distributed training of Bert model on Ascend accelerators, we can run the following command in `/bert/` dir: | |||||
| ``` | ``` | ||||
| python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json | python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json | ||||
| ``` | ``` | ||||
| @@ -59,7 +59,7 @@ def append_cmd_env(cmd, key, value): | |||||
| def distribute_pretrain(): | def distribute_pretrain(): | ||||
| """ | """ | ||||
| distribute pretrain scripts. The number of D chips can be automatically allocated | |||||
| distribute pretrain scripts. The number of Ascend accelerators can be automatically allocated | |||||
| based on the device_num set in hccl config file, You don not need to specify that. | based on the device_num set in hccl config file, You don not need to specify that. | ||||
| """ | """ | ||||
| cmd = "" | cmd = "" | ||||
| @@ -1,11 +1,11 @@ | |||||
| # Run distribute pretrain | # Run distribute pretrain | ||||
| ## description | ## description | ||||
| The number of D chips can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that. | |||||
| The number of Ascend accelerators can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that. | |||||
| ## how to use | ## how to use | ||||
| For example, if we want to generate the launch command of the distributed training of Bert model on D chip, we can run the following command in `/bert/` dir: | |||||
| For example, if we want to generate the launch command of the distributed training of Bert model on Ascend accelerators, we can run the following command in `/bert/` dir: | |||||
| ``` | ``` | ||||
| python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json | python ./scripts/ascend_distributed_launcher/get_distribute_pretrain_cmd.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir ./scripts/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json | ||||
| ``` | ``` | ||||
| @@ -59,7 +59,7 @@ def append_cmd_env(cmd, key, value): | |||||
| def distribute_pretrain(): | def distribute_pretrain(): | ||||
| """ | """ | ||||
| distribute pretrain scripts. The number of D chips can be automatically allocated | |||||
| distribute pretrain scripts. The number of Ascend accelerators can be automatically allocated | |||||
| based on the device_num set in hccl config file, You don not need to specify that. | based on the device_num set in hccl config file, You don not need to specify that. | ||||
| """ | """ | ||||
| cmd = "" | cmd = "" | ||||
| @@ -1,6 +1,6 @@ | |||||
| # description | # description | ||||
| mindspore distributed training launch helper utilty that will generate hccl config file. | |||||
| MindSpore distributed training launch helper utilty that will generate hccl config file. | |||||
| # use | # use | ||||
| @@ -14,4 +14,4 @@ hccl_[device_num]p_[which device]_[server_ip].json | |||||
| ``` | ``` | ||||
| # Note | # Note | ||||
| Please note that the D chips used must be continuous, such [0,4) means to use four chips 0,1,2,3; [0,1) means to use chip 0; The first four chips are a group, and the last four chips are a group. In addition to the [0,8) chips are allowed, other cross-group such as [3,6) are prohibited. | |||||
| Please note that the Ascend accelerators used must be continuous, such [0,4) means to use four chips 0,1,2,3; [0,1) means to use chip 0; The first four chips are a group, and the last four chips are a group. In addition to the [0,8) chips are allowed, other cross-group such as [3,6) are prohibited. | |||||
| @@ -37,7 +37,7 @@ def parse_args(): | |||||
| "helper utilty that will generate hccl" | "helper utilty that will generate hccl" | ||||
| " config file") | " config file") | ||||
| parser.add_argument("--device_num", type=str, default="[0,8)", | parser.add_argument("--device_num", type=str, default="[0,8)", | ||||
| help="The number of the D chip used. please note that the D chips" | |||||
| help="The number of the Ascend accelerators used. please note that the Ascend accelerators" | |||||
| "used must be continuous, such [0,4) means to use four chips " | "used must be continuous, such [0,4) means to use four chips " | ||||
| "0,1,2,3; [0,1) means to use chip 0; The first four chips are" | "0,1,2,3; [0,1) means to use chip 0; The first four chips are" | ||||
| "a group, and the last four chips are a group. In addition to" | "a group, and the last four chips are a group. In addition to" | ||||