You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 1.6 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
  1. # Run distribute pretrain
  2. ## description
  3. The number of D chips can be automatically allocated based on the device_num set in hccl config file, You don not need to specify that.
  4. ## how to use
  5. For example, if we want to run the distributed training of Bert model on D chip, we can in `/bert/` dir:
  6. ```
  7. python model_zoo/utils/ascend_distributed_launcher/run_distribute_pretrain.py --run_script_dir ./run_pretrain.py --hyper_parameter_config_dir model_zoo/utils/ascend_distributed_launcher/hyper_parameter_config.ini --data_dir /path/dataset/ --hccl_config_dir model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
  8. ```
  9. output:
  10. ```
  11. hccl_config_dir: model_zoo/utils/hccl_tools/hccl_2p_56_x.x.x.x.json
  12. the number of logical core: 192
  13. avg_core_per_rank: 96
  14. rank_size: 2
  15. start training for rank 0, device 5:
  16. rank_id: 0
  17. device_id: 5
  18. core nums: 0-95
  19. epoch_size: 8
  20. data_dir: /data/small_512/
  21. schema_dir:
  22. log file dir: ./LOG5/log.txt
  23. start training for rank 1, device 6:
  24. rank_id: 1
  25. device_id: 6
  26. core nums: 96-191
  27. epoch_size: 8
  28. data_dir: /data/small_512/
  29. schema_dir:
  30. log file dir: ./LOG6/log.txt
  31. ```
  32. ## Note
  33. 1. Note that `hccl_2p_56_x.x.x.x.json` can use [hccl_tools.py](https://gitee.com/mindspore/mindspore/tree/master/model_zoo/utils/hccl_tools) to generate.
  34. 2. For hyper parameter, please note that you should customize the scripts `hyper_parameter_config.ini`. Please note that these two hyper parameters are not allowed to be configured here:
  35. device_id
  36. device_num
  37. 3. For Other Model, please note that you should customize the option `run_script` and Corresponding `hyper_parameter_config.ini`.