| @@ -49,6 +49,24 @@ You will get the accuracy as following: | |||||
| result: {'acc': 0.92} | result: {'acc': 0.92} | ||||
| ``` | ``` | ||||
| ### Distribute Training | |||||
| ``` | |||||
| sh run_distribute_train.sh rank_table.json your_data_path | |||||
| ``` | |||||
| The above shell script will run distribute training in the background, you can view the results through the file `train_parallel[X]/log`. | |||||
| You will get the loss value as following: | |||||
| ``` | |||||
| # grep "result: " train_parallel*/log | |||||
| train_parallel0/log:epoch: 1 step: 97, loss is 1.9060308 | |||||
| train_parallel0/log:epcoh: 2 step: 97, loss is 1.6003821 | |||||
| ... | |||||
| train_parallel1/log:epoch: 1 step: 97, loss is 1.7095519 | |||||
| train_parallel1/log:epcoh: 2 step: 97, loss is 1.7133579 | |||||
| ... | |||||
| ... | |||||
| ``` | |||||
| > About rank_table.json, you can refer to the [distributed training tutorial](https://www.mindspore.cn/tutorial/en/master/advanced_use/distributed_training.html). | |||||
| ## Usage: | ## Usage: | ||||
| @@ -75,4 +93,14 @@ parameters/options: | |||||
| --data_path the storage path of datasetd | --data_path the storage path of datasetd | ||||
| --device_id the device which used to evaluate model. | --device_id the device which used to evaluate model. | ||||
| --checkpoint_path the checkpoint file path used to evaluate model. | --checkpoint_path the checkpoint file path used to evaluate model. | ||||
| ``` | |||||
| ``` | |||||
| ### Distribute Training | |||||
| ``` | |||||
| Usage: sh run_distribute_train.sh [MINDSPORE_HCCL_CONFIG_PATH] [DATA_PATH] | |||||
| parameters/options: | |||||
| MINDSPORE_HCCL_CONFIG_PATH HCCL configuration file path. | |||||
| DATA_PATH the storage path of dataset. | |||||
| ``` | |||||