You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

wjtes20220926-log.txt 5.7 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
  1. /home/work
  2. start loading script
  3. finish loading script
  4. 2022/09/26 16:24:20 Start to download master.zip
  5. 2022/09/26 16:24:20 Total parts count 1
  6. 2022/09/26 16:24:21 part(1) finished
  7. 2022/09/26 16:24:21 Download object finished, downloadPath:/cache/code/master.zip
  8. panic: runtime error: index out of range [4] with length 4
  9. goroutine 1 [running]:
  10. main.main()
  11. /home/houysh/openi/lewis/sync_for_grampus/downloader_for_obs.go:41 +0x4e0
  12. unzip finished;start to exec code;
  13. do nothing
  14. [Modelarts Service Log]user: uid=1101(work) gid=1101(work) groups=1101(work),1000(HwHiAiUser)
  15. [Modelarts Service Log]pwd: /home/work
  16. [Modelarts Service Log]boot_file: /cache/code/npu_test/npu/train_for_c2net.py
  17. [Modelarts Service Log]log_url: /tmp/log/train.log
  18. [Modelarts Service Log]command: /cache/code/npu_test/npu/train_for_c2net.py
  19. [Modelarts Service Log]local_code_dir:
  20. [Modelarts Service Log]Training start at 2022-09-26-16:24:21
  21. [Modelarts Service Log][modelarts_create_log] modelarts-pipe found
  22. [ModelArts Service Log]modelarts-pipe: will create log file /tmp/log/train.log
  23. [Modelarts Service Log][modelarts_logger] modelarts-pipe found
  24. [ModelArts Service Log]modelarts-pipe: will create log file /tmp/log/train.log
  25. [ModelArts Service Log]modelarts-pipe: will write log file /tmp/log/train.log
  26. [ModelArts Service Log]modelarts-pipe: param for max log length: 1073741824
  27. [ModelArts Service Log]modelarts-pipe: param for whether exit on overflow: 0
  28. INFO:root:Using MoXing-v2.0.0.rc2.4b57a67b-4b57a67b
  29. INFO:root:Using OBS-Python-SDK-3.20.9.1
  30. [Modelarts Service Log]2022-09-26 16:24:22,746 - INFO - Ascend Driver: Version=22.0.0.3
  31. [Modelarts Service Log]2022-09-26 16:24:22,747 - INFO - you are advised to use ASCEND_DEVICE_ID env instead of DEVICE_ID, as the DEVICE_ID env will be discarded in later versions
  32. [Modelarts Service Log]2022-09-26 16:24:22,747 - INFO - particularly, ${ASCEND_DEVICE_ID} == ${DEVICE_ID}, it's the logical device id
  33. [Modelarts Service Log]2022-09-26 16:24:22,747 - INFO - Davinci training command
  34. [Modelarts Service Log]2022-09-26 16:24:22,747 - INFO - ['/usr/bin/python', '/cache/code/npu_test/npu/train_for_c2net.py']
  35. [Modelarts Service Log]2022-09-26 16:24:22,747 - INFO - Wait for Rank table file ready
  36. [Modelarts Service Log]2022-09-26 16:24:22,748 - INFO - Rank table file (K8S generated) is ready for read
  37. [Modelarts Service Log]2022-09-26 16:24:22,748 - INFO -
  38. {
  39. "status": "completed",
  40. "group_count": "1",
  41. "group_list": [
  42. {
  43. "group_name": "job-wjtes2022092616t2327",
  44. "device_count": "1",
  45. "instance_count": "1",
  46. "instance_list": [
  47. {
  48. "pod_name": "joba57ac677-job-wjtes2022092616t2327-0",
  49. "server_id": "192.168.0.189",
  50. "devices": [
  51. {
  52. "device_id": "3",
  53. "device_ip": "192.4.68.236"
  54. }
  55. ]
  56. }
  57. ]
  58. }
  59. ]
  60. }
  61. [Modelarts Service Log]2022-09-26 16:24:22,748 - INFO - Rank table file (C7x)
  62. [Modelarts Service Log]2022-09-26 16:24:22,748 - INFO -
  63. {
  64. "status": "completed",
  65. "version": "1.0",
  66. "server_count": "1",
  67. "server_list": [
  68. {
  69. "server_id": "192.168.0.189",
  70. "device": [
  71. {
  72. "device_id": "3",
  73. "device_ip": "192.4.68.236",
  74. "rank_id": "0"
  75. }
  76. ]
  77. }
  78. ]
  79. }
  80. [Modelarts Service Log]2022-09-26 16:24:22,749 - INFO - Rank table file (C7x) is generated
  81. [Modelarts Service Log]2022-09-26 16:24:22,749 - INFO - Current server
  82. [Modelarts Service Log]2022-09-26 16:24:22,749 - INFO -
  83. {
  84. "server_id": "192.168.0.189",
  85. "device": [
  86. {
  87. "device_id": "3",
  88. "device_ip": "192.4.68.236",
  89. "rank_id": "0"
  90. }
  91. ]
  92. }
  93. [Modelarts Service Log]2022-09-26 16:24:22,750 - INFO - bootstrap proc-rank-0-device-0
  94. args:
  95. Namespace(device_target='Ascend', epoch_size=5)
  96. Traceback (most recent call last):
  97. File "/cache/code/npu_test/npu/train_for_c2net.py", line 50, in <module>
  98. cfg.batch_size)
  99. File "/cache/code/npu_test/npu/dataset.py", line 32, in create_dataset
  100. mnist_ds = ds.MnistDataset(data_path)
  101. File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/dataset/engine/validators.py", line 343, in new_method
  102. check_dir(dataset_dir)
  103. File "/usr/local/ma/python3.7/lib/python3.7/site-packages/mindspore/dataset/core/validator_helpers.py", line 551, in check_dir
  104. raise ValueError("The folder {} does not exist or is not a directory or permission denied!".format(dataset_dir))
  105. ValueError: The folder /cache/dataset/train does not exist or is not a directory or permission denied!
  106. [Modelarts Service Log]2022-09-26 16:24:31,765 - ERROR - proc-rank-0-device-0 (pid: 159) has exited with non-zero code: 1
  107. [Modelarts Service Log]2022-09-26 16:24:31,765 - INFO - Begin destroy training processes
  108. [Modelarts Service Log]2022-09-26 16:24:31,765 - INFO - proc-rank-0-device-0 (pid: 159) has exited
  109. [Modelarts Service Log]2022-09-26 16:24:31,765 - INFO - End destroy training processes
  110. [ModelArts Service Log]modelarts-pipe: total length: 3763
  111. [Modelarts Service Log]Training end with return code: 1
  112. [Modelarts Service Log]Training end at 2022-09-26-16:24:31
  113. [Modelarts Service Log]Training completed.
  114. 2022/09/26 16:24:51 start uploading model
  115. 2022/09/26 16:24:51 file:train.log
  116. 2022/09/26 16:24:52 finish uploading model