You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 1.5 kB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546
  1. # MindRecord generating guidelines
  2. <!-- TOC -->
  3. - [MindRecord generating guidelines](#mindrecord-generating-guidelines)
  4. - [Create work space](#create-work-space)
  5. - [Implement data generator](#implement-data-generator)
  6. - [Run data generator](#run-data-generator)
  7. <!-- /TOC -->
  8. ## Create work space
  9. Assume the dataset name is 'xyz'
  10. * Create work space from template
  11. ```shell
  12. cd ${your_mindspore_home}/example/convert_to_mindrecord
  13. cp -r template xyz
  14. ```
  15. ## Implement data generator
  16. Edit dictionary data generator
  17. * Edit file
  18. ```shell
  19. cd ${your_mindspore_home}/example/convert_to_mindrecord
  20. vi xyz/mr_api.py
  21. ```
  22. Two API, 'mindrecord_task_number' and 'mindrecord_dict_data', must be implemented
  23. - 'mindrecord_task_number()' returns number of tasks. Return 1 if data row is generated serially. Return N if generator can be split into N parallel-run tasks.
  24. - 'mindrecord_dict_data(task_id)' yields dictionary data row by row. 'task_id' is 0..N-1, if N is return value of mindrecord_task_number()
  25. Tricky for parallel run
  26. - For imagenet, one directory can be a task.
  27. - For TFRecord with multiple files, each file can be a task.
  28. - For TFRecord with 1 file only, it could also be split into N tasks. Task_id=K means: data row is picked only if (count % N == K)
  29. ## Run data generator
  30. * run python script
  31. ```shell
  32. cd ${your_mindspore_home}/example/convert_to_mindrecord
  33. python writer.py --mindrecord_script imagenet [...]
  34. ```