You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

datasets.h 58 kB

added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
5 years ago
5 years ago
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046
  1. /**
  2. * Copyright 2020 Huawei Technologies Co., Ltd
  3. *
  4. * Licensed under the Apache License, Version 2.0 (the "License");
  5. * you may not use this file except in compliance with the License.
  6. * You may obtain a copy of the License at
  7. *
  8. * http://www.apache.org/licenses/LICENSE-2.0
  9. *
  10. * Unless required by applicable law or agreed to in writing, software
  11. * distributed under the License is distributed on an "AS IS" BASIS,
  12. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. * See the License for the specific language governing permissions and
  14. * limitations under the License.
  15. */
  16. #ifndef MINDSPORE_CCSRC_MINDDATA_DATASET_INCLUDE_DATASETS_H_
  17. #define MINDSPORE_CCSRC_MINDDATA_DATASET_INCLUDE_DATASETS_H_
  18. #include <sys/stat.h>
  19. #include <unistd.h>
  20. #include <map>
  21. #include <memory>
  22. #include <set>
  23. #include <string>
  24. #include <unordered_map>
  25. #include <unordered_set>
  26. #include <utility>
  27. #include <vector>
  28. #include "minddata/dataset/include/iterator.h"
  29. #include "minddata/dataset/include/samplers.h"
  30. #include "minddata/dataset/include/tensor.h"
  31. #include "minddata/dataset/include/text.h"
  32. #include "minddata/dataset/include/type_id.h"
  33. namespace mindspore {
  34. namespace dataset {
  35. class Tensor;
  36. class TensorRow;
  37. class TensorShape;
  38. class TreeAdapter;
  39. class TreeGetters;
  40. #ifndef ENABLE_ANDROID
  41. class Vocab;
  42. #endif
  43. class DatasetCache;
  44. class DatasetNode;
  45. class Iterator;
  46. class TensorOperation;
  47. class SchemaObj;
  48. class SamplerObj;
  49. class CsvBase;
  50. // Dataset classes (in alphabetical order)
  51. class BatchDataset;
  52. class MapDataset;
  53. class ProjectDataset;
  54. class ShuffleDataset;
  55. #ifndef ENABLE_ANDROID
  56. class BucketBatchByLengthDataset;
  57. class FilterDataset;
  58. class CSVDataset;
  59. class TransferDataset;
  60. class ConcatDataset;
  61. class RenameDataset;
  62. #endif
  63. #ifndef ENABLE_ANDROID
  64. class SentencePieceVocab;
  65. enum class SentencePieceModel;
  66. #endif
  67. class DSCallback;
  68. class RepeatDataset;
  69. #ifndef ENABLE_ANDROID
  70. class SkipDataset;
  71. class TakeDataset;
  72. class ZipDataset;
  73. #endif
  74. /// \class Dataset datasets.h
  75. /// \brief A base class to represent a dataset in the data pipeline.
  76. class Dataset : public std::enable_shared_from_this<Dataset> {
  77. public:
  78. // need friend class so they can access the children_ field
  79. friend class Iterator;
  80. friend class TransferNode;
  81. /// \brief Constructor
  82. Dataset();
  83. /// \brief Destructor
  84. ~Dataset() = default;
  85. /// \brief Gets the dataset size
  86. /// \param[in] estimate This is only supported by some of the ops and it's used to speed up the process of getting
  87. /// dataset size at the expense of accuracy.
  88. /// \return dataset size. If failed, return -1
  89. int64_t GetDatasetSize(bool estimate = false);
  90. /// \brief Gets the output type
  91. /// \return a vector of DataType. If failed, return an empty vector
  92. std::vector<DataType> GetOutputTypes();
  93. /// \brief Gets the output shape
  94. /// \return a vector of TensorShape. If failed, return an empty vector
  95. std::vector<TensorShape> GetOutputShapes();
  96. /// \brief Gets the batch size
  97. /// \return int64_t
  98. int64_t GetBatchSize();
  99. /// \brief Gets the repeat count
  100. /// \return int64_t
  101. int64_t GetRepeatCount();
  102. /// \brief Gets the number of classes
  103. /// \return number of classes. If failed, return -1
  104. int64_t GetNumClasses();
  105. /// \brief Gets the column names
  106. /// \return Names of the columns. If failed, return an empty vector
  107. std::vector<std::string> GetColumnNames();
  108. /// \brief Gets the class indexing
  109. /// \return a map of ClassIndexing. If failed, return an empty map
  110. std::vector<std::pair<std::string, std::vector<int32_t>>> GetClassIndexing();
  111. /// \brief Setter function for runtime number of workers
  112. /// \param[in] num_workers The number of threads in this operator
  113. /// \return Shared pointer to the original object
  114. std::shared_ptr<Dataset> SetNumWorkers(int32_t num_workers);
  115. /// \brief Function to create an Iterator over the Dataset pipeline
  116. /// \param[in] columns List of columns to be used to specify the order of columns
  117. /// \param[in] num_epochs Number of epochs to run through the pipeline, default -1 which means infinite epochs.
  118. /// An empty row is returned at the end of each epoch
  119. /// \return Shared pointer to the Iterator
  120. std::shared_ptr<Iterator> CreateIterator(std::vector<std::string> columns = {}, int32_t num_epochs = -1);
  121. #ifndef ENABLE_ANDROID
  122. /// \brief Function to transfer data through a device.
  123. /// \notes If device is Ascend, features of data will be transferred one by one. The limitation
  124. /// of data transmission per time is 256M.
  125. /// \param[in] queue_name Channel name (default="", create new unique name).
  126. /// \param[in] device_type Type of device (default="", get from MSContext).
  127. /// \param[in] num_epochs Number of epochs (default=-1, infinite epochs).
  128. /// \param[in] send_epoch_end Whether to send end of sequence to device or not (default=true).
  129. /// \param[in] total_batches Number of batches to be sent to the device (default=0, all data).
  130. /// \param[in] create_data_info_queue Whether to create queue which stores types and shapes
  131. /// of data or not(default=false).
  132. /// \return Returns true if no error encountered else false.
  133. bool DeviceQueue(std::string queue_name = "", std::string device_type = "", int32_t num_epochs = -1,
  134. bool send_epoch_end = true, int32_t total_batches = 0, bool create_data_info_queue = false);
  135. /// \brief Function to create a Saver to save the dynamic data processed by the dataset pipeline
  136. /// \note Usage restrictions:
  137. /// 1. Supported dataset formats: 'mindrecord' only
  138. /// 2. To save the samples in order, set dataset's shuffle to false and num_files to 1.
  139. /// 3. Before calling the function, do not use batch operator, repeat operator or data augmentation operators
  140. /// with random attribute in map operator.
  141. /// 4. Mindrecord does not support bool, uint64, multi-dimensional uint8(drop dimension) nor
  142. /// multi-dimensional string.
  143. /// \param[in] file_name Path to dataset file
  144. /// \param[in] num_files Number of dataset files (default=1)
  145. /// \param[in] file_type Dataset format (default="mindrecord")
  146. /// \return Returns true if no error encountered else false
  147. bool Save(std::string dataset_path, int32_t num_files = 1, std::string dataset_type = "mindrecord");
  148. #endif
  149. /// \brief Function to create a BatchDataset
  150. /// \notes Combines batch_size number of consecutive rows into batches
  151. /// \param[in] batch_size The number of rows each batch is created with
  152. /// \param[in] drop_remainder Determines whether or not to drop the last possibly incomplete
  153. /// batch. If true, and if there are less than batch_size rows
  154. /// available to make the last batch, then those rows will
  155. /// be dropped and not propagated to the next node
  156. /// \return Shared pointer to the current BatchDataset
  157. std::shared_ptr<BatchDataset> Batch(int32_t batch_size, bool drop_remainder = false);
  158. #ifndef ENABLE_ANDROID
  159. /// \brief Function to create a BucketBatchByLengthDataset
  160. /// \notes Bucket elements according to their lengths. Each bucket will be padded and batched when
  161. /// they are full.
  162. /// \param[in] column_names Columns passed to element_length_function
  163. /// \param[in] bucket_boundaries A list consisting of the upper boundaries of the buckets.
  164. /// Must be strictly increasing. If there are n boundaries, n+1 buckets are created: One bucket for
  165. /// [0, bucket_boundaries[0]), one bucket for [bucket_boundaries[i], bucket_boundaries[i+1]) for each
  166. /// 0<i<n, and one bucket for [bucket_boundaries[n-1], inf).
  167. /// \param[in] bucket_batch_sizes A list consisting of the batch sizes for each bucket.
  168. /// Must contain elements equal to the size of bucket_boundaries + 1.
  169. /// \param[in] element_length_function A function pointer that takes in TensorRow and outputs a TensorRow.
  170. /// The output must contain a single tensor containing a single int32_t. If no value is provided,
  171. /// then size of column_names must be 1, and the size of the first dimension of that column will be taken
  172. /// as the length (default=nullptr)
  173. /// \param[in] pad_info Represents how to batch each column. The key corresponds to the column name, the value must
  174. /// be a tuple of 2 elements. The first element corresponds to the shape to pad to, and the second element
  175. /// corresponds to the value to pad with. If a column is not specified, then that column will be padded to the
  176. /// longest in the current batch, and 0 will be used as the padding value. Any unspecified dimensions will be
  177. /// padded to the longest in the current batch, unless if pad_to_bucket_boundary is true. If no padding is
  178. /// wanted, set pad_info to None (default=empty dictionary).
  179. /// \param[in] pad_to_bucket_boundary If true, will pad each unspecified dimension in pad_info to the
  180. /// bucket_boundary minus 1. If there are any elements that fall into the last bucket,
  181. /// an error will occur (default=false).
  182. /// \param[in] drop_remainder If true, will drop the last batch for each bucket if it is not a full batch
  183. /// (default=false).
  184. /// \return Shared pointer to the current BucketBatchByLengthDataset
  185. std::shared_ptr<BucketBatchByLengthDataset> BucketBatchByLength(
  186. const std::vector<std::string> &column_names, const std::vector<int32_t> &bucket_boundaries,
  187. const std::vector<int32_t> &bucket_batch_sizes,
  188. std::function<TensorRow(TensorRow)> element_length_function = nullptr,
  189. const std::map<std::string, std::pair<TensorShape, std::shared_ptr<Tensor>>> &pad_info = {},
  190. bool pad_to_bucket_boundary = false, bool drop_remainder = false) {
  191. return std::make_shared<BucketBatchByLengthDataset>(shared_from_this(), column_names, bucket_boundaries,
  192. bucket_batch_sizes, element_length_function, pad_info,
  193. pad_to_bucket_boundary, drop_remainder);
  194. }
  195. /// \brief Function to create a SentencePieceVocab from source dataset
  196. /// \notes Build a SentencePieceVocab from a dataset.
  197. /// \param[in] col_names Column names to get words from. It can be a vector of column names
  198. /// \param[in] vocab_size Vocabulary size.
  199. /// \param[in] character_coverage Percentage of characters covered by the model, must be between
  200. /// 0.98 and 1.0 Good defaults are: 0.9995 for languages with rich character sets like
  201. /// Japanese or Chinese character sets, and 1.0 for other languages with small character sets.
  202. /// \param[in] model_type Model type. Choose from unigram (default), bpe, char, or word.
  203. /// The input sentence must be pretokenized when using word type.
  204. /// \param[in] params A vector contains more option parameters of sentencepiece library
  205. std::shared_ptr<SentencePieceVocab> BuildSentencePieceVocab(
  206. const std::vector<std::string> &col_names, int32_t vocab_size, float character_coverage,
  207. SentencePieceModel model_type, const std::unordered_map<std::string, std::string> &params);
  208. /// \brief Function to create a Vocab from source dataset
  209. /// \notes Build a vocab from a dataset. This would collect all the unique words in a dataset and return a vocab
  210. /// which contains top_k most frequent words (if top_k is specified)
  211. /// \param[in] columns Column names to get words from. It can be a vector of column names
  212. /// \param[in] freq_range A tuple of integers (min_frequency, max_frequency). Words within the frequency
  213. /// range would be kept. 0 <= min_frequency <= max_frequency <= total_words. min_frequency/max_frequency
  214. /// can be set to default, which corresponds to 0/total_words separately
  215. /// \param[in] top_k Number of words to be built into vocab. top_k most frequent words are
  216. /// taken. The top_k is taken after freq_range. If not enough top_k, all words will be taken
  217. /// \param[in] special_tokens A list of strings, each one is a special token
  218. /// \param[in] special_first Whether special_tokens will be prepended/appended to vocab, If special_tokens
  219. /// is specified and special_first is set to default, special_tokens will be prepended
  220. /// \return Shared pointer to the current Vocab
  221. std::shared_ptr<Vocab> BuildVocab(const std::vector<std::string> &columns = {},
  222. const std::pair<int64_t, int64_t> &freq_range = {0, kDeMaxFreq},
  223. int64_t top_k = kDeMaxTopk, const std::vector<std::string> &special_tokens = {},
  224. bool special_first = true);
  225. /// \brief Function to create a ConcatDataset
  226. /// \notes Concat the datasets in the input
  227. /// \param[in] datasets List of shared pointers to the dataset that should be concatenated together
  228. /// \return Shared pointer to the current ConcatDataset
  229. std::shared_ptr<ConcatDataset> Concat(const std::vector<std::shared_ptr<Dataset>> &datasets) {
  230. std::vector<std::shared_ptr<Dataset>> all_datasets{shared_from_this()};
  231. all_datasets.insert(std::end(all_datasets), std::begin(datasets), std::end(datasets));
  232. return std::make_shared<ConcatDataset>(all_datasets);
  233. }
  234. /// \brief Function to filter dataset by predicate
  235. /// \notes If input_columns is not provided or empty, all columns will be used
  236. /// \param[in] predicate Function callable which returns a boolean value. If false then filter the element
  237. /// \param[in] input_columns List of names of the input columns to filter
  238. /// \return Shared pointer to the current FilterNode
  239. std::shared_ptr<FilterDataset> Filter(std::function<TensorRow(TensorRow)> predicate,
  240. const std::vector<std::string> &input_columns = {}) {
  241. return std::make_shared<FilterDataset>(shared_from_this(), predicate, input_columns);
  242. }
  243. #endif
  244. /// \brief Function to create a MapDataset
  245. /// \notes Applies each operation in operations to this dataset
  246. /// \param[in] operations Vector of operations to be applied on the dataset. Operations are
  247. /// applied in the order they appear in this list
  248. /// \param[in] input_columns Vector of the names of the columns that will be passed to the first
  249. /// operation as input. The size of this list must match the number of
  250. /// input columns expected by the first operator. The default input_columns
  251. /// is the first column
  252. /// \param[in] output_columns Vector of names assigned to the columns outputted by the last operation
  253. /// This parameter is mandatory if len(input_columns) != len(output_columns)
  254. /// The size of this list must match the number of output columns of the
  255. /// last operation. The default output_columns will have the same
  256. /// name as the input columns, i.e., the columns will be replaced
  257. /// \param[in] project_columns A list of column names to project
  258. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  259. /// \return Shared pointer to the current MapDataset
  260. std::shared_ptr<MapDataset> Map(std::vector<std::shared_ptr<TensorOperation>> operations,
  261. const std::vector<std::string> &input_columns = {},
  262. const std::vector<std::string> &output_columns = {},
  263. const std::vector<std::string> &project_columns = {},
  264. const std::shared_ptr<DatasetCache> &cache = nullptr,
  265. std::vector<std::shared_ptr<DSCallback>> callbacks = {}) {
  266. return std::make_shared<MapDataset>(shared_from_this(), operations, input_columns, output_columns, project_columns,
  267. cache, callbacks);
  268. }
  269. /// \brief Function to create a Project Dataset
  270. /// \notes Applies project to the dataset
  271. /// \param[in] columns The name of columns to project
  272. /// \return Shared pointer to the current Dataset
  273. std::shared_ptr<ProjectDataset> Project(const std::vector<std::string> &columns) {
  274. return std::make_shared<ProjectDataset>(shared_from_this(), columns);
  275. }
  276. #ifndef ENABLE_ANDROID
  277. /// \brief Function to create a Rename Dataset
  278. /// \notes Renames the columns in the input dataset
  279. /// \param[in] input_columns List of the input columns to rename
  280. /// \param[in] output_columns List of the output columns
  281. /// \return Shared pointer to the current Dataset
  282. std::shared_ptr<RenameDataset> Rename(const std::vector<std::string> &input_columns,
  283. const std::vector<std::string> &output_columns) {
  284. return std::make_shared<RenameDataset>(shared_from_this(), input_columns, output_columns);
  285. }
  286. #endif
  287. /// \brief Function to create a RepeatDataset
  288. /// \notes Repeats this dataset count times. Repeat indefinitely if count is -1
  289. /// \param[in] count Number of times the dataset should be repeated
  290. /// \return Shared pointer to the current Dataset
  291. /// \note Repeat will return shared pointer to `Dataset` instead of `RepeatDataset`
  292. /// due to a limitation in the current implementation
  293. std::shared_ptr<RepeatDataset> Repeat(int32_t count = -1) {
  294. return std::make_shared<RepeatDataset>(shared_from_this(), count);
  295. }
  296. #ifndef ENABLE_ANDROID
  297. /// \brief Function to create a Shuffle Dataset
  298. /// \notes Randomly shuffles the rows of this dataset
  299. /// \param[in] buffer_size The size of the buffer (must be larger than 1) for shuffling
  300. /// \return Shared pointer to the current ShuffleDataset
  301. std::shared_ptr<ShuffleDataset> Shuffle(int32_t buffer_size) {
  302. return std::make_shared<ShuffleDataset>(shared_from_this(), buffer_size);
  303. }
  304. /// \brief Function to create a SkipDataset
  305. /// \notes Skips count elements in this dataset.
  306. /// \param[in] count Number of elements the dataset to be skipped.
  307. /// \return Shared pointer to the current SkipDataset
  308. std::shared_ptr<SkipDataset> Skip(int32_t count) { return std::make_shared<SkipDataset>(shared_from_this(), count); }
  309. /// \brief Function to create a TakeDataset
  310. /// \notes Takes count elements in this dataset.
  311. /// \param[in] count Number of elements the dataset to be taken.
  312. /// \return Shared pointer to the current Dataset
  313. std::shared_ptr<TakeDataset> Take(int32_t count = -1) {
  314. return std::make_shared<TakeDataset>(shared_from_this(), count);
  315. }
  316. /// \brief Function to create a Zip Dataset
  317. /// \notes Applies zip to the dataset
  318. /// \param[in] datasets A list of shared pointers to the datasets that we want to zip
  319. /// \return Shared pointer to the current Dataset
  320. std::shared_ptr<ZipDataset> Zip(const std::vector<std::shared_ptr<Dataset>> &datasets) {
  321. std::vector<std::shared_ptr<Dataset>> all_datasets = datasets;
  322. all_datasets.push_back(shared_from_this());
  323. return std::make_shared<ZipDataset>(all_datasets);
  324. }
  325. #endif
  326. std::shared_ptr<DatasetNode> IRNode() { return ir_node_; }
  327. protected:
  328. std::shared_ptr<TreeGetters> tree_getters_;
  329. std::shared_ptr<DatasetNode> ir_node_;
  330. };
  331. class SchemaObj {
  332. public:
  333. /// \brief Constructor
  334. explicit SchemaObj(const std::string &schema_file = "");
  335. /// \brief Destructor
  336. ~SchemaObj() = default;
  337. /// \brief SchemaObj Init function
  338. /// \return bool true if schema initialization is successful
  339. Status Init();
  340. /// \brief Add new column to the schema with unknown shape of rank 1
  341. /// \param[in] name Name of the column.
  342. /// \param[in] de_type Data type of the column(TypeId).
  343. /// \return Status code
  344. Status add_column(const std::string &name, TypeId de_type);
  345. /// \brief Add new column to the schema with unknown shape of rank 1
  346. /// \param[in] name Name of the column.
  347. /// \param[in] de_type Data type of the column(std::string).
  348. /// \param[in] shape Shape of the column.
  349. /// \return Status code
  350. Status add_column(const std::string &name, const std::string &de_type);
  351. /// \brief Add new column to the schema
  352. /// \param[in] name Name of the column.
  353. /// \param[in] de_type Data type of the column(TypeId).
  354. /// \param[in] shape Shape of the column.
  355. /// \return Status code
  356. Status add_column(const std::string &name, TypeId de_type, const std::vector<int32_t> &shape);
  357. /// \brief Add new column to the schema
  358. /// \param[in] name Name of the column.
  359. /// \param[in] de_type Data type of the column(std::string).
  360. /// \param[in] shape Shape of the column.
  361. /// \return Status code
  362. Status add_column(const std::string &name, const std::string &de_type, const std::vector<int32_t> &shape);
  363. /// \brief Get a JSON string of the schema
  364. /// \return JSON string of the schema
  365. std::string to_json();
  366. /// \brief Get a JSON string of the schema
  367. std::string to_string() { return to_json(); }
  368. /// \brief Set a new value to dataset_type
  369. inline void set_dataset_type(std::string dataset_type) { dataset_type_ = std::move(dataset_type); }
  370. /// \brief Set a new value to num_rows
  371. inline void set_num_rows(int32_t num_rows) { num_rows_ = num_rows; }
  372. /// \brief Get the current num_rows
  373. inline int32_t get_num_rows() const { return num_rows_; }
  374. /// \brief Get schema file from JSON file
  375. /// \param[in] json_string Name of JSON file to be parsed.
  376. /// \return Status code
  377. Status FromJSONString(const std::string &json_string);
  378. /// \brief Parse and add column information
  379. /// \param[in] json_string Name of JSON string for column dataset attribute information, decoded from schema file.
  380. /// \return Status code
  381. Status ParseColumnString(const std::string &json_string);
  382. private:
  383. /// \brief Parse the columns and add them to columns
  384. /// \param[in] columns Dataset attribution information, decoded from schema file.
  385. /// Support both nlohmann::json::value_t::array and nlohmann::json::value_t::onject.
  386. /// \return Status code
  387. Status parse_column(nlohmann::json columns);
  388. /// \brief Get schema file from JSON file
  389. /// \param[in] json_obj parsed JSON object
  390. /// \return Status code
  391. Status from_json(nlohmann::json json_obj);
  392. int32_t num_rows_;
  393. std::string dataset_type_;
  394. std::string schema_file_;
  395. nlohmann::json columns_;
  396. };
  397. class BatchDataset : public Dataset {
  398. public:
  399. BatchDataset(std::shared_ptr<Dataset> input, int32_t batch_size, bool drop_remainder = false);
  400. ~BatchDataset() = default;
  401. };
  402. #ifndef ENABLE_ANDROID
  403. class BucketBatchByLengthDataset : public Dataset {
  404. public:
  405. BucketBatchByLengthDataset(
  406. std::shared_ptr<Dataset> input, const std::vector<std::string> &column_names,
  407. const std::vector<int32_t> &bucket_boundaries, const std::vector<int32_t> &bucket_batch_sizes,
  408. std::function<TensorRow(TensorRow)> element_length_function = nullptr,
  409. const std::map<std::string, std::pair<TensorShape, std::shared_ptr<Tensor>>> &pad_info = {},
  410. bool pad_to_bucket_boundary = false, bool drop_remainder = false);
  411. ~BucketBatchByLengthDataset() = default;
  412. };
  413. class ConcatDataset : public Dataset {
  414. public:
  415. explicit ConcatDataset(const std::vector<std::shared_ptr<Dataset>> &input);
  416. ~ConcatDataset() = default;
  417. };
  418. class FilterDataset : public Dataset {
  419. public:
  420. FilterDataset(std::shared_ptr<Dataset> input, std::function<TensorRow(TensorRow)> predicate,
  421. const std::vector<std::string> &input_columns);
  422. ~FilterDataset() = default;
  423. };
  424. #endif
  425. class MapDataset : public Dataset {
  426. public:
  427. MapDataset(std::shared_ptr<Dataset> input, std::vector<std::shared_ptr<TensorOperation>> operations,
  428. const std::vector<std::string> &input_columns, const std::vector<std::string> &output_columns,
  429. const std::vector<std::string> &project_columns, const std::shared_ptr<DatasetCache> &cache,
  430. std::vector<std::shared_ptr<DSCallback>> callbacks);
  431. ~MapDataset() = default;
  432. };
  433. class ProjectDataset : public Dataset {
  434. public:
  435. ProjectDataset(std::shared_ptr<Dataset> input, const std::vector<std::string> &columns);
  436. ~ProjectDataset() = default;
  437. };
  438. #ifndef ENABLE_ANDROID
  439. class RenameDataset : public Dataset {
  440. public:
  441. RenameDataset(std::shared_ptr<Dataset> input, const std::vector<std::string> &input_columns,
  442. const std::vector<std::string> &output_columns);
  443. ~RenameDataset() = default;
  444. };
  445. #endif
  446. class RepeatDataset : public Dataset {
  447. public:
  448. RepeatDataset(std::shared_ptr<Dataset> input, int32_t count);
  449. ~RepeatDataset() = default;
  450. };
  451. class ShuffleDataset : public Dataset {
  452. public:
  453. ShuffleDataset(std::shared_ptr<Dataset> input, int32_t buffer_size);
  454. ~ShuffleDataset() = default;
  455. };
  456. #ifndef ENABLE_ANDROID
  457. class SkipDataset : public Dataset {
  458. public:
  459. SkipDataset(std::shared_ptr<Dataset> input, int32_t count);
  460. ~SkipDataset() = default;
  461. };
  462. class TakeDataset : public Dataset {
  463. public:
  464. TakeDataset(std::shared_ptr<Dataset> input, int32_t count);
  465. ~TakeDataset() = default;
  466. };
  467. class ZipDataset : public Dataset {
  468. public:
  469. explicit ZipDataset(const std::vector<std::shared_ptr<Dataset>> &inputs);
  470. ~ZipDataset() = default;
  471. };
  472. #endif
  473. /// \brief Function to create a SchemaObj
  474. /// \param[in] schema_file Path of schema file
  475. /// \return Shared pointer to the current schema
  476. std::shared_ptr<SchemaObj> Schema(const std::string &schema_file = "");
  477. class AlbumDataset : public Dataset {
  478. public:
  479. AlbumDataset(const std::string &dataset_dir, const std::string &data_schema,
  480. const std::vector<std::string> &column_names = {}, bool decode = false,
  481. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  482. const std::shared_ptr<DatasetCache> &cache = nullptr);
  483. ~AlbumDataset() = default;
  484. };
  485. /// \brief Function to create an AlbumDataset
  486. /// \notes The generated dataset is specified through setting a schema
  487. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  488. /// \param[in] data_schema Path to dataset schema file
  489. /// \param[in] column_names Column names used to specify columns to load, if empty, will read all columns.
  490. /// (default = {})
  491. /// \param[in] decode the option to decode the images in dataset (default = false)
  492. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  493. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  494. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  495. /// \return Shared pointer to the current Dataset
  496. std::shared_ptr<AlbumDataset> Album(const std::string &dataset_dir, const std::string &data_schema,
  497. const std::vector<std::string> &column_names = {}, bool decode = false,
  498. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  499. const std::shared_ptr<DatasetCache> &cache = nullptr);
  500. #ifndef ENABLE_ANDROID
  501. class CelebADataset : public Dataset {
  502. public:
  503. explicit CelebADataset(const std::string &dataset_dir, const std::string &usage = "all",
  504. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(), bool decode = false,
  505. const std::set<std::string> &extensions = {},
  506. const std::shared_ptr<DatasetCache> &cache = nullptr);
  507. ~CelebADataset() = default;
  508. };
  509. /// \brief Function to create a CelebADataset
  510. /// \notes The generated dataset has two columns ['image', 'attr'].
  511. /// The type of the image tensor is uint8. The attr tensor is uint32 and one hot type.
  512. /// \param[in] dataset_dir Path to the root directory that contains the dataset.
  513. /// \param[in] usage One of "all", "train", "valid" or "test" (default = "all").
  514. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  515. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  516. /// \param[in] decode Decode the images after reading (default=false).
  517. /// \param[in] extensions Set of file extensions to be included in the dataset (default={}).
  518. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  519. /// \return Shared pointer to the current Dataset
  520. std::shared_ptr<CelebADataset> CelebA(const std::string &dataset_dir, const std::string &usage = "all",
  521. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(), bool decode = false,
  522. const std::set<std::string> &extensions = {},
  523. const std::shared_ptr<DatasetCache> &cache = nullptr);
  524. class Cifar10Dataset : public Dataset {
  525. public:
  526. explicit Cifar10Dataset(const std::string &dataset_dir, const std::string &usage = "all",
  527. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  528. const std::shared_ptr<DatasetCache> &cache = nullptr);
  529. ~Cifar10Dataset() = default;
  530. };
  531. /// \brief Function to create a Cifar10 Dataset
  532. /// \notes The generated dataset has two columns ["image", "label"]
  533. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  534. /// \param[in] usage of CIFAR10, can be "train", "test" or "all" (default = "all").
  535. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  536. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  537. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  538. /// \return Shared pointer to the current Dataset
  539. std::shared_ptr<Cifar10Dataset> Cifar10(const std::string &dataset_dir, const std::string &usage = "all",
  540. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  541. const std::shared_ptr<DatasetCache> &cache = nullptr);
  542. class Cifar100Dataset : public Dataset {
  543. public:
  544. explicit Cifar100Dataset(const std::string &dataset_dir, const std::string &usage = "all",
  545. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  546. const std::shared_ptr<DatasetCache> &cache = nullptr);
  547. ~Cifar100Dataset() = default;
  548. };
  549. /// \brief Function to create a Cifar100 Dataset
  550. /// \notes The generated dataset has three columns ["image", "coarse_label", "fine_label"]
  551. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  552. /// \param[in] usage of CIFAR100, can be "train", "test" or "all" (default = "all").
  553. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  554. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  555. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  556. /// \return Shared pointer to the current Dataset
  557. std::shared_ptr<Cifar100Dataset> Cifar100(const std::string &dataset_dir, const std::string &usage = "all",
  558. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  559. const std::shared_ptr<DatasetCache> &cache = nullptr);
  560. class CLUEDataset : public Dataset {
  561. public:
  562. explicit CLUEDataset(const std::vector<std::string> &dataset_files, const std::string &task = "AFQMC",
  563. const std::string &usage = "train", int64_t num_samples = 0,
  564. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1, int32_t shard_id = 0,
  565. const std::shared_ptr<DatasetCache> &cache = nullptr);
  566. ~CLUEDataset() = default;
  567. };
  568. /// \brief Function to create a CLUEDataset
  569. /// \notes The generated dataset has a variable number of columns depending on the task and usage
  570. /// \param[in] dataset_files List of files to be read to search for a pattern of files. The list
  571. /// will be sorted in a lexicographical order.
  572. /// \param[in] task The kind of task, one of "AFQMC", "TNEWS", "IFLYTEK", "CMNLI", "WSC" and "CSL" (default="AFQMC").
  573. /// \param[in] usage Be used to "train", "test" or "eval" data (default="train").
  574. /// \param[in] num_samples The number of samples to be included in the dataset.
  575. /// (Default = 0 means all samples.)
  576. /// \param[in] shuffle The mode for shuffling data every epoch. (Default=ShuffleMode.kGlobal)
  577. /// Can be any of:
  578. /// ShuffleMode::kFalse - No shuffling is performed.
  579. /// ShuffleMode::kFiles - Shuffle files only.
  580. /// ShuffleMode::kGlobal - Shuffle both the files and samples.
  581. /// \param[in] num_shards Number of shards that the dataset should be divided into. (Default = 1)
  582. /// \param[in] shard_id The shard ID within num_shards. This argument should be
  583. /// specified only when num_shards is also specified. (Default = 0)
  584. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  585. /// \return Shared pointer to the current CLUEDataset
  586. std::shared_ptr<CLUEDataset> CLUE(const std::vector<std::string> &dataset_files, const std::string &task = "AFQMC",
  587. const std::string &usage = "train", int64_t num_samples = 0,
  588. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1,
  589. int32_t shard_id = 0, const std::shared_ptr<DatasetCache> &cache = nullptr);
  590. class CocoDataset : public Dataset {
  591. public:
  592. CocoDataset(const std::string &dataset_dir, const std::string &annotation_file, const std::string &task = "Detection",
  593. const bool &decode = false, const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  594. const std::shared_ptr<DatasetCache> &cache = nullptr);
  595. ~CocoDataset() = default;
  596. };
  597. /// \brief Function to create a CocoDataset
  598. /// \notes The generated dataset has multi-columns :
  599. /// - task='Detection', column: [['image', dtype=uint8], ['bbox', dtype=float32], ['category_id', dtype=uint32],
  600. /// ['iscrowd', dtype=uint32]].
  601. /// - task='Stuff', column: [['image', dtype=uint8], ['segmentation',dtype=float32], ['iscrowd', dtype=uint32]].
  602. /// - task='Keypoint', column: [['image', dtype=uint8], ['keypoints', dtype=float32],
  603. /// ['num_keypoints', dtype=uint32]].
  604. /// - task='Panoptic', column: [['image', dtype=uint8], ['bbox', dtype=float32], ['category_id', dtype=uint32],
  605. /// ['iscrowd', dtype=uint32], ['area', dtype=uitn32]].
  606. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  607. /// \param[in] annotation_file Path to the annotation json
  608. /// \param[in] task Set the task type of reading coco data, now support 'Detection'/'Stuff'/'Panoptic'/'Keypoint'
  609. /// \param[in] decode Decode the images after reading
  610. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  611. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  612. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  613. /// \return Shared pointer to the current Dataset
  614. std::shared_ptr<CocoDataset> Coco(const std::string &dataset_dir, const std::string &annotation_file,
  615. const std::string &task = "Detection", const bool &decode = false,
  616. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  617. const std::shared_ptr<DatasetCache> &cache = nullptr);
  618. class CSVDataset : public Dataset {
  619. public:
  620. explicit CSVDataset(const std::vector<std::string> &dataset_files, char field_delim = ',',
  621. const std::vector<std::shared_ptr<CsvBase>> &column_defaults = {},
  622. const std::vector<std::string> &column_names = {}, int64_t num_samples = 0,
  623. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1, int32_t shard_id = 0,
  624. const std::shared_ptr<DatasetCache> &cache = nullptr);
  625. ~CSVDataset() = default;
  626. };
  627. /// \brief Function to create a CSVDataset
  628. /// \notes The generated dataset has a variable number of columns
  629. /// \param[in] dataset_files List of files to be read to search for a pattern of files. The list
  630. /// will be sorted in a lexicographical order.
  631. /// \param[in] field_delim A char that indicates the delimiter to separate fields (default=',').
  632. /// \param[in] column_defaults List of default values for the CSV field (default={}). Each item in the list is
  633. /// either a valid type (float, int, or string). If this is not provided, treats all columns as string type.
  634. /// \param[in] column_names List of column names of the dataset (default={}). If this is not provided, infers the
  635. /// column_names from the first row of CSV file.
  636. /// \param[in] num_samples The number of samples to be included in the dataset.
  637. /// (Default = 0 means all samples.)
  638. /// \param[in] shuffle The mode for shuffling data every epoch. (Default=ShuffleMode::kGlobal)
  639. /// Can be any of:
  640. /// ShuffleMode::kFalse - No shuffling is performed.
  641. /// ShuffleMode::kFiles - Shuffle files only.
  642. /// ShuffleMode::kGlobal - Shuffle both the files and samples.
  643. /// \param[in] num_shards Number of shards that the dataset should be divided into. (Default = 1)
  644. /// \param[in] shard_id The shard ID within num_shards. This argument should be
  645. /// specified only when num_shards is also specified. (Default = 0)
  646. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  647. /// \return Shared pointer to the current Dataset
  648. std::shared_ptr<CSVDataset> CSV(const std::vector<std::string> &dataset_files, char field_delim = ',',
  649. const std::vector<std::shared_ptr<CsvBase>> &column_defaults = {},
  650. const std::vector<std::string> &column_names = {}, int64_t num_samples = 0,
  651. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1,
  652. int32_t shard_id = 0, const std::shared_ptr<DatasetCache> &cache = nullptr);
  653. class ImageFolderDataset : public Dataset {
  654. public:
  655. explicit ImageFolderDataset(const std::string &dataset_dir, bool decode = false,
  656. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  657. const std::set<std::string> &extensions = {},
  658. const std::map<std::string, int32_t> &class_indexing = {},
  659. const std::shared_ptr<DatasetCache> &cache = nullptr);
  660. ~ImageFolderDataset() = default;
  661. };
  662. /// \brief Function to create an ImageFolderDataset
  663. /// \notes A source dataset that reads images from a tree of directories
  664. /// All images within one folder have the same label
  665. /// The generated dataset has two columns ["image", "label"]
  666. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  667. /// \param[in] decode A flag to decode in ImageFolder
  668. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  669. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  670. /// \param[in] extensions File extensions to be read
  671. /// \param[in] class_indexing a class name to label map
  672. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  673. /// \return Shared pointer to the current ImageFolderDataset
  674. std::shared_ptr<ImageFolderDataset> ImageFolder(const std::string &dataset_dir, bool decode = false,
  675. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  676. const std::set<std::string> &extensions = {},
  677. const std::map<std::string, int32_t> &class_indexing = {},
  678. const std::shared_ptr<DatasetCache> &cache = nullptr);
  679. class ManifestDataset : public Dataset {
  680. public:
  681. explicit ManifestDataset(const std::string &dataset_file, const std::string &usage = "train",
  682. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  683. const std::map<std::string, int32_t> &class_indexing = {}, bool decode = false,
  684. const std::shared_ptr<DatasetCache> &cache = nullptr);
  685. ~ManifestDataset() = default;
  686. };
  687. /// \brief Function to create a ManifestDataset
  688. /// \notes The generated dataset has two columns ["image", "label"]
  689. /// \param[in] dataset_file The dataset file to be read
  690. /// \param[in] usage Need "train", "eval" or "inference" data (default="train")
  691. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  692. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  693. /// \param[in] class_indexing A str-to-int mapping from label name to index (default={}, the folder
  694. /// names will be sorted alphabetically and each class will be given a unique index starting from 0).
  695. /// \param[in] decode Decode the images after reading (default=false).
  696. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  697. /// \return Shared pointer to the current ManifestDataset
  698. std::shared_ptr<ManifestDataset> Manifest(const std::string &dataset_file, const std::string &usage = "train",
  699. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  700. const std::map<std::string, int32_t> &class_indexing = {},
  701. bool decode = false, const std::shared_ptr<DatasetCache> &cache = nullptr);
  702. class MindDataDataset : public Dataset {
  703. public:
  704. explicit MindDataDataset(const std::string &dataset_file, const std::vector<std::string> &columns_list = {},
  705. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  706. nlohmann::json padded_sample = nullptr, int64_t num_padded = 0);
  707. explicit MindDataDataset(const std::vector<std::string> &dataset_files,
  708. const std::vector<std::string> &columns_list = {},
  709. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  710. nlohmann::json padded_sample = nullptr, int64_t num_padded = 0);
  711. ~MindDataDataset() = default;
  712. };
  713. /// \brief Function to create a MindDataDataset
  714. /// \param[in] dataset_file File name of one component of a mindrecord source. Other files with identical source
  715. /// in the same path will be found and loaded automatically.
  716. /// \param[in] columns_list List of columns to be read (default={})
  717. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  718. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler()),
  719. /// supported sampler list: SubsetRandomSampler, PkSampler, RandomSampler, SequentialSampler, DistributedSampler.
  720. /// \param[in] padded_sample Samples will be appended to dataset, where keys are the same as column_list.
  721. /// \param[in] num_padded Number of padding samples. Dataset size plus num_padded should be divisible by num_shards.
  722. /// \return Shared pointer to the current MindDataDataset
  723. std::shared_ptr<MindDataDataset> MindData(const std::string &dataset_file,
  724. const std::vector<std::string> &columns_list = {},
  725. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  726. nlohmann::json padded_sample = nullptr, int64_t num_padded = 0);
  727. /// \brief Function to create a MindDataDataset
  728. /// \param[in] dataset_files List of dataset files to be read directly.
  729. /// \param[in] columns_list List of columns to be read (default={})
  730. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  731. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler()),
  732. /// supported sampler list: SubsetRandomSampler, PkSampler, RandomSampler, SequentialSampler, DistributedSampler.
  733. /// \param[in] padded_sample Samples will be appended to dataset, where keys are the same as column_list.
  734. /// \param[in] num_padded Number of padding samples. Dataset size plus num_padded should be divisible by num_shards.
  735. /// \return Shared pointer to the current MindDataDataset
  736. std::shared_ptr<MindDataDataset> MindData(const std::vector<std::string> &dataset_files,
  737. const std::vector<std::string> &columns_list = {},
  738. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  739. nlohmann::json padded_sample = nullptr, int64_t num_padded = 0);
  740. #endif
  741. class MnistDataset : public Dataset {
  742. public:
  743. explicit MnistDataset(const std::string &dataset_dir, const std::string &usage = "all",
  744. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  745. const std::shared_ptr<DatasetCache> &cache = nullptr);
  746. ~MnistDataset() = default;
  747. };
  748. /// \brief Function to create a MnistDataset
  749. /// \notes The generated dataset has two columns ["image", "label"]
  750. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  751. /// \param[in] usage of MNIST, can be "train", "test" or "all" (default = "all").
  752. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  753. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  754. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  755. /// \return Shared pointer to the current MnistDataset
  756. std::shared_ptr<MnistDataset> Mnist(const std::string &dataset_dir, const std::string &usage = "all",
  757. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  758. const std::shared_ptr<DatasetCache> &cache = nullptr);
  759. #ifndef ENABLE_ANDROID
  760. /// \brief Function to create a ConcatDataset
  761. /// \notes Reload "+" operator to concat two datasets
  762. /// \param[in] datasets1 Shared pointer to the first dataset to be concatenated
  763. /// \param[in] datasets2 Shared pointer to the second dataset to be concatenated
  764. /// \return Shared pointer to the current ConcatDataset
  765. std::shared_ptr<ConcatDataset> operator+(const std::shared_ptr<Dataset> &datasets1,
  766. const std::shared_ptr<Dataset> &datasets2);
  767. class RandomDataDataset : public Dataset {
  768. public:
  769. RandomDataDataset(const int32_t &total_rows, std::shared_ptr<SchemaObj> schema,
  770. const std::vector<std::string> &columns_list, std::shared_ptr<DatasetCache> cache);
  771. RandomDataDataset(const int32_t &total_rows, std::string schema_path, const std::vector<std::string> &columns_list,
  772. std::shared_ptr<DatasetCache> cache);
  773. ~RandomDataDataset() = default;
  774. };
  775. /// \brief Function to create a RandomDataset
  776. /// \param[in] total_rows Number of rows for the dataset to generate (default=0, number of rows is random)
  777. /// \param[in] schema SchemaObj to set column type, data type and data shape
  778. /// \param[in] columns_list List of columns to be read (default={}, read all columns)
  779. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  780. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  781. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  782. /// \return Shared pointer to the current Dataset
  783. template <typename T = std::shared_ptr<SchemaObj>>
  784. std::shared_ptr<RandomDataDataset> RandomData(const int32_t &total_rows = 0, const T &schema = nullptr,
  785. const std::vector<std::string> &columns_list = {},
  786. const std::shared_ptr<DatasetCache> &cache = nullptr) {
  787. std::shared_ptr<RandomDataDataset> ds;
  788. if constexpr (std::is_same<T, std::nullptr_t>::value || std::is_same<T, std::shared_ptr<SchemaObj>>::value) {
  789. std::shared_ptr<SchemaObj> schema_obj = schema;
  790. ds = std::make_shared<RandomDataDataset>(total_rows, std::move(schema_obj), std::move(columns_list), cache);
  791. } else {
  792. ds = std::make_shared<RandomDataDataset>(total_rows, std::move(schema), std::move(columns_list), cache);
  793. }
  794. return ds;
  795. }
  796. class TextFileDataset : public Dataset {
  797. public:
  798. explicit TextFileDataset(const std::vector<std::string> &dataset_files, int64_t num_samples = 0,
  799. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1, int32_t shard_id = 0,
  800. const std::shared_ptr<DatasetCache> &cache = nullptr);
  801. ~TextFileDataset() = default;
  802. };
  803. /// \brief Function to create a TextFileDataset
  804. /// \notes The generated dataset has one column ['text']
  805. /// \param[in] dataset_files List of files to be read to search for a pattern of files. The list
  806. /// will be sorted in a lexicographical order.
  807. /// \param[in] num_samples The number of samples to be included in the dataset.
  808. /// (Default = 0 means all samples.)
  809. /// \param[in] shuffle The mode for shuffling data every epoch. (Default=ShuffleMode.kGlobal)
  810. /// Can be any of:
  811. /// ShuffleMode.kFalse - No shuffling is performed.
  812. /// ShuffleMode.kFiles - Shuffle files only.
  813. /// ShuffleMode.kGlobal - Shuffle both the files and samples.
  814. /// \param[in] num_shards Number of shards that the dataset should be divided into. (Default = 1)
  815. /// \param[in] shard_id The shard ID within num_shards. This argument should be
  816. /// specified only when num_shards is also specified. (Default = 0)
  817. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  818. /// \return Shared pointer to the current TextFileDataset
  819. std::shared_ptr<TextFileDataset> TextFile(const std::vector<std::string> &dataset_files, int64_t num_samples = 0,
  820. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1,
  821. int32_t shard_id = 0, const std::shared_ptr<DatasetCache> &cache = nullptr);
  822. class TFRecordDataset : public Dataset {
  823. public:
  824. TFRecordDataset(const std::vector<std::string> &dataset_files, std::string schema,
  825. const std::vector<std::string> &columns_list, int64_t num_samples, ShuffleMode shuffle,
  826. int32_t num_shards, int32_t shard_id, bool shard_equal_rows, std::shared_ptr<DatasetCache> cache);
  827. /// \brief Constructor
  828. /// \note Parameter 'schema' is shared pointer to Schema object
  829. TFRecordDataset(const std::vector<std::string> &dataset_files, std::shared_ptr<SchemaObj> schema,
  830. const std::vector<std::string> &columns_list, int64_t num_samples, ShuffleMode shuffle,
  831. int32_t num_shards, int32_t shard_id, bool shard_equal_rows, std::shared_ptr<DatasetCache> cache);
  832. ~TFRecordDataset() = default;
  833. };
  834. /// \brief Function to create a TFRecordDataset
  835. /// \param[in] dataset_files List of files to be read to search for a pattern of files. The list
  836. /// will be sorted in a lexicographical order.
  837. /// \param[in] schema SchemaObj or string to schema path. (Default = nullptr, which means that the
  838. /// meta data from the TFData file is considered the schema.)
  839. /// \param[in] columns_list List of columns to be read. (Default = {}, read all columns)
  840. /// \param[in] num_samples The number of samples to be included in the dataset.
  841. /// (Default = 0 means all samples.)
  842. /// If num_samples is 0 and numRows(parsed from schema) does not exist, read the full dataset;
  843. /// If num_samples is 0 and numRows(parsed from schema) is greater than 0, read numRows rows;
  844. /// If both num_samples and numRows(parsed from schema) are greater than 0, read num_samples rows.
  845. /// \param[in] shuffle The mode for shuffling data every epoch. (Default = ShuffleMode::kGlobal)
  846. /// Can be any of:
  847. /// ShuffleMode::kFalse - No shuffling is performed.
  848. /// ShuffleMode::kFiles - Shuffle files only.
  849. /// ShuffleMode::kGlobal - Shuffle both the files and samples.
  850. /// \param[in] num_shards Number of shards that the dataset should be divided into. (Default = 1)
  851. /// \param[in] shard_id The shard ID within num_shards. This argument should be specified only
  852. /// when num_shards is also specified. (Default = 0)
  853. /// \param[in] shard_equal_rows Get equal rows for all shards. (Default = False, number of rows of
  854. /// each shard may be not equal)
  855. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  856. /// \return Shared pointer to the current TFRecordDataset
  857. template <typename T = std::shared_ptr<SchemaObj>>
  858. std::shared_ptr<TFRecordDataset> TFRecord(const std::vector<std::string> &dataset_files, const T &schema = nullptr,
  859. const std::vector<std::string> &columns_list = {}, int64_t num_samples = 0,
  860. ShuffleMode shuffle = ShuffleMode::kGlobal, int32_t num_shards = 1,
  861. int32_t shard_id = 0, bool shard_equal_rows = false,
  862. const std::shared_ptr<DatasetCache> &cache = nullptr) {
  863. std::shared_ptr<TFRecordDataset> ds = nullptr;
  864. if constexpr (std::is_same<T, std::nullptr_t>::value || std::is_same<T, std::shared_ptr<SchemaObj>>::value) {
  865. std::shared_ptr<SchemaObj> schema_obj = schema;
  866. ds = std::make_shared<TFRecordDataset>(dataset_files, schema_obj, columns_list, num_samples, shuffle, num_shards,
  867. shard_id, shard_equal_rows, cache);
  868. } else {
  869. std::string schema_path = schema;
  870. if (!schema_path.empty()) {
  871. struct stat sb;
  872. int rc = stat(common::SafeCStr(schema_path), &sb);
  873. if (rc == -1 && errno != ENOENT) {
  874. MS_LOG(WARNING) << "Unable to query the status of [" << schema_path << "]. Errno = " << errno << ".";
  875. }
  876. if (rc != 0) {
  877. MS_LOG(ERROR) << "TFRecordDataset: schema path [" << schema_path << "] is invalid or does not exist.";
  878. return nullptr;
  879. }
  880. }
  881. ds = std::make_shared<TFRecordDataset>(dataset_files, schema_path, columns_list, num_samples, shuffle, num_shards,
  882. shard_id, shard_equal_rows, cache);
  883. }
  884. return ds;
  885. }
  886. class VOCDataset : public Dataset {
  887. public:
  888. explicit VOCDataset(const std::string &dataset_dir, const std::string &task = "Segmentation",
  889. const std::string &usage = "train", const std::map<std::string, int32_t> &class_indexing = {},
  890. bool decode = false, const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  891. const std::shared_ptr<DatasetCache> &cache = nullptr);
  892. ~VOCDataset() = default;
  893. };
  894. /// \brief Function to create a VOCDataset
  895. /// \notes The generated dataset has multi-columns :
  896. /// - task='Detection', column: [['image', dtype=uint8], ['bbox', dtype=float32], ['label', dtype=uint32],
  897. /// ['difficult', dtype=uint32], ['truncate', dtype=uint32]].
  898. /// - task='Segmentation', column: [['image', dtype=uint8], ['target',dtype=uint8]].
  899. /// \param[in] dataset_dir Path to the root directory that contains the dataset
  900. /// \param[in] task Set the task type of reading voc data, now only support "Segmentation" or "Detection"
  901. /// \param[in] usage The type of data list text file to be read (default = "train").
  902. /// \param[in] class_indexing A str-to-int mapping from label name to index, only valid in "Detection" task
  903. /// \param[in] decode Decode the images after reading
  904. /// \param[in] sampler Object used to choose samples from the dataset. If sampler is not given,
  905. /// a `RandomSampler` will be used to randomly iterate the entire dataset (default = RandomSampler())
  906. /// \param[in] cache Tensor cache to use. (default=nullptr which means no cache is used).
  907. /// \return Shared pointer to the current Dataset
  908. std::shared_ptr<VOCDataset> VOC(const std::string &dataset_dir, const std::string &task = "Segmentation",
  909. const std::string &usage = "train",
  910. const std::map<std::string, int32_t> &class_indexing = {}, bool decode = false,
  911. const std::shared_ptr<SamplerObj> &sampler = RandomSampler(),
  912. const std::shared_ptr<DatasetCache> &cache = nullptr);
  913. /// \brief Function the create a cache to be attached to a dataset
  914. /// \param id A user assigned session id for the current pipeline.
  915. /// \param mem_sz Size of the memory set aside for the row caching (default=0 which means unlimited,
  916. /// note that it might bring in the risk of running out of memory on the machine).
  917. /// \param spill Spill to disk if out of memory (default=False).
  918. /// \param hostname optional host name (default="127.0.0.1").
  919. /// \param port optional port (default=50052).
  920. /// \param num_connections optional number of connections (default=12).
  921. /// \param prefetch_sz optional prefetch size (default=20).
  922. /// \return Shared pointer to DatasetCache. If error, nullptr is returned.
  923. std::shared_ptr<DatasetCache> CreateDatasetCache(session_id_type id, uint64_t mem_sz, bool spill,
  924. std::optional<std::string> hostname = std::nullopt,
  925. std::optional<int32_t> port = std::nullopt,
  926. std::optional<int32_t> num_connections = std::nullopt,
  927. std::optional<int32_t> prefetch_sz = std::nullopt);
  928. /// \brief Function to create a ZipDataset
  929. /// \notes Applies zip to the dataset
  930. /// \param[in] datasets List of shared pointers to the datasets that we want to zip
  931. /// \return Shared pointer to the current Dataset
  932. std::shared_ptr<ZipDataset> Zip(const std::vector<std::shared_ptr<Dataset>> &datasets);
  933. #endif
  934. } // namespace dataset
  935. } // namespace mindspore
  936. #endif // MINDSPORE_CCSRC_MINDDATA_DATASET_INCLUDE_DATASETS_H_