You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

execution_tree.h 12 kB

5 years ago
5 years ago
5 years ago
added python api based on cpp api 1st draft of python iterator Added Cifar10 and Cifar100 pybind port Change pybind to use IR for Skip and Manifest Signed-off-by: alex-yuyue <yue.yu1@huawei.com> DatasetNode as a base for all IR nodes namespace change Fix the namespace issue and make ut tests work Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Add VOCDataset !63 Added RandomDataset * Added RandomDataset add imagefolder ir Pybind switch: CelebA and UT !61 CLUE example with class definition * Merge branch 'python-api' of gitee.com:ezphlow/mindspore into clue_class_pybind * Passing testcases * Added CLUE, not working add ManifestDataset IR Signed-off-by: alex-yuyue <yue.yu1@huawei.com> Update Coco & VOC & TFReader, Update clang-format, Reorder datasets_binding !69 Add Generator and move c_dataset.Iterator to dataset.Iterator * Add GeneratorDataset to c_dataset * Add GeneratorDataset to c_dataset !67 Moving c_datasets and adding sampler wrapper * Need to add create() method in datasets.py * migration from c_dataset to dataset part 1 !71 Fix indent error * Fix indentation error !72 Fix c_api tests cases * Fix c_api tests cases !73 Added CSV Dataset * Added CSVDataset pybind switch: Take and CelebA fixes !75 move c_dataset functionality to datasets * Fixed existing testcases * Added working clue and imagefolder * Added sampler conversion from pybind * Added sampler creation !77 Add Python API tree * Python API tree add minddataset TextFileDataset pybind Rename to skip test_concat.py and test_minddataset_exception.py !80 Add batch IR to python-api branch, most test cases work * staging III * staging, add pybind Enable more c_api take and CelebA tests; delete util_c_api !84 Schema changes in datasets.py * Schema changes !85 Remove input_indexes from sub-classes * remove input_index from each subclass !83 Remove C datasets * Removed c_dataset package * Remove c_datasets !82 pybind switch: shuffle * pybind switch: shuffle !86 Add build_vocab * Add build_vocab Rebase with upstream/master _shuffle conflict BatchNode error !88 Fix rebase problem * fix rebase problem Enable more unit tests; code typo/nit fixes !91 Fix python vocag hang * Fix python vocab hang !89 Added BucketBatchByLength Pybind switch * Added BucketBatchByLength Update and enable more tet_c_api_*.py tests !95 Add BuildSentencePeiceVocab * - Add BuildSentencePeiceVocab !96 Fix more tests * - Fix some tests - Enable more test_c_api_* - Add syncwait !99 pybind switch for device op * pybind switch for device op !93 Add getters to python API * Add getters to python API !101 Validate tree, error if graph * - Add sync wait !103 TFrecord/Random Datasets schema problem * - TfRecord/Random schem aproblem !102 Added filter pybind switch * Added Filter pybind switch !104 Fix num_samples * - TfRecord/Random schem aproblem !105 Fix to_device hang * Fix to_device hang !94 Adds Cache support for CLUE dataset * Added cache for all dataset ops * format change * Added CLUE cache support * Added Cache conversion Add save pybind fix compile err init modify concat_node !107 Fix some tests cases * Fix tests cases Enable and fix more tests !109 pybind switch for get dataset size * pybind_get_dataset_size some check-code fixes for pylint, cpplint and clang-format !113 Add callback * revert * dataset_sz 1 line * fix typo * get callback to work !114 Make Android compile clean * Make Android Compile Clean Fix build issues due to rebase !115 Fix more tests * Fix tests cases * !93 Add getters to python API fix test_profiling.py !116 fix get dataset size * fix get dataset size !117 GetColumnNames pybind switch * Added GetColumnNames pybind switch code-check fixes: clangformat, cppcheck, cpplint, pylint Delete duplicate test_c_api_*.py files; more lint fixes !121 Fix cpp tests * Remove extra call to getNext in cpp tests !122 Fix Schema with Generator * Fix Schema with Generator fix some cases of csv & mindrecord !124 fix tfrecord get_dataset_size and add some UTs * fix tfrecord get dataset size and add some ut for get_dataset_size !125 getter separation * Getter separation !126 Fix sampler.GetNumSamples * Fix sampler.GetNumSampler !127 Assign runtime getter to each get function * Assign runtime getter to each get function Fix compile issues !128 Match master code * Match master code !129 Cleanup DeviceOp/save code * Cleanup ToDevice/Save code !130 Add cache fix * Added cache fix for map and image folder !132 Fix testing team issues * Pass queue_name from python to C++ * Add Schema.from_json !131 Fix Cache op issues and delete de_pipeline * Roll back C++ change * Removed de_pipeline and passing all cache tests. * fixed cache tests !134 Cleanup datasets.py part1 * Cleanup dataset.py part1 !133 Updated validation for SentencePieceVocab.from_dataset * Added type_check for column names in SentencePieceVocab.from_dataset Rebase on master 181120 10:20 fix profiling temporary solution of catching stauts from Node.Build() !141 ToDevice Termination * ToDevice termination pylint fixes !137 Fix test team issues and add some corresponding tests * Fix test team issues and add some corresponding tests !138 TreeGetter changes to use OptPass * Getter changes to use OptPass (Zirui) Rebase fix !143 Fix cpplint issue * Fix cpplint issue pylint fixes in updated testcases !145 Reset exceptions testcase * reset exception test to master !146 Fix Check_Pylint Error * Fix Check_Pylint Error !147 fix android * fix android !148 ToDevice changes * Add ToDevice to the iterator List for cleanup at exit !149 Pylint issue * Add ToDevice to the iterator List for cleanup at exit !150 Pylint 2 * Add ToDevice to the iterator List for cleanup at exit !152 ExecutionTree error * ET destructor error !153 in getter_pass, only remove callback, without deleting map op * getter pass no longer removes map !156 early __del__ of iterator/to_device * early __del__ of iterator !155 Address review comments Eric 1 * Added one liner fix to validators.py * roll back signature fix * lint fix * Eric Address comments 2 * C++ lint fix * Address comments Eric 1 !158 Review rework for dataset bindings - part 1 * Reorder nodes repeat and rename * Review rework for dataset bindings - part 1 !154 Fixing minor problems in the comments (datasets.py, python_tree_consumer.cc, iterators_bindings.cc, and iterators.py) * Fixing minor problems in the comments (datasets.py, python_tree_consum… !157 add replace none * Add replace_none to datasets.py, address comments in tests Trying to resolve copy Override the deepcopy method of deviceop Create_ir_tree method Create_ir_tree method 2 Create_ir_tree method 2 del to_device if already exists del to_device if already exists cache getters shapes and types Added yolov3 relaxation, to be rolled back Get shapes and types together bypass yolo NumWorkers for MapOp revert Yolo revert Thor Print more info Debug code: Update LOG INFO to LOG ERROR do not remove epochctrl for getter pass Remove repeat(1) pritn batch size add log to tree_consumer and device_queue op Revert PR 8744 Signed-off-by: alex-yuyue <yue.yu1@huawei.com> __del__ toDEvice __del__ toDevice2 !165 add ifndef ENABLE_ANDROID to device queue print * Add ifndef ENABLE_ANDROID to device queue print revert some changes !166 getter: get_data_info * getter: get_data_info !168 add back tree print * revert info to warnning in one log * add back the missed print tree log Release GIL in GetDataInfo
5 years ago
5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288
  1. /**
  2. * Copyright 2019 Huawei Technologies Co., Ltd
  3. *
  4. * Licensed under the Apache License, Version 2.0 (the "License");
  5. * you may not use this file except in compliance with the License.
  6. * You may obtain a copy of the License at
  7. *
  8. * http://www.apache.org/licenses/LICENSE-2.0
  9. *
  10. * Unless required by applicable law or agreed to in writing, software
  11. * distributed under the License is distributed on an "AS IS" BASIS,
  12. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. * See the License for the specific language governing permissions and
  14. * limitations under the License.
  15. */
  16. #ifndef MINDSPORE_CCSRC_MINDDATA_DATASET_ENGINE_EXECUTION_TREE_H_
  17. #define MINDSPORE_CCSRC_MINDDATA_DATASET_ENGINE_EXECUTION_TREE_H_
  18. #include <functional>
  19. #include <memory>
  20. #include <stack>
  21. #include <string>
  22. #include <vector>
  23. #ifndef ENABLE_ANDROID
  24. #if !defined(_WIN32) && !defined(_WIN64)
  25. #include <sys/sysinfo.h>
  26. #include <opencv2/imgproc/imgproc.hpp>
  27. #endif
  28. #endif
  29. #include "minddata/dataset/engine/datasetops/dataset_op.h"
  30. #include "minddata/dataset/util/status.h"
  31. #include "mindspore/ccsrc/minddata/dataset/engine/perf/profiling.h"
  32. namespace mindspore {
  33. namespace dataset {
  34. // Forward declares
  35. class TaskGroup;
  36. class DatasetOp;
  37. class Pass;
  38. using OptPass = std::vector<std::unique_ptr<Pass>>;
  39. class ExecutionTree {
  40. public:
  41. // Prepare flags used during tree prepare phase
  42. enum PrepareFlags {
  43. kDePrepNone = 0,
  44. kDePrepRepeat = 1, // Processing a repeat operation
  45. kDePrepCache = 2 // Processing a cache operation
  46. };
  47. // State flags for the lifecycle of the tree
  48. enum TreeState {
  49. kDeTStateInit = 0, // The freshly initialized state after construction
  50. kDeTStateBuilding, // The tree is being built, nodes are being added
  51. kDeTStatePrepare, // The tree has been assigned a root node and is pending prepare
  52. kDeTStateReady, // The tree has been prepared and is ready to be launched
  53. kDeTStateExecuting, // The tree has been launched and is executing
  54. kDeTStateEpochEnd, // The tree has been received end of epoch signal, just for profiling
  55. kDeTStateFinished // The tree has been drained, dataset iterator received EOF
  56. };
  57. class Iterator {
  58. public:
  59. // Constructor
  60. // @param root The root node to start iterating from
  61. explicit Iterator(const std::shared_ptr<DatasetOp> &root = nullptr);
  62. // Destructor
  63. ~Iterator() {}
  64. Iterator &operator++() {
  65. ++ind_;
  66. return *this;
  67. } // prefix ++ overload
  68. Iterator operator++(int) {
  69. Iterator it = *this;
  70. it.ind_ = ind_;
  71. ind_++;
  72. return it;
  73. } // post-fix ++ overload
  74. Iterator &operator--() {
  75. --ind_;
  76. return *this;
  77. } // prefix -- overload
  78. Iterator operator--(int) {
  79. Iterator it = *this;
  80. it.ind_ = ind_;
  81. ind_--;
  82. return it;
  83. } // post-fix -- overload
  84. DatasetOp &operator*() { return *nodes_[ind_]; } // dereference operator
  85. std::shared_ptr<DatasetOp> operator->() { return nodes_[ind_]; }
  86. // getter function
  87. // @return Shared pointer to the current operator
  88. std::shared_ptr<DatasetOp> get() { return nodes_[ind_]; }
  89. bool operator==(const Iterator &rhs) { return nodes_[ind_] == rhs.nodes_[rhs.ind_]; }
  90. bool operator!=(const Iterator &rhs) { return nodes_[ind_] != rhs.nodes_[rhs.ind_]; }
  91. int32_t NumNodes() { return nodes_.size(); }
  92. private:
  93. int32_t ind_; // the cur node our Iterator points to
  94. std::vector<std::shared_ptr<DatasetOp>> nodes_; // store the nodes in post order
  95. void PostOrderTraverse(const std::shared_ptr<DatasetOp> &);
  96. };
  97. // Constructor
  98. ExecutionTree();
  99. // Destructor
  100. ~ExecutionTree();
  101. // Associates a DatasetOp with this tree. This assigns a valid node id to the operator and
  102. // provides it with a link to the tree. A node cannot form any relationships (parent/child) with
  103. // other nodes unless they are associated with the same tree.
  104. // @param op - The operator to associate
  105. // @return Status - The error code return
  106. Status AssociateNode(const std::shared_ptr<DatasetOp> &op);
  107. // Sets the root node of the tree
  108. // @param op - The operator to assign as root
  109. // @return Status - The error code return
  110. Status AssignRoot(const std::shared_ptr<DatasetOp> &op);
  111. // Start the execution of the tree
  112. // @return Status - The error code return
  113. Status Launch();
  114. /// A print method typically used for debugging
  115. /// \param out - The output stream to write output to
  116. void Print(std::ostream &out, const std::shared_ptr<DatasetOp> &op = nullptr) const;
  117. // Returns an iterator positioned at the start
  118. // @return Iterator - The iterator
  119. ExecutionTree::Iterator begin(const std::shared_ptr<DatasetOp> &root = nullptr) const {
  120. return Iterator(root == nullptr ? root_ : root);
  121. }
  122. // Returns an iterator positioned at the end
  123. // @return Iterator - The iterator
  124. ExecutionTree::Iterator end() const { return Iterator(nullptr); }
  125. // << Stream output operator overload
  126. // @notes This allows you to write the debug print info using stream operators
  127. // @param out - reference to the output stream being overloaded
  128. // @param exe_tree - reference to the execution tree to display
  129. // @return - the output stream must be returned
  130. friend std::ostream &operator<<(std::ostream &out, ExecutionTree &exe_tree) {
  131. exe_tree.Print(out);
  132. return out;
  133. }
  134. // Given the number of workers, launches the worker entry function for each. Essentially a
  135. // wrapper for the TaskGroup handling that is stored inside the execution tree.
  136. // @param num_workers - The number of workers to launch
  137. // @param func - The function entry point that workers will execute
  138. // @return Status - The error code return
  139. Status LaunchWorkers(int32_t num_workers, std::function<Status(uint32_t)> func, std::string name = "");
  140. // Getter method
  141. // @return shared_ptr to the root operator
  142. std::shared_ptr<DatasetOp> root() const { return root_; }
  143. // Getter method
  144. // @return the prepare flags
  145. uint32_t PrepareFlags() const { return prepare_flags_; }
  146. // The driver of the prepare phase of the execution tree.
  147. // Prepare phase consists of three sub phases
  148. //
  149. // 1. PrepareTreePreAction()
  150. // Compulsory transformation/action pre optimization.
  151. // For example, CacheOp Insertion
  152. //
  153. // 2. Optimize()
  154. // Optimization transformation/action, optional
  155. // For example, MapOp Fusion
  156. //
  157. // 3. PrepareTreePostAction()
  158. // Compulsory transformation/action post optimization.
  159. // For example, repeatOp inlining
  160. //
  161. // @return Status - The error code return
  162. Status Prepare(int num_epochs = -1);
  163. // Compulsory transformation/action pre optimization.
  164. // @return Status - The error code return
  165. Status PrepareTreePreAction();
  166. // Compulsory transformation/action post optimization.
  167. // @return Status - The error code return
  168. Status PrepareTreePostAction();
  169. // Optimization transformation/action, optional.
  170. // @return Status - The error code return
  171. Status Optimize();
  172. // The DEPRECATED driver of the prepare phase of the execution tree. The prepare phase will recursively
  173. // walk the tree to perform modifications to the tree or specific nodes within the tree to get
  174. // it ready for execution.
  175. // @param Total number of epochs that will be run on this tree
  176. // @return Status - The error code return
  177. Status PrepareDeprecated();
  178. // Recursive function used during prepare phase to visit a node and drive any pre- and post-
  179. // node actions during a tree walk.
  180. // @param op - The dataset op to work on
  181. // @return Status - The error code return
  182. Status PrepareNode(const std::shared_ptr<DatasetOp> &dataset_op);
  183. // Return the pointer to the TaskGroup
  184. // @return raw pointer to the TaskGroup
  185. TaskGroup *AllTasks() const { return tg_.get(); }
  186. // Return if the ExecutionTree is at end of epoch status
  187. // @return bool - true is ExecutionTree is end of epoch status
  188. bool IsEpochEnd() const { return tree_state_ == TreeState::kDeTStateEpochEnd; }
  189. // Set the ExecutionTree to EOE state
  190. void SetEpochEnd() { tree_state_ = TreeState::kDeTStateEpochEnd; }
  191. // Set the ExecutionTree to executing state
  192. void SetExecuting() { tree_state_ = TreeState::kDeTStateExecuting; }
  193. // Return if the ExecutionTree is finished (iterator receives EOF).
  194. // @return Bool - true is ExecutionTree is finished
  195. bool isFinished() const { return tree_state_ == TreeState::kDeTStateFinished; }
  196. // Return if the ExecutionTree is ready.
  197. // @return Bool - true is ExecutionTree is ready
  198. bool isPrepared() const {
  199. return tree_state_ == TreeState::kDeTStateReady || tree_state_ == kDeTStateExecuting ||
  200. tree_state_ == kDeTStateFinished;
  201. }
  202. // Set the ExecutionTree to Finished state.
  203. void SetFinished() { tree_state_ = TreeState::kDeTStateFinished; }
  204. // Getter for profiling manager, no ownership
  205. ProfilingManager *GetProfilingManager() { return profiling_manager_.get(); }
  206. // Set optional optimization if tree has not been prepared yet
  207. Status SetOptimize(bool value) {
  208. if (tree_state_ != kDeTStateInit && tree_state_ != kDeTStateBuilding) {
  209. std::string optimize = (optimize_ == true) ? "true" : "false";
  210. std::string msg = "Tree has already been prepared with OPTIMIZE set to " + optimize;
  211. RETURN_STATUS_UNEXPECTED(msg);
  212. } else {
  213. optimize_ = value;
  214. return Status::OK();
  215. }
  216. }
  217. // Optional optimizations status
  218. bool OptimizationEnabled() const { return optimize_; }
  219. // Getter function to get the total number of epochs to be run on this tree.
  220. // @return total number of epochs
  221. int32_t num_epochs() { return num_epochs_; }
  222. // set the function ptr that overrides the pre-pass which allows caller to adjust the existing pre_pass and
  223. // introduce new passes. E.g. caller can override the num_epoch in EpochInjectionPass
  224. void SetPrePassOverride(std::function<OptPass(OptPass)> pre_pass_override) { pre_pass_override_ = pre_pass_override; }
  225. private:
  226. // A helper functions for doing the recursive printing
  227. // @param dataset_op - The dataset op to print
  228. // @param indent - an indent string for aligning child levels in output
  229. // @param last - an indicator if it's the last child or not
  230. // @param detailed - should it display the detailed node output or the summary line
  231. void PrintNode(std::ostream &out, const std::shared_ptr<DatasetOp> &dataset_op, std::string indent, bool last,
  232. bool detailed) const;
  233. std::unique_ptr<TaskGroup> tg_; // Class for worker management
  234. std::shared_ptr<DatasetOp> root_; // The root node of the tree
  235. int32_t id_count_; // Counter for generating operator id's
  236. uint32_t prepare_flags_; // Flags used during tree prepare
  237. TreeState tree_state_; // Tracking the current tree state
  238. int32_t num_epochs_; // Total number of epochs to run for this tree
  239. std::unique_ptr<ProfilingManager> profiling_manager_; // Profiling manager
  240. bool optimize_; // Flag to enable optional optimizations
  241. std::function<OptPass(OptPass)> pre_pass_override_; // function ptr that overrides pre pass, called in PrePrepare()
  242. };
  243. } // namespace dataset
  244. } // namespace mindspore
  245. #endif // MINDSPORE_CCSRC_MINDDATA_DATASET_ENGINE_EXECUTION_TREE_H_