In the last commit (015d7354c7), I deleted the check on whether different ValueNode have
same tensor value, but forgot the situation that several nodes use the same ValueNode,
in this case, the function will create several parameter for the same ValueNode, but all
ValueNode is replaced with the first parameter, and the remaining parameters are not used.
This will result in a "parameter has no user" error.
Use a std::set for the ValueNodes can resolve this problem.
Robin-hood-hashing (https://github.com/martinus/robin-hood-hashing)
is considered faster then std::unordered_map/set,
so we use it to improve mindspore performance.
1. robin_hood head file in `third_party/robin_hood/include`;
2. In `utils/hash_map.h` and `utils/hash_set.h`, we define:
- mindspore::HashMap as an alias of robin_hood::unordered_map;
- mindspore::HashSet as an alias of robin_hood::unordered_set;
3. Replace:
- `#include <unordered_map>` --> `#include "utils/hash_map.h"`;
- `#include <unordered_set>` --> `#include "utils/hash_set.h"`;
- `std::unordered_map` --> `mindspore::HashMap`;
- `std::unordered_set` --> `mindspore::HashSet`;
- `map.insert(std::pair(key, value))` --> `map.emplace(key, value)`;
- `[] (const std::pair<K, V> &p) {..} ` --> `[] (const auto &p) {..} `;
4. Fix issues found by switch to robin_hood:
- AnfNodeConfig hash and equal;
- Fix a bug in `Slice::operator==()`;
- Fix a bug in `CNode::HasPrimalAttr()`;
- Fix map.erase() usage bugs: `map.erase(iter++)` --> `iter = map.erase(iter)`;
- Fix some iterator invalidated problem;
5. Some std::unordered_map/set can not replace by robin_hood:
- As parameter of functions that exposed to python by pybind11;
- Use bad hash that cause robin_hood::map over_flow, such as AbstractBasePtrListHasher;
6. Update cpp unit tests;
7. Add build option '-F' to enable robin_hood, default on.
Changed the callback function GetProcessorFromContext to GetTargetFromContext,
so that we can use it to filter the clusterable op list, added a GetProcessorByTarget into AkgKernelJsonGenerator.
Moved the function IsKeepBasicNode, GetValidOps, OpListFilter from graph_kernel_helper
to graph_kernel_utils. combined the GetValidOps and OpListFilter.
Decoupled the pass getitem_tuple from "optimizer/common/helper.h", by deleting the checking
of input size. cnode->input(i) also checks the input index.
1. move functions from graph_kernel_helper.cc to graph_builder.cc:
the EliminateMakeTuple, implemented with SpreadTuples.
the ConvertNonscalarTensorToParameter, remove checking the equal tensor.
the IsTupleOutput (original IsMakeTupleOut), use recursion.
the CreateNewFuseCNode, remove the "output" argument; call SetNewKernelInfo in it.
the ReplaceNewFuseCNode,
the BuildSingleGraphFromNodes (original MixedNodesTransToGraph)
the ReplaceNodesWithGraphKernelNode (original FuseNodesToSubGraph)
2. create graph_kernel_utils.cc.
the ExtractGraphKernelName and SpreadTuples was moved to the file.
3. add SetNewKernelInfo to the callback functions.
only the GET functions are implemented now.
remove the calling of AnfAlgo's GET functions for node info from AkgKernelJsonGenerator.
And, bugfix in pass reorder_ops, which set attrs for the same prim::Cast primitive in different CNode.
changed the function to "TypeIdToString", and use the Type::ToString() function,
instead of TypeId-String map.
changed the DtypeToTypeId together, the original StringToType can be used.
added a new interface StringToTypeId.
it's unreasonable to change the node when generating kernel json.
instead, it should be set in a pass.
most of the operators in original akg_kernel_attrs_process are not longer used,
so we deleted them, leaving only the "Cast" and "MatMul/BatchMatMul".
only Linux system is supported now.
change the default value of `ENABLE_AKG` to off, and controlled by option `-K`.
the `ENABLE_AKG` is auto enabled when `ENABLE_GPU` or `ENABLE_D` is on.
since now, we can use `ENABLE_AKG` to control the compilation of graphkernel
and akg codes.
fix usage description for option "-K", it should be "[-K on|off]".
LLVM is required by akg for cpu kernels, so AKG for cpu is default disabled now.
* change the graphkernel's passes code(backend/optimizer/graph_kernel/*) to the
new namespace `mindspore::graphkernel`, to decouple from `mindspore::opt`.
* change the original `mindspore::opt::graphkernel` to `mindspore::graphkernel::inner` (graph_kernel/model)
* change the original `mindspore::opt::expanders` to `mindspore::graphkernel::expanders` (graph_kernel/expanders)
TODO: modify graph_kernel_flags, kernel_compiler/akg/
The "throw" statement is not allowed in mindspore project (codedex check),
so we remove the self-define exception and replace with MS_LOG(EXCEPTION).
In GraphKernelExpanders, we check the return value instead.
The rollback function in ArithmeticSimplify / TrnasformOpOptimizer
is not supported now.
what's more,
changed the c++ op expanders from .h files to .cc files,
the OpExpanderRegister is called in each .cc file, likes
the operator registers in mindspore.