Changed the callback function GetProcessorFromContext to GetTargetFromContext,
so that we can use it to filter the clusterable op list, added a GetProcessorByTarget into AkgKernelJsonGenerator.
Moved the function IsKeepBasicNode, GetValidOps, OpListFilter from graph_kernel_helper
to graph_kernel_utils. combined the GetValidOps and OpListFilter.
Decoupled the pass getitem_tuple from "optimizer/common/helper.h", by deleting the checking
of input size. cnode->input(i) also checks the input index.
fix typo from code check
fix namespace error
add CollectFusedJsonWithSingleKernel to generate json for customp[
fix typo
fix typo
drop changes in CreateOutputsJson
only the GET functions are implemented now.
remove the calling of AnfAlgo's GET functions for node info from AkgKernelJsonGenerator.
And, bugfix in pass reorder_ops, which set attrs for the same prim::Cast primitive in different CNode.
* move GetInputTensorValue from common_utils to json_generator
* get dtype size by `Number.nbits()` instead of `GetDtypeNbyte` map.
* manually get attr from anfnode, instead of `AnfAlgo::GetNodeAttr`
* replace `AnfAlgo::GetCNodePrimitive` with `GetCNodePrimitive` in anf.cc
* it's not used to judge `AnfAlgo::IsRealKernel` in inner function.
cleancode jobs:
* remove the `Clean` function from AkgKernelJsonGenerator
* delete the json key "id", to delete the mutex in AkgKernelJsonGenerator
changed the function to "TypeIdToString", and use the Type::ToString() function,
instead of TypeId-String map.
changed the DtypeToTypeId together, the original StringToType can be used.
added a new interface StringToTypeId.
it's unreasonable to change the node when generating kernel json.
instead, it should be set in a pass.
most of the operators in original akg_kernel_attrs_process are not longer used,
so we deleted them, leaving only the "Cast" and "MatMul/BatchMatMul".
only Linux system is supported now.
change the default value of `ENABLE_AKG` to off, and controlled by option `-K`.
the `ENABLE_AKG` is auto enabled when `ENABLE_GPU` or `ENABLE_D` is on.
since now, we can use `ENABLE_AKG` to control the compilation of graphkernel
and akg codes.
fix usage description for option "-K", it should be "[-K on|off]".
LLVM is required by akg for cpu kernels, so AKG for cpu is default disabled now.
This reverts commit b077aa1cab.
Revert "[feat] [assistant] [I3T96X] add new Dataset operator LibriSpeechDataset"
This reverts commit 4e6f7dc97d.
delete pass_registry_test.cc
comment hiai_nlu_model_multi.pb related line
Add a subdirectory "model" in the "backend/optimizer/graph_kernel" for litegraph.
Implement two interfaces "AnfGraph2LiteGraph" and "LiteGraph2AnfGraph".
The litegraph will be the base data structure when we migrate the GraphKernel code
from python("mindspore/_extends/graph_kernel") to c++.
1. Add a new message type "AKG/ATTR" in AkgKernelBuilder.
the attrs was sent before the kernel infos.
2. Send "online_tuning" attribute when the flag is not zero,
but error occurs in the latest akg submodule.
3. Send "repository_path" attribute when the flag is not empty.
4. Add a new value "compute_capability" into kernel info when the processor is GPU.
Refactor the original "PassManager" class, and derive the "GraphKernelPassManager"
GraphKernel's ir files are dumped into a new sub-directory "graph_kernel" in the original "verbose_ir_files"
All GraphKernel's passes are divided into 3 levels, and controlled by the flag "opt_level" by default.
when the opt_level is greaterequal to the pass's level, this pass will run.
The default "opt_level" is 2 when GraphKernel is enabled.
Levels:
1. Basic features, like cluster, splitter, and some preprocess, postprocess.
2. All stable features, mainly includes the optimization passes.
3. Experimental features, like stitch-fusion, parallel-fusion.
The two flags "enable_pass" and "disable_pass" are available in this commit.
User can manually enable some passes when it's disabled by "opt_level", or disable the enabled passes,
by specifying that pass in this format: "stage_id.pass_id" or "stage_name.pass_name", multiple passes are separated by comma(",")
the stage/pass index and stage/pass name can be found from the ir filename.
e.g. "--enable_pass=cluster.graph_kernel_expander,1.1,1.2"
Others:
1. the pass "tensor_promotion" is not useful, remove it.
2. put the pass "InsertPadOps" before "ArithmeticSimplify".