only Linux system is supported now.
change the default value of `ENABLE_AKG` to off, and controlled by option `-K`.
the `ENABLE_AKG` is auto enabled when `ENABLE_GPU` or `ENABLE_D` is on.
since now, we can use `ENABLE_AKG` to control the compilation of graphkernel
and akg codes.
fix usage description for option "-K", it should be "[-K on|off]".
LLVM is required by akg for cpu kernels, so AKG for cpu is default disabled now.
modifications for pass transform_op_optimizer:
1. Changed the maxflow-mincut algorithm to the Dinic's Algorithm,
since bug exists in the original ISAP codes.
if the algorithm is slow, we can apply some optimization for it. (e.g. current-arc optimization)
2. Added the pass TransformOpOptimizer in OptLevel_3.
this pass collects nodes around the specific transform operator (only Transpose now),
and use the mincut algorithm to get a plan, then re-link the original graph and
re-inference the shape and format of graph.
modifications for litegraph:
1. the class Node inherits from std::enable_shared_from_this,so we can get a shared_ptr by a pure pointer.
2. modified the Infer interface. it don't change the node, only inference the infos and return them.
1. Add a new message type "AKG/ATTR" in AkgKernelBuilder.
the attrs was sent before the kernel infos.
2. Send "online_tuning" attribute when the flag is not zero,
but error occurs in the latest akg submodule.
3. Send "repository_path" attribute when the flag is not empty.
4. Add a new value "compute_capability" into kernel info when the processor is GPU.
some flags, like "enable_stitch_fusion" and "enable_parallel_fusion", are enabled by default when opt_level=3.
and user can manually disable these flags.
Refactor the original "PassManager" class, and derive the "GraphKernelPassManager"
GraphKernel's ir files are dumped into a new sub-directory "graph_kernel" in the original "verbose_ir_files"
All GraphKernel's passes are divided into 3 levels, and controlled by the flag "opt_level" by default.
when the opt_level is greaterequal to the pass's level, this pass will run.
The default "opt_level" is 2 when GraphKernel is enabled.
Levels:
1. Basic features, like cluster, splitter, and some preprocess, postprocess.
2. All stable features, mainly includes the optimization passes.
3. Experimental features, like stitch-fusion, parallel-fusion.
The two flags "enable_pass" and "disable_pass" are available in this commit.
User can manually enable some passes when it's disabled by "opt_level", or disable the enabled passes,
by specifying that pass in this format: "stage_id.pass_id" or "stage_name.pass_name", multiple passes are separated by comma(",")
the stage/pass index and stage/pass name can be found from the ir filename.
e.g. "--enable_pass=cluster.graph_kernel_expander,1.1,1.2"
Others:
1. the pass "tensor_promotion" is not useful, remove it.
2. put the pass "InsertPadOps" before "ArithmeticSimplify".
used the flag "opt_level" to control GraphKernel,
0 means disabled while non-zero value means enabled.
the default value is controlled by context "enable_graph_kernel",
but if it's also set in "graph_kernel_flags", then the flag will prevail.
supported the whitelist and blacklist operators for GraphKernelExpander.
"enable_expand_ops", "enable_expand_ops_only", "disable_expand_ops".