1. Removed the deprecated pass "EliminateGetitemForControlDepend"
2. Spread the MakeTuple in UpdateState's input at PreProcess, so that all inputs are directly connected
to UpdateState, I dont need to consider the condition "Getitem-MakeTuple-UpdateState'.
after this pass, the UpdateState(U, make_tuple(op1, op2, ...)) was changed to UpdateState(U, op1, op2, ...)
3. Shrink the UpdateState's inputs at PostProcess. The reverse operation of the above pass.
recovered the UpdateState's format for the process after GraphKernel.
4. Add a pass ExtendOutputForUpdateState, it's the main job of this commit.
Consider this situation:
A Cast op has multiple users in a composite kernel, while it's also in the output list and connects to
an external UpdateState. In the pass "ShapeOpsSplitter", it will be duplicated. after that, only one replica will be connected
to the external UpdateState, others will be connected to its original users respectively.
After the pass "GraphKernelSplitter", only one part will be connected to this UpdateState, the execution order of other nodes cannot be ensured.
This pass extended the node that connects to UpdateState, if a node has an external UpdateState user, all outputs that depend on this node
will be connected to this UpdateState. It may add many redundant edges, the next pass will handle it.
5. Add a pass MergeOutputForUpdateState after GraphKernelSplitter.
if an UpdateState has multiple inputs from the same node, only one edge will be kept.
Changed the expander to class, and used the class name to identify the operator.
Moved the original op logic into the `_expand` function; added a `_check` function to check op inputs.
Use decorator to register the whitelist formats that operator supports,
and the decorator will change the `_check` function to check the formats.
The basic idea is: exploits data dependency to control the execution order
of side-effect operations, and keep the semantics of ANF unchanged.
The ControlDepend primitive is removed and there are two primitives added:
1. UpdateState:
```
a = Assign(para, value)
```
became:
```
a = Assign(para, value, u)
u = UpdateState(u, a)
```
2. Load:
```
x = Add(para, value)
```
became:
```
p = Load(para, u)
x = Add(p, value)
u = UpdateState(u, p)
```
Decoupled from the front-end interfaces.
1. Removed the call to "Renormalize".
Completed the infer-format in model_builder.
Only used the device shape and device format to
infer an abstract shape without considering padding.
2. Removed the call to python's Primitive interfaces.
The "Renormalize" relies on the PrimitivePy, so they can be
removed together. After that, the functions "ConstAttrToInput",
"DeleteAttrInInput" and related can be removed.
3. Reuse the AkgKernelJsonGenerator in GraphKernelExpander.
1) set the attribute "extract_opinfo_from_anf" to true, so that
the generator can handle the basic operator with anf info.
2) added a function "extract_expand_info" in expander.py
to convert the json into a more friendly format. The attrs
was converted to a dict instead of a list.
4. Scalars only support DefaultFormat.
Removed the argument "format" from graph_builder.value
5. Moved the expander op list from graph_kernel_helper.cc to graph_kernel_expander.cc
1. added a pass to replace Assign with InplaceAssign.
2. bugfix in eliminate_redundant_output. the side-effect node should not be eliminated.
3. bugfix in graph_kernel/splitter.py, the kernel includes InplaceAssign should be a composite node.
4. added two tool functions GetAllInputDeviceTypes and GetAllOutputDeviceTypes into AnfAlgo.
5. do not fuse a single Assign in pass BasicOpsFusion.
Cast the float16-input to float32 before ReduceSum, and cast back to float16 after ReduceSum.
If the op after this ReduceSum is a casting from float16 to float32, then it can be eliminated.
this commit reverts the modification for basic_ops_fusion.cc in 8af78cd5c,
the getitem should be fused with its all users.
(no bug. but when the network is large, it works very slowly, this's a temporary solution)
fixbug in ReplaceNewFuseCNode
add a pass to eliminate repeated output after cse
fixbug in graph_kernel_splitter
do not fuse reshape op as output in costmodel.
add tile expander
add BroadcastTo in model
fix BroadcastTo op calling error and infer shape
rewrite tile expander
not split broadcast_to
add SqrtGrad expander
Refactor the BasicOpsFusion and CompositeOpsFusion to one pass.
Add a pass to eliminate the redundant output.
TODO: rename the file basic_ops_fusion and delete the file composite_ops_fusion