UPDATE releaseNote of AKG

5 years ago · b7171148ba
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,121 +1,28 @@
 # Release 1.1.1
 ## Major Features and Improvements
 * Enable Tensor core when processing GEMM operators in AKG by using poly to create the schedule needed by tensor core pass automatically;
 * Implemented an akg mma lib with inlined ptx codes instead of wmma interface of cuda;
 * Enable one-dimensional mapping to optimize memory promotion.

 ## Bugfixes
 * Fix Segmentation fault in Mapping OuterBand in mindspore (!321).
 * Fix bugs for memory promotion issues  (!306).
 * Fix bugs during gen tuning space for scalar ops (!326).

 ## Contributors
 Thanks goes to these wonderful people:

 chengyun, chendeshi, chenlei_autodiff, gengzhen, hanhuifeng, lvwenyuan, lishanni513, hujiahui8, polyhedral, shiliang,  wYann, xixixian, xxxxxxw, xuhui, xiaruijie, yangsijia, yiyanzhi, zhangzhaochuang, zhengzuohe

 Contributions of any kind are welcome!

 # Release 1.1.0
 ## Major Features and Improvements
 * GPU operators improvements
  * Propose a new strategy to handle the reduction operators: The reduce axises would be detected and rescheduled as a seperated band in the schedule tree and then mapping to blocks, then it will call the akg_reduce_lib which using atomic operation to do reduction in the codegen pass. The experimental results show that AKG improves the execution performance relative to cudnn in the large shape cases;
  * Optimize the auto-tiling algorithms which can improve the performance of reduction operators dramatically in most scenarios.
 * Support AutoTuning for composite operators on GPU;
 * Refactor composite framework to enable optimization in DSL level;
 * Enhance CSE to support eliminating redundant vmadd on Ascend;
 * Update scipy to 1.5.3.

 ## Bugfixes
 * TensorAdd support FRACTAL_NZ and DefaultFormat(!228).
 * GPU fix cast: fp32 -> uint8(!216).
 * bugfix: Fix bug in opt_broadcast(!272).
 * fix vadds for int32(!250).

 ## Contributors
 Thanks goes to these wonderful people:

 chengyun, chendeshi, chenlei_autodiff, gaoxiong, gengzhen, guanxiaowei, hanhuifeng, laekov, luoyin, lvwenyuan, liuchang, lishanni513, lingyunli63, polyhedral, shiliang, wYann, wangrao124, xiaruijie, xixixian, xuhui, 要术甲杰, yiyanzhi_akane, yangshuo, yangsijia, zhangzhaochuang, zhengzuohe, zhangrenwei, zengzitao

 Contributions of any kind are welcome!

 # Release 1.0.0
 ## Major Features and Improvements
 * GPU Support
  * AKG now can generate gpu cuda kernel with no-schedule by using polyhedral techniques, which will create initial schedule, tile outerBands, map with blocks and threads and memory promotion automatically in the AutoPoly pass.
  * Some primitive and fused operators(most are element-wise operators and reduce operators) were added, as well as corresponding testcases.
 * Schedule-templates enhancement
  * Optimize the TVM original schedule-templates to get better performance in some reduce cases.
  * Support fusing multi-outputs into one kernel for element-wise operators.
 * Davinci Enhancement
  * Eliminate unnecessary broadcast by transforming the element-wise computation, such as `D[i, j] = A[i] + B[i, j] + C[i]` -> `D[i, j] = A[i] + C[i] + B[i, j]`, which satisfies commutative law and associative law.
  * Enhance the pass to_three_address to match more cases for vmadd.

 ## Bugfixes
 * fix a bug that random test case segment_max failed(!127).
 * fix the permisson denied error of rewriting meta_file with same name(!147).
 * fix warning for unsupported gpu built-in ops(!148).

 ## Contributors
 Thanks goes to these wonderful people:

 baita, ConnZhai, gengzhen, guanxiaowei, hanhuifeng, hujiahui8, laekov, lvwenyuan, lishanni513, lingyunli63, polyhedral, wYann, wangrao124, xixixian, xuhui, 要术甲杰, yiyanzhi_akane, yangsijia, zhengzuohe, zhangrenwei, zengzitao

 Contributions of any kind are welcome!

 # Release 0.7.0-beta
 ## Major Features and Improvements
 * Backend refactoring
  * Rewrite instruction args calculation module in EmitInsn by implementing a new computing strategy based on axis spliting, which achieved improvement both on performance and code simplicity.

 ## Bugfixes
 * fix dump code error when running gpu operators and set env MS_AKG_DUMP_CODE=ON(!113).

 ## Contributors
 ## Release 1.2.0
 ### Bug fixes
  * Fixed local memory promotion for large thread (2980!)
  * Fixed reduce binding dimension issue on gpu platform (ff38!)

 ## Release 1.2.0-rc1
 ### Major Features and Improvements
  * [STABLE] Rebuild the AKG repository for providing a new way to support ascend backend by linking a static library contained all the ascend passes. (Ascend)
  * [STABLE] Optimize the reduction add operation in ascend backend. (Ascend)
  * [STABLE] Add support for tuning elemwise&&reduction operators. (GPU)

 ### Bug fixes
  * Fixed a problem that data prefetch cannot be enabled by attributes in DSL.
  * Fixed bugs of autotiling algorithms (tiling too small, cannot adapted matmul+bias, etc.) in Ascend platform.

 ### Contributors
 Thanks goes to these wonderful people:

 lvwenyuan, shiliang, xuhui, wYann

 yangsijia, xxxxxxw, polyhedral, zhangrenwei, yiyanzhi, xixixian, hujiahui8, zhengzuohe, lishanni, zhangzhaochuang, xuhui, liuchao, gengzhen, xiaruijie, 
 chenlei_autodiff, lingyunli63, wYann, lvwenyuan, peiwenfang, hanhuifeng, gaoxiong, chengyun
 Contributions of any kind are welcome!

 # Release 0.6.0-beta
 ## Major Features and Improvements
 * AutoPoly refactor to support integrating multi-backend targets easily
  * Employ a pass/passmgr framework to manage all the transformations of ISL schedule tree in which transformation such as InitialSchTree and tileOuterBand would be considered as a pass to schedule tree.
  * Refactor some data structure of poly so that they can de-couple with Davinci chips.
 * Backend refactoring
  * Enhance min alignment analysis with more accurate propagate conditions.
  * Finetune pragma using alignment information before EmitInsn pass.
  * Simplify EmitInsn pass by unifying the emit method for different patterns.
 * Change the way of using TVM
  * Delete the repository ktvm and reference TVM directly in sourcecode(third_party/incubator-tvm).
  * Enable GPU operators generation which was tailored in ktvm.

 ## Bugfixes
 * fix wrong hoist problem in multicore loop switch hoist pass(!87).
 * fix scalar rearrange bug(!84).
 * fix matmul tuning and support all space tuning(!73).
 * fix variable broadcast_idx redefinition error when pragma dma_copy is replaced by opt_broadcast(!45).
 * fix the bug in broadcast_rewrite(!22).
 * fix bugs of multi-core processing(!33).
 * fix a bug that extra pipe_barrier inserted in the loop(!30).
 * fix inefficient auto tiling for axis with tail and remove duplicated check(!6).

 ## Contributors
 Thanks goes to these wonderful people:

 brovensmile, chengyun, chenlei_autodiff, chengbin, ConnZhai, fuxiaoteng, gaoxiong, gengzhen, hanhuifeng, KasonChan, luoyin, lvwenyuan, peiwenfang, xuhui, yangsijia, wangzhuo325, wYann

 Contributions of any kind are welcome!

 # Release 0.5.0-beta
 ## Major Features
 * Support auto-schedule and code-generation on Ascend platform.
 * Provide C++ APIs of basic operators used in MindSpore.
 * Support Elementwise-Elementwise, Reduce-Elementwise fusion patterns in Bert.
 * Support LambUpdateWithLR, LambNextMv, BatchMatmul optimazition for Bert.

 ## Initial Version
 * Upload the initial framework
 * Basic support for Ascend910 platform
 * Integration with GraphKernel
 * Basic support for Ascend910 platform and gpu v100
 * Integration with GraphKernel fusion of MindSpore.