i-robot
d0792507bd
!26736 [MS][LITE]support parameter cache and distribution predict
Merge pull request !26736 from 张学同/distribution_cache_dev
4 years ago
wanyiming
d559b686fe
mod_pad
4 years ago
i-robot
953acc0335
!26672 Use GPU mem Allocator and workspace instead of self allocator
Merge pull request !26672 from wuwenbing/master
4 years ago
i-robot
a78b4fd60f
!26722 vm err log modify
Merge pull request !26722 from zhaosida/code_clean_master
4 years ago
i-robot
fa5ea7b3a6
!26370 DynamicRNNGrad support `hidden_size not multiple of 16` scene
Merge pull request !26370 from yuchaojie/ir_fusion4
4 years ago
zhangxuetong
9ef0fa84bb
support parameter cache and distribution predict
4 years ago
zhaosida
8759ac41ad
fix err log
4 years ago
VectorSL
710289a72d
add tensor array
4 years ago
i-robot
cfc6ea32ff
!24714 replace rtmemcpyxx to acl memcpy
Merge pull request !24714 from jjfeing/br_replace_rtmemcpyxx_with_acl_api
4 years ago
wenbean
31053edbe4
Use Allocator and workspace pre allocat mem in GPU
4 years ago
i-robot
30d182ac18
!26626 fix reduce ops axis multiple bug in GPU
Merge pull request !26626 from zhangbuxue/fix_reduce_ops_axis_multiple_bug_in_GPU
4 years ago
i-robot
3d0b785241
!26680 Decouple GraphKernelCluster from ME backend
Merge pull request !26680 from DeshiChen/1122_cluster
4 years ago
i-robot
b1d878ca6b
!26648 add more log to locate op compile failed reason
Merge pull request !26648 from liubuyu/SB
4 years ago
i-robot
3fc995a6ae
!26664 Add support float64 as input type for ReduceProd GPU op.
Merge pull request !26664 from hezhenhao1/add_prod
4 years ago
lby
47dbd2dd9c
add more log to locate compile failed reason
4 years ago
dayschan
2038295a25
Decouple GraphKernelCluster from ME backend
Changed the callback function GetProcessorFromContext to GetTargetFromContext,
so that we can use it to filter the clusterable op list, added a GetProcessorByTarget into AkgKernelJsonGenerator.
Moved the function IsKeepBasicNode, GetValidOps, OpListFilter from graph_kernel_helper
to graph_kernel_utils. combined the GetValidOps and OpListFilter.
Decoupled the pass getitem_tuple from "optimizer/common/helper.h", by deleting the checking
of input size. cnode->input(i) also checks the input index.
4 years ago
buxue
89a688f3be
fix reduce ops axis multiple bug in GPU
4 years ago
i-robot
821aba8907
!26245 sync commercial pclint clean
Merge pull request !26245 from zhaodezan/master
4 years ago
yuchaojie
b760eba23a
DynamicRNNGrad support `hidden_size not multiple of 16` scene
4 years ago
hezhenhao1
accc6368aa
Add support float64 as input type for ReduceProd GPU op.
4 years ago
i-robot
69b9598f49
!26225 Support akg kernel json generation of multi output ops
Merge pull request !26225 from zichun_ye/akg_json_multi_output
4 years ago
i-robot
24d4eac7a9
!26598 refactor rank cpu operator
Merge pull request !26598 from zhujingxuan/code_style
4 years ago
i-robot
a04fdd04c9
!26605 Fix gpu mem leak bug, add cuda memmalloc result check
Merge pull request !26605 from wuwenbing/master
4 years ago
i-robot
0d11abc7c2
!26518 tag environment implement
Merge pull request !26518 from chenweifeng/tag-env-implement
4 years ago
i-robot
1a7a04e4c9
!25132 [PyNative][MindRT][GPU] Op Lazy Build
Merge pull request !25132 from caifubi/master-pynative-mindrt-gpu-async-build
4 years ago
jjfeing
05485d991c
replace api with acl api
4 years ago
i-robot
56e61892bf
!26573 [MS][LITE][develop] optimize deconv fp16 ram
Merge pull request !26573 from sunsuodong/fix_deconv_winograd_fp16
4 years ago
caifubi
38352c1ba8
PyNative MindRT Op Lazy Build
4 years ago
i-robot
69c4f470e4
!26546 Unify GPU/CPU ops input/output(col/rolmajor), modify related testcases, add linalg function and testcases
Merge pull request !26546 from wuwenbing/master
4 years ago
zhujingxuan
79d08508e8
refactor
4 years ago
sunsuodong
47785a8f12
fix_deconv_winograd_fp16
4 years ago
wilfChen
79b37042aa
tag environment implement
4 years ago
Zichun Ye
996e7c39b3
support generating json for ops with multi ouputs in akg
fix typo from code check
fix namespace error
add CollectFusedJsonWithSingleKernel to generate json for customp[
fix typo
fix typo
drop changes in CreateOutputsJson
4 years ago
i-robot
1b7a38ccd8
!26329 Add exchangeKeys ops and getKeys ops for cross-silo STABLE_PW_ENCRYPT
Merge pull request !26329 from jxlang910/master
4 years ago
i-robot
3269c9b881
!26335 Support MindSpore on MacOS
Merge pull request !26335 from xulei/ms_mac_compile_br
4 years ago
wenbean
26d4bf6350
Fix meme leak bug, add result expect
4 years ago
i-robot
302b2ba3ee
!26485 [MS][LITE][develop] add fp16 kernel
Merge pull request !26485 from sunsuodong/add_fp16_kernel
4 years ago
i-robot
bc34adc3e3
!26516 refine the log
Merge pull request !26516 from liubuyu/master
4 years ago
zhaozhenlong
e885bce606
cast between fp16 int8
4 years ago
wenbean
13409f519f
Unify GPU/CPU ops input/output(col/rolmajor), modify related testcases, add linalg function and testcases
4 years ago
sunsuodong
bf208e243e
add fp16 kernel
4 years ago
lby
63bc0ae7ff
算子白名单
4 years ago
i-robot
b5c02a4ee0
!26426 gpu environment kernel
Merge pull request !26426 from chenweifeng/gpu-environment-kernel
4 years ago
i-robot
45947b51ba
!26312 add grad support dynamic shape
Merge pull request !26312 from wangnan39/add_grad_support_dynamic_shape
4 years ago
i-robot
d837e2c969
!26429 add cpu op fp64 register to fix bugs of gmres for backend cpu
Merge pull request !26429 from zhuzhongrui/gmres
4 years ago
jin-xiulang
f972de90d0
Add exchangeKeys ops and getKeys ops for STABLE_PW_ENCRYPT
fix review syggestions
4 years ago
i-robot
f72820404c
!26282 Add GPU eigenvalue/eigenvector ops for real and complex
Merge pull request !26282 from wuwenbing/master
4 years ago
i-robot
8805939be4
!26444 optimize matmul broadcast
Merge pull request !26444 from wangyanling/optimizematmul
4 years ago
wilfChen
68260a6a94
gpu environment kernel implement
4 years ago
王南
f082f4ccec
add grad support dynamic
4 years ago