mindspore-ci-bot
84607e3a51
!15890 fix an allreduce calculate bug in pynative mode
From: @lvchangquan
Reviewed-by:
Signed-off-by:
4 years ago
lvchangquan
0b09fdf94c
fix an allreduce bug with two streams sync problem
4 years ago
dayschan
c688116f9d
move the akg kernel build timer into AkgKernelBuilder::AkgKernelParallelBuild, so that it can time the Ascend kernel builder
4 years ago
John Tzanakakis
89c78069ce
use host_type vs deprecated type_id
4 years ago
limingqi107
179c677fef
fix graph output address set in the one time memory application scenarios
4 years ago
mindspore-ci-bot
25ce6a104e
!15600 add the continue memory alloc of communication kernel for actor runtime
From: @limingqi107
Reviewed-by: @wilfchen,@cristoval
Signed-off-by: @cristoval
4 years ago
limingqi107
fba1dd8f2f
add the continue memory alloc of communication kernel for actor runtime
4 years ago
wilfChen
ba9bbfadf8
gpu inference mixed precision
4 years ago
mindspore-ci-bot
a000f39764
!15552 use host shape instead of device shape for debugger
From: @john_tzanakakis
Reviewed-by: @yelihua,@pandoublefeng
Signed-off-by: @pandoublefeng
4 years ago
zengzitao
1fd87c6d83
add address state to fix cache problem when open graph kernel
4 years ago
John Tzanakakis
ddae425e0c
use host instead of device shape for debugger
4 years ago
limingqi107
b3a5ccebc3
fix codedex
4 years ago
mindspore-ci-bot
4bdad7ef49
!14802 Avoid calling too many time of SetDevice in PyNative mode
From: @jojobugfree
Reviewed-by: @kisnwang,@zhoufeng54
Signed-off-by: @zhoufeng54
5 years ago
caifubi
e76e7d4a27
Fix bug of wrong cuda device id
5 years ago
mindspore-ci-bot
da93717c5d
!14682 GPU set tensor limitation to 2G
From: @VectorSL
Reviewed-by: @cristoval,@wilfchen,@cristoval
Signed-off-by: @cristoval
5 years ago
mindspore-ci-bot
83b25e10e9
!13009 [debugger] offline debug feature
From: @islam_amin
Reviewed-by:
Signed-off-by:
5 years ago
VectorSL
7b9b84d651
addtensor size limitation to 2G
5 years ago
John Tzanakakis
da3b13a0e1
Offline debugger
Authors: John Tzanakakis, Adel Shafiei, Amir Lashkari, Islam Amin
5 years ago
hwjiaorui
dac67cbabb
clean code
5 years ago
lizhenyu
cf2244f1ef
[bugfix]Not set device id in asynchronous compile and run graph
5 years ago
mindspore-ci-bot
825ce95756
!13832 fix a bug that cuda error show wrong name in log
From: @hanhuifeng2020
Reviewed-by: @dylangeng,@anyrenwei
Signed-off-by: @anyrenwei
5 years ago
mindspore-ci-bot
e9d57490c5
!14155 [MS][RDR] fix fps degradation of gpu trainning in master branch
From: @louie5
Reviewed-by: @ouwenchang,@lixiaohui33
Signed-off-by: @lixiaohui33
5 years ago
wilfChen
e1d443efe3
tensor-rt library dynamic loadg
5 years ago
louei5
f23ce6c7d9
optimize record gpu memory information
5 years ago
mindspore-ci-bot
a48785cdcc
!14052 add op atomic clean to clear input addr in launch allreduce
From: @lvchangquan
Reviewed-by: @kisnwang,@chujinjin
Signed-off-by: @chujinjin
5 years ago
lizhenyu
3f9d9c5b2e
add error log when set device id failed
5 years ago
lvchangquan
0a7df321fe
add op atomic clean to clear input addr in launch allreduce
5 years ago
dayschan
11ee3b1624
add context graph_kernel_flags
used the flag "opt_level" to control GraphKernel,
0 means disabled while non-zero value means enabled.
the default value is controlled by context "enable_graph_kernel",
but if it's also set in "graph_kernel_flags", then the flag will prevail.
supported the whitelist and blacklist operators for GraphKernelExpander.
"enable_expand_ops", "enable_expand_ops_only", "disable_expand_ops".
5 years ago
hanhuifeng2020
9629977242
fix a bug that cuda error show wrong name in log
5 years ago
mindspore-ci-bot
5fd3d140b6
!13344 add DeviceContext module
From: @zyli2020
Reviewed-by:
Signed-off-by:
5 years ago
mindspore-ci-bot
8e8f3043f9
!12115 IR operators of GPU and CPU are unified as batchnorm
From: @ding_fei_fei
Reviewed-by:
Signed-off-by:
5 years ago
lizhenyu
95565aa7b8
add hardware abstract layer
5 years ago
dingpeifei
87e41aaeee
IR operators of GPU and CPU are unified as batchnorm
5 years ago
mindspore-ci-bot
defcc51641
!13304 refactor RDR to support single name
From: @luopengting
Reviewed-by: @ouwenchang,@lixiaohui33
Signed-off-by: @lixiaohui33
5 years ago
luopengting
c8ba7694c5
refactor RDR to support single name
1. support single name
2. add hash method for pair
3. move constructor and destructor of MemAddressInfo as public
4. remove graph_id
5. modify interval for somas info
5 years ago
mindspore-ci-bot
eb1c0310a9
!13307 GPU fix shared_ptr in GpuKernel
From: @VectorSL
Reviewed-by: @cristoval,@chujinjin
Signed-off-by: @chujinjin
5 years ago
mindspore-ci-bot
77cda67b3f
!13012 add mul fusion based on allreduce fusion
From: @lvchangquan
Reviewed-by:
Signed-off-by:
5 years ago
lvchangquan
31f9e6a42c
add op_mul fusion based on allreduce fusion in pynative mode
5 years ago
VectorSL
36e11ae17c
fix GPUKernelMod about the using of shard_ptr
5 years ago
limingqi107
a046a5eb43
optimize GPU format transform
5 years ago
mindspore-ci-bot
a21c8e13b5
!13010 Add device id log
From: @zpac
Reviewed-by: @cristoval,@wilfchen
Signed-off-by: @cristoval
5 years ago
tanghuikang
dac64f30ee
Support ms_function + heterogenous
5 years ago
ZPaC
f2edee750a
Add device id log
5 years ago
wenfangpei
d6b3a07b4a
parallel build gpu ops about graph kernel
5 years ago
mindspore-ci-bot
2f312dac66
!12091 Performance optimization for PyNative AllReduce
From: @jojobugfree
Reviewed-by:
Signed-off-by:
5 years ago
mindspore-ci-bot
4365c332e6
!12813 unify AvgPoolGrad's MindIR
From: @yuchaojie
Reviewed-by: @kisnwang
Signed-off-by:
5 years ago
yuchaojie
d2cb3aa1c2
unify AvgPoolGrad
5 years ago
louei5
99203038a5
support recording gpu memory information and graph execute order
5 years ago
caifubi
171b468bb3
PyNative AllReduce Bucket
5 years ago
mindspore-ci-bot
50542793c8
!12077 optimize gpu backend logger
From: @wilfchen
Reviewed-by: @cristoval,@limingqi107
Signed-off-by: @limingqi107
5 years ago