mindspore-ci-bot
da93717c5d
!14682 GPU set tensor limitation to 2G
From: @VectorSL
Reviewed-by: @cristoval,@wilfchen,@cristoval
Signed-off-by: @cristoval
5 years ago
mindspore-ci-bot
83b25e10e9
!13009 [debugger] offline debug feature
From: @islam_amin
Reviewed-by:
Signed-off-by:
5 years ago
VectorSL
7b9b84d651
addtensor size limitation to 2G
5 years ago
John Tzanakakis
da3b13a0e1
Offline debugger
Authors: John Tzanakakis, Adel Shafiei, Amir Lashkari, Islam Amin
5 years ago
hwjiaorui
dac67cbabb
clean code
5 years ago
lizhenyu
cf2244f1ef
[bugfix]Not set device id in asynchronous compile and run graph
5 years ago
mindspore-ci-bot
825ce95756
!13832 fix a bug that cuda error show wrong name in log
From: @hanhuifeng2020
Reviewed-by: @dylangeng,@anyrenwei
Signed-off-by: @anyrenwei
5 years ago
mindspore-ci-bot
e9d57490c5
!14155 [MS][RDR] fix fps degradation of gpu trainning in master branch
From: @louie5
Reviewed-by: @ouwenchang,@lixiaohui33
Signed-off-by: @lixiaohui33
5 years ago
wilfChen
e1d443efe3
tensor-rt library dynamic loadg
5 years ago
louei5
f23ce6c7d9
optimize record gpu memory information
5 years ago
mindspore-ci-bot
a48785cdcc
!14052 add op atomic clean to clear input addr in launch allreduce
From: @lvchangquan
Reviewed-by: @kisnwang,@chujinjin
Signed-off-by: @chujinjin
5 years ago
lizhenyu
3f9d9c5b2e
add error log when set device id failed
5 years ago
lvchangquan
0a7df321fe
add op atomic clean to clear input addr in launch allreduce
5 years ago
dayschan
11ee3b1624
add context graph_kernel_flags
used the flag "opt_level" to control GraphKernel,
0 means disabled while non-zero value means enabled.
the default value is controlled by context "enable_graph_kernel",
but if it's also set in "graph_kernel_flags", then the flag will prevail.
supported the whitelist and blacklist operators for GraphKernelExpander.
"enable_expand_ops", "enable_expand_ops_only", "disable_expand_ops".
5 years ago
hanhuifeng2020
9629977242
fix a bug that cuda error show wrong name in log
5 years ago
mindspore-ci-bot
5fd3d140b6
!13344 add DeviceContext module
From: @zyli2020
Reviewed-by:
Signed-off-by:
5 years ago
mindspore-ci-bot
8e8f3043f9
!12115 IR operators of GPU and CPU are unified as batchnorm
From: @ding_fei_fei
Reviewed-by:
Signed-off-by:
5 years ago
lizhenyu
95565aa7b8
add hardware abstract layer
5 years ago
dingpeifei
87e41aaeee
IR operators of GPU and CPU are unified as batchnorm
5 years ago
mindspore-ci-bot
defcc51641
!13304 refactor RDR to support single name
From: @luopengting
Reviewed-by: @ouwenchang,@lixiaohui33
Signed-off-by: @lixiaohui33
5 years ago
luopengting
c8ba7694c5
refactor RDR to support single name
1. support single name
2. add hash method for pair
3. move constructor and destructor of MemAddressInfo as public
4. remove graph_id
5. modify interval for somas info
5 years ago
mindspore-ci-bot
eb1c0310a9
!13307 GPU fix shared_ptr in GpuKernel
From: @VectorSL
Reviewed-by: @cristoval,@chujinjin
Signed-off-by: @chujinjin
5 years ago
mindspore-ci-bot
77cda67b3f
!13012 add mul fusion based on allreduce fusion
From: @lvchangquan
Reviewed-by:
Signed-off-by:
5 years ago
lvchangquan
31f9e6a42c
add op_mul fusion based on allreduce fusion in pynative mode
5 years ago
VectorSL
36e11ae17c
fix GPUKernelMod about the using of shard_ptr
5 years ago
limingqi107
a046a5eb43
optimize GPU format transform
5 years ago
mindspore-ci-bot
a21c8e13b5
!13010 Add device id log
From: @zpac
Reviewed-by: @cristoval,@wilfchen
Signed-off-by: @cristoval
5 years ago
tanghuikang
dac64f30ee
Support ms_function + heterogenous
5 years ago
ZPaC
f2edee750a
Add device id log
5 years ago
wenfangpei
d6b3a07b4a
parallel build gpu ops about graph kernel
5 years ago
mindspore-ci-bot
2f312dac66
!12091 Performance optimization for PyNative AllReduce
From: @jojobugfree
Reviewed-by:
Signed-off-by:
5 years ago
mindspore-ci-bot
4365c332e6
!12813 unify AvgPoolGrad's MindIR
From: @yuchaojie
Reviewed-by: @kisnwang
Signed-off-by:
5 years ago
yuchaojie
d2cb3aa1c2
unify AvgPoolGrad
5 years ago
louei5
99203038a5
support recording gpu memory information and graph execute order
5 years ago
caifubi
171b468bb3
PyNative AllReduce Bucket
5 years ago
mindspore-ci-bot
50542793c8
!12077 optimize gpu backend logger
From: @wilfchen
Reviewed-by: @cristoval,@limingqi107
Signed-off-by: @limingqi107
5 years ago
wilfChen
58196f1faf
modify gpu backend logger
5 years ago
zuochuanyong
3fa26683ac
nlp perf(Pynative): change memory sync mode from synchronous to asynchronous in SyncHostToDevice
5 years ago
He Wei
7d9a783993
[auto-monad] Support side-effects by auto-monad
The basic idea is: exploits data dependency to control the execution order
of side-effect operations, and keep the semantics of ANF unchanged.
The ControlDepend primitive is removed and there are two primitives added:
1. UpdateState:
```
a = Assign(para, value)
```
became:
```
a = Assign(para, value, u)
u = UpdateState(u, a)
```
2. Load:
```
x = Add(para, value)
```
became:
```
p = Load(para, u)
x = Add(p, value)
u = UpdateState(u, p)
```
5 years ago
lizhenyu
6649153c49
add input data type check for ps cache mode
5 years ago
limingqi107
366f3e668d
optimize the memory alloc error info
5 years ago
mindspore-ci-bot
4364abc7ee
!11798 Support RunOpsInGraph on CPU&GPU in pynative mode
From: @HulkTang
Reviewed-by:
Signed-off-by:
5 years ago
lizhenyu
f17534af08
ps cache support sparse
5 years ago
tanghuikang
6f2cd92aba
Support RunOpsInGraph on CPU&GPU in pynative mode
5 years ago
mindspore-ci-bot
03f88c6f44
!11271 Fix the bug of step_trace cannot get the step_trace_point name in callback scene
From: @gzhcv
Reviewed-by: @ouwenchang,@lilongfei15
Signed-off-by: @lilongfei15
5 years ago
mindspore-ci-bot
d8323b5d51
!11342 Support device memeory profiling
From: @yanghaitao1
Reviewed-by: @wangyue01,@lilongfei15
Signed-off-by: @lilongfei15
5 years ago
yuchaojie
1932d87a26
update some op's attr name
5 years ago
yanghaitao1
8d147deb07
profiler memory
5 years ago
gzhcv
ce66e8cf4e
Fix the bug of step_trace cannot get the step_trace_point name in callback scene
5 years ago
chujinjin
ade9a82c2b
fix device memory leak
5 years ago