i-robot
c2212f88b4
!31164 Fix the global norm missing insert allreduce
Merge pull request !31164 from huangxinjing/fx_global_norm_error
4 years ago
i-robot
bf03f0e030
!31252 Implementation of element wise parallel ops
Merge pull request !31252 from liuluobin/element_wise_ops
4 years ago
liuluobin
f13d342986
Implementation of element wise parallel ops
4 years ago
huangxinjing
31f55b6525
1. The main gol: Fix mixing inserting the AllReduce when where is no mirror appeared
2. remove pattern match error as the origin pattern match will find no operator if there is only one parameter
4 years ago
yangzhenzhang
1f98ffb79c
adafactor parallel skip handle reshape
4 years ago
i-robot
dcb5cd670c
!30953 Dynamic Weight Decay
Merge pull request !30953 from wanyiming/dynamic_wd
4 years ago
lilei
690c58ebcf
modify virtualdataset bug for master
4 years ago
i-robot
c2a5cc1486
!31040 Produce parallel operators for Argmin/max, SquareSumAll and UnsortedSegmentProd
Merge pull request !31040 from Bert0108/reduce_operators_arg
4 years ago
i-robot
216e7c6a92
!31041 add check for conv2d
Merge pull request !31041 from yangzhenzhang/add-check-for-conv2d
4 years ago
yangzhenzhang
c00d29f223
rebase
4 years ago
liuluobin
8f045d02e3
Fix a bug where ROIAlign and CropAndResize distributed op do not support GPU
4 years ago
Bert0108
bfc5e4345c
add distributed operators for argmax/min sqauresumall and unsortedsetmentprod
4 years ago
wanyiming
a124ec4de7
add dynamic_decay
4 years ago
i-robot
335ef1c270
!30459 Add ut validate function for parallel
Merge pull request !30459 from liuluobin/ut_master
4 years ago
liuluobin
b797a410cc
Add validate function for parallel ut
4 years ago
Bert0108
dfc92f1791
add distributed parallel operators for reduceall and reduceprod
4 years ago
yao_yf
b60e54e0d5
support not only power of 2
4 years ago
wangjun
46612fabfb
add st for shard
4 years ago
i-robot
ad9757ccf0
!30661 [Auto parallel] [MoE] Fix an error of configuring MoE parallel
Merge pull request !30661 from Xiaoda/124-moe-changes
4 years ago
Xiaoda Zhang
81e5abe580
fix an error of configuring parallel
4 years ago
huangxinjing
896daee845
[AutoParallel]Fix insert error for the mirror
4 years ago
yangzhenzhang
43e6e16da3
check platform for resizebilinear
4 years ago
i-robot
0341d96dd6
!30469 add shard function to support part of the graph executed in auto_parallel under pynative mode
Merge pull request !30469 from wangjun/0223_pp
4 years ago
i-robot
cfe0f76d2b
!30491 ut for allgather fusion
Merge pull request !30491 from jiahongQian/master
4 years ago
wangjun
24d448239c
add pynative_parallel
4 years ago
i-robot
981eae461a
!30118 自动优化器并行特性
Merge pull request !30118 from zhuyuxiao/I4S85V
4 years ago
jiahongQian
25f57505bf
ut for allgather fusion
4 years ago
i-robot
bbcfbce9e0
!29997 [Auto parallel] [MoE] Support data_parallel + expert_parallel
Merge pull request !29997 from Xiaoda/124-moe-changes
4 years ago
zhuyuxiao
d0e0e305d3
good
4 years ago
i-robot
f2130e7434
!30483 [AutoParallel]Pipeline Automatic detection Opt
Merge pull request !30483 from lichen/pipeline_opt_detection
4 years ago
yao_yf
e21f878e14
adasum ut fix
4 years ago
Xiaoda Zhang
b714451937
implementing expert_parallel+data_parallel in MoE:
1) extending _Linear's input as 4-dimension tensor: [outer_batch, expert_dim, -1, hidden], and _Liner's BatchMatMul becomes BatchMatMul(4_dim_tensor, 3_dim_tensor);
2) configuring the _Linear's BatchMatMul sharding strategy as [[dp, ep, 1, 1], [ep, 1, mp]];
3) introducing a new parameter 'expert_parallel' in TransformerOpParallelConfig, creating a new class MoEParallelConfig to include 'data_parallel', 'model_parallel' and 'expert_parallel';
4) changing parallel config for FeedForward, TransformerEncoderLayer, TransformerDecoderLayer.
4 years ago
wangshengnan12@huawei.com
acbefd80ea
pipeline_opt_detection
4 years ago
i-robot
81260a2319
!30466 takedown test_auto_parallel_adasum.py to ensure stability, again
Merge pull request !30466 from yanghaoran/master
4 years ago
i-robot
14393503b7
!30431 allreduce allgather fusion
Merge pull request !30431 from jiahongQian/master
4 years ago
yanghaoran
71d6b7d506
takedown test_auto_parallel_adasum.py to ensure stability, again
4 years ago
i-robot
2e8eac8341
!30367 auto_parallel_adasum_support_data_parallel
Merge pull request !30367 from yao_yf/auto_parallel_adasum_support_data_parallel
4 years ago
jiahongQian
8a2151d8bb
allgather reducescatter fusion
4 years ago
i-robot
5bee7156b9
!30369 add_virtualdataset_ut
Merge pull request !30369 from lilei/add_virtualdataset_ut
4 years ago
yao_yf
19236b1a70
auto parallel adasum support data parallel and hybrid parallel
4 years ago
huangxinjing
092ba035e3
Add global norm parallel support
4 years ago
yanghaoran
bfe139b662
takedonw test_auto_parallel_adasum.py to ensure gate stability
4 years ago
lilei
bc62e24d94
add_virtualdataset_ut
4 years ago
i-robot
94c8c6355c
!30294 auto_parallel_adasum_checks_and_ut.
Merge pull request !30294 from yao_yf/auto_parallel_adasum_checks_and_ut
4 years ago
huangxinjing
5e325ac336
[AUTO_PARALLEL]Fix insert nodes error
4 years ago
i-robot
7386612515
!29820 moe_topk routing
Merge pull request !29820 from wangshengnan123/moe_topk_routing
4 years ago
yao_yf
4b79d4c425
auto parallel adasum uts and checks
4 years ago
wangshengnan123
7322426648
top_k routing
4 years ago
i-robot
48d4f34576
!30167 Add UT case for 'Convolution+Transformer' structure
Merge pull request !30167 from Bert0108/ut_conformer
4 years ago
Bert0108
25a9c73a08
add ut case for monitoring the conformer structure
4 years ago