mindspore2022

584 MB

Tree: cdd3e50e98

Author	SHA1	Message	Date
b00518648	93da6bab46	fix bugs of moe: only use a fewer dp in moe	4 years ago
Xiaoda Zhang	81e5abe580	fix an error of configuring parallel	4 years ago
Xiaoda Zhang	b714451937	implementing expert_parallel+data_parallel in MoE: 1) extending _Linear's input as 4-dimension tensor: [outer_batch, expert_dim, -1, hidden], and _Liner's BatchMatMul becomes BatchMatMul(4_dim_tensor, 3_dim_tensor); 2) configuring the _Linear's BatchMatMul sharding strategy as [[dp, ep, 1, 1], [ep, 1, mp]]; 3) introducing a new parameter 'expert_parallel' in TransformerOpParallelConfig, creating a new class MoEParallelConfig to include 'data_parallel', 'model_parallel' and 'expert_parallel'; 4) changing parallel config for FeedForward, TransformerEncoderLayer, TransformerDecoderLayer.	4 years ago
wangshengnan123	7322426648	top_k routing	4 years ago
linqingke	acde7febef	update pangu reshape and softmax performance. Add layer norm judge Fix layer norm name error Fix input tyoe check Fix ut test Add 3d supports	4 years ago
huanghui	ba66c0d491	add security isolate for save_graphs	4 years ago
Xiaoda Zhang	5613c0b974	add a moe implementation: 1) extend the Liner cell for including BatchMatMul implementation, in which the first dimension indicates the expert number; 2) implement a Switch (top1) router; 3) implement a MoE cell, which extends the FeedForward cell.	4 years ago