You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
peixu_ren cccb230f7b Add random normal cuda implementation on GPU 5 years ago
..
adam_impl.cu add adam optimizer 5 years ago
adam_impl.cuh add adam optimizer 5 years ago
adam_weight_decay_impl.cu Gpu Adam Fusion 5 years ago
adam_weight_decay_impl.cuh Gpu Adam Fusion 5 years ago
argmax_impl.cu initial version 6 years ago
argmax_impl.cuh initial version 6 years ago
argmaxwithvalue_impl.cu update argmaxwithvalue 5 years ago
argmaxwithvalue_impl.cuh update argmaxwithvalue 5 years ago
assign_add_impl.cu initial version 6 years ago
assign_add_impl.cuh initial version 6 years ago
batchnorm_fold2_impl.cu add quantizaiton gpu op 5 years ago
batchnorm_fold2_impl.cuh add quantizaiton gpu op 5 years ago
batchnorm_fold_impl.cu add quantizaiton gpu op 5 years ago
batchnorm_fold_impl.cuh add quantizaiton gpu op 5 years ago
broadcast_grad_impl.cu Gpu Minimum & Maximum kernels support int32 5 years ago
broadcast_grad_impl.cuh gpu support MinimumGrad & MaximumGrad kernel 5 years ago
broadcast_impl.cu gpu support bert finetune 5 years ago
broadcast_impl.cuh gpu support bert finetune 5 years ago
concatv2_impl.cu gpu concat kernel support 4 inputs 5 years ago
concatv2_impl.cuh gpu concat kernel support 4 inputs 5 years ago
correction_mul_impl.cu add quantizaiton gpu op 5 years ago
correction_mul_impl.cuh add quantizaiton gpu op 5 years ago
cross_entropy_impl.cu Add protection in cross entropy kernel. 5 years ago
cross_entropy_impl.cuh fix bug in cross entropy error 5 years ago
dropout_impl.cu gpu dropout rewrite 5 years ago
dropout_impl.cuh gpu dropout rewrite 5 years ago
equalcount_impl.cu initial version 6 years ago
equalcount_impl.cuh initial version 6 years ago
fake_quant_perchannel_impl.cu add fake quant test case for gpu 5 years ago
fake_quant_perchannel_impl.cuh add fake quant test case for gpu 5 years ago
fake_quant_perlayer_impl.cu add fake quant test case for gpu 5 years ago
fake_quant_perlayer_impl.cuh add fake quant test case for gpu 5 years ago
float_status_impl.cu gpu add float_status kernel 5 years ago
float_status_impl.cuh gpu add float_status kernel 5 years ago
ftrl_impl.cu add ftrl optimizer 5 years ago
ftrl_impl.cuh add ftrl optimizer 5 years ago
gather.cu initial version 6 years ago
gather.cuh add quantizaiton gpu op 5 years ago
gelu_impl.cu gpu Gelu kernel support fp16 5 years ago
gelu_impl.cuh gpu support Gelu & GeluGrad kernels 5 years ago
layer_norm_grad_impl.cu gpu momentum layernorm layernormgrad support fp16 5 years ago
layer_norm_grad_impl.cuh Gpu support LayerNorm kernel 5 years ago
layer_norm_impl.cu gpu momentum layernorm layernormgrad support fp16 5 years ago
layer_norm_impl.cuh gpu momentum layernorm layernormgrad support fp16 5 years ago
minmax_update_impl.cu fix perchannel num_channels not set bug and adjust quant.py params order 5 years ago
minmax_update_impl.cuh fix perchannel num_channels not set bug and adjust quant.py params order 5 years ago
momentum_impl.cu gpu momentum layernorm layernormgrad support fp16 5 years ago
momentum_impl.cuh gpu momentum layernorm layernormgrad support fp16 5 years ago
one_hot_impl.cu initial version 6 years ago
one_hot_impl.cuh initial version 6 years ago
pad_impl.cu initial version 6 years ago
pad_impl.cuh initial version 6 years ago
random_op_impl.cu Add random normal cuda implementation on GPU 5 years ago
random_op_impl.cuh Add random normal cuda implementation on GPU 5 years ago
rmsprop_impl.cu fixed validator for CumProd, ReduceProd, ApplyRMSProp 5 years ago
rmsprop_impl.cuh fixed validator for CumProd, ReduceProd, ApplyRMSProp 5 years ago
select_impl.cu gpu add kernel select 5 years ago
select_impl.cuh gpu add kernel select 5 years ago
sigmoid_cross_entropy_with_logits_grad_impl.cu add SigmoidCrossEntropyWithLogitsGrad op 5 years ago
sigmoid_cross_entropy_with_logits_grad_impl.cuh add SigmoidCrossEntropyWithLogitsGrad op 5 years ago
sigmoid_cross_entropy_with_logits_impl.cu add SigmoidCrossEntropyWithLogits op 5 years ago
sigmoid_cross_entropy_with_logits_impl.cuh add SigmoidCrossEntropyWithLogits op 5 years ago
slice_impl.cu gpu fix slice 5 years ago
slice_impl.cuh Gpu Slice kernel performance improvement 5 years ago
smooth_l1_loss_impl.cu gpu support smoothl1loss 5 years ago
smooth_l1_loss_impl.cuh gpu support smoothl1loss 5 years ago
sparse_cross_entropy_cuda_impl.cu add quantizaiton gpu op 5 years ago
sparse_cross_entropy_cuda_impl.cuh add quantizaiton gpu op 5 years ago
transpose_impl.cu initial version 6 years ago
transpose_impl.cuh initial version 6 years ago
unary_op_impl.cu update argmaxwithvalue 5 years ago
unary_op_impl.cuh gpu queue support unary 5 years ago
unsorted_segment_sum.cu gpu support UnsortedSegmentSum kernel 5 years ago
unsorted_segment_sum.cuh gpu support UnsortedSegmentSum kernel 5 years ago