You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
lizhenyu cf2244f1ef [bugfix]Not set device id in asynchronous compile and run graph 5 years ago
..
distribution GPU supports p2p nccl interfaces 5 years ago
mpi code warning clean 5 years ago
blocking_queue.cc add push opt logic 5 years ago
blocking_queue.h add input data type check for ps cache mode 5 years ago
cuda_common.h add more dtypes support for gatherdgrad and other bugfix 5 years ago
cuda_driver.cc add error log when set device id failed 5 years ago
cuda_driver.h add error log when set device id failed 5 years ago
cuda_env_checker.cc search nvcc in entire PATH 5 years ago
cuda_env_checker.h search nvcc in entire PATH 5 years ago
gpu_bucket.cc add op atomic clean to clear input addr in launch allreduce 5 years ago
gpu_bucket.h add op atomic clean to clear input addr in launch allreduce 5 years ago
gpu_buffer_mgr.cc add error log when set device id failed 5 years ago
gpu_buffer_mgr.h add push opt logic 5 years ago
gpu_common.h fix a bug that cuda error show wrong name in log 5 years ago
gpu_device_address.cc nlp perf(Pynative): change memory sync mode from synchronous to asynchronous in SyncHostToDevice 5 years ago
gpu_device_address.h fix device memory leak 5 years ago
gpu_device_manager.cc add error log when set device id failed 5 years ago
gpu_device_manager.h Fix GPU sync stream Segmentation fault 5 years ago
gpu_event.cc PyNative AllReduce Bucket 5 years ago
gpu_event.h PyNative AllReduce Bucket 5 years ago
gpu_kernel_build.cc add hardware abstract layer 5 years ago
gpu_kernel_build.h add hardware abstract layer 5 years ago
gpu_kernel_runtime.cc [bugfix]Not set device id in asynchronous compile and run graph 5 years ago
gpu_kernel_runtime.h Support ms_function + heterogenous 5 years ago
gpu_launch_kernel.cc add hardware abstract layer 5 years ago
gpu_launch_kernel.h add op atomic clean to clear input addr in launch allreduce 5 years ago
gpu_launch_mul.cc add op_mul fusion based on allreduce fusion in pynative mode 5 years ago
gpu_launch_mul.h add op atomic clean to clear input addr in launch allreduce 5 years ago
gpu_memory_allocator.cc Refactor ms_context implementation 5 years ago
gpu_memory_allocator.h Unified code style 5 years ago
gpu_memory_copy_manager.cc refine GPU memory swap performance 5 years ago
gpu_memory_copy_manager.h refine GPU memory swap performance 5 years ago
gpu_memory_manager.cc optimize the memory alloc error info 5 years ago
gpu_memory_manager.h profiler memory 5 years ago
gpu_stream_assign.cc fix GPUKernelMod about the using of shard_ptr 5 years ago
gpu_stream_assign.h fix_consecutive_allreduce_bug 5 years ago
kernel_info_setter.cc IR operators of GPU and CPU are unified as batchnorm 5 years ago
kernel_info_setter.h IR operators of GPU and CPU are unified as batchnorm 5 years ago
queue_common.h add trace for gpu error/excpt log 5 years ago
readme.md mindspore path adjust 5 years ago
trt_loader.cc tensor-rt library dynamic loadg 5 years ago
trt_loader.h tensor-rt library dynamic loadg 5 years ago