Example of latency using CUDA Zero Copy instead of regular Shared Memory over CPU

Install

pip install dora-rs-cli # if not already present

# Install pyarrow with gpu support
conda install pyarrow "arrow-cpp-proc=*=cuda" -c conda-forge

## Test installation with
python -c "import pyarrow.cuda"

# Install numba for translation from arrow to torch
pip install numba

## Test installation with
python -c "import numba.cuda"

# Install torch if it's not already present
pip install torch

## Test installation with
python -c "import torch; assert torch.cuda.is_available()"

Run

dora run cpu_bench.yml

dora run cuda_bench.yml

cat benchmark_data.csv

To run the demo code

dora up
dora start demo_bench.yml --detach
python demo_receiver.py
dora destroy

802 B Raw Permalink Blame History

Example of latency using CUDA Zero Copy instead of regular Shared Memory over CPU

Install

Run

To run the demo code

802 B

Raw Permalink Blame History