Bumping system metrics to the latest version removes nvml warnings that
used to create a deadlock.
It should also improve system metrics performance by reducing the number
of opened files.
As noted in #978, the current Python CUDA IPC API fails when sending
specific tensors.
My investigation suggests that `pyarrow.cuda` (used by `dora.cuda`), is
likely the source of this failure. During testing with two processes
(sender and receiver), the receiver receives `pyarrow.CudaBuffer` with
corrupted content, and it appears that there's nothing we can do about
this.
However, I found that the `numba.cuda` module is a suitable alternative
for implementing CUDA IPC. As we already depend on `numba`, no new
dependencies are introduced. I have refactored the API to use
`numba.cuda`, and it has passed the tests proposed in #978.
Example & docstrings are updated accordingly.
**Note**: this change is not backward compatible.
### Summary
This example had two issues:
- the path to the dora rerun executable in the dataflow file was wrongly
set
- it wasn't able to be run via cargo run command (only using uv
installation + dora build + dora run)
### Test it
via cargo run:
`cargo run --example rerun-viewer`
via uv setup:
```
uv venv -p 3.11 --seed
uv pip install -e ../../apis/python/node --reinstall
dora build dataflow.yml --uv
dora run dataflow.yml --uv
```
Resolves#966
This PR improves current benchmark implementation by increasing the
queue size for throughput testing, and adding time gap between
throughput tests of different sizes to ensure accurate measurement.
Besides, a warning message will be logged now on discarding events due
to queue size limit.
This PR adds the sub-command uninstall under the self command. It also
adds the missing discription of the Self command. I have used
self-replace crate for this purpose.