Bumping system metrics to the latest version removes nvml warnings that
used to create a deadlock.
It should also improve system metrics performance by reducing the number
of opened files.
As noted in #978, the current Python CUDA IPC API fails when sending
specific tensors.
My investigation suggests that `pyarrow.cuda` (used by `dora.cuda`), is
likely the source of this failure. During testing with two processes
(sender and receiver), the receiver receives `pyarrow.CudaBuffer` with
corrupted content, and it appears that there's nothing we can do about
this.
However, I found that the `numba.cuda` module is a suitable alternative
for implementing CUDA IPC. As we already depend on `numba`, no new
dependencies are introduced. I have refactored the API to use
`numba.cuda`, and it has passed the tests proposed in #978.
Example & docstrings are updated accordingly.
**Note**: this change is not backward compatible.
### Summary
This example had two issues:
- the path to the dora rerun executable in the dataflow file was wrongly
set
- it wasn't able to be run via cargo run command (only using uv
installation + dora build + dora run)
### Test it
via cargo run:
`cargo run --example rerun-viewer`
via uv setup:
```
uv venv -p 3.11 --seed
uv pip install -e ../../apis/python/node --reinstall
dora build dataflow.yml --uv
dora run dataflow.yml --uv
```
We don't wait for dynamic nodes when the dataflow is done otherwise, so they might stop after the dataflow is done. This commit fixes a daemon error that happened in this case.
The coordinator now sends an immediate `DataflowStartTriggered` reply when receiving a `DataflowStart` command. This enables the CLI to directly attach to the dataflow and observe the build output.
To wait until the build/spawning is done, this commit introduces a new `WaitForSpawn` command, to which the coordinator replies with a `DataflowSpawned` message once the node has been started. We use this command for the `dora build` command.