This ensures that the data is not modified, even if shared with Python as zero-copy arrow array. Modifying the data could result in undefined behavior since there might be other subscribers that read the data at the same time.
Instead of doing an additional copy to send them from the operator thread to the runtime thread.
This commit uses the new `allocate_data_sample` and `send_output_sample` methods introduced in d7cd370.
We don't want to keep the input open until all drop tokens were released for two reasons:
- It adds an unnecessary delay. It is already clear that the output is finished, so by reporting it directly receivers can react earlier.
- Receivers might be blocked while waiting for new events, which prevents them from sending finished drop tokens. By closing the outputs, they will be unblocked through a new `InputClosed` event, which allows them to send their finished drop tokens. This way, we receive the remaining drop tokens faster in the sender.
The Python garbage collection will drop them non-deterministically in the background. Also, the dropping cannot happen while the GIL is held by our wait code.
There might still be some pending drop tokens after the receiving end of the event stream was closed. So we don't want to break from the receiver thread directly. Instead we keep it running until the control channel signals that it expects no more drop tokens by closing the `finished_drop_tokens` channel. This happens either when all required drop tokens were received, or because of a timeout.
Don't bind the lifetime of the event to the `next` call anymore. This makes it possible to use the original event in Python, which does not support borrowed data or lifetimes. The drawback is that we no longer have the guarantee that an event is freed before the next call to `recv`.
This commit migrates the allocation of shared memory samples from the daemon to the individual nodes. This way, the nodes can prepare the memory themselves when sending outputs. Thus, we avoid the extra roundtrip to the daemon that we used before (the prepare request and reply). The other advantage is that we can now queue shared memory outputs on the TCP socket without waiting for replies from the daemon (like we already do for `Vec`-backed messages).
The intent of this commit is to remove the quantity of log that is being pushed to user.
This commit removes the redeclaration of a set up tracing methods to centralise
the tokio-tracing subscriber within the extension crate. It also add the
feature to filter information based on Environment variable.
This makes it possible to define the log level for tokio tracing like
this:
```
RUST_LOG=debug dora-daemon --run-dataflow dataflow.yml
```
I have also unified the feature flag to make it easier to manage tracing features among the workspace.
I did not change the default behaviour of tracing in our crates and therefore by
using the command above you should get the same tracing log as before.
fix merge conflict generated