| @@ -1,33 +0,0 @@ | |||
| name: github pages | |||
| on: | |||
| push: | |||
| branches: | |||
| - main | |||
| pull_request: | |||
| permissions: | |||
| contents: write | |||
| jobs: | |||
| deploy: | |||
| runs-on: ubuntu-20.04 | |||
| concurrency: | |||
| group: ${{ github.workflow }}-${{ github.ref }} | |||
| steps: | |||
| - uses: actions/checkout@v3 | |||
| - name: Setup mdBook | |||
| uses: peaceiris/actions-mdbook@v1 | |||
| with: | |||
| mdbook-version: "0.4.10" | |||
| # mdbook-version: 'latest' | |||
| - run: mdbook build docs | |||
| - name: Deploy | |||
| uses: peaceiris/actions-gh-pages@v3 | |||
| if: ${{ github.ref == 'refs/heads/main' }} | |||
| with: | |||
| github_token: ${{ secrets.GITHUB_TOKEN }} | |||
| publish_dir: ./docs/book | |||
| @@ -1 +0,0 @@ | |||
| book/ | |||
| @@ -1,6 +0,0 @@ | |||
| [book] | |||
| authors = ["Haixuan Xavier Tao", "Philipp Oppermann"] | |||
| language = "en" | |||
| multilingual = false | |||
| src = "src" | |||
| title = "dora-rs" | |||
| @@ -1,25 +0,0 @@ | |||
| # Summary | |||
| [Introduction](./introduction.md) | |||
| --- | |||
| # User Guide | |||
| - [installation](./installation.md) | |||
| - [Getting started](./getting-started.md) | |||
| # Reference Guide | |||
| - [Overview](overview.md) A high level overview of dora's different components and the differences between operators and custom nodes. | |||
| - [Dataflow Configuration](dataflow-config.md) Specification of dora's YAML-based dataflow configuration format. | |||
| - [Rust API](rust-api.md) | |||
| - [C API](c-api.md) | |||
| - [Python API](python-api.md) | |||
| # Brainstorming Ideas | |||
| - [State Management](state-management.md) Describes dora's state management capabilities for state sharing and recovery. | |||
| - [Library vs Framework](library-vs-framework.md) | |||
| - [Middleware Layer Abstraction](./communication-layer.md) | |||
| @@ -1,77 +0,0 @@ | |||
| # C API | |||
| ## Operator | |||
| The operator API is a framework for you to implement. The implemented operator will be managed by `dora`. This framework enable us to make optimisation and provide advanced features. | |||
| The operator definition is composed of 3 functions, `dora_init_operator` that initialise the operator and its context. `dora_drop_operator` that free the memory, and `dora_on_event` that action the logic of the operator on receiving an input. | |||
| ```c | |||
| {{#include ../../examples/c-dataflow/operator.c:0:29}} | |||
| ``` | |||
| ### Try it out! | |||
| - Create an `operator.c` file: | |||
| ```c | |||
| {{#include ../../examples/c-dataflow/operator.c}} | |||
| ``` | |||
| {{#include ../../examples/c-dataflow/README.md:40:46}} | |||
| - Link it in your graph as: | |||
| ```yaml | |||
| {{#include ../../examples/c-dataflow/dataflow.yml:13:20}} | |||
| ``` | |||
| ## Custom Node | |||
| The custom node API allow you to integrate `dora` into your application. It allows you to retrieve input and send output in any fashion you want. | |||
| #### `init_dora_context_from_env` | |||
| `init_dora_context_from_env` initiate a node from environment variables set by `dora-coordinator` | |||
| ```c | |||
| void *dora_context = init_dora_context_from_env(); | |||
| ``` | |||
| #### `dora_next_event` | |||
| `dora_next_event` waits for the next event (e.g. an input). Use `read_dora_event_type` to read the event's type. Inputs are of type `DoraEventType_Input`. To extract the ID and data of an input event, use `read_dora_input_id` and `read_dora_input_data` on the returned pointer. It is safe to ignore any events and handle only the events that are relevant to the node. | |||
| ```c | |||
| void *input = dora_next_input(dora_context); | |||
| // read out the ID as a UTF8-encoded string | |||
| char *id; | |||
| size_t id_len; | |||
| read_dora_input_id(input, &id, &id_len); | |||
| // read out the data as a byte array | |||
| char *data; | |||
| size_t data_len; | |||
| read_dora_input_data(input, &data, &data_len); | |||
| ``` | |||
| #### `dora_send_output` | |||
| `dora_send_output` send data from the node. | |||
| ```c | |||
| char out_id[] = "tick"; | |||
| char out_data[50]; | |||
| dora_send_output(dora_context, out_id, strlen(out_id), out_data, out_data_len); | |||
| ``` | |||
| ### Try it out! | |||
| - Create an `node.c` file: | |||
| ```c | |||
| {{#include ../../examples/c-dataflow/node.c}} | |||
| ``` | |||
| {{#include ../../examples/c-dataflow/README.md:26:35}} | |||
| - Link it in your graph as: | |||
| ```yaml | |||
| {{#include ../../examples/c-dataflow/dataflow.yml:6:12}} | |||
| ``` | |||
| @@ -1,32 +0,0 @@ | |||
| # [Middleware (communication) layer abstraction (MLA)](https://github.com/dora-rs/dora/discussions/53) | |||
| `dora` needs to implement MLA as a separate crate to provides a middleware abstraction layer that enables scalable, high performance communications for inter async tasks, intra-process (OS threads), interprocess communication on a single computer node or between different nodes in a computer network. MLA needs to support different communication patterns: | |||
| - publish-subscribe push / push pattern - the published message is pushed to subscribers | |||
| - publish-subscribe push / pull pattern - the published message is write to storage and later pulled by subscribers | |||
| - Request / reply pattern | |||
| - Point-to-point pattern | |||
| - Client / Server pattern | |||
| The MLA needs to abstract following details: | |||
| - inter-async tasks (e.g., tokio channels), intraprocess (OS threads, e.g., shared memory), interprocess and inter-host / inter-network communication | |||
| - different transport layer implementations (shared memory, UDP, TCP) | |||
| - builtin support for multiple serialization / deserialization protocols, e.g, capnproto, protobuf, flatbuffers etc | |||
| - different language bindings to Rust, Python, C, C++ etc | |||
| - telemetry tools for logs, metrics, distributed tracing, live data monitoring (e.g., tap a live data), recording and replay | |||
| Rust eco-system has abundant crates to provide underlaying communications, e.g.,: | |||
| - tokio / crossbeam provides different types of channels serving different purpose: mpsc, oneshot, broadcast, watch etc | |||
| - Tonic provides gRPC services | |||
| - Tower provides request/reply service | |||
| - Zenoh middleware provides many different pub/sub capabilities | |||
| MLA also needs to provide high level APIs: | |||
| - publish(topic, value, optional fields):- optional fields may contain senders' identify to help MLA logics to satify above requirements | |||
| - subscriber(topic, optional fields)-> future streams | |||
| - put(key, value, optional fields) | |||
| - get(key, optional fields) -> value | |||
| - send(key, msg, optional fields) | |||
| - recv(key, optional fields)->value | |||
| More info here: [#53](https://github.com/dora-rs/dora/discussions/53) | |||
| @@ -1,108 +0,0 @@ | |||
| # Dataflow Specification | |||
| Dataflows are specified through a YAML file. This section presents our current draft for the file format. It only includes basic functionality for now, we will extend it later when we introduce more advanced features. | |||
| ## Dataflow | |||
| Dataflows are specified through the following format: | |||
| ```yaml | |||
| nodes: | |||
| - id: foo | |||
| # ... (see below) | |||
| - id: bar | |||
| # ... (see below) | |||
| ``` | |||
| ### Inputs and Outputs | |||
| Each operator or custom node has a separate namespace for its outputs. To refer to outputs, the <operator>/<output> syntax is used. This way, there are no name conflicts between operators. | |||
| Input operands are specified using the <name>: <operator>/<output> syntax, where <data> is the internal name that should be used for the operand. The main advantage of this name mapping is that the same operator executable can be reused multiple times on different input. | |||
| ## Nodes | |||
| Nodes are defined using the following format: | |||
| ```yaml | |||
| nodes: | |||
| - id: some-unique-id | |||
| # For nodes with multiple operators | |||
| operators: | |||
| - id: operator-1 | |||
| # ... (see below) | |||
| - id: operator-2 | |||
| # ... (see below) | |||
| - id: some-unique-id-2 | |||
| custom: | |||
| source: path/to/timestamp | |||
| env: | |||
| - ENVIRONMENT_VARIABLE_1: true | |||
| working-directory: some/path | |||
| inputs: | |||
| input_1: operator_2/output_4 | |||
| input_2: custom_node_2/output_4 | |||
| outputs: | |||
| - output_1 | |||
| # Unique operator | |||
| - id: some-unique-id-3 | |||
| operator: | |||
| # ... (see below) | |||
| ``` | |||
| Nodes must provide either a `operators` field, or a `custom` field, but not both. Nodes with an `operators` field run a dora runtime process, which runs and manages the specified operators. Nodes with a `custom` field, run a custom executable. | |||
| ### Custom Nodes | |||
| Custom nodes specify the executable name and arguments like a normal shell operation through the `run` field. Through the optional `env` field, it is possible to set environment variables for the process. The optional `working-directory` field allows to overwrite the directory in which the program is started. | |||
| To integrate with the rest of the dora dataflow, custom nodes must specify their inputs and outputs, similar to operators. They can reference outputs of both operators, and other custom nodes. | |||
| ## Operators | |||
| Operators are defined through the following format: | |||
| ```yaml | |||
| - id: unique-operator-id | |||
| name: Human-Readable Operator Name | |||
| description: An optional description of the operators's purpose. | |||
| inputs: | |||
| input_1: source_operator_2/output_1 | |||
| input_2: custom_node_1/output_1 | |||
| outputs: | |||
| - output_1 | |||
| ## ONE OF: | |||
| shared_library: "path/to/shared_lib" # file extension and `lib` prefix are added automatically | |||
| python: "path/to/python_file.py" | |||
| wasm: "path/to/wasm_file.wasm" | |||
| ``` | |||
| Operators must list all their inputs and outputs. Inputs can be linked to arbitrary outputs of other operators or custom nodes. | |||
| There are multiple ways to implement an operator: | |||
| - as a C-compatible shared library | |||
| - as a Python object | |||
| - as a WebAssembly (WASM) module | |||
| Each operator must specify exactly one implementation. The implementation must follow a specific format that is specified by dora. | |||
| ## Example | |||
| ```yaml | |||
| {{#include ../../examples/rust-dataflow/dataflow.yml}} | |||
| ``` | |||
| ## TODO: Integration with ROS 1/2 | |||
| To integrate dora-rs operators with ROS1 or ROS2 operators, we plan to provide special _bridge operators_. These operators act as a sink in one dataflow framework and push all messages to a different dataflow framework, where they act as source. | |||
| For example, we plan to provide a `to_ros_2` operator, which takes a single `data` input, which is then published to a specified ROS 2 dataflow. | |||
| @@ -1,123 +0,0 @@ | |||
| ### Create a Rust workspace | |||
| - Initiate the workspace with: | |||
| ```bash | |||
| mkdir my_first_dataflow | |||
| cd my_first_dataflow | |||
| ``` | |||
| - Create the Cargo.toml file that will configure the entire workspace: | |||
| `Cargo.toml` | |||
| ```toml | |||
| [workspace] | |||
| members = [ | |||
| "rust-dataflow-example-node", | |||
| ] | |||
| ``` | |||
| ### Write your first node | |||
| Let's write a node which sends the current time periodically. Let's make it after 100 iterations. The other nodes/operators will then exit as well because all sources closed. | |||
| - Generate a new Rust binary (application): | |||
| ```bash | |||
| cargo new rust-dataflow-example-node | |||
| ``` | |||
| with `Cargo.toml`: | |||
| ```toml | |||
| {{#include ../../examples/rust-dataflow/node/Cargo.toml}} | |||
| ``` | |||
| with `src/main.rs`: | |||
| ```rust | |||
| {{#include ../../examples/rust-dataflow/node/src/main.rs}} | |||
| ``` | |||
| ### Write your first operator | |||
| - Generate a new Rust library: | |||
| ```bash | |||
| cargo new rust-dataflow-example-operator --lib | |||
| ``` | |||
| with `Cargo.toml`: | |||
| ```toml | |||
| {{#include ../../examples/rust-dataflow/operator/Cargo.toml}} | |||
| ``` | |||
| with `src/lib.rs`: | |||
| ```rust | |||
| {{#include ../../examples/rust-dataflow/operator/src/lib.rs}} | |||
| ``` | |||
| - And modify the root `Cargo.toml`: | |||
| ```toml= | |||
| [workspace] | |||
| members = [ | |||
| "rust-dataflow-example-node", | |||
| "rust-dataflow-example-operator", | |||
| ] | |||
| ``` | |||
| ### Write your sink node | |||
| Let's write a `logger` which will print incoming data. | |||
| - Generate a new Rust binary (application): | |||
| ```bash | |||
| cargo new sink_logger | |||
| ``` | |||
| with `Cargo.toml`: | |||
| ```toml | |||
| {{#include ../../examples/rust-dataflow/sink/Cargo.toml}} | |||
| ``` | |||
| with `src/main.rs`: | |||
| ```rust | |||
| {{#include ../../examples/rust-dataflow/sink/src/main.rs}} | |||
| ``` | |||
| - And modify the root `Cargo.toml`: | |||
| ```toml= | |||
| [workspace] | |||
| members = [ | |||
| "rust-dataflow-example-node", | |||
| "rust-dataflow-example-operator", | |||
| "rust-dataflow-example-sink" | |||
| ] | |||
| ``` | |||
| ### Compile everything | |||
| ```bash | |||
| cargo build --all --release | |||
| ``` | |||
| ### Write a graph definition | |||
| Let's write the graph definition so that the nodes know who to communicate with. | |||
| `dataflow.yml` | |||
| ```yaml | |||
| {{#include ../../examples/rust-dataflow/dataflow.yml}} | |||
| ``` | |||
| ### Run it! | |||
| - Run the `dataflow`: | |||
| ```bash | |||
| dora-daemon --run-dataflow dataflow.yml | |||
| ``` | |||
| @@ -1,26 +0,0 @@ | |||
| # Installation | |||
| This project is in early development, and many features have yet to be implemented with breaking changes. Please don't take for granted the current design. The installation process will be streamlined in the future. | |||
| ### Download the binaries from Github | |||
| Install `dora` binaries from GitHub releases: | |||
| ```bash | |||
| wget https://github.com/dora-rs/dora/releases/download/<version>/dora-<version>-x86_64-Linux.zip | |||
| unzip dora-<version>-x86_64-Linux.zip | |||
| python3 -m pip install dora-rs==<version> ## For Python API | |||
| PATH=$PATH:$(pwd) | |||
| dora --help | |||
| ``` | |||
| #### Or compile from Source | |||
| Build it using: | |||
| ```bash | |||
| git clone https://github.com/dora-rs/dora.git | |||
| cd dora | |||
| cargo build --all --release | |||
| PATH=$PATH:$(pwd)/target/release | |||
| ``` | |||
| @@ -1,32 +0,0 @@ | |||
| # Welcome to `dora`! | |||
| `dora` goal is to be a low latency, composable, and distributed data flow. | |||
| By using `dora`, you can define robotic applications as a graph of nodes that can be easily swapped and replaced. Those nodes can be shared and implemented in different languages such as Rust, Python or C. `dora` will then connect those nodes and try to provide as many features as possible to facilitate the dataflow. | |||
| ## ✨ Features | |||
| Composability as: | |||
| - [x] `YAML` declarative programming | |||
| - [x] polyglot: | |||
| - [x] Rust | |||
| - [x] C | |||
| - [x] C++ | |||
| - [x] Python | |||
| - [x] Isolated operators and custom nodes that can be reused. | |||
| Low latency as: | |||
| - [x] written in <i>...Cough...blazingly fast ...Cough...</i> Rust. | |||
| - [x] PubSub communication with shared memory! | |||
| - [ ] Zero-copy on read! | |||
| Distributed as: | |||
| - [ ] PubSub communication between machines with [`zenoh`](https://github.com/eclipse-zenoh/zenoh) | |||
| - [x] Distributed telemetry with [`opentelemetry`](https://github.com/open-telemetry/opentelemetry-rust) | |||
| ## ⚖️ LICENSE | |||
| This project is licensed under Apache-2.0. Check out [NOTICE.md](NOTICE.md) for more information. | |||
| @@ -1,38 +0,0 @@ | |||
| # Framework | |||
| - Runtime process | |||
| - Talks with other runtime processes | |||
| - Across machines | |||
| - loop | |||
| - listen for inputs | |||
| - invoke corresponding operator(s) | |||
| - collect and forward outputs | |||
| - Operators | |||
| - Connected to runtime | |||
| - Via TCP socket (can be a separate process) | |||
| - Single connection with high level message format, or | |||
| - Separate connection per input/output | |||
| - Dynamically linked as shared library | |||
| - Runtime invokes specific handler message directly with input(s) | |||
| - Outputs either: | |||
| - Return a collection as result | |||
| - Call runtime function to send out result | |||
| - Input aggregation (i.e. waiting until multiple inputs are available) | |||
| - by runtime -> aggregation specified in config file | |||
| - by operator -> custom handling possible | |||
| # Library | |||
| - All sources/operator/sinks are separate processes that link a runtime library | |||
| - "Orchestrator" process | |||
| - reads config file | |||
| - launches processes accordingly | |||
| - passes node config | |||
| - as argument | |||
| - via env variable | |||
| - including input and output names | |||
| - Runtime library provides (async) functions to | |||
| - wait for one or multiple inputs | |||
| - with timeouts | |||
| - send out outputs | |||
| @@ -1,33 +0,0 @@ | |||
| # Design Overview | |||
| The dora framework is structured into different components: | |||
|  | |||
| The following main components exist: | |||
| - **Nodes:** Dora nodes are separate, isolated processes that communicate with other nodes through the dora library. Nodes can be either a custom, user-specified program, or a dora runtime node, which allows to run dora _operators_. Nodes implement their own `main` function and thus have full control over their execution. | |||
| Nodes use the dora _library_ to communicate with other nodes, which is available for multiple languages (Rust, C; maybe Python, WASM). Communication happens through a _communication layer_, which will be `zenoh` in our first version. We plan to add more options in the future. All communication layer implementations should be robust against disconnections, so operators should be able to keep running even if they lose the connection to the coordinator. | |||
| - **Operators:** Operators are light-weight, cooperative, library-based components that are executed by a dora runtime node. They must implement a specific interface, depending on the used language. Operators can use a wide range of advanced features provided by the dora runtime, for example priority scheduling or native deadline support. | |||
| - **Coordinator:** The coordinator is responsible for reading the dataflow from a YAML file, verifying it, and deploying the nodes and operators to the specified or automatically determined machines. It monitors the operator's health and implements high level cluster management functionality. For example, we could implement automatic scaling for cloud nodes or operator replication and restarts. The coordinator can be controlled through a command line program (CLI). | |||
| ## Operators vs Custom Nodes | |||
| There are two ways to implement an operation in dora: Either as a dora operator, or as a custom nodes. Both approaches have their advantages and drawbacks, as explained below. In general, it is recommended to create dora operators and only use custom nodes when necessary. | |||
| Operators have the following advantages: | |||
| - They can use a wide range of advanced functionality provided by the dora runtime nodes. This includes special scheduling strategies and features such as deadlines. | |||
| - They are _light-weight_, so they only occupy minimal amounts of memory. This makes it possible to run thousands of operators on the same machine. | |||
| - They can use runtime-managed state storage, for robustness or for sharing state with other operators. | |||
| - They _share the address space_ with other operators on the same node, which makes communication much faster. | |||
| Custom nodes provide a different set of advantages: | |||
| - Each node is a separate, isolated process, which can be important for security-critical operations. | |||
| - They support pinned resources. For example, a CPU core can be pinned to a custom node through the dataflow configuration file. | |||
| - They have full control over their execution, which allows to create complex input and wake-up rules. | |||
| @@ -1,74 +0,0 @@ | |||
| # Python API | |||
| ## Operator | |||
| The operator API is a framework for you to implement. The implemented operator will be managed by `dora`. This framework enable us to make optimisation and provide advanced features. It is the recommended way of using `dora`. | |||
| An operator requires an `on_event` method and requires to return a `DoraStatus` , depending of it needs to continue or stop. | |||
| ```python | |||
| {{#include ../../examples/python-operator-dataflow/object_detection.py:0:25}} | |||
| ``` | |||
| > For Python, we recommend to allocate the operator on a single runtime. A runtime will share the same GIL with several operators making those operators run almost sequentially. See: [https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#deadlocks](https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#deadlocks) | |||
| ### Try it out! | |||
| - Create an operator python file called `object_detection.py`: | |||
| ```python | |||
| {{#include ../../examples/python-dataflow/object_detection.py}} | |||
| ``` | |||
| - Link it in your graph as: | |||
| ```yaml | |||
| {{#include ../../examples/python-dataflow/dataflow.yml:14:20}} | |||
| ``` | |||
| ## Custom Node | |||
| The custom node API allow you to integrate `dora` into your application. It allows you to retrieve input and send output in any fashion you want. | |||
| #### `Node()` | |||
| `Node()` initiate a node from environment variables set by `dora-coordinator` | |||
| ```python | |||
| from dora import Node | |||
| node = Node() | |||
| ``` | |||
| #### `.next()` or `__next__()` as an iterator | |||
| `.next()` gives you the next input that the node has received. It blocks until the next input becomes available. It will return `None` when all senders has been dropped. | |||
| ```python | |||
| input_id, value, metadata = node.next() | |||
| # or | |||
| for input_id, value, metadata in node: | |||
| ``` | |||
| #### `.send_output(output_id, data)` | |||
| `send_output` send data from the node. | |||
| ```python | |||
| node.send_output("string", b"string", {"open_telemetry_context": "7632e76"}) | |||
| ``` | |||
| ### Try it out! | |||
| - Install python node API: | |||
| ```bash | |||
| pip install dora-rs | |||
| ``` | |||
| - Create a python file called `webcam.py`: | |||
| ```python | |||
| {{#include ../../examples/python-dataflow/webcam.py}} | |||
| ``` | |||
| - Link it in your graph as: | |||
| ```yaml | |||
| {{#include ../../examples/python-dataflow/dataflow.yml:6:12}} | |||
| ``` | |||
| @@ -1,97 +0,0 @@ | |||
| # Rust API | |||
| ## Operator | |||
| The operator API is a framework for you to implement. The implemented operator will be managed by `dora`. This framework enable us to make optimisation and provide advanced features. It is the recommended way of using `dora`. | |||
| An operator requires to be registered and implement the `DoraOperator` trait. It is composed of an `on_event` method that defines the behaviour of the operator when there is an event such as receiving an input for example. | |||
| ```rust | |||
| {{#include ../../examples/rust-dataflow/operator/src/lib.rs:0:17}} | |||
| ``` | |||
| ### Try it out! | |||
| - Generate a new Rust library | |||
| ```bash | |||
| cargo new rust-dataflow-example-operator --lib | |||
| ``` | |||
| `Cargo.toml` | |||
| ```toml | |||
| {{#include ../../examples/rust-dataflow/operator/Cargo.toml}} | |||
| ``` | |||
| `src/lib.rs` | |||
| ```rust | |||
| {{#include ../../examples/rust-dataflow/operator/src/lib.rs}} | |||
| ``` | |||
| - Build it: | |||
| ```bash | |||
| cargo build --release | |||
| ``` | |||
| - Link it in your graph as: | |||
| ```yaml | |||
| {{#include ../../examples/rust-dataflow/dataflow.yml:13:21}} | |||
| ``` | |||
| This example can be found in `examples`. | |||
| ## Custom Node | |||
| The custom node API allow you to integrate `dora` into your application. It allows you to retrieve input and send output in any fashion you want. | |||
| #### `DoraNode::init_from_env()` | |||
| `DoraNode::init_from_env()` initiate a node from environment variables set by `dora-coordinator` | |||
| ```rust | |||
| let (mut node, mut events) = DoraNode::init_from_env()?; | |||
| ``` | |||
| #### `.recv()` | |||
| `.recv()` wait for the next event on the events stream. | |||
| ```rust | |||
| let event = events.recv(); | |||
| ``` | |||
| #### `.send_output(...)` | |||
| `send_output` send data from the node to the other nodes. | |||
| We take a closure as an input to enable zero copy on send. | |||
| ```rust | |||
| node.send_output( | |||
| &data_id, | |||
| metadata.parameters, | |||
| data.len(), | |||
| |out| { | |||
| out.copy_from_slice(data); | |||
| })?; | |||
| ``` | |||
| ### Try it out! | |||
| - Generate a new Rust binary (application): | |||
| ```bash | |||
| cargo new rust-dataflow-example-node | |||
| ``` | |||
| ```toml | |||
| {{#include ../../examples/rust-dataflow/node/Cargo.toml}} | |||
| ``` | |||
| `src/main.rs` | |||
| ```rust | |||
| {{#include ../../examples/rust-dataflow/node/src/main.rs}} | |||
| ``` | |||
| - Link it in your graph as: | |||
| ```yaml | |||
| {{#include ../../examples/rust-dataflow/dataflow.yml:6:12}} | |||
| ``` | |||
| @@ -1,52 +0,0 @@ | |||
| # State Management | |||
| Most operations require to keep some sort of state between calls. This document describes the different ways to handle state in dora. | |||
| ## Internal State | |||
| Operators are `struct` or object instances, so they can keep internal state between invocations. This state is private to the operator. When an operator exits or crashes, its internal state is lost. | |||
| ## Saving State | |||
| To make themselves resilient against crashes, operators can use dora's state management. The dora runtime provides each operator with a private key-value store (KVS). Operators can save serialized state into the KVS by using the `save_state` function of the runtime: | |||
| ```rust | |||
| fn save_state(key: &str, value: Vec<u8>) | |||
| ``` | |||
| The runtime only stores the latest value for each key, so subsequent writes to the same key replace the earlier values. Serialization is required because the state must be self-contained (i.e. no pointers to other memory) and consistent (i.e. no half-updated state). Otherwise, state recovery might not be possible after an operator crash. | |||
| To keep the performance overhead of this function low, it is recommended to use a suitable serialization format that stores the data with minimal memory and compute overhead. Text-based formats such as JSON are not recommended. Also, fast-changing state should be stored under a separate key to minimize the amount of state that needs to be written. | |||
| ### State Recovery | |||
| When an operator crashes, the dora runtime restarts it and supplies it with the last version of the saved state. It does this by calling the operator's `restore_state` method: | |||
| ```rust | |||
| fn restore_state(&mut self, state: HashMap<String, Vec<u8>>) | |||
| ``` | |||
| In this method, the operator should deserialize and apply all state entries, and perform all custom consistency checks that are necessary. | |||
| ## Sharing State | |||
| To share state between operators, dora provides access to a node-local key-value store: | |||
| ```rust | |||
| fn kvs_write(key: &str, value: Vec<u8>) | |||
| ``` | |||
| ```rust | |||
| fn kvs_read(key: &str) -> Vec<u8> | |||
| ``` | |||
| Todo: | |||
| - Consistency? | |||
| - Anna? | |||
| ## Custom Nodes | |||
| Custom nodes have full control over the execution, so they can implement their own state management. Shared state can be accessed through the `kvs_read` and `kvs_write` functions of the dora library, which are equivalent to the respective functions provided by the dora runtime. | |||
| Since custom nodes cannot use the recovery feature of the dora runtime, the `save_state`/`restore_state` functions are not available for them. | |||