Browse Source

Remove old documentation

tags/v0.2.4
haixuanTao 2 years ago
parent
commit
1cb67f5213
16 changed files with 0 additions and 761 deletions
  1. +0
    -33
      .github/workflows/gh-pages.yml
  2. +0
    -1
      docs/.gitignore
  3. +0
    -6
      docs/book.toml
  4. +0
    -25
      docs/src/SUMMARY.md
  5. +0
    -77
      docs/src/c-api.md
  6. +0
    -32
      docs/src/communication-layer.md
  7. +0
    -108
      docs/src/dataflow-config.md
  8. +0
    -123
      docs/src/getting-started.md
  9. +0
    -26
      docs/src/installation.md
  10. +0
    -32
      docs/src/introduction.md
  11. +0
    -38
      docs/src/library-vs-framework.md
  12. +0
    -33
      docs/src/overview.md
  13. +0
    -4
      docs/src/overview.svg
  14. +0
    -74
      docs/src/python-api.md
  15. +0
    -97
      docs/src/rust-api.md
  16. +0
    -52
      docs/src/state-management.md

+ 0
- 33
.github/workflows/gh-pages.yml View File

@@ -1,33 +0,0 @@
name: github pages

on:
push:
branches:
- main
pull_request:

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-20.04
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
steps:
- uses: actions/checkout@v3

- name: Setup mdBook
uses: peaceiris/actions-mdbook@v1
with:
mdbook-version: "0.4.10"
# mdbook-version: 'latest'

- run: mdbook build docs

- name: Deploy
uses: peaceiris/actions-gh-pages@v3
if: ${{ github.ref == 'refs/heads/main' }}
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./docs/book

+ 0
- 1
docs/.gitignore View File

@@ -1 +0,0 @@
book/

+ 0
- 6
docs/book.toml View File

@@ -1,6 +0,0 @@
[book]
authors = ["Haixuan Xavier Tao", "Philipp Oppermann"]
language = "en"
multilingual = false
src = "src"
title = "dora-rs"

+ 0
- 25
docs/src/SUMMARY.md View File

@@ -1,25 +0,0 @@
# Summary

[Introduction](./introduction.md)

---

# User Guide

- [installation](./installation.md)
- [Getting started](./getting-started.md)

# Reference Guide


- [Overview](overview.md) A high level overview of dora's different components and the differences between operators and custom nodes.
- [Dataflow Configuration](dataflow-config.md) Specification of dora's YAML-based dataflow configuration format.
- [Rust API](rust-api.md)
- [C API](c-api.md)
- [Python API](python-api.md)

# Brainstorming Ideas

- [State Management](state-management.md) Describes dora's state management capabilities for state sharing and recovery.
- [Library vs Framework](library-vs-framework.md)
- [Middleware Layer Abstraction](./communication-layer.md)

+ 0
- 77
docs/src/c-api.md View File

@@ -1,77 +0,0 @@
# C API

## Operator

The operator API is a framework for you to implement. The implemented operator will be managed by `dora`. This framework enable us to make optimisation and provide advanced features.

The operator definition is composed of 3 functions, `dora_init_operator` that initialise the operator and its context. `dora_drop_operator` that free the memory, and `dora_on_event` that action the logic of the operator on receiving an input.

```c
{{#include ../../examples/c-dataflow/operator.c:0:29}}
```
### Try it out!

- Create an `operator.c` file:
```c
{{#include ../../examples/c-dataflow/operator.c}}
```

{{#include ../../examples/c-dataflow/README.md:40:46}}

- Link it in your graph as:
```yaml
{{#include ../../examples/c-dataflow/dataflow.yml:13:20}}
```

## Custom Node

The custom node API allow you to integrate `dora` into your application. It allows you to retrieve input and send output in any fashion you want.

#### `init_dora_context_from_env`

`init_dora_context_from_env` initiate a node from environment variables set by `dora-coordinator`

```c
void *dora_context = init_dora_context_from_env();
```

#### `dora_next_event`

`dora_next_event` waits for the next event (e.g. an input). Use `read_dora_event_type` to read the event's type. Inputs are of type `DoraEventType_Input`. To extract the ID and data of an input event, use `read_dora_input_id` and `read_dora_input_data` on the returned pointer. It is safe to ignore any events and handle only the events that are relevant to the node.

```c
void *input = dora_next_input(dora_context);

// read out the ID as a UTF8-encoded string
char *id;
size_t id_len;
read_dora_input_id(input, &id, &id_len);

// read out the data as a byte array
char *data;
size_t data_len;
read_dora_input_data(input, &data, &data_len);
```

#### `dora_send_output`

`dora_send_output` send data from the node.

```c
char out_id[] = "tick";
char out_data[50];
dora_send_output(dora_context, out_id, strlen(out_id), out_data, out_data_len);
```
### Try it out!

- Create an `node.c` file:
```c
{{#include ../../examples/c-dataflow/node.c}}
```

{{#include ../../examples/c-dataflow/README.md:26:35}}

- Link it in your graph as:
```yaml
{{#include ../../examples/c-dataflow/dataflow.yml:6:12}}
```

+ 0
- 32
docs/src/communication-layer.md View File

@@ -1,32 +0,0 @@
# [Middleware (communication) layer abstraction (MLA)](https://github.com/dora-rs/dora/discussions/53)

`dora` needs to implement MLA as a separate crate to provides a middleware abstraction layer that enables scalable, high performance communications for inter async tasks, intra-process (OS threads), interprocess communication on a single computer node or between different nodes in a computer network. MLA needs to support different communication patterns:
- publish-subscribe push / push pattern - the published message is pushed to subscribers
- publish-subscribe push / pull pattern - the published message is write to storage and later pulled by subscribers
- Request / reply pattern
- Point-to-point pattern
- Client / Server pattern

The MLA needs to abstract following details:

- inter-async tasks (e.g., tokio channels), intraprocess (OS threads, e.g., shared memory), interprocess and inter-host / inter-network communication
- different transport layer implementations (shared memory, UDP, TCP)
- builtin support for multiple serialization / deserialization protocols, e.g, capnproto, protobuf, flatbuffers etc
- different language bindings to Rust, Python, C, C++ etc
- telemetry tools for logs, metrics, distributed tracing, live data monitoring (e.g., tap a live data), recording and replay

Rust eco-system has abundant crates to provide underlaying communications, e.g.,:
- tokio / crossbeam provides different types of channels serving different purpose: mpsc, oneshot, broadcast, watch etc
- Tonic provides gRPC services
- Tower provides request/reply service
- Zenoh middleware provides many different pub/sub capabilities

MLA also needs to provide high level APIs:
- publish(topic, value, optional fields):- optional fields may contain senders' identify to help MLA logics to satify above requirements
- subscriber(topic, optional fields)-> future streams
- put(key, value, optional fields)
- get(key, optional fields) -> value
- send(key, msg, optional fields)
- recv(key, optional fields)->value

More info here: [#53](https://github.com/dora-rs/dora/discussions/53)

+ 0
- 108
docs/src/dataflow-config.md View File

@@ -1,108 +0,0 @@
# Dataflow Specification

Dataflows are specified through a YAML file. This section presents our current draft for the file format. It only includes basic functionality for now, we will extend it later when we introduce more advanced features.

## Dataflow

Dataflows are specified through the following format:

```yaml
nodes:
- id: foo
# ... (see below)
- id: bar
# ... (see below)
```

### Inputs and Outputs

Each operator or custom node has a separate namespace for its outputs. To refer to outputs, the <operator>/<output> syntax is used. This way, there are no name conflicts between operators.

Input operands are specified using the <name>: <operator>/<output> syntax, where <data> is the internal name that should be used for the operand. The main advantage of this name mapping is that the same operator executable can be reused multiple times on different input.

## Nodes

Nodes are defined using the following format:

```yaml
nodes:
- id: some-unique-id
# For nodes with multiple operators
operators:
- id: operator-1
# ... (see below)
- id: operator-2
# ... (see below)



- id: some-unique-id-2
custom:
source: path/to/timestamp
env:
- ENVIRONMENT_VARIABLE_1: true
working-directory: some/path

inputs:
input_1: operator_2/output_4
input_2: custom_node_2/output_4
outputs:
- output_1
# Unique operator
- id: some-unique-id-3
operator:
# ... (see below)
```

Nodes must provide either a `operators` field, or a `custom` field, but not both. Nodes with an `operators` field run a dora runtime process, which runs and manages the specified operators. Nodes with a `custom` field, run a custom executable.

### Custom Nodes

Custom nodes specify the executable name and arguments like a normal shell operation through the `run` field. Through the optional `env` field, it is possible to set environment variables for the process. The optional `working-directory` field allows to overwrite the directory in which the program is started.

To integrate with the rest of the dora dataflow, custom nodes must specify their inputs and outputs, similar to operators. They can reference outputs of both operators, and other custom nodes.

## Operators

Operators are defined through the following format:

```yaml
- id: unique-operator-id
name: Human-Readable Operator Name
description: An optional description of the operators's purpose.

inputs:
input_1: source_operator_2/output_1
input_2: custom_node_1/output_1
outputs:
- output_1

## ONE OF:
shared_library: "path/to/shared_lib" # file extension and `lib` prefix are added automatically
python: "path/to/python_file.py"
wasm: "path/to/wasm_file.wasm"
```

Operators must list all their inputs and outputs. Inputs can be linked to arbitrary outputs of other operators or custom nodes.

There are multiple ways to implement an operator:

- as a C-compatible shared library
- as a Python object
- as a WebAssembly (WASM) module

Each operator must specify exactly one implementation. The implementation must follow a specific format that is specified by dora.

## Example

```yaml
{{#include ../../examples/rust-dataflow/dataflow.yml}}
```

## TODO: Integration with ROS 1/2

To integrate dora-rs operators with ROS1 or ROS2 operators, we plan to provide special _bridge operators_. These operators act as a sink in one dataflow framework and push all messages to a different dataflow framework, where they act as source.

For example, we plan to provide a `to_ros_2` operator, which takes a single `data` input, which is then published to a specified ROS 2 dataflow.


+ 0
- 123
docs/src/getting-started.md View File

@@ -1,123 +0,0 @@
### Create a Rust workspace

- Initiate the workspace with:

```bash
mkdir my_first_dataflow
cd my_first_dataflow
```

- Create the Cargo.toml file that will configure the entire workspace:

`Cargo.toml`
```toml
[workspace]

members = [
"rust-dataflow-example-node",
]
```

### Write your first node

Let's write a node which sends the current time periodically. Let's make it after 100 iterations. The other nodes/operators will then exit as well because all sources closed.

- Generate a new Rust binary (application):

```bash
cargo new rust-dataflow-example-node
```

with `Cargo.toml`:
```toml
{{#include ../../examples/rust-dataflow/node/Cargo.toml}}
```

with `src/main.rs`:
```rust
{{#include ../../examples/rust-dataflow/node/src/main.rs}}
```

### Write your first operator

- Generate a new Rust library:

```bash
cargo new rust-dataflow-example-operator --lib
```

with `Cargo.toml`:
```toml
{{#include ../../examples/rust-dataflow/operator/Cargo.toml}}
```

with `src/lib.rs`:
```rust
{{#include ../../examples/rust-dataflow/operator/src/lib.rs}}
```

- And modify the root `Cargo.toml`:
```toml=
[workspace]

members = [
"rust-dataflow-example-node",
"rust-dataflow-example-operator",
]
```



### Write your sink node

Let's write a `logger` which will print incoming data.

- Generate a new Rust binary (application):

```bash
cargo new sink_logger
```

with `Cargo.toml`:
```toml
{{#include ../../examples/rust-dataflow/sink/Cargo.toml}}
```

with `src/main.rs`:
```rust
{{#include ../../examples/rust-dataflow/sink/src/main.rs}}
```

- And modify the root `Cargo.toml`:
```toml=
[workspace]

members = [
"rust-dataflow-example-node",
"rust-dataflow-example-operator",
"rust-dataflow-example-sink"
]
```

### Compile everything

```bash
cargo build --all --release
```


### Write a graph definition

Let's write the graph definition so that the nodes know who to communicate with.

`dataflow.yml`
```yaml
{{#include ../../examples/rust-dataflow/dataflow.yml}}
```

### Run it!

- Run the `dataflow`:
```bash
dora-daemon --run-dataflow dataflow.yml
```

+ 0
- 26
docs/src/installation.md View File

@@ -1,26 +0,0 @@
# Installation

This project is in early development, and many features have yet to be implemented with breaking changes. Please don't take for granted the current design. The installation process will be streamlined in the future.

### Download the binaries from Github


Install `dora` binaries from GitHub releases:

```bash
wget https://github.com/dora-rs/dora/releases/download/<version>/dora-<version>-x86_64-Linux.zip
unzip dora-<version>-x86_64-Linux.zip
python3 -m pip install dora-rs==<version> ## For Python API
PATH=$PATH:$(pwd)
dora --help
```

#### Or compile from Source

Build it using:
```bash
git clone https://github.com/dora-rs/dora.git
cd dora
cargo build --all --release
PATH=$PATH:$(pwd)/target/release
```

+ 0
- 32
docs/src/introduction.md View File

@@ -1,32 +0,0 @@
# Welcome to `dora`!

`dora` goal is to be a low latency, composable, and distributed data flow.

By using `dora`, you can define robotic applications as a graph of nodes that can be easily swapped and replaced. Those nodes can be shared and implemented in different languages such as Rust, Python or C. `dora` will then connect those nodes and try to provide as many features as possible to facilitate the dataflow.

## ✨ Features

Composability as:
- [x] `YAML` declarative programming
- [x] polyglot:
- [x] Rust
- [x] C
- [x] C++
- [x] Python
- [x] Isolated operators and custom nodes that can be reused.

Low latency as:
- [x] written in <i>...Cough...blazingly fast ...Cough...</i> Rust.
- [x] PubSub communication with shared memory!
- [ ] Zero-copy on read!

Distributed as:
- [ ] PubSub communication between machines with [`zenoh`](https://github.com/eclipse-zenoh/zenoh)
- [x] Distributed telemetry with [`opentelemetry`](https://github.com/open-telemetry/opentelemetry-rust)



## ⚖️ LICENSE

This project is licensed under Apache-2.0. Check out [NOTICE.md](NOTICE.md) for more information.


+ 0
- 38
docs/src/library-vs-framework.md View File

@@ -1,38 +0,0 @@
# Framework

- Runtime process
- Talks with other runtime processes
- Across machines
- loop
- listen for inputs
- invoke corresponding operator(s)
- collect and forward outputs
- Operators
- Connected to runtime
- Via TCP socket (can be a separate process)
- Single connection with high level message format, or
- Separate connection per input/output
- Dynamically linked as shared library
- Runtime invokes specific handler message directly with input(s)
- Outputs either:
- Return a collection as result
- Call runtime function to send out result
- Input aggregation (i.e. waiting until multiple inputs are available)
- by runtime -> aggregation specified in config file
- by operator -> custom handling possible

# Library

- All sources/operator/sinks are separate processes that link a runtime library
- "Orchestrator" process
- reads config file
- launches processes accordingly
- passes node config
- as argument
- via env variable
- including input and output names
- Runtime library provides (async) functions to
- wait for one or multiple inputs
- with timeouts
- send out outputs


+ 0
- 33
docs/src/overview.md View File

@@ -1,33 +0,0 @@
# Design Overview

The dora framework is structured into different components:

![design diagram](overview.svg)


The following main components exist:

- **Nodes:** Dora nodes are separate, isolated processes that communicate with other nodes through the dora library. Nodes can be either a custom, user-specified program, or a dora runtime node, which allows to run dora _operators_. Nodes implement their own `main` function and thus have full control over their execution.

Nodes use the dora _library_ to communicate with other nodes, which is available for multiple languages (Rust, C; maybe Python, WASM). Communication happens through a _communication layer_, which will be `zenoh` in our first version. We plan to add more options in the future. All communication layer implementations should be robust against disconnections, so operators should be able to keep running even if they lose the connection to the coordinator.
- **Operators:** Operators are light-weight, cooperative, library-based components that are executed by a dora runtime node. They must implement a specific interface, depending on the used language. Operators can use a wide range of advanced features provided by the dora runtime, for example priority scheduling or native deadline support.
- **Coordinator:** The coordinator is responsible for reading the dataflow from a YAML file, verifying it, and deploying the nodes and operators to the specified or automatically determined machines. It monitors the operator's health and implements high level cluster management functionality. For example, we could implement automatic scaling for cloud nodes or operator replication and restarts. The coordinator can be controlled through a command line program (CLI).


## Operators vs Custom Nodes

There are two ways to implement an operation in dora: Either as a dora operator, or as a custom nodes. Both approaches have their advantages and drawbacks, as explained below. In general, it is recommended to create dora operators and only use custom nodes when necessary.

Operators have the following advantages:

- They can use a wide range of advanced functionality provided by the dora runtime nodes. This includes special scheduling strategies and features such as deadlines.
- They are _light-weight_, so they only occupy minimal amounts of memory. This makes it possible to run thousands of operators on the same machine.
- They can use runtime-managed state storage, for robustness or for sharing state with other operators.
- They _share the address space_ with other operators on the same node, which makes communication much faster.

Custom nodes provide a different set of advantages:

- Each node is a separate, isolated process, which can be important for security-critical operations.
- They support pinned resources. For example, a CPU core can be pinned to a custom node through the dataflow configuration file.
- They have full control over their execution, which allows to create complex input and wake-up rules.


+ 0
- 4
docs/src/overview.svg
File diff suppressed because it is too large
View File


+ 0
- 74
docs/src/python-api.md View File

@@ -1,74 +0,0 @@
# Python API

## Operator

The operator API is a framework for you to implement. The implemented operator will be managed by `dora`. This framework enable us to make optimisation and provide advanced features. It is the recommended way of using `dora`.

An operator requires an `on_event` method and requires to return a `DoraStatus` , depending of it needs to continue or stop.

```python
{{#include ../../examples/python-operator-dataflow/object_detection.py:0:25}}
```

> For Python, we recommend to allocate the operator on a single runtime. A runtime will share the same GIL with several operators making those operators run almost sequentially. See: [https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#deadlocks](https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#deadlocks)
### Try it out!

- Create an operator python file called `object_detection.py`:
```python
{{#include ../../examples/python-dataflow/object_detection.py}}
```

- Link it in your graph as:
```yaml
{{#include ../../examples/python-dataflow/dataflow.yml:14:20}}
```

## Custom Node

The custom node API allow you to integrate `dora` into your application. It allows you to retrieve input and send output in any fashion you want.
#### `Node()`

`Node()` initiate a node from environment variables set by `dora-coordinator`

```python
from dora import Node

node = Node()
```

#### `.next()` or `__next__()` as an iterator

`.next()` gives you the next input that the node has received. It blocks until the next input becomes available. It will return `None` when all senders has been dropped.

```python
input_id, value, metadata = node.next()

# or

for input_id, value, metadata in node:
```

#### `.send_output(output_id, data)`

`send_output` send data from the node.

```python
node.send_output("string", b"string", {"open_telemetry_context": "7632e76"})
```

### Try it out!

- Install python node API:
```bash
pip install dora-rs
```

- Create a python file called `webcam.py`:
```python
{{#include ../../examples/python-dataflow/webcam.py}}
```

- Link it in your graph as:
```yaml
{{#include ../../examples/python-dataflow/dataflow.yml:6:12}}
```

+ 0
- 97
docs/src/rust-api.md View File

@@ -1,97 +0,0 @@
# Rust API

## Operator

The operator API is a framework for you to implement. The implemented operator will be managed by `dora`. This framework enable us to make optimisation and provide advanced features. It is the recommended way of using `dora`.

An operator requires to be registered and implement the `DoraOperator` trait. It is composed of an `on_event` method that defines the behaviour of the operator when there is an event such as receiving an input for example.

```rust
{{#include ../../examples/rust-dataflow/operator/src/lib.rs:0:17}}
```

### Try it out!

- Generate a new Rust library

```bash
cargo new rust-dataflow-example-operator --lib
```

`Cargo.toml`
```toml
{{#include ../../examples/rust-dataflow/operator/Cargo.toml}}
```

`src/lib.rs`
```rust
{{#include ../../examples/rust-dataflow/operator/src/lib.rs}}
```

- Build it:
```bash
cargo build --release
```

- Link it in your graph as:
```yaml
{{#include ../../examples/rust-dataflow/dataflow.yml:13:21}}
```

This example can be found in `examples`.

## Custom Node

The custom node API allow you to integrate `dora` into your application. It allows you to retrieve input and send output in any fashion you want.
#### `DoraNode::init_from_env()`

`DoraNode::init_from_env()` initiate a node from environment variables set by `dora-coordinator`

```rust
let (mut node, mut events) = DoraNode::init_from_env()?;
```

#### `.recv()`

`.recv()` wait for the next event on the events stream.

```rust
let event = events.recv();
```

#### `.send_output(...)`

`send_output` send data from the node to the other nodes.
We take a closure as an input to enable zero copy on send.

```rust
node.send_output(
&data_id,
metadata.parameters,
data.len(),
|out| {
out.copy_from_slice(data);
})?;
```

### Try it out!

- Generate a new Rust binary (application):

```bash
cargo new rust-dataflow-example-node
```

```toml
{{#include ../../examples/rust-dataflow/node/Cargo.toml}}
```

`src/main.rs`
```rust
{{#include ../../examples/rust-dataflow/node/src/main.rs}}
```

- Link it in your graph as:
```yaml
{{#include ../../examples/rust-dataflow/dataflow.yml:6:12}}
```

+ 0
- 52
docs/src/state-management.md View File

@@ -1,52 +0,0 @@
# State Management

Most operations require to keep some sort of state between calls. This document describes the different ways to handle state in dora.

## Internal State

Operators are `struct` or object instances, so they can keep internal state between invocations. This state is private to the operator. When an operator exits or crashes, its internal state is lost.

## Saving State

To make themselves resilient against crashes, operators can use dora's state management. The dora runtime provides each operator with a private key-value store (KVS). Operators can save serialized state into the KVS by using the `save_state` function of the runtime:

```rust
fn save_state(key: &str, value: Vec<u8>)
```

The runtime only stores the latest value for each key, so subsequent writes to the same key replace the earlier values. Serialization is required because the state must be self-contained (i.e. no pointers to other memory) and consistent (i.e. no half-updated state). Otherwise, state recovery might not be possible after an operator crash.

To keep the performance overhead of this function low, it is recommended to use a suitable serialization format that stores the data with minimal memory and compute overhead. Text-based formats such as JSON are not recommended. Also, fast-changing state should be stored under a separate key to minimize the amount of state that needs to be written.

### State Recovery

When an operator crashes, the dora runtime restarts it and supplies it with the last version of the saved state. It does this by calling the operator's `restore_state` method:

```rust
fn restore_state(&mut self, state: HashMap<String, Vec<u8>>)
```

In this method, the operator should deserialize and apply all state entries, and perform all custom consistency checks that are necessary.

## Sharing State

To share state between operators, dora provides access to a node-local key-value store:

```rust
fn kvs_write(key: &str, value: Vec<u8>)
```
```rust
fn kvs_read(key: &str) -> Vec<u8>
```

Todo:

- Consistency?
- Anna?


## Custom Nodes

Custom nodes have full control over the execution, so they can implement their own state management. Shared state can be accessed through the `kvs_read` and `kvs_write` functions of the dora library, which are equivalent to the respective functions provided by the dora runtime.

Since custom nodes cannot use the recovery feature of the dora runtime, the `save_state`/`restore_state` functions are not available for them.

Loading…
Cancel
Save