This PR fixes some small issues that was making examples not run on macOS.tags/v0.3.9-rc1
| @@ -0,0 +1,9 @@ | |||||
| # Quick Python ROS2 example | |||||
| To get started: | |||||
| ```bash | |||||
| source /opt/ros/humble/setup.bash && ros2 run turtlesim turtlesim_node & | |||||
| source /opt/ros/humble/setup.bash && ros2 run examples_rclcpp_minimal_service service_main & | |||||
| cargo run --example python-ros2-dataflow --features="ros2-examples" | |||||
| ``` | |||||
| @@ -0,0 +1,7 @@ | |||||
| # Quick Rust example | |||||
| To get started: | |||||
| ```bash | |||||
| cargo run --example rust-dataflow | |||||
| ``` | |||||
| @@ -1,12 +1,31 @@ | |||||
| # Dora echo example | |||||
| # Dora Speech to Text example | |||||
| Make sure to have, dora, pip and cargo installed. | Make sure to have, dora, pip and cargo installed. | ||||
| ```bash | ```bash | ||||
| dora up | |||||
| dora build dataflow.yml | |||||
| dora start dataflow.yml | |||||
| dora build https://raw.githubusercontent.com/dora-rs/dora/main/examples/speech-to-text/whisper.yml | |||||
| dora run https://raw.githubusercontent.com/dora-rs/dora/main/examples/speech-to-text/whisper.yml | |||||
| # Wait for the whisper model to download which can takes a bit of time. | |||||
| ``` | |||||
| ## Graph Visualization | |||||
| ```mermaid | |||||
| flowchart TB | |||||
| dora-microphone | |||||
| dora-vad | |||||
| dora-distil-whisper | |||||
| dora-rerun[/dora-rerun\] | |||||
| subgraph ___dora___ [dora] | |||||
| subgraph ___timer_timer___ [timer] | |||||
| dora/timer/secs/2[\secs/2/] | |||||
| end | |||||
| end | |||||
| dora/timer/secs/2 -- tick --> dora-microphone | |||||
| dora-microphone -- audio --> dora-vad | |||||
| dora-vad -- audio as input --> dora-distil-whisper | |||||
| dora-distil-whisper -- text as original_text --> dora-rerun | |||||
| # In another terminal | |||||
| terminal-print | |||||
| ``` | ``` | ||||
| @@ -2,6 +2,8 @@ nodes: | |||||
| - id: dora-microphone | - id: dora-microphone | ||||
| build: pip install -e ../../node-hub/dora-microphone | build: pip install -e ../../node-hub/dora-microphone | ||||
| path: dora-microphone | path: dora-microphone | ||||
| inputs: | |||||
| tick: dora/timer/millis/2000 | |||||
| outputs: | outputs: | ||||
| - audio | - audio | ||||
| @@ -0,0 +1,33 @@ | |||||
| nodes: | |||||
| - id: dora-microphone | |||||
| description: Microphone | |||||
| build: pip install dora-microphone | |||||
| path: dora-microphone | |||||
| inputs: | |||||
| tick: dora/timer/millis/2000 | |||||
| outputs: | |||||
| - audio | |||||
| - id: dora-vad | |||||
| build: pip install dora-vad | |||||
| path: dora-vad | |||||
| inputs: | |||||
| audio: dora-microphone/audio | |||||
| outputs: | |||||
| - audio | |||||
| - id: dora-whisper | |||||
| build: pip install dora-distil-whisper | |||||
| path: dora-distil-whisper | |||||
| inputs: | |||||
| input: dora-vad/audio | |||||
| outputs: | |||||
| - text | |||||
| env: | |||||
| TARGET_LANGUAGE: english | |||||
| - id: dora-rerun | |||||
| build: pip install dora-rerun | |||||
| path: dora-rerun | |||||
| inputs: | |||||
| original_text: dora-whisper/text | |||||
| @@ -1 +1,11 @@ | |||||
| # Quick example on using a VLM with dora-rs | # Quick example on using a VLM with dora-rs | ||||
| Make sure to have, dora, pip and cargo installed. | |||||
| ```bash | |||||
| dora build https://raw.githubusercontent.com/dora-rs/dora/main/examples/vlm/qwenvl.yml | |||||
| dora run https://raw.githubusercontent.com/dora-rs/dora/main/examples/vlm/qwenvl.yml | |||||
| # Wait for the qwenvl, whisper model to download which can takes a bit of time. | |||||
| ``` | |||||
| @@ -1,37 +0,0 @@ | |||||
| nodes: | |||||
| - id: camera | |||||
| build: pip install -e ../../node-hub/opencv-video-capture | |||||
| path: opencv-video-capture | |||||
| inputs: | |||||
| tick: dora/timer/millis/50 | |||||
| outputs: | |||||
| - image | |||||
| env: | |||||
| CAPTURE_PATH: 0 | |||||
| IMAGE_WIDTH: 640 | |||||
| IMAGE_HEIGHT: 480 | |||||
| - id: dora-qwenvl | |||||
| build: pip install -e ../../node-hub/dora-qwenvl | |||||
| path: dora-qwenvl | |||||
| inputs: | |||||
| image: | |||||
| source: camera/image | |||||
| queue_size: 1 | |||||
| tick: dora/timer/millis/400 | |||||
| outputs: | |||||
| - text | |||||
| - tick | |||||
| env: | |||||
| DEFAULT_QUESTION: Describe the image in a very short sentence. | |||||
| # For China | |||||
| # USE_MODELSCOPE_HUB: true | |||||
| - id: plot | |||||
| build: pip install -e ../../node-hub/opencv-plot | |||||
| path: opencv-plot | |||||
| inputs: | |||||
| image: | |||||
| source: camera/image | |||||
| queue_size: 1 | |||||
| text: dora-qwenvl/tick | |||||
| @@ -1,4 +1,30 @@ | |||||
| nodes: | nodes: | ||||
| - id: dora-microphone | |||||
| build: pip install -e ../../node-hub/dora-microphone | |||||
| path: dora-microphone | |||||
| inputs: | |||||
| tick: dora/timer/millis/2000 | |||||
| outputs: | |||||
| - audio | |||||
| - id: dora-vad | |||||
| build: pip install -e ../../node-hub/dora-vad | |||||
| path: dora-vad | |||||
| inputs: | |||||
| audio: dora-microphone/audio | |||||
| outputs: | |||||
| - audio | |||||
| - id: dora-distil-whisper | |||||
| build: pip install -e ../../node-hub/dora-distil-whisper | |||||
| path: dora-distil-whisper | |||||
| inputs: | |||||
| input: dora-vad/audio | |||||
| outputs: | |||||
| - text | |||||
| env: | |||||
| TARGET_LANGUAGE: english | |||||
| - id: camera | - id: camera | ||||
| build: pip install -e ../../node-hub/opencv-video-capture | build: pip install -e ../../node-hub/opencv-video-capture | ||||
| path: opencv-video-capture | path: opencv-video-capture | ||||
| @@ -18,10 +44,9 @@ nodes: | |||||
| image: | image: | ||||
| source: camera/image | source: camera/image | ||||
| queue_size: 1 | queue_size: 1 | ||||
| tick: dora/timer/millis/400 | |||||
| text: dora-distil-whisper/text | |||||
| outputs: | outputs: | ||||
| - text | - text | ||||
| - tick | |||||
| env: | env: | ||||
| DEFAULT_QUESTION: Describe the image in a very short sentence. | DEFAULT_QUESTION: Describe the image in a very short sentence. | ||||
| # USE_MODELSCOPE_HUB: true | # USE_MODELSCOPE_HUB: true | ||||
| @@ -33,7 +58,8 @@ nodes: | |||||
| image: | image: | ||||
| source: camera/image | source: camera/image | ||||
| queue_size: 1 | queue_size: 1 | ||||
| text: dora-qwenvl/tick | |||||
| text_qwenvl: dora-qwenvl/text | |||||
| text_whisper: dora-distil-whisper/text | |||||
| env: | env: | ||||
| IMAGE_WIDTH: 640 | IMAGE_WIDTH: 640 | ||||
| IMAGE_HEIGHT: 480 | IMAGE_HEIGHT: 480 | ||||
| @@ -0,0 +1,67 @@ | |||||
| nodes: | |||||
| - id: dora-microphone | |||||
| build: pip install dora-microphone | |||||
| path: dora-microphone | |||||
| inputs: | |||||
| tick: dora/timer/millis/2000 | |||||
| outputs: | |||||
| - audio | |||||
| - id: dora-vad | |||||
| build: pip install dora-vad | |||||
| path: dora-vad | |||||
| inputs: | |||||
| audio: dora-microphone/audio | |||||
| outputs: | |||||
| - audio | |||||
| - id: dora-distil-whisper | |||||
| build: pip install dora-distil-whisper | |||||
| path: dora-distil-whisper | |||||
| inputs: | |||||
| input: dora-vad/audio | |||||
| outputs: | |||||
| - text | |||||
| env: | |||||
| TARGET_LANGUAGE: english | |||||
| - id: camera | |||||
| build: pip install opencv-video-capture | |||||
| path: opencv-video-capture | |||||
| inputs: | |||||
| tick: dora/timer/millis/50 | |||||
| outputs: | |||||
| - image | |||||
| env: | |||||
| CAPTURE_PATH: 0 | |||||
| IMAGE_WIDTH: 640 | |||||
| IMAGE_HEIGHT: 480 | |||||
| - id: dora-qwenvl | |||||
| build: pip install dora-qwenvl | |||||
| path: dora-qwenvl | |||||
| inputs: | |||||
| image: | |||||
| source: camera/image | |||||
| queue_size: 1 | |||||
| text: dora-distil-whisper/text | |||||
| outputs: | |||||
| - text | |||||
| env: | |||||
| DEFAULT_QUESTION: Describe the image in a very short sentence. | |||||
| # USE_MODELSCOPE_HUB: true | |||||
| - id: plot | |||||
| build: pip install dora-rerun | |||||
| path: dora-rerun | |||||
| inputs: | |||||
| image: | |||||
| source: camera/image | |||||
| queue_size: 1 | |||||
| text_qwenvl: dora-qwenvl/text | |||||
| text_whisper: dora-distil-whisper/text | |||||
| env: | |||||
| IMAGE_WIDTH: 640 | |||||
| IMAGE_HEIGHT: 480 | |||||
| README: | | |||||
| # Visualization of QwenVL2 | |||||
| @@ -1,3 +1,30 @@ | |||||
| # Dora Node for transforming speech to text (English only) | |||||
| # Dora Whisper Node for transforming speech to text | |||||
| Check example at [examples/speech-to-text](examples/speech-to-text) | |||||
| ## YAML Specification | |||||
| This node is supposed to be used as follows: | |||||
| ```yaml | |||||
| - id: dora-distil-whisper | |||||
| build: pip install dora-distil-whisper | |||||
| path: dora-distil-whisper | |||||
| inputs: | |||||
| input: dora-vad/audio | |||||
| outputs: | |||||
| - text | |||||
| env: | |||||
| TARGET_LANGUAGE: english | |||||
| ``` | |||||
| ## Examples | |||||
| - Speech to Text | |||||
| - github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text | |||||
| - website: https://dora-rs.ai/docs/examples/stt | |||||
| - Vision Language Model | |||||
| - github: https://github.com/dora-rs/dora/blob/main/examples/vlm | |||||
| - website: https://dora-rs.ai/docs/examples/vlm | |||||
| ## License | |||||
| Dora-whisper's code and model weights are released under the MIT License | |||||
| @@ -1,5 +1,48 @@ | |||||
| # Dora Node for recording data from microphone | |||||
| # Collect data from microphone | |||||
| This node will send data as soon as the microphone volume is higher than a threshold. | This node will send data as soon as the microphone volume is higher than a threshold. | ||||
| Check example at [examples/speech-to-text](examples/speech-to-text) | |||||
| This is using python Sounddevice. | |||||
| It detects beginning and ending of voice activity within a stream of audio and returns the parts that contains activity. | |||||
| There's a maximum amount of voice duration, to avoid having no input for too long. | |||||
| ## Input/Output Specification | |||||
| - inputs: | |||||
| - tick: This is used to detect when the dataflow is finished. | |||||
| - outputs: | |||||
| - audio: 16kHz sampled audio sent by chunk | |||||
| ## YAML Specification | |||||
| ```yaml | |||||
| - id: dora-vad | |||||
| description: Voice activity detection. See; <a href='https://github.com/snakers4/silero-vad'>sidero</a> | |||||
| build: pip install dora-vad | |||||
| path: dora-vad | |||||
| inputs: | |||||
| audio: dora-microphone/audio | |||||
| outputs: | |||||
| - audio | |||||
| ``` | |||||
| ## Reference documentation | |||||
| - dora-microphone | |||||
| - github: https://github.com/dora-rs/dora/blob/main/node-hub/dora-microphone | |||||
| - website: http://dora-rs.ai/docs/nodes/microphone | |||||
| - sounddevice | |||||
| - website: https://python-sounddevice.readthedocs.io/en/0.5.1/ | |||||
| - github: https://github.com/spatialaudio/python-sounddevice/tree/master | |||||
| ## Examples | |||||
| - Speech to Text | |||||
| - github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text | |||||
| - website: https://dora-rs.ai/docs/examples/stt | |||||
| ## License | |||||
| The code and model weights are released under the MIT License. | |||||
| @@ -16,13 +16,19 @@ def main(): | |||||
| start_recording_time = tm.time() | start_recording_time = tm.time() | ||||
| node = Node() | node = Node() | ||||
| always_none = node.next(timeout=0.001) is None | |||||
| finished = False | |||||
| # pylint: disable=unused-argument | # pylint: disable=unused-argument | ||||
| def callback(indata, frames, time, status): | def callback(indata, frames, time, status): | ||||
| nonlocal buffer, node, start_recording_time | |||||
| nonlocal buffer, node, start_recording_time, finished | |||||
| if tm.time() - start_recording_time > MAX_DURATION: | if tm.time() - start_recording_time > MAX_DURATION: | ||||
| audio_data = np.array(buffer).ravel().astype(np.float32) / 32768.0 | audio_data = np.array(buffer).ravel().astype(np.float32) / 32768.0 | ||||
| node.send_output("audio", pa.array(audio_data)) | node.send_output("audio", pa.array(audio_data)) | ||||
| if not always_none: | |||||
| event = node.next(timeout=0.001) | |||||
| finished = event is None | |||||
| buffer = [] | buffer = [] | ||||
| start_recording_time = tm.time() | start_recording_time = tm.time() | ||||
| else: | else: | ||||
| @@ -32,5 +38,5 @@ def main(): | |||||
| with sd.InputStream( | with sd.InputStream( | ||||
| callback=callback, dtype=np.int16, channels=1, samplerate=SAMPLE_RATE | callback=callback, dtype=np.int16, channels=1, samplerate=SAMPLE_RATE | ||||
| ): | ): | ||||
| while True: | |||||
| sd.sleep(int(100 * 1000)) | |||||
| while not finished: | |||||
| sd.sleep(int(1000)) | |||||
| @@ -1,3 +1,32 @@ | |||||
| # Dora QwenVL2 node | # Dora QwenVL2 node | ||||
| Experimental node for using a VLM within dora. | Experimental node for using a VLM within dora. | ||||
| ## YAML Specification | |||||
| This node is supposed to be used as follows: | |||||
| ```yaml | |||||
| - id: dora-qwenvl | |||||
| build: pip install dora-qwenvl | |||||
| path: dora-qwenvl | |||||
| inputs: | |||||
| image: | |||||
| source: camera/image | |||||
| queue_size: 1 | |||||
| text: dora-distil-whisper/text | |||||
| outputs: | |||||
| - text | |||||
| env: | |||||
| DEFAULT_QUESTION: Describe the image in a very short sentence. | |||||
| ``` | |||||
| ## Additional documentation | |||||
| - Qwenvl: https://github.com/QwenLM/Qwen-VL | |||||
| ## Examples | |||||
| - Vision Language Model | |||||
| - Github: https://github.com/dora-rs/dora/blob/main/examples/vlm | |||||
| - Website: https://dora-rs.ai/docs/examples/vlm | |||||
| @@ -85,7 +85,12 @@ def generate(frames: dict, question): | |||||
| return_tensors="pt", | return_tensors="pt", | ||||
| ) | ) | ||||
| device = "cuda:0" if torch.cuda.is_available() else "cpu" | |||||
| if torch.backends.mps.is_available(): | |||||
| device = torch.device("mps") | |||||
| elif torch.cuda.is_available(): | |||||
| device = torch.device("cuda", 0) | |||||
| else: | |||||
| device = torch.device("cpu") | |||||
| inputs = inputs.to(device) | inputs = inputs.to(device) | ||||
| # Inference: Generation of the output | # Inference: Generation of the output | ||||
| @@ -181,7 +186,7 @@ def main(): | |||||
| ) | ) | ||||
| elif event_type == "ERROR": | elif event_type == "ERROR": | ||||
| raise RuntimeError(event["error"]) | |||||
| print("Event Error:" + event["error"]) | |||||
| if __name__ == "__main__": | if __name__ == "__main__": | ||||
| @@ -15,7 +15,7 @@ python = "^3.7" | |||||
| dora-rs = "^0.3.6" | dora-rs = "^0.3.6" | ||||
| numpy = "< 2.0.0" | numpy = "< 2.0.0" | ||||
| torch = "^2.2.0" | torch = "^2.2.0" | ||||
| torchvision = "^0.19" | |||||
| torchvision = "^0.20" | |||||
| transformers = "^4.45" | transformers = "^4.45" | ||||
| qwen-vl-utils = "^0.0.2" | qwen-vl-utils = "^0.0.2" | ||||
| accelerate = "^0.33" | accelerate = "^0.33" | ||||
| @@ -1 +1 @@ | |||||
| Subproject commit b2889e65cfe62571ced3ce88f00e7d80b41fee69 | |||||
| Subproject commit 198374ea8c4a2ec2ddae86c35448d21aa9756f37 | |||||
| @@ -7,25 +7,27 @@ This nodes is still experimental and format for passing Images, Bounding boxes, | |||||
| ## Getting Started | ## Getting Started | ||||
| ```bash | ```bash | ||||
| cargo install --force rerun-cli@0.15.1 | |||||
| ## To install this package | |||||
| git clone git@github.com:dora-rs/dora.git | |||||
| cargo install --git https://github.com/dora-rs/dora dora-rerun | |||||
| pip install dora-rerun | |||||
| ``` | ``` | ||||
| ## Adding to existing graph: | ## Adding to existing graph: | ||||
| ```yaml | ```yaml | ||||
| - id: rerun | |||||
| custom: | |||||
| source: dora-rerun | |||||
| inputs: | |||||
| image: webcam/image | |||||
| text: webcam/text | |||||
| boxes2d: object_detection/bbox | |||||
| envs: | |||||
| RERUN_MEMORY_LIMIT: 25% | |||||
| - id: plot | |||||
| build: pip install dora-rerun | |||||
| path: dora-rerun | |||||
| inputs: | |||||
| image: | |||||
| source: camera/image | |||||
| queue_size: 1 | |||||
| text_qwenvl: dora-qwenvl/text | |||||
| text_whisper: dora-distil-whisper/text | |||||
| env: | |||||
| IMAGE_WIDTH: 640 | |||||
| IMAGE_HEIGHT: 480 | |||||
| README: | | |||||
| # Visualization | |||||
| RERUN_MEMORY_LIMIT: 25% | |||||
| ``` | ``` | ||||
| ## Input definition | ## Input definition | ||||
| @@ -67,3 +69,25 @@ Make sure to name the dataflow as follows: | |||||
| ## Configurations | ## Configurations | ||||
| - RERUN_MEMORY_LIMIT: Rerun memory limit | - RERUN_MEMORY_LIMIT: Rerun memory limit | ||||
| ## Reference documentation | |||||
| - dora-rerun | |||||
| - github: https://github.com/dora-rs/dora/blob/main/node-hub/dora-rerun | |||||
| - website: http://dora-rs.ai/docs/nodes/rerun | |||||
| - rerun | |||||
| - github: https://github.com/rerun-io/rerun | |||||
| - website: https://rerun.io | |||||
| ## Examples | |||||
| - speech to text | |||||
| - github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text | |||||
| - website: https://dora-rs.ai/docs/examples/stt | |||||
| - vision language model | |||||
| - github: https://github.com/dora-rs/dora/blob/main/examples/vlm | |||||
| - website: https://dora-rs.ai/docs/examples/vlm | |||||
| ## License | |||||
| The code and model weights are released under the MIT License. | |||||
| @@ -1,3 +1,45 @@ | |||||
| # Speech Activity Detection(VAD) | # Speech Activity Detection(VAD) | ||||
| This is using Silero VAD. | This is using Silero VAD. | ||||
| It detects beginning and ending of voice activity within a stream of audio and returns the parts that contains activity. | |||||
| There's a maximum amount of voice duration, to avoid having no input for too long. | |||||
| ## Input/Output Specification | |||||
| - inputs: | |||||
| - audio: 8kHz or 16kHz sample rate. | |||||
| - outputs: | |||||
| - audio: Same as input but truncated | |||||
| ## YAML Specification | |||||
| ```yaml | |||||
| - id: dora-vad | |||||
| description: Voice activity detection. See; <a href='https://github.com/snakers4/silero-vad'>sidero</a> | |||||
| build: pip install dora-vad | |||||
| path: dora-vad | |||||
| inputs: | |||||
| audio: dora-microphone/audio | |||||
| outputs: | |||||
| - audio | |||||
| ``` | |||||
| ## Reference documentation | |||||
| - dora-sidero | |||||
| - github: https://github.com/dora-rs/dora/blob/main/node-hub/dora-vad | |||||
| - website: http://dora-rs.ai/docs/nodes/sidero | |||||
| - Sidero | |||||
| - github https://github.com/snakers4/silero-vad | |||||
| ## Examples | |||||
| - Speech to Text | |||||
| - github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text | |||||
| - website: https://dora-rs.ai/docs/examples/stt | |||||
| ## License | |||||
| The code and model weights are released under the MIT License. | |||||
| @@ -1,19 +0,0 @@ | |||||
| import os | |||||
| import sys | |||||
| from pathlib import Path | |||||
| # Define the path to the README file relative to the package directory | |||||
| readme_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "README.md") | |||||
| # Read the content of the README file | |||||
| try: | |||||
| with open(readme_path, "r", encoding="utf-8") as f: | |||||
| __doc__ = f.read() | |||||
| except FileNotFoundError: | |||||
| __doc__ = "README file not found." | |||||
| # Set up the import hook | |||||
| submodule_path = Path(__file__).resolve().parent / "RoboticsDiffusionTransformer" | |||||
| sys.path.insert(0, str(submodule_path)) | |||||