Browse Source

Improve speech to text example within the macOS ecosystem (#741)

This PR fixes some small issues that was making examples not run on
macOS.
tags/v0.3.9-rc1
Haixuan Xavier Tao GitHub 1 year ago
parent
commit
5ab3ea46e1
No known key found for this signature in database GPG Key ID: B5690EEEBB952194
19 changed files with 383 additions and 90 deletions
  1. +9
    -0
      examples/python-ros2-dataflow/README.md
  2. +7
    -0
      examples/rust-dataflow/README.md
  3. +25
    -6
      examples/speech-to-text/README.md
  4. +2
    -0
      examples/speech-to-text/whisper-dev.yml
  5. +33
    -0
      examples/speech-to-text/whisper.yml
  6. +10
    -0
      examples/vlm/README.md
  7. +0
    -37
      examples/vlm/dataflow.yml
  8. +29
    -3
      examples/vlm/qwenvl-dev.yml
  9. +67
    -0
      examples/vlm/qwenvl.yml
  10. +29
    -2
      node-hub/dora-distil-whisper/README.md
  11. +45
    -2
      node-hub/dora-microphone/README.md
  12. +9
    -3
      node-hub/dora-microphone/dora_microphone/main.py
  13. +29
    -0
      node-hub/dora-qwenvl/README.md
  14. +7
    -2
      node-hub/dora-qwenvl/dora_qwenvl/main.py
  15. +1
    -1
      node-hub/dora-qwenvl/pyproject.toml
  16. +1
    -1
      node-hub/dora-rdt-1b/dora_rdt_1b/RoboticsDiffusionTransformer
  17. +38
    -14
      node-hub/dora-rerun/README.md
  18. +42
    -0
      node-hub/dora-vad/README.md
  19. +0
    -19
      node-hub/dora_rdt_1b/__init__.py

+ 9
- 0
examples/python-ros2-dataflow/README.md View File

@@ -0,0 +1,9 @@
# Quick Python ROS2 example

To get started:

```bash
source /opt/ros/humble/setup.bash && ros2 run turtlesim turtlesim_node &
source /opt/ros/humble/setup.bash && ros2 run examples_rclcpp_minimal_service service_main &
cargo run --example python-ros2-dataflow --features="ros2-examples"
```

+ 7
- 0
examples/rust-dataflow/README.md View File

@@ -0,0 +1,7 @@
# Quick Rust example

To get started:

```bash
cargo run --example rust-dataflow
```

+ 25
- 6
examples/speech-to-text/README.md View File

@@ -1,12 +1,31 @@
# Dora echo example
# Dora Speech to Text example


Make sure to have, dora, pip and cargo installed. Make sure to have, dora, pip and cargo installed.


```bash ```bash
dora up
dora build dataflow.yml
dora start dataflow.yml
dora build https://raw.githubusercontent.com/dora-rs/dora/main/examples/speech-to-text/whisper.yml
dora run https://raw.githubusercontent.com/dora-rs/dora/main/examples/speech-to-text/whisper.yml

# Wait for the whisper model to download which can takes a bit of time.
```

## Graph Visualization

```mermaid

flowchart TB
dora-microphone
dora-vad
dora-distil-whisper
dora-rerun[/dora-rerun\]
subgraph ___dora___ [dora]
subgraph ___timer_timer___ [timer]
dora/timer/secs/2[\secs/2/]
end
end
dora/timer/secs/2 -- tick --> dora-microphone
dora-microphone -- audio --> dora-vad
dora-vad -- audio as input --> dora-distil-whisper
dora-distil-whisper -- text as original_text --> dora-rerun


# In another terminal
terminal-print
``` ```

examples/speech-to-text/dataflow.yml → examples/speech-to-text/whisper-dev.yml View File

@@ -2,6 +2,8 @@ nodes:
- id: dora-microphone - id: dora-microphone
build: pip install -e ../../node-hub/dora-microphone build: pip install -e ../../node-hub/dora-microphone
path: dora-microphone path: dora-microphone
inputs:
tick: dora/timer/millis/2000
outputs: outputs:
- audio - audio



+ 33
- 0
examples/speech-to-text/whisper.yml View File

@@ -0,0 +1,33 @@
nodes:
- id: dora-microphone
description: Microphone
build: pip install dora-microphone
path: dora-microphone
inputs:
tick: dora/timer/millis/2000
outputs:
- audio

- id: dora-vad
build: pip install dora-vad
path: dora-vad
inputs:
audio: dora-microphone/audio
outputs:
- audio

- id: dora-whisper
build: pip install dora-distil-whisper
path: dora-distil-whisper
inputs:
input: dora-vad/audio
outputs:
- text
env:
TARGET_LANGUAGE: english

- id: dora-rerun
build: pip install dora-rerun
path: dora-rerun
inputs:
original_text: dora-whisper/text

+ 10
- 0
examples/vlm/README.md View File

@@ -1 +1,11 @@
# Quick example on using a VLM with dora-rs # Quick example on using a VLM with dora-rs

Make sure to have, dora, pip and cargo installed.

```bash
dora build https://raw.githubusercontent.com/dora-rs/dora/main/examples/vlm/qwenvl.yml

dora run https://raw.githubusercontent.com/dora-rs/dora/main/examples/vlm/qwenvl.yml

# Wait for the qwenvl, whisper model to download which can takes a bit of time.
```

+ 0
- 37
examples/vlm/dataflow.yml View File

@@ -1,37 +0,0 @@
nodes:
- id: camera
build: pip install -e ../../node-hub/opencv-video-capture
path: opencv-video-capture
inputs:
tick: dora/timer/millis/50
outputs:
- image
env:
CAPTURE_PATH: 0
IMAGE_WIDTH: 640
IMAGE_HEIGHT: 480

- id: dora-qwenvl
build: pip install -e ../../node-hub/dora-qwenvl
path: dora-qwenvl
inputs:
image:
source: camera/image
queue_size: 1
tick: dora/timer/millis/400
outputs:
- text
- tick
env:
DEFAULT_QUESTION: Describe the image in a very short sentence.
# For China
# USE_MODELSCOPE_HUB: true

- id: plot
build: pip install -e ../../node-hub/opencv-plot
path: opencv-plot
inputs:
image:
source: camera/image
queue_size: 1
text: dora-qwenvl/tick

examples/vlm/dataflow_rerun.yml → examples/vlm/qwenvl-dev.yml View File

@@ -1,4 +1,30 @@
nodes: nodes:
- id: dora-microphone
build: pip install -e ../../node-hub/dora-microphone
path: dora-microphone
inputs:
tick: dora/timer/millis/2000
outputs:
- audio

- id: dora-vad
build: pip install -e ../../node-hub/dora-vad
path: dora-vad
inputs:
audio: dora-microphone/audio
outputs:
- audio

- id: dora-distil-whisper
build: pip install -e ../../node-hub/dora-distil-whisper
path: dora-distil-whisper
inputs:
input: dora-vad/audio
outputs:
- text
env:
TARGET_LANGUAGE: english

- id: camera - id: camera
build: pip install -e ../../node-hub/opencv-video-capture build: pip install -e ../../node-hub/opencv-video-capture
path: opencv-video-capture path: opencv-video-capture
@@ -18,10 +44,9 @@ nodes:
image: image:
source: camera/image source: camera/image
queue_size: 1 queue_size: 1
tick: dora/timer/millis/400
text: dora-distil-whisper/text
outputs: outputs:
- text - text
- tick
env: env:
DEFAULT_QUESTION: Describe the image in a very short sentence. DEFAULT_QUESTION: Describe the image in a very short sentence.
# USE_MODELSCOPE_HUB: true # USE_MODELSCOPE_HUB: true
@@ -33,7 +58,8 @@ nodes:
image: image:
source: camera/image source: camera/image
queue_size: 1 queue_size: 1
text: dora-qwenvl/tick
text_qwenvl: dora-qwenvl/text
text_whisper: dora-distil-whisper/text
env: env:
IMAGE_WIDTH: 640 IMAGE_WIDTH: 640
IMAGE_HEIGHT: 480 IMAGE_HEIGHT: 480

+ 67
- 0
examples/vlm/qwenvl.yml View File

@@ -0,0 +1,67 @@
nodes:
- id: dora-microphone
build: pip install dora-microphone
path: dora-microphone
inputs:
tick: dora/timer/millis/2000
outputs:
- audio

- id: dora-vad
build: pip install dora-vad
path: dora-vad
inputs:
audio: dora-microphone/audio
outputs:
- audio

- id: dora-distil-whisper
build: pip install dora-distil-whisper
path: dora-distil-whisper
inputs:
input: dora-vad/audio
outputs:
- text
env:
TARGET_LANGUAGE: english

- id: camera
build: pip install opencv-video-capture
path: opencv-video-capture
inputs:
tick: dora/timer/millis/50
outputs:
- image
env:
CAPTURE_PATH: 0
IMAGE_WIDTH: 640
IMAGE_HEIGHT: 480

- id: dora-qwenvl
build: pip install dora-qwenvl
path: dora-qwenvl
inputs:
image:
source: camera/image
queue_size: 1
text: dora-distil-whisper/text
outputs:
- text
env:
DEFAULT_QUESTION: Describe the image in a very short sentence.
# USE_MODELSCOPE_HUB: true

- id: plot
build: pip install dora-rerun
path: dora-rerun
inputs:
image:
source: camera/image
queue_size: 1
text_qwenvl: dora-qwenvl/text
text_whisper: dora-distil-whisper/text
env:
IMAGE_WIDTH: 640
IMAGE_HEIGHT: 480
README: |
# Visualization of QwenVL2

+ 29
- 2
node-hub/dora-distil-whisper/README.md View File

@@ -1,3 +1,30 @@
# Dora Node for transforming speech to text (English only)
# Dora Whisper Node for transforming speech to text


Check example at [examples/speech-to-text](examples/speech-to-text)
## YAML Specification

This node is supposed to be used as follows:

```yaml
- id: dora-distil-whisper
build: pip install dora-distil-whisper
path: dora-distil-whisper
inputs:
input: dora-vad/audio
outputs:
- text
env:
TARGET_LANGUAGE: english
```

## Examples

- Speech to Text
- github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text
- website: https://dora-rs.ai/docs/examples/stt
- Vision Language Model
- github: https://github.com/dora-rs/dora/blob/main/examples/vlm
- website: https://dora-rs.ai/docs/examples/vlm

## License

Dora-whisper's code and model weights are released under the MIT License

+ 45
- 2
node-hub/dora-microphone/README.md View File

@@ -1,5 +1,48 @@
# Dora Node for recording data from microphone
# Collect data from microphone


This node will send data as soon as the microphone volume is higher than a threshold. This node will send data as soon as the microphone volume is higher than a threshold.


Check example at [examples/speech-to-text](examples/speech-to-text)
This is using python Sounddevice.

It detects beginning and ending of voice activity within a stream of audio and returns the parts that contains activity.

There's a maximum amount of voice duration, to avoid having no input for too long.

## Input/Output Specification

- inputs:
- tick: This is used to detect when the dataflow is finished.
- outputs:
- audio: 16kHz sampled audio sent by chunk

## YAML Specification

```yaml
- id: dora-vad
description: Voice activity detection. See; <a href='https://github.com/snakers4/silero-vad'>sidero</a>
build: pip install dora-vad
path: dora-vad
inputs:
audio: dora-microphone/audio
outputs:
- audio
```

## Reference documentation

- dora-microphone
- github: https://github.com/dora-rs/dora/blob/main/node-hub/dora-microphone
- website: http://dora-rs.ai/docs/nodes/microphone
- sounddevice
- website: https://python-sounddevice.readthedocs.io/en/0.5.1/
- github: https://github.com/spatialaudio/python-sounddevice/tree/master

## Examples

- Speech to Text
- github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text
- website: https://dora-rs.ai/docs/examples/stt

## License

The code and model weights are released under the MIT License.

+ 9
- 3
node-hub/dora-microphone/dora_microphone/main.py View File

@@ -16,13 +16,19 @@ def main():
start_recording_time = tm.time() start_recording_time = tm.time()
node = Node() node = Node()


always_none = node.next(timeout=0.001) is None
finished = False

# pylint: disable=unused-argument # pylint: disable=unused-argument
def callback(indata, frames, time, status): def callback(indata, frames, time, status):
nonlocal buffer, node, start_recording_time
nonlocal buffer, node, start_recording_time, finished


if tm.time() - start_recording_time > MAX_DURATION: if tm.time() - start_recording_time > MAX_DURATION:
audio_data = np.array(buffer).ravel().astype(np.float32) / 32768.0 audio_data = np.array(buffer).ravel().astype(np.float32) / 32768.0
node.send_output("audio", pa.array(audio_data)) node.send_output("audio", pa.array(audio_data))
if not always_none:
event = node.next(timeout=0.001)
finished = event is None
buffer = [] buffer = []
start_recording_time = tm.time() start_recording_time = tm.time()
else: else:
@@ -32,5 +38,5 @@ def main():
with sd.InputStream( with sd.InputStream(
callback=callback, dtype=np.int16, channels=1, samplerate=SAMPLE_RATE callback=callback, dtype=np.int16, channels=1, samplerate=SAMPLE_RATE
): ):
while True:
sd.sleep(int(100 * 1000))
while not finished:
sd.sleep(int(1000))

+ 29
- 0
node-hub/dora-qwenvl/README.md View File

@@ -1,3 +1,32 @@
# Dora QwenVL2 node # Dora QwenVL2 node


Experimental node for using a VLM within dora. Experimental node for using a VLM within dora.

## YAML Specification

This node is supposed to be used as follows:

```yaml
- id: dora-qwenvl
build: pip install dora-qwenvl
path: dora-qwenvl
inputs:
image:
source: camera/image
queue_size: 1
text: dora-distil-whisper/text
outputs:
- text
env:
DEFAULT_QUESTION: Describe the image in a very short sentence.
```

## Additional documentation

- Qwenvl: https://github.com/QwenLM/Qwen-VL

## Examples

- Vision Language Model
- Github: https://github.com/dora-rs/dora/blob/main/examples/vlm
- Website: https://dora-rs.ai/docs/examples/vlm

+ 7
- 2
node-hub/dora-qwenvl/dora_qwenvl/main.py View File

@@ -85,7 +85,12 @@ def generate(frames: dict, question):
return_tensors="pt", return_tensors="pt",
) )


device = "cuda:0" if torch.cuda.is_available() else "cpu"
if torch.backends.mps.is_available():
device = torch.device("mps")
elif torch.cuda.is_available():
device = torch.device("cuda", 0)
else:
device = torch.device("cpu")
inputs = inputs.to(device) inputs = inputs.to(device)


# Inference: Generation of the output # Inference: Generation of the output
@@ -181,7 +186,7 @@ def main():
) )


elif event_type == "ERROR": elif event_type == "ERROR":
raise RuntimeError(event["error"])
print("Event Error:" + event["error"])




if __name__ == "__main__": if __name__ == "__main__":


+ 1
- 1
node-hub/dora-qwenvl/pyproject.toml View File

@@ -15,7 +15,7 @@ python = "^3.7"
dora-rs = "^0.3.6" dora-rs = "^0.3.6"
numpy = "< 2.0.0" numpy = "< 2.0.0"
torch = "^2.2.0" torch = "^2.2.0"
torchvision = "^0.19"
torchvision = "^0.20"
transformers = "^4.45" transformers = "^4.45"
qwen-vl-utils = "^0.0.2" qwen-vl-utils = "^0.0.2"
accelerate = "^0.33" accelerate = "^0.33"


+ 1
- 1
node-hub/dora-rdt-1b/dora_rdt_1b/RoboticsDiffusionTransformer

@@ -1 +1 @@
Subproject commit b2889e65cfe62571ced3ce88f00e7d80b41fee69
Subproject commit 198374ea8c4a2ec2ddae86c35448d21aa9756f37

+ 38
- 14
node-hub/dora-rerun/README.md View File

@@ -7,25 +7,27 @@ This nodes is still experimental and format for passing Images, Bounding boxes,
## Getting Started ## Getting Started


```bash ```bash
cargo install --force rerun-cli@0.15.1

## To install this package
git clone git@github.com:dora-rs/dora.git
cargo install --git https://github.com/dora-rs/dora dora-rerun
pip install dora-rerun
``` ```


## Adding to existing graph: ## Adding to existing graph:


```yaml ```yaml
- id: rerun
custom:
source: dora-rerun
inputs:
image: webcam/image
text: webcam/text
boxes2d: object_detection/bbox
envs:
RERUN_MEMORY_LIMIT: 25%
- id: plot
build: pip install dora-rerun
path: dora-rerun
inputs:
image:
source: camera/image
queue_size: 1
text_qwenvl: dora-qwenvl/text
text_whisper: dora-distil-whisper/text
env:
IMAGE_WIDTH: 640
IMAGE_HEIGHT: 480
README: |
# Visualization
RERUN_MEMORY_LIMIT: 25%
``` ```


## Input definition ## Input definition
@@ -67,3 +69,25 @@ Make sure to name the dataflow as follows:
## Configurations ## Configurations


- RERUN_MEMORY_LIMIT: Rerun memory limit - RERUN_MEMORY_LIMIT: Rerun memory limit

## Reference documentation

- dora-rerun
- github: https://github.com/dora-rs/dora/blob/main/node-hub/dora-rerun
- website: http://dora-rs.ai/docs/nodes/rerun
- rerun
- github: https://github.com/rerun-io/rerun
- website: https://rerun.io

## Examples

- speech to text
- github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text
- website: https://dora-rs.ai/docs/examples/stt
- vision language model
- github: https://github.com/dora-rs/dora/blob/main/examples/vlm
- website: https://dora-rs.ai/docs/examples/vlm

## License

The code and model weights are released under the MIT License.

+ 42
- 0
node-hub/dora-vad/README.md View File

@@ -1,3 +1,45 @@
# Speech Activity Detection(VAD) # Speech Activity Detection(VAD)


This is using Silero VAD. This is using Silero VAD.

It detects beginning and ending of voice activity within a stream of audio and returns the parts that contains activity.

There's a maximum amount of voice duration, to avoid having no input for too long.

## Input/Output Specification

- inputs:
- audio: 8kHz or 16kHz sample rate.
- outputs:
- audio: Same as input but truncated

## YAML Specification

```yaml
- id: dora-vad
description: Voice activity detection. See; <a href='https://github.com/snakers4/silero-vad'>sidero</a>
build: pip install dora-vad
path: dora-vad
inputs:
audio: dora-microphone/audio
outputs:
- audio
```

## Reference documentation

- dora-sidero
- github: https://github.com/dora-rs/dora/blob/main/node-hub/dora-vad
- website: http://dora-rs.ai/docs/nodes/sidero
- Sidero
- github https://github.com/snakers4/silero-vad

## Examples

- Speech to Text
- github: https://github.com/dora-rs/dora/blob/main/examples/speech-to-text
- website: https://dora-rs.ai/docs/examples/stt

## License

The code and model weights are released under the MIT License.

+ 0
- 19
node-hub/dora_rdt_1b/__init__.py View File

@@ -1,19 +0,0 @@
import os
import sys
from pathlib import Path

# Define the path to the README file relative to the package directory
readme_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), "README.md")

# Read the content of the README file
try:
with open(readme_path, "r", encoding="utf-8") as f:
__doc__ = f.read()
except FileNotFoundError:
__doc__ = "README file not found."


# Set up the import hook

submodule_path = Path(__file__).resolve().parent / "RoboticsDiffusionTransformer"
sys.path.insert(0, str(submodule_path))

Loading…
Cancel
Save