Tree: b63b4c1b76

History

7SOMAY 9fbb0015f7 Added E ruff flag for better code quality [skip ci]		9 months ago
..
dora_llama_cpp_python	Minor fix within llama-cpp-python transformers and qwen	10 months ago

tests	Applied pyupgrade style	10 months ago

README.md	updated theparameter input	10 months ago

pyproject.toml	Added E ruff flag for better code quality [skip ci]	9 months ago

test.yml	updated theparameter input	10 months ago

README.md

dora-llama-cpp-python

dora-llama-cpp-python

A Dora node that provides access to LLaMA models using llama-cpp-python for efficient CPU/GPU inference.

Features

GPU acceleration support (CUDA on Linux, Metal on macOS)
Easy integration with speech-to-text and text-to-speech pipelines
Configurable system prompts and activation words
Lightweight CPU inference with GGUF models
Thread-level CPU optimization
Adjustable context window size

Getting started

Installation

uv venv -p 3.11 --seed
uv pip install -e .

Usage

The node can be configured in your dataflow YAML file:


# Using a HuggingFace model
- id: dora-llama-cpp-python
  build: pip install -e path/to/dora-llama-cpp-python
  path: dora-llama-cpp-python
  inputs:
    text: source_node/text  # Input text to generate response for
  outputs:
    - text  # Generated response text
  env:
    MODEL_NAME_OR_PATH: "TheBloke/Llama-2-7B-Chat-GGUF"
    MODEL_FILE_PATTERN: "*Q4_K_M.gguf"
    SYSTEM_PROMPT: "You're a very succinct AI assistant with short answers."
    ACTIVATION_WORDS: "what how who where you"
    MAX_TOKENS: "512"
    N_GPU_LAYERS: "35"     # Enable GPU acceleration
    N_THREADS: "4"         # CPU threads
    CONTEXT_SIZE: "4096"   # Maximum context window

Configuration Options

MODEL_NAME_OR_PATH: Path to local model file or HuggingFace repo id (default: "TheBloke/Llama-2-7B-Chat-GGUF")
MODEL_FILE_PATTERN: Pattern to match model file when downloading from HF (default: "*Q4_K_M.gguf")
SYSTEM_PROMPT: Customize the AI assistant's personality/behavior
ACTIVATION_WORDS: Space-separated list of words that trigger model response
MAX_TOKENS: Maximum number of tokens to generate (default: 512)
N_GPU_LAYERS: Number of layers to offload to GPU (default: 0, set to 35 for GPU acceleration)
N_THREADS: Number of CPU threads to use (default: 4)
CONTEXT_SIZE: Maximum context window size (default: 4096)

Example: Speech Assistant Pipeline

This example shows how to create a conversational AI pipeline that:

Captures audio from microphone
Converts speech to text
Generates AI responses
Converts responses back to speech

nodes:
  - id: dora-microphone
    build: pip install dora-microphone
    path: dora-microphone
    inputs:
      tick: dora/timer/millis/2000
    outputs:
      - audio

  - id: dora-vad
    build: pip install dora-vad
    path: dora-vad
    inputs:
      audio: dora-microphone/audio
    outputs:
      - audio
      - timestamp_start

  - id: dora-whisper
    build: pip install dora-distil-whisper
    path: dora-distil-whisper
    inputs:
      input: dora-vad/audio
    outputs:
      - text

  - id: dora-llama-cpp-python
    build: pip install -e .
    path: dora-llama-cpp-python
    inputs:
      text: dora-whisper/text
    outputs:
      - text
    env:
      MODEL_NAME: "TheBloke/Llama-2-7B-Chat-GGUF"
      MODEL_FILE_PATTERN: "*Q4_K_M.gguf"
      SYSTEM_PROMPT: "You're a helpful assistant."
      ACTIVATION_WORDS: "hey help what how"
      MAX_TOKENS: "512"
      N_GPU_LAYERS: "35"
      N_THREADS: "4"
      CONTEXT_SIZE: "4096"

  - id: dora-tts
    build: pip install dora-kokoro-tts
    path: dora-kokoro-tts
    inputs:
      text: dora-llama-cpp-python/text
    outputs:
      - audio

Running the Example

dora build example.yml
dora run example.yml

Contribution Guide

Format with ruff:

uv pip install ruff
uv run ruff check . --fix

Lint with ruff:

uv run ruff check .

Test with pytest

uv pip install pytest
uv run pytest . # Test

License

dora-llama-cpp-python is released under the MIT License

DORA (Dataflow-Oriented Robotic Architecture) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed datafl

Rust Python TOML Markdown C other

dev@phil-opp.com tao.xavier@outlook.com yuma.hiramatsu@gmail.com xuxchang@hotmail.com shashwatpatil974@gmail.com ward.michael.j@gmail.com 29139614+renovate[bot]@users.noreply.github.com sharjeelsajid09@gmail.com haroon152018@gmail.com dev@enzo-le-van.fr matiurrehman017@gmail.com ericlbuehler@gmail.com 49699333+dependabot[bot]@users.noreply.github.com ssomay2002@gmail.com somay.a4p@ap.denso.com 45980096+heyong4725@users.noreply.github.com echo_ai@foxmail.com hai-xuan.tao@student.ecp.fr 1754165401@qq.com 4655609+bobd988@users.noreply.github.com 2891067867@qq.com moneymindedmunish1@gmail.com arnavkhan4343@gmail.com arenas@aliyun.com 135236134+Shar-jeel-Sajid@users.noreply.github.com