You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 3.8 kB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152
  1. # dora-llama-cpp-python
  2. A Dora node that provides access to LLaMA models using llama-cpp-python for efficient CPU/GPU inference.
  3. ## Features
  4. - GPU acceleration support (CUDA on Linux, Metal on macOS)
  5. - Easy integration with speech-to-text and text-to-speech pipelines
  6. - Configurable system prompts and activation words
  7. - Lightweight CPU inference with GGUF models
  8. - Thread-level CPU optimization
  9. - Adjustable context window size
  10. ## Getting started
  11. ### Installation
  12. ```bash
  13. uv venv -p 3.11 --seed
  14. uv pip install -e .
  15. ```
  16. ## Usage
  17. The node can be configured in your dataflow YAML file:
  18. ```yaml
  19. # Using a HuggingFace model
  20. - id: dora-llama-cpp-python
  21. build: pip install -e path/to/dora-llama-cpp-python
  22. path: dora-llama-cpp-python
  23. inputs:
  24. text: source_node/text # Input text to generate response for
  25. outputs:
  26. - text # Generated response text
  27. env:
  28. MODEL_NAME_OR_PATH: "TheBloke/Llama-2-7B-Chat-GGUF"
  29. MODEL_FILE_PATTERN: "*Q4_K_M.gguf"
  30. SYSTEM_PROMPT: "You're a very succinct AI assistant with short answers."
  31. ACTIVATION_WORDS: "what how who where you"
  32. MAX_TOKENS: "512"
  33. N_GPU_LAYERS: "35" # Enable GPU acceleration
  34. N_THREADS: "4" # CPU threads
  35. CONTEXT_SIZE: "4096" # Maximum context window
  36. ```
  37. ### Configuration Options
  38. - `MODEL_NAME_OR_PATH`: Path to local model file or HuggingFace repo id (default: "TheBloke/Llama-2-7B-Chat-GGUF")
  39. - `MODEL_FILE_PATTERN`: Pattern to match model file when downloading from HF (default: "*Q4_K_M.gguf")
  40. - `SYSTEM_PROMPT`: Customize the AI assistant's personality/behavior
  41. - `ACTIVATION_WORDS`: Space-separated list of words that trigger model response
  42. - `MAX_TOKENS`: Maximum number of tokens to generate (default: 512)
  43. - `N_GPU_LAYERS`: Number of layers to offload to GPU (default: 0, set to 35 for GPU acceleration)
  44. - `N_THREADS`: Number of CPU threads to use (default: 4)
  45. - `CONTEXT_SIZE`: Maximum context window size (default: 4096)
  46. ## Example: Speech Assistant Pipeline
  47. This example shows how to create a conversational AI pipeline that:
  48. 1. Captures audio from microphone
  49. 2. Converts speech to text
  50. 3. Generates AI responses
  51. 4. Converts responses back to speech
  52. ```yaml
  53. nodes:
  54. - id: dora-microphone
  55. build: pip install dora-microphone
  56. path: dora-microphone
  57. inputs:
  58. tick: dora/timer/millis/2000
  59. outputs:
  60. - audio
  61. - id: dora-vad
  62. build: pip install dora-vad
  63. path: dora-vad
  64. inputs:
  65. audio: dora-microphone/audio
  66. outputs:
  67. - audio
  68. - timestamp_start
  69. - id: dora-whisper
  70. build: pip install dora-distil-whisper
  71. path: dora-distil-whisper
  72. inputs:
  73. input: dora-vad/audio
  74. outputs:
  75. - text
  76. - id: dora-llama-cpp-python
  77. build: pip install -e .
  78. path: dora-llama-cpp-python
  79. inputs:
  80. text: dora-whisper/text
  81. outputs:
  82. - text
  83. env:
  84. MODEL_NAME: "TheBloke/Llama-2-7B-Chat-GGUF"
  85. MODEL_FILE_PATTERN: "*Q4_K_M.gguf"
  86. SYSTEM_PROMPT: "You're a helpful assistant."
  87. ACTIVATION_WORDS: "hey help what how"
  88. MAX_TOKENS: "512"
  89. N_GPU_LAYERS: "35"
  90. N_THREADS: "4"
  91. CONTEXT_SIZE: "4096"
  92. - id: dora-tts
  93. build: pip install dora-kokoro-tts
  94. path: dora-kokoro-tts
  95. inputs:
  96. text: dora-llama-cpp-python/text
  97. outputs:
  98. - audio
  99. ```
  100. ### Running the Example
  101. ```bash
  102. dora build example.yml
  103. dora run example.yml
  104. ```
  105. ## Contribution Guide
  106. - Format with [ruff](https://docs.astral.sh/ruff/):
  107. ```bash
  108. uv pip install ruff
  109. uv run ruff check . --fix
  110. ```
  111. - Lint with ruff:
  112. ```bash
  113. uv run ruff check .
  114. ```
  115. - Test with [pytest](https://github.com/pytest-dev/pytest)
  116. ```bash
  117. uv pip install pytest
  118. uv run pytest . # Test
  119. ```
  120. ## License
  121. dora-llama-cpp-python is released under the MIT License