Martin Evans
321d0b58c4
Merge pull request #202 from martindevans/multi_gpu
Multi GPU
2 years ago
Martin Evans
a03fe003de
Fixed decoding of text "accumulating" over time (never properly clearing buffer)
2 years ago
Martin Evans
51d4411a58
Added two new classes for detokenization tasks:
- `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected.
- `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens.
Added tests for these classes and updated StatelessExecutor to use them.
Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).
2 years ago
Martin Evans
efdf3d630c
- Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens).
- Built a new (hacky) `Detokenize` method which handles this
2 years ago
Martin Evans
1d0620e634
Created a test that "roundtrips" strings through tokenization. This reveals some flaws with certain characters
2 years ago
Martin Evans
04acbf8c42
Improved doc comment on `tensor_split`
2 years ago
Martin Evans
15db194c17
Added multi GPU support
2 years ago
Martin Evans
e89ca5cc17
Fixed a few minor warnings
2 years ago
Martin Evans
9daf586ba8
Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc)
2 years ago
Martin Evans
1f8c94e386
Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538 )
2 years ago
Martin Evans
2a38808bca
- Added threads to context params, replaced all thread args with `uint?`
- Replaced all binaries
2 years ago
Martin Evans
9a0a0ae9fe
Removed cloning support
2 years ago
Martin Evans
0d40338692
Fixed out-of-context handling in stateless executor
2 years ago
Martin Evans
b306ac23dd
Added `Decode` method to `SafeLLamaContextHandle`
2 years ago
Martin Evans
9e958e896b
safe handle for batch
2 years ago
Martin Evans
ce1fc51163
Added some more native methods
2 years ago
Martin Evans
bca55eace0
Initial changes to match the llama.cpp changes
2 years ago
Haiping
10678a83d6
Merge pull request #65 from martindevans/alternative_dependency_loading
CPU Feature Detection
2 years ago
Martin Evans
daf09eae64
Skipping tokenization of empty strings (saves allocating an empty array every time)
2 years ago
Martin Evans
bba801f4b7
Added a property to get the KV cache size from a context
2 years ago
sa_ddam213
09d8f434f2
Extract LLamaLogLevel, Remove Logger class
2 years ago
Martin Evans
d3b8ee988c
Beam Search ( #155 )
* Added the low level bindings to beam search.
2 years ago
Martin Evans
614ba40948
- Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings.
- Minimal amount of characters converted
- Allocation free
- Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span
- Allocation free
2 years ago
Martin Evans
6a842014ac
Removed duplicate `llama_sample_classifier_free_guidance` method
2 years ago
Martin Evans
8f58a40fb9
Added Linux dependency loading
2 years ago
Martin Evans
dd4957471f
Changed paths to match what the GitHub build action produces
2 years ago
Martin Evans
756a1ad0ba
Added a new way to load dependencies, performing CPU feature detection
2 years ago
Rinne
4e83e48ad1
Merge pull request #122 from martindevans/gguf
Add GGUF support
2 years ago
Martin Evans
bcf06e2652
Added some comments on various native methods
2 years ago
Martin Evans
a70c7170dd
- Created a higher level `Grammar` class which is immutable and contains a list of grammar rules. This is the main "entry point" to the grammar system.
- Made all the mechanics of grammar parsing (GBNFGrammarParser, ParseState) internal. Just call `Grammar.Parse("whatever")`.
- Added a `GrammarRule` class which validates elements on construction (this allows constructing grammar without parsing GBNF).
- It should be impossible for a `GrammarRule` to represent an invalid rule.
2 years ago
Mihai
0bd495276b
Add initial tests + fix bugs. Still WIP since the test is failing.
2 years ago
Martin Evans
2022b82947
Added binaries generated by this action: https://github.com/SciSharp/LLamaSharp/actions/runs/6002797872/job/16279896150
Based on this version: 6b73ef1201
2 years ago
Martin Evans
31287b5e6e
Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options:
- Just convert it to a `string`, nice and simple
- Write the bytes to a `Span<byte>` no allocations
- Write the chars to a `StringBuilder` potentially no allocations
2 years ago
Martin Evans
0c98ae1955
Passing ctx to `llama_token_nl(_ctx)`
2 years ago
Martin Evans
6ffa28f964
Removed `LLAMA_MAX_DEVICES` (not used)
2 years ago
Martin Evans
2056078aef
Initial changes required for GGUF support
2 years ago
Martin Evans
cf4754db44
Removed unnecessary parameters from some low level sampler methods
2 years ago
Martin Evans
f70525fec2
Two small improvements to the native sampling API:
- Modified `llama_sample_token_mirostat` and `llama_sample_token_mirostat_v2` to take `ref float` instead of as a `float*`. Less pointers is always good.
- Modified `llama_sample_repetition_penalty` and `llama_sample_frequency_and_presence_penalties` to take pointers instead of arrays. This allows the use non non allocating types (e.g. Span) instead of arrays
- Modified higher level API to accept `Memory<int>` instead of `int[]`, which can be used to reduce allocations at call sites
2 years ago
Martin Evans
a911b77dec
Various minor changes, resolving about 100 ReSharper code quality warnings
2 years ago
Martin Evans
ebacdb666d
- Moved the lower level state get/set methods onto SafeLLamaContextHandle
- Used those methods to add a `Clone` method to SafeLLamaContextHandle
- Simplified `LLamaContext` by using the new methods
- Sealed `LLamaContext` and `LLamaEmbedder`
2 years ago
Martin Evans
829f32b27d
- Added `Obsolete` attributes to the entire `OldVersion` namespace, so they can be removed in the future
- Minor changes to cleanup some of the compiler warnings
2 years ago
zombieguy
45b01d5a78
Improved type conversion
Type conversion is now done in the property rather than the utils class and uses the System.Convert class to ensure consistency.
2 years ago
Martin Evans
2830e5755c
- Applied a lot of minor R# code quality suggestions. Lots of unnecessary imports removed.
- Deleted `NativeInfo` (internal class, not used anywhere)
2 years ago
Martin Evans
4b7d718551
Added native symbol for CFG
2 years ago
Martin Evans
759ae26f36
Merge branch 'master' into grammar_basics
2 years ago
Martin Evans
a9e6f21ab8
- Creating and destroying contexts in the stateless executor, saving memory. It now uses zero memory when not inferring!
- Passing encoding in the `IModelParams`, which reduces how often encoding needs to be passed around
2 years ago
Martin Evans
ae8ef17a4a
- Added various convenience overloads to `LLamaContext.Eval`
- Converted `SafeLLamaContextHandle` to take a `ReadOnlySpan` for Eval, narrower type better represents what's really needed
2 years ago
Martin Evans
64416ca23c
- Created a slightly nicer way to create grammar (from `IReadOnlyList<IReadOnlyList<LLamaGrammarElement>>`)
- Integrated grammar into sampling
- Added a test for the grammar sampling
2 years ago
Martin Evans
0294bb1303
Some of the basics of the grammar API
2 years ago
Rinne
62331852bc
Merge pull request #90 from martindevans/proposal_multi_context
Multi Context
2 years ago