LLamaSharp

Commit Graph

Author	SHA1	Message	Date
Martin Evans	b6d242193e	Debugging slowdown by removing some things: - Removed all `record struct` uses in native code - Removed usage of `readonly` in native structs Minor fix: - Added sequential layout to `LLamaModelQuantizeParams`	2 years ago
Martin Evans	51c292ebd8	Added a safe method for `llama_get_logits_ith`	2 years ago
Martin Evans	7e3cde4c13	Moved helper methods into `LLamaBatchSafeHandle`	2 years ago
Martin Evans	c7fdb9712c	Added binaries, built from ``6961c4bd0b``	2 years ago
Martin Evans	e81b3023d5	Rewritten sampling API to be accessed through the `LLamaTokenDataArray` object	2 years ago
Martin Evans	3c5547b2b7	Reduced some uses of `NativeApi` in `BatchedDecoding` by adding some helper methods	2 years ago
Martin Evans	a024d2242e	It works! had to update binary to `b1426`	2 years ago
Martin Evans	8cd81251b4	initial setup	2 years ago
Martin Evans	321d0b58c4	Merge pull request #202 from martindevans/multi_gpu Multi GPU	2 years ago
Martin Evans	a03fe003de	Fixed decoding of text "accumulating" over time (never properly clearing buffer)	2 years ago
Martin Evans	51d4411a58	Added two new classes for detokenization tasks: - `AntipromptProcessor` accepts chunks of text and returns a value indicating if any antiprompt has been detected. - `StreamingTokenDecoder` decodes tokens into text, maintaining some internal state to handle single characters which are encoded as multiple tokens. Added tests for these classes and updated StatelessExecutor to use them. Removed most DeTokenize methods, marked the rest as obsolete (should always use a `StreamingTokenDecoder`).	2 years ago
Martin Evans	efdf3d630c	- Removed all `TokenToString` methods (it's never correct to use them, because sometimes one single character may be represented by multiple tokens). - Built a new (hacky) `Detokenize` method which handles this	2 years ago
Martin Evans	1d0620e634	Created a test that "roundtrips" strings through tokenization. This reveals some flaws with certain characters	2 years ago
Martin Evans	04acbf8c42	Improved doc comment on `tensor_split`	2 years ago
Martin Evans	15db194c17	Added multi GPU support	2 years ago
Martin Evans	e89ca5cc17	Fixed a few minor warnings	2 years ago
Martin Evans	9daf586ba8	Assorted cleanup leftover after the huge change in the last PR (comments, syntax style, etc)	2 years ago
Martin Evans	1f8c94e386	Added in the `special` parameter to the tokenizer (introduced in https://github.com/ggerganov/llama.cpp/pull/3538 )	2 years ago
Martin Evans	2a38808bca	- Added threads to context params, replaced all thread args with `uint?` - Replaced all binaries	2 years ago
Martin Evans	9a0a0ae9fe	Removed cloning support	2 years ago
Martin Evans	0d40338692	Fixed out-of-context handling in stateless executor	2 years ago
Martin Evans	b306ac23dd	Added `Decode` method to `SafeLLamaContextHandle`	2 years ago
Martin Evans	9e958e896b	safe handle for batch	2 years ago
Martin Evans	ce1fc51163	Added some more native methods	2 years ago
Martin Evans	bca55eace0	Initial changes to match the llama.cpp changes	2 years ago
Haiping	10678a83d6	Merge pull request #65 from martindevans/alternative_dependency_loading CPU Feature Detection	2 years ago
Martin Evans	daf09eae64	Skipping tokenization of empty strings (saves allocating an empty array every time)	2 years ago
Martin Evans	bba801f4b7	Added a property to get the KV cache size from a context	2 years ago
sa_ddam213	09d8f434f2	Extract LLamaLogLevel, Remove Logger class	2 years ago
Martin Evans	d3b8ee988c	Beam Search (#155 ) * Added the low level bindings to beam search.	2 years ago
Martin Evans	614ba40948	- Added a `TokensEndsWithAnyString` extension to `IReadOnlyList<int>` which efficiently checks if a set of tokens ends with one of a set of strings. - Minimal amount of characters converted - Allocation free - Added `TokensToSpan` to `SafeLlamaModelHandle` which converts as many tokens as possible into a character span - Allocation free	2 years ago
Martin Evans	6a842014ac	Removed duplicate `llama_sample_classifier_free_guidance` method	2 years ago
Martin Evans	8f58a40fb9	Added Linux dependency loading	2 years ago
Martin Evans	dd4957471f	Changed paths to match what the GitHub build action produces	2 years ago
Martin Evans	756a1ad0ba	Added a new way to load dependencies, performing CPU feature detection	2 years ago
Rinne	4e83e48ad1	Merge pull request #122 from martindevans/gguf Add GGUF support	2 years ago
Martin Evans	bcf06e2652	Added some comments on various native methods	2 years ago
Martin Evans	a70c7170dd	- Created a higher level `Grammar` class which is immutable and contains a list of grammar rules. This is the main "entry point" to the grammar system. - Made all the mechanics of grammar parsing (GBNFGrammarParser, ParseState) internal. Just call `Grammar.Parse("whatever")`. - Added a `GrammarRule` class which validates elements on construction (this allows constructing grammar without parsing GBNF). - It should be impossible for a `GrammarRule` to represent an invalid rule.	2 years ago
Mihai	0bd495276b	Add initial tests + fix bugs. Still WIP since the test is failing.	2 years ago
Martin Evans	2022b82947	Added binaries generated by this action: https://github.com/SciSharp/LLamaSharp/actions/runs/6002797872/job/16279896150 Based on this version: `6b73ef1201`	2 years ago
Martin Evans	31287b5e6e	Rewritten TokenToSpan/TokenToString to better fit the new way it's done in llama.cpp with a few different options: - Just convert it to a `string`, nice and simple - Write the bytes to a `Span<byte>` no allocations - Write the chars to a `StringBuilder` potentially no allocations	2 years ago
Martin Evans	0c98ae1955	Passing ctx to `llama_token_nl(_ctx)`	2 years ago
Martin Evans	6ffa28f964	Removed `LLAMA_MAX_DEVICES` (not used)	2 years ago
Martin Evans	2056078aef	Initial changes required for GGUF support	2 years ago
Martin Evans	cf4754db44	Removed unnecessary parameters from some low level sampler methods	2 years ago
Martin Evans	f70525fec2	Two small improvements to the native sampling API: - Modified `llama_sample_token_mirostat` and `llama_sample_token_mirostat_v2` to take `ref float` instead of as a `float*`. Less pointers is always good. - Modified `llama_sample_repetition_penalty` and `llama_sample_frequency_and_presence_penalties` to take pointers instead of arrays. This allows the use non non allocating types (e.g. Span) instead of arrays - Modified higher level API to accept `Memory<int>` instead of `int[]`, which can be used to reduce allocations at call sites	2 years ago
Martin Evans	a911b77dec	Various minor changes, resolving about 100 ReSharper code quality warnings	2 years ago
Martin Evans	ebacdb666d	- Moved the lower level state get/set methods onto SafeLLamaContextHandle - Used those methods to add a `Clone` method to SafeLLamaContextHandle - Simplified `LLamaContext` by using the new methods - Sealed `LLamaContext` and `LLamaEmbedder`	2 years ago
Martin Evans	829f32b27d	- Added `Obsolete` attributes to the entire `OldVersion` namespace, so they can be removed in the future - Minor changes to cleanup some of the compiler warnings	2 years ago
zombieguy	45b01d5a78	Improved type conversion Type conversion is now done in the property rather than the utils class and uses the System.Convert class to ensure consistency.	2 years ago

1 2 3

115 Commits (112e33eee8aa4aed8074ec9dfe4356a174f341bf)