LLamaSharp

Commit Graph

Author	SHA1	Message	Date
Martin Evans	00df7c1516	- Added `LLamaWeights.LoadFromFileAsync`. - Async loading supports cancellation through a `CancellationToken`. If loading is cancelled an `OperationCanceledException` is thrown. If it fails for another reason a `LoadWeightsFailedException` is thrown. - Updated examples to use `LoadFromFileAsync`	2 years ago
Martin Evans	18586cc43b	Merge pull request #696 from martindevans/safe_handle_constructor_refactor Removed Unnecessary Constructor From Safe Handles	2 years ago
Martin Evans	e9fd7f96e0	Merge pull request #691 from martindevans/empty_batch_check Empty batch check	2 years ago
Martin Evans	a2f8573831	Merge pull request #698 from martindevans/slightly_safer_quantize_params Slightly Safer Quantize Params	2 years ago
Martin Evans	d4f793a7eb	Using `is` check instead of `== null`	2 years ago
Martin Evans	ecb359c9e7	- Using more specific `LoadWeightsFailedException` when a llava model fails to load (#697 ) - Passing model path, instead of a message, to `LoadWeightsFailedException` constructor	2 years ago
Martin Evans	58ec798bff	Modified `llama_model_quantize` to accept argument by `ref` instead of pointer.	2 years ago
Martin Evans	54dab273cd	- Removed unnecessary constructors from safe handles - Returning SafeLLamaGrammarHandle directly from `llama_grammar_init` and `llama_grammar_copy`	2 years ago
Martin Evans	25812762c9	Added checks in `Decode` to skip doing anything if the batch is empty.	2 years ago
Martin Evans	3c76440957	- Added tests for generating embeddings with generative model and embedding model - Rewritten native API methods for embeddings to return pointers - null is a valid value for these methods to return so `Span` is not appropriate	2 years ago
Martin Evans	c325ac9127	April 2024 Binary Update (#662 ) * Updated binaries, using [this build](https://github.com/SciSharp/LLamaSharp/actions/runs/8654672719/job/23733195669) for llama.cpp commit `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7`. - Added all new functions. - Moved some functions (e.g. `SafeLlamaModelHandle` specific functions) into `SafeLlamaModelHandle.cs` - Exposed tokens on `SafeLlamaModelHandle` and `LLamaWeights` through a `Tokens` property. As new special tokens are added in the future they can be added here. - Changed all token properties to return nullable tokens, to handle some models not having some tokens. - Fixed `DefaultSamplingPipeline` to handle no newline token in some models. * Moved native methods to more specific locations. - Context specific things have been moved into `SafeLLamaContextHandle.cs` and made private - they're exposed through C# properties and methods already. - Checking that GPU layer count is zero if GPU offload is not supported. - Moved methods for creating default structs (`llama_model_quantize_default_params` and `llama_context_default_params`) into relevant structs. * Removed exception if `GpuLayerCount > 0` when GPU is not supported. * - Added low level wrapper methods for new per-sequence state load/save in `SafeLLamaContextHandle` - Added high level wrapper methods (save/load with `State` object or memory mapped file) in `LLamaContext` - Moved native methods for per-sequence state load/save into `SafeLLamaContextHandle` * Added update and defrag methods for KV cache in `SafeLLamaContextHandle` * Updated submodule to `f7001ccc5aa359fcf41bba19d1c99c3d25c9bcc7` * Passing the sequence ID when saving a single sequence state	2 years ago
Martin Evans	58107bb5b9	Logging interceptor (#649 ) * - Added `NativeLogConfig` which allows overriding the llama.cpp log callback - Delaying binding of this into llama.cpp until after `NativeLibraryConfig` has loaded * Using the log callback to show loading log messages during loading. * Registering log callbacks before any calls to llama.cpp except `llama_empty_call`, this is specifically selected to be a method that does nothing and is just there for triggering DLL loading. * - Removed much of the complexity of logging from `NativeApi.Load`. It always call whatever log callbacks you have registered. - Removed alternative path for `ILogger` in NativeLibraryConfig, instead it redirects to wrapping it in a delegate. * Saving a GC handle to keep the log callback alive * Removed prefix, logger should already do that. * Buffering up messages until a newline is encountered before passing log message to ILogger. * - Added trailing `\n` to log messages from loading. - Using `ThreadLocal<StringBuilder>` to ensure messages from separate threads don't get mixed together.	2 years ago
evolcano	353412923f	Merge branch 'master' of https://github.com/SciSharp/LLamaSharp	2 years ago
evolcano	9d091c0316	Add path to find llama.dll for MAUI This commit is originally made by lcarrere in https://github.com/SciSharp/LLamaSharp/issues/180 . I have confirmed this modification is OK in my windows 11 laptop, add make this commit according require of AsakusaRinne.	2 years ago
SignalRT	2d9a114f66	Include comments and include some checks	2 years ago
SignalRT	e8732efadd	Example InteractiveExecutor Add an Example and modifications to the interactive executor to enable Llava Models. Just a preview / demo	2 years ago
Martin Evans	e2705be6c8	Fixed off by one error in LLamaBatch sampling position (#626 )	2 years ago
Martin Evans	91d72e7465	Keeping track of positions where logits will be generated in a batch and what sequence those logits are associated with. (#624 )	2 years ago
Martin Evans	024787225b	`SetDllImportResolver` based loading (#603 ) - Modified library loading to be based on `SetDllImportResolver`. This replaces the built in loading system and ensures there can't be two libraries loaded at once. - llava and llama are loaded separately, as needed. - All the previous loading logic is still used, within the `SetDllImportResolver` - Split out CUDA, AVX and MacOS paths to separate helper methods. - `Description` now specifies if it is for `llama` or `llava`	2 years ago
jlsantiago	3b2836eac4	Llava api (#563 ) * Add llava_binaries, update all binaries to make the test * Llava API + LlavaTest Preliminary * First prototype of Load + Unit Test * Temporary run test con branch LlavaAPI * Disable Embed test to review the rest of the test * Restore Embedding test * Use BatchThread to eval image embeddings Test Threads default value to ensure it doesn´t produce problems. * Rename test file * Update action versions * Test only one method, no release embeddings * Revert "Test only one method, no release embeddings" This reverts commit `264e176dcc`. * Correct API call * Only test llava related functionality * Cuda and Cblast binaries * Restore build policy * Changes related with code review * Add SafeHandles * Set overwrite to upload-artifact@v4 * Revert to upload-artifact@v3 * revert to upload-artifact@v3	2 years ago
Martin Evans	ce4de7d607	llama_decode lock (#595 ) * Added a lock object into `SafeLlamaModelHandle` which all calls to `llama_decode` (in the `SafeLLamaContextHandle`) lock first. This prevents two contexts from running inference on the same model at the same time, which seems to be unsafe in llama.cpp. * Modified the lock to be global over _all_ inferences. This seems to be necessary (at least with the CUDA backend).	2 years ago
Clovis Henrique Ribeiro	d0f79814e9	Added conditional compilation code to progress_callback (in LlamaModelParams struct) so the struct plays nice with legacy NET Framework 4.8 (#593 )	2 years ago
Martin Evans	f0b0bbcbb7	Mutable Logits (#586 ) Modified LLamaBatch to not share tokens with other sequences if logits is true. This ensures that the logit span at the end in used by exactly one sequence - therefore it's safe to mutate. This removes the need for copying _very_ large arrays (vocab size) and simplifies sampling pipelines.	2 years ago
Martin Evans	a8ba9f05b3	March Binary Update (#565 ) * Updated binaries to llama.cpp `3ab8b3a92ede46df88bc5a2dfca3777de4a2b2b6` (build run: https://github.com/SciSharp/LLamaSharp/actions/runs/8118890586) * Added abort callback * Added properties to get/set thread count on `LLamaContext` * Fixed LLamaLogLevel numbering	2 years ago
Martin Evans	8ac1634233	Removed `llama_eval`. It is going to be completely removed in the next version of llama.cpp (#553 )	2 years ago
Martin Evans	f0e7e7cc0a	Removed `SamplingApi`. it has been marked as Obsolete for a while, replaced by instance methods on `LLamaTokenDataArray` (#552 )	2 years ago
Martin Evans	7d84625a67	Classifier Free Guidance (#536 ) * Added a `Guidance` method to `LLamaTokenDataArray` which applies classifier free guidance * Factored out a safer `llama_sample_apply_guidance` method based on spans * Created a guided sampling demo using the batched executor * fixed comment, "classifier free" not "context free" * Rebased onto master and fixed breakage due to changes in `BaseSamplingPipeline` * Asking user for guidance weight * Progress bar in batched fork demo * Improved fork example (using tree display) * Added proper disposal of resources in batched examples * Added some more comments in BatchedExecutorGuidance	2 years ago
Scott W Harden	a6394001a1	NativeLibraryConfig: WithLogs(LLamaLogLevel) (#529 ) Adds a NativeLibraryConfig.WithLogs() overload to let the user indicate the log level (with "info" as the default)	2 years ago
Martin Evans	c7d0dc915a	Assorted small changes to clean up some code warnings	2 years ago
Martin Evans	e9d9042576	Added `Divide` to `KvAccessor`	2 years ago
Martin Evans	949861a581	- Added a `Modify` method to `Conversation`. This grants temporary access to directly modify the KV cache. - Re-implmented `Rewind` as an extension method using `Modify` internally - Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling. - Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.	2 years ago
Martin Evans	b0acecf080	Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix). Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state. Added two new examples, demonstrating forking and rewinding.	2 years ago
Martin Evans	90915c5a99	Added increment and decrement operators to `LLamaPos`	2 years ago
Martin Evans	c5146bac23	- Exposed KV debug view through `SafeLLamaContextHandle` - Added `KvCacheSequenceDivide` - Moved count tokens/cells methods to `SafeLLamaContextHandle`	2 years ago
Martin Evans	15a98b36d8	Updated everything to work with llama.cpp `ce32060198`	2 years ago
Martin Evans	5da2a2f64b	- Removed one of the constructors of `SafeLLamaHandleBase`, which implicitly states that memory is owned. Better to be explicit about this kind of thing! - Also fixed `ToString()` in `SafeLLamaHandleBase`	2 years ago
Jason Couture	ec59c5bf9e	Fix missing library name prefix for cuda	2 years ago
Jason Couture	443ce4fff4	While the dllimport changes work, manual path searching needed to be updated	2 years ago
Jason Couture	db7e1e88f8	Use llama instead of libllama in `[DllImport]` This results in windows users not needing to rename the DLL. This allows native llama builds to be dropped in, even on windows. I also took the time to update the documentation, removing references to renaming the files, since the names now match. Fixes #463	2 years ago
Martin Evans	92b9bbe779	Added methods to `SafeLLamaContextHandle` for KV cache manipulation	2 years ago
Martin Evans	96c26c25f5	Merge pull request #445 from martindevans/stateless_executor_llama_decode Swapped `StatelessExecutor` to use `llama_decode`!	2 years ago
Martin Evans	9fe878ae1f	- Fixed example - Growing more than double, if necessary	2 years ago
Martin Evans	9ede1bedc2	Automatically growing batch n_seq_max when exceeded. This means no parameters need to be picked when the batch is created.	2 years ago
Martin Evans	a2e29d393c	Swapped `StatelessExecutor` to use `llama_decode`! - Added `logits_i` argument to `Context.ApplyPenalty` - Added a new exception type for `llama_decode` return code	2 years ago
Martin Evans	99969e538e	- Removed some unused `eval` methods. - Added a `DecodeAsync` overload which runs the work in a task - Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents. - Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.	2 years ago
Martin Evans	36a9335588	Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch.	2 years ago
Martin Evans	1472704e12	Added a test with examples of troublesome strings from 0.9.1	2 years ago
Martin Evans	73172bbaba	Merge pull request #438 from martindevans/cleanup_model_unnecessary_unsafe Model Metadata Loading Cleanup	2 years ago
Martin Evans	ce1d302e7e	Moved some native methods into `SafeLlamaModelHandle`, these methods are all wrapped in safer accessors with no extra costs so there is no need to expose them.	2 years ago
Martin Evans	1e86755071	- Removed unnecessary `unsafe` block in model metadata loading - Clarified comments on native metadata loading methods	2 years ago

1 2 3 4 5

237 Commits (84bb5a36aba609d20992a9712cbda4c19f762033)