Scott W Harden
4c3077d0f0
ChatSession: improve exception message
The original message contained the word "preceeded" which should be spelled as "preceded"
2 years ago
Martin Evans
c7d0dc915a
Assorted small changes to clean up some code warnings
2 years ago
Martin Evans
174f21a385
0.10.0
2 years ago
Martin Evans
d03c1a9201
Merge pull request #503 from martindevans/batched_executor_again
Introduced a new `BatchedExecutor`
2 years ago
Martin Evans
d47b6afe4d
Normalizing embeddings in `LLamaEmbedder`. As is done in llama.cpp: 2891c8aa9a/examples/embedding/embedding.cpp (L92)
2 years ago
Martin Evans
e9d9042576
Added `Divide` to `KvAccessor`
2 years ago
Martin Evans
1cc463b9b7
Added a finalizer to `BatchedExecutor`
2 years ago
Martin Evans
0c2cff0e1c
Added a Finalizer for `Conversation` in case it is not correctly disposed.
2 years ago
Martin Evans
949861a581
- Added a `Modify` method to `Conversation`. This grants **temporary** access to directly modify the KV cache.
- Re-implmented `Rewind` as an extension method using `Modify` internally
- Implemented `ShiftLeft`, which shifts everything over except for some starting tokens. This is the same as the `StatelessExecutor` out-of-context handling.
- Starting batch at epoch 1, this ensures that conversations (starting at zero) are below the current epoch. It also means `0` can always be used as a value guaranteed to be below the current epoch.
2 years ago
Martin Evans
b0acecf080
Created a new `BatchedExecutor` which processes multiple "Conversations" in one single inference batch. This is faster, even when the conversations are unrelated, and is much faster if the conversations share some overlap (e.g. a common system prompt prefix).
Conversations can be "forked", to create a copy of a conversation at a given point. This allows e.g. prompting a conversation with a system prefix just once and then forking it again and again for each individual conversation. Conversations can also be "rewound" to an earlier state.
Added two new examples, demonstrating forking and rewinding.
2 years ago
Martin Evans
90915c5a99
Added increment and decrement operators to `LLamaPos`
2 years ago
Martin Evans
82c471eac4
Merge pull request #500 from martindevans/improved_kv_cache_methods
Small KV Cache Handling Improvements
2 years ago
Martin Evans
c5146bac23
- Exposed KV debug view through `SafeLLamaContextHandle`
- Added `KvCacheSequenceDivide`
- Moved count tokens/cells methods to `SafeLLamaContextHandle`
2 years ago
Martin Evans
744758f110
Using `AddRange` in `LLamaEmbedder`
2 years ago
Martin Evans
c7103e86e4
Added new file types to quantisation
2 years ago
Martin Evans
17385e12b6
Merge pull request #479 from martindevans/update_binaries_feb_2024
Update binaries feb 2024
2 years ago
Martin Evans
bac40a3b7a
Added new binaries, from this run: https://github.com/SciSharp/LLamaSharp/actions/runs/7792319886
2 years ago
Jason Couture
c963b051e2
Add nuspec for OpenCL (CLBLAST)
2 years ago
Martin Evans
765c697f77
Fixed number type
2 years ago
Martin Evans
b2e815d51e
Updated all binaries (from this run: https://github.com/SciSharp/LLamaSharp/actions/runs/7746303349 )
2 years ago
Martin Evans
15a98b36d8
Updated everything to work with llama.cpp ce32060198
2 years ago
Martin Evans
c9c8cd0d62
- Swapped embeddings generator to use `llama_decode`
- Modified `GetEmbeddings` method to be async
2 years ago
Martin Evans
22aba9a671
Merge pull request #473 from martindevans/base_handle_removed_constructor
Removed `SafeLLamaHandleBase` Constructor
2 years ago
Martin Evans
5da2a2f64b
- Removed one of the constructors of `SafeLLamaHandleBase`, which implicitly states that memory is owned. Better to be explicit about this kind of thing!
- Also fixed `ToString()` in `SafeLLamaHandleBase`
2 years ago
Martin Evans
9b995510d6
Removed all setters in `IModelParams` and `IContextParams`, allowing implementations to be immutable.
2 years ago
Jason Couture
ec59c5bf9e
Fix missing library name prefix for cuda
2 years ago
Jason Couture
443ce4fff4
While the dllimport changes work, manual path searching needed to be updated
2 years ago
Jason Couture
db7e1e88f8
Use llama instead of libllama in `[DllImport]`
This results in windows users not needing to rename the DLL. This allows native llama builds to be dropped in, even on windows.
I also took the time to update the documentation, removing references to renaming the files, since the names now match.
Fixes #463
2 years ago
dependabot[bot]
d8eb817bf5
build(deps): bump System.Text.Json from 8.0.0 to 8.0.1
Bumps [System.Text.Json](https://github.com/dotnet/runtime ) from 8.0.0 to 8.0.1.
- [Release notes](https://github.com/dotnet/runtime/releases )
- [Commits](https://github.com/dotnet/runtime/compare/v8.0.0...v8.0.1 )
---
updated-dependencies:
- dependency-name: System.Text.Json
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
2 years ago
Martin Evans
92b9bbe779
Added methods to `SafeLLamaContextHandle` for KV cache manipulation
2 years ago
Martin Evans
a690db5d3e
Fixed build error caused by extra unnecessary parameter
2 years ago
Martin Evans
96c26c25f5
Merge pull request #445 from martindevans/stateless_executor_llama_decode
Swapped `StatelessExecutor` to use `llama_decode`!
2 years ago
Martin Evans
9fe878ae1f
- Fixed example
- Growing more than double, if necessary
2 years ago
Martin Evans
9ede1bedc2
Automatically growing batch n_seq_max when exceeded. This means no parameters need to be picked when the batch is created.
2 years ago
Martin Evans
a2e29d393c
Swapped `StatelessExecutor` to use `llama_decode`!
- Added `logits_i` argument to `Context.ApplyPenalty`
- Added a new exception type for `llama_decode` return code
2 years ago
Martin Evans
5b6e82a594
Improved the BatchedDecoding demo:
- using less `NativeHandle`
- Using `StreamingTokenDecoder` instead of obsolete detokenize method
2 years ago
Martin Evans
99969e538e
- Removed some unused `eval` methods.
- Added a `DecodeAsync` overload which runs the work in a task
- Replaced some `NativeHandle` usage in `BatchedDecoding` with higher level equivalents.
- Made the `LLamaBatch` grow when token capacity is exceeded, removing the need to manage token capacity externally.
2 years ago
Martin Evans
36a9335588
Removed `LLamaBatchSafeHandle` (using unmanaged memory, created by llama.cpp) and replaced it with a fully managed `LLamaBatch`. Modified the `BatchedDecoding` example to use new managed batch.
2 years ago
Martin Evans
1472704e12
Added a test with examples of troublesome strings from 0.9.1
2 years ago
Martin Evans
73172bbaba
Merge pull request #438 from martindevans/cleanup_model_unnecessary_unsafe
Model Metadata Loading Cleanup
2 years ago
Martin Evans
ce1d302e7e
Moved some native methods into `SafeLlamaModelHandle`, these methods are all wrapped in safer accessors with no extra costs so there is no need to expose them.
2 years ago
Martin Evans
1e86755071
- Removed unnecessary `unsafe` block in model metadata loading
- Clarified comments on native metadata loading methods
2 years ago
Martin Evans
de2b20aae5
- Added a specific exception for failing to load model weights.
- Checking if model is readable
2 years ago
Martin Evans
096e0e75f8
Check that the model file actually exists immediately before loading it. Improve #395
2 years ago
Martin Evans
3c6af909dd
Merge pull request #434 from martindevans/stateless_eos_check
Added a check for EOS token in LLamaStatelessExecutor
2 years ago
Martin Evans
f160fbd6d1
Added a check for EOS token in LLamaStatelessExecutor
2 years ago
Martin Evans
2ea2048b78
- Added a test for tokenizing just a new line (reproduce issue https://github.com/SciSharp/LLamaSharp/issues/430 )
- Properly displaying `LLamaToken`
- Removed all tokenisation code in `SafeLLamaContextHandle` - just pass it all through to the `SafeLlamaModelHandle`
- Improved `SafeLlamaModelHandle` tokenisation:
- Renting an array, for one less allocation
- Not using `&tokens[0]` to take a pointer to an array, this is redundant and doesn't work on empty arrays
2 years ago
Martin Evans
98635a0d5a
Fixed decoding of large tokens (over 16 bytes) in streaming text decoder
2 years ago
Martin Evans
402a110a3a
Merge pull request #404 from martindevans/switched_to_LLamaToken_struct
LLamaToken Struct
2 years ago
Steven Kennedy
988f2fa302
Reverted Net8.0
2 years ago