kv-cache : rework kv_cell (#13706)

* kv-cache : rework kv_cell ggml-ci * kv-cells : use "shift" instead of "delta" consistently ggml-ci * llama : add llama_max_parallel_sequences() ggml-ci * kv-cells : update comments [no ci] * context : fail upon construction if sequences exceed max value ggml-ci * kv-cells : get_pos() -> pos_get() + comments ggml-ci * kv-cells : fix tracking of "used" cells ggml-ci
2025-05-25 16:34:36 +03:00 · 2025-05-25 16:34:36 +03:00 · de2ef53a4b
commit de2ef53a4b
parent c508256db2
8 changed files with 470 additions and 253 deletions
--- a/include/llama.h
+++ b/include/llama.h
@ -471,6 +471,7 @@ extern "C" {
    LLAMA_API int64_t llama_time_us(void);

    LLAMA_API size_t llama_max_devices(void);
+    LLAMA_API size_t llama_max_parallel_sequences(void);

    LLAMA_API bool llama_supports_mmap       (void);
    LLAMA_API bool llama_supports_mlock      (void);