kv-cache : separate recurrent vs non-recurrent impl (#12799)

* kv-cache : serparate recurrent vs non-recurrent impl (wip)

ggml-ci

* kv-cache : init -> contructor + add llama_memory_params

ggml-ci

* kv-cache : fix callback reference

ggml-ci

* context : llama_kv_cache -> llama_memory_i

ggml-ci

* context : move memory creation logic to model

ggml-ci

* llama : remove reference of memory during encode

ggml-ci

* kv-cache : hide padding details in the implementation

ggml-ci

* kv-cache : add ubatch_next()

ggml-ci

* context : simplify sbatch logic

ggml-ci

* kv-cache : hide defrag logic in the implementation

ggml-ci

* context : hide kv cache details in implementation

ggml-ci

* build : fix

ggml-ci

* cont : another fix

ggml-ci

* kv-cache : simplify interface (wip)

ggml-ci

* kv-cache : use separate KV cell structs for unified/recurrent

ggml-ci

* kv-cache : clean-up

ggml-ci

* model : better llama_model::create_model() signature

ggml-ci

* kv-cache : fix recurrent seq_rm()

ggml-ci

* kv-cache : replace `struct callbacks` with `llama_model &`

ggml-ci

* kv-cache : replace `struct graph_params` with `llama_context &`

ggml-ci

* kv-cache : fix offload check

ggml-ci

* context : avoid passing unique_ptr

ggml-ci

* kv-cache : avoid using the backends from the llama_context

ref #13113

ggml-ci

* kv-cache : more consistent debug logs [no ci]

* kv-cache : do not pass the full llama_context for kv graphs

ggml-ci

* kv-cache : remove comment

* kv-cache : ggml_rope_ext_inplace -> ggml_rope_ext

ggml-ci

* kv-cache : fix recurrent multi-user case

ggml-ci

* memory : remove comments [no ci]

This commit is contained in:

Georgi Gerganov

2025-05-02 17:48:36 +03:00

• committed by

GitHub

parent cb06a3c363

commit c642bc014c

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

11 changed files with 1960 additions and 1048 deletions

									
										3

src/llama-batch.h
									
										View file
										
				@ -70,7 +70,8 @@ struct llama_sbatch {

				    // sequence-wise split

				    llama_ubatch split_seq(size_t n_ubatch);

				    void from_batch(const llama_batch & batch, size_t n_embd, bool simple_split = false, bool logits_all = false);

				    llama_sbatch() = default;

				    llama_sbatch(const llama_batch & batch, size_t n_embd, bool simple_split = false, bool logits_all = false);

				};

				// temporary allocate memory for the input batch if needed

Rows
Columns

kv-cache : separate recurrent vs non-recurrent impl (#12799)

3 src/llama-batch.h Unescape Escape View file

3

src/llama-batch.h

View file