![]() * kv-cache : simplify the "struct llama_kv_cache" interface ggml-ci * kv-cache : revert the (n_swa + n_ubatch) change (for next PR) ggml-ci * kv-cache : some comments ggml-ci * context : fix graph reserve for multiple sequences ggml-ci * kv-cache : fix typo [no ci] * kv-cache : fix find_slot() logic for free slots ggml-ci * llama : add TODO for deprecating the defrag API in the future * kv-cache : improve find_slot() using min/max seq pos info ggml-ci * llama : handle aborts and compute errors ggml-ci * memory : extract state into llama_memory_state ggml-ci * kv-cache : add comments ggml-ci * server : update batching logic to reset n_batch on successful decode * server : upon full re-processing, remove the sequence from the cache * kv-cache : add TODO for doing split_equal when split_simple fails ggml-ci |
||
---|---|---|
.. | ||
batched-bench | ||
cvector-generator | ||
export-lora | ||
gguf-split | ||
imatrix | ||
llama-bench | ||
main | ||
mtmd | ||
perplexity | ||
quantize | ||
rpc | ||
run | ||
server | ||
tokenize | ||
tts | ||
CMakeLists.txt |