* llama: propagating the results of `graph_compute` to the user interface * llama: reverting kv_cache in case of failed compute * llama: `llama_kv_cache_state` was removed, only the result of `llama_graph_compute` is returned * llama: restore a kv_cache in case of failed computation * llama: correct reverting of the entire batch. also updates `llama_kv_cache_find_slot`, will correctly count the number of `used` cells for recurrent models * llama: updated comments * llama : add comments about KV cache state after error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| llama-grammar.cpp | ||
| llama-grammar.h | ||
| llama-impl.h | ||
| llama-sampling.cpp | ||
| llama-sampling.h | ||
| llama-vocab.cpp | ||
| llama-vocab.h | ||
| llama.cpp | ||
| unicode-data.cpp | ||
| unicode-data.h | ||
| unicode.cpp | ||
| unicode.h | ||