llama.cpp/ggml
Georgi Gerganov a19b5cef16
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
2025-04-08 19:54:51 +03:00
..
cmake scripts : update sync + fix cmake merge 2025-03-27 10:09:29 +02:00
include metal : improve FA + improve MoE (#12612) 2025-03-28 20:21:59 +02:00
src llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) 2025-04-08 19:54:51 +03:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : add logging for native build options/vars (whisper/2935) 2025-03-30 08:33:31 +03:00