llama.cpp

History

Georgi Gerganov a19b5cef16 llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 ) * ggml : FA supports F32 V * graph : cast KV to F16 when the KV cache is not used ggml-ci * server : add test that exercises embeddings with FA enabled ggml-ci		2025-04-08 19:54:51 +03:00
..
CMakeLists.txt	ggml : skip intermediate .air file when compiling .metallib (#12247 )	2025-03-07 14:15:27 +01:00
ggml-metal-impl.h	metal : improve FA + improve MoE (#12612 )	2025-03-28 20:21:59 +02:00
ggml-metal.m	llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )	2025-04-08 19:54:51 +03:00
ggml-metal.metal	metal : use F32 prec in FA kernels (#12688 )	2025-04-01 14:57:19 +03:00