llama.cpp

History

Jeff Bolz 015022bb53 vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931 ) The grouped query attention optmization doesn't require a power of two ratio, the only thing relying on it was the modulo operation written as bitwise &. split_k need not depend on gqa_ratio - enable it any time there's only one workgroup in the X dimension. The shader gets the split index from the x coord, and multiple workgroups in the X dimension (pre-split) indicates a larger FA operation that wouldn't need splitting.		2025-04-16 20:37:25 +02:00
..
cmake	scripts : update sync + fix cmake merge	2025-03-27 10:09:29 +02:00
include	ggml : add bilinear upscale support (ggml/1185)	2025-04-11 00:17:47 +03:00
src	vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931 )	2025-04-16 20:37:25 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	CUDA/HIP: Share the same unified memory allocation logic. (#12934 )	2025-04-15 11:20:38 +02:00