llama.cpp/ggml
Jeff Bolz 015022bb53
vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931)
The grouped query attention optmization doesn't require a power of two ratio,
the only thing relying on it was the modulo operation written as bitwise &.

split_k need not depend on gqa_ratio - enable it any time there's only one
workgroup in the X dimension. The shader gets the split index from the x coord,
and multiple workgroups in the X dimension (pre-split) indicates a larger
FA operation that wouldn't need splitting.
2025-04-16 20:37:25 +02:00
..
cmake scripts : update sync + fix cmake merge 2025-03-27 10:09:29 +02:00
include ggml : add bilinear upscale support (ggml/1185) 2025-04-11 00:17:47 +03:00
src vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931) 2025-04-16 20:37:25 +02:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt CUDA/HIP: Share the same unified memory allocation logic. (#12934) 2025-04-15 11:20:38 +02:00