llama.cpp/ggml
Daniele cf2270e4d3
vulkan: subgroup size tuning (#12087)
* vulkan: subgroup size test

* Vulkan: Add device architecture enum and logic to recognize AMD generations

* vulkan: use new architecture logic to specify subgroup size

* Initial vulkan subgroup size tuning for RDNA3

* vulkan: commonize RDNA subgroup tuning

* vulkan: override subgroup size if required_subgroup_size = 0

* vulkan: disable warp 32 for RDNA3

* vulkan: fine tuned RDNA1 subgroup sizes

* vulkan: adjusted subgroup size map

* vulkan: fixed RDNA2 subgroup map

---------

Co-authored-by: 0cc4m <picard12@live.de>
2025-03-17 12:42:33 +01:00
..
cmake cmake : enable building llama.cpp using system libggml (#12321) 2025-03-17 11:05:23 +02:00
include ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) 2025-03-06 02:26:10 +01:00
src vulkan: subgroup size tuning (#12087) 2025-03-17 12:42:33 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt opencl: use OpenCL C standard supported by the device (#12221) 2025-03-10 09:57:00 -07:00