llama.cpp/ggml/src
Jeff Bolz 772703c8ff
vulkan: Optimize some mat-vec mul quant shaders (#10296)
Compute two result elements per workgroup (for Q{4,5}_{0,1}). This reuses
the B loads across the rows and also reuses some addressing calculations.
This required manually partially unrolling the loop, since the compiler
is less willing to unroll outer loops.

Add bounds-checking on the last iteration of the loop. I think this was at
least partly broken before.

Optimize the Q4_K shader to vectorize most loads and reduce the number of
bit twiddling instructions.
2024-11-16 07:26:57 +01:00
..
ggml-amx ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-blas ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-cann ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-cpu ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) 2024-11-16 01:53:37 +01:00
ggml-cuda ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-hip ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-kompute ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-metal ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-musa ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-rpc ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-sycl sycl: Use syclcompat::dp4a (#10267) 2024-11-15 11:09:12 +08:00
ggml-vulkan vulkan: Optimize some mat-vec mul quant shaders (#10296) 2024-11-16 07:26:57 +01:00
CMakeLists.txt ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-aarch64.c ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) 2024-11-16 01:53:37 +01:00
ggml-aarch64.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-alloc.c ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) 2024-10-16 11:28:01 +03:00
ggml-backend-impl.h llama : refactor model loader with backend registry (#10026) 2024-10-30 02:01:23 +01:00
ggml-backend-reg.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-backend.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-common.h ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) 2024-09-05 21:48:47 -04:00
ggml-impl.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-quants.c ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml.c ggml : fix some build issues 2024-11-15 21:45:32 +02:00