llama.cpp/tests
Jeff Bolz eddfb43850
vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505)
* tests: add mul_mat perf/functional tests for p021/nc vulkan shaders

* vulkan: Optimize mul_mat_vec p021 and nc shaders.

These shaders are used in attention calculations, and when the KV cache grows
large they start to dominate the run time. For the nc shader (which is called
with large 'k' dimension), use unrolling and vector loads. For the p021 shader
(which is called with large 'm' and small 'k' dimensions), take advantage of
grouped query attention to reuse loads from the A matrix for the whole group,
and reduce the number of workgroups (too much overhead from tiny dispatches).

Using subgroupAdd in the p021 shader also helps, use that conditionally.
2025-03-22 09:40:11 +01:00
..
.gitignore tests : gitignore ggml-common.h 2024-03-09 14:17:11 +02:00
CMakeLists.txt sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
get-model.cpp ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
get-model.h ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs server : revamp chat UI with vuejs and daisyui (#10175) 2024-11-07 17:31:10 -04:00
test-arg-parser.cpp speculative : refactor and add a simpler example (#10362) 2024-11-25 09:58:41 +02:00
test-autorelease.cpp llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-backend-ops.cpp vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505) 2025-03-22 09:40:11 +01:00
test-barrier.cpp ggml : move CPU backend to a separate file (#10144) 2024-11-03 19:34:08 +01:00
test-c.c Nomic Vulkan backend (#4456) 2024-01-29 15:50:50 -05:00
test-chat-template.cpp tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900) 2025-02-18 18:03:23 +00:00
test-chat.cpp server: extract <think> tags from qwq outputs (#12297) 2025-03-10 10:59:03 +00:00
test-double-float.cpp ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
test-gguf.cpp cleanup: fix compile warnings associated with gnu_printf (#11811) 2025-02-12 10:06:53 -04:00
test-grammar-integration.cpp sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
test-grammar-llguidance.cpp sampling : support for llguidance grammars (#10224) 2025-02-02 09:55:32 +02:00
test-grammar-parser.cpp llama : refactor sampling v2 (#9294) 2024-09-07 15:16:19 +03:00
test-json-schema-to-grammar.cpp tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) 2025-03-05 13:05:13 +00:00
test-llama-grammar.cpp llama : minor grammar refactor (#10897) 2024-12-19 17:42:13 +02:00
test-log.cpp common : use common_ prefix for common library functions (#9805) 2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh ci : use -no-cnv in gguf-split tests (#11254) 2025-01-15 18:28:35 +02:00
test-model-load-cancel.cpp llama : update llama_model API names (#11063) 2025-01-06 10:55:18 +02:00
test-opt.cpp ggml : inttypes.h -> cinttypes (#0) 2024-11-17 08:30:29 +02:00
test-quantize-fns.cpp tests : fix test-quantize-fns to init the CPU backend (#12306) 2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp ggml : inttypes.h -> cinttypes (#0) 2024-11-17 08:30:29 +02:00
test-rope.cpp llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
test-sampling.cpp sampling: add Top-nσ sampler (#11223) 2025-02-13 08:45:57 +02:00
test-tokenizer-0.cpp llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-tokenizer-0.py py : logging and flake8 suppression refactoring (#7081) 2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh tests : fix test-tokenizer-0.sh 2024-05-28 15:04:09 +03:00
test-tokenizer-1-bpe.cpp llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-tokenizer-1-spm.cpp llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-tokenizer-random.py llama : add llama_vocab, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00