llama.cpp

History

Jeff Bolz dc1d2adfc0 vulkan: scalar flash attention implementation (#13324 ) * vulkan: scalar flash attention implementation * vulkan: always use fp32 for scalar flash attention * vulkan: use vector loads in scalar flash attention shader * vulkan: remove PV matrix, helps with register usage * vulkan: reduce register usage in scalar FA, but perf may be slightly worse * vulkan: load each Q value once. optimize O reduction. more tuning * vulkan: support q4_0/q8_0 KV in scalar FA * CI: increase timeout to accommodate newly-supported tests * vulkan: for scalar FA, select between 1 and 8 rows * vulkan: avoid using Float16 capability in scalar FA		2025-05-10 08:07:07 +02:00
..
bench.yml.disabled	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
build-linux-cross.yml	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
build.yml	vulkan: scalar flash attention implementation (#13324 )	2025-05-10 08:07:07 +02:00
close-issue.yml	ci : do not stale-close roadmap issues	2025-02-04 09:31:01 +02:00
docker.yml	docker : disable arm64 and intel images (#13356 )	2025-05-07 16:36:33 +02:00
editorconfig.yml	ci : pin dependency to specific version (#11137 )	2025-01-08 12:07:20 +01:00
gguf-publish.yml	ci : update checkout, setup-python and upload-artifact to latest (#6456 )	2024-04-03 21:01:13 +03:00
labeler.yml	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
python-check-requirements.yml	py : fix requirements check '==' -> '~=' (#8982 )	2024-08-12 11:02:01 +03:00
python-lint.yml	ci : add ubuntu cuda build, build with one arch on windows (#10456 )	2024-11-26 13:05:07 +01:00
python-type-check.yml	ci : reduce severity of unused Pyright ignore comments (#9697 )	2024-09-30 14:13:16 -04:00
release.yml	ci : limit write permission to only the release step + fixes (#13392 )	2025-05-08 23:45:22 +02:00
server.yml	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00