llama.cpp

History

Jeff Bolz dc1d2adfc0 vulkan: scalar flash attention implementation (#13324 ) * vulkan: scalar flash attention implementation * vulkan: always use fp32 for scalar flash attention * vulkan: use vector loads in scalar flash attention shader * vulkan: remove PV matrix, helps with register usage * vulkan: reduce register usage in scalar FA, but perf may be slightly worse * vulkan: load each Q value once. optimize O reduction. more tuning * vulkan: support q4_0/q8_0 KV in scalar FA * CI: increase timeout to accommodate newly-supported tests * vulkan: for scalar FA, select between 1 and 8 rows * vulkan: avoid using Float16 capability in scalar FA		2025-05-10 08:07:07 +02:00
..
actions	ci : move release workflow to a separate file (#13362 )	2025-05-08 13:15:28 +02:00
ISSUE_TEMPLATE	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
workflows	vulkan: scalar flash attention implementation (#13324 )	2025-05-10 08:07:07 +02:00
labeler.yml	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
pull_request_template.md	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00