* vulkan: scalar flash attention implementation * vulkan: always use fp32 for scalar flash attention * vulkan: use vector loads in scalar flash attention shader * vulkan: remove PV matrix, helps with register usage * vulkan: reduce register usage in scalar FA, but perf may be slightly worse * vulkan: load each Q value once. optimize O reduction. more tuning * vulkan: support q4_0/q8_0 KV in scalar FA * CI: increase timeout to accommodate newly-supported tests * vulkan: for scalar FA, select between 1 and 8 rows * vulkan: avoid using Float16 capability in scalar FA |
||
|---|---|---|
| .. | ||
| bench.yml.disabled | ||
| build-linux-cross.yml | ||
| build.yml | ||
| close-issue.yml | ||
| docker.yml | ||
| editorconfig.yml | ||
| gguf-publish.yml | ||
| labeler.yml | ||
| python-check-requirements.yml | ||
| python-lint.yml | ||
| python-type-check.yml | ||
| release.yml | ||
| server.yml | ||