llama.cpp/ggml
Johannes Gäßler 73e2ed3ce3
CUDA: use async data loading for FlashAttention (#11894)
* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-17 14:03:24 +01:00
..
cmake cmake: add ggml find package (#11369) 2025-01-26 12:07:48 -04:00
include repo : update links to new url (#11886) 2025-02-15 16:40:57 +02:00
src CUDA: use async data loading for FlashAttention (#11894) 2025-02-17 14:03:24 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) 2025-02-04 12:59:15 +02:00