llama.cpp

History

Bizhao Shi 2d38b6e400 CANN: Add the basic supports of Flash Attention kernel (#13627 ) * cann: add the basic FA support * cann: update the readme * cann: update the FlashAttention with PSEShift * cann: update the input parameters in FA * cann: update the alibi with max_bias * cann: add the constrints of softcap * cann: update the docs CANN.md * cann: update the docs CANN.md * cann: fix typo of CANN.md * cann: add some comments and update the CANN.md * cann: update the CANN.md * cann: update the inner precise for fusedInferAttention * cann: update the constraints of flash_attn_ext on ggml-cann.cpp * cann: clean the whitespace * cann: clean the whitespace * cann: add a new endline		2025-05-26 10:20:18 +08:00
..
cmake	scripts : update sync + fix cmake merge	2025-03-27 10:09:29 +02:00
include	ggml : fix the order of ggml_unary_op (#13718 )	2025-05-23 08:12:48 +02:00
src	CANN: Add the basic supports of Flash Attention kernel (#13627 )	2025-05-26 10:20:18 +08:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	sycl: use oneDNN for matrices multiplication (#12972 )	2025-05-15 16:53:41 +02:00