llama.cpp

History

Alan Gray 3f9da22c2b Simplify and improve CUDA graphs through use of indirect copy pointers (#9017 ) * CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers Previously there was complexity in the CUDA graphs implementation due frequently changing parameters to copy kernels associated with K and V cache pointers. This patch simplifies by using indirection to avoid such parameters frequently changing, avoiding the need for frequent graph updates. Fixes #12152 * Addressed comments * fix HIP builds * properly sync to stream * removed ggml_cuda_cpy_fn_ptrs * move stream sync before free * guard to only use indirection with graphs * style fixes * check for errors --------- Co-authored-by: slaren <slarengh@gmail.com>		2025-04-03 03:31:15 +02:00
..
cmake	scripts : update sync + fix cmake merge	2025-03-27 10:09:29 +02:00
include	metal : improve FA + improve MoE (#12612 )	2025-03-28 20:21:59 +02:00
src	Simplify and improve CUDA graphs through use of indirect copy pointers (#9017 )	2025-04-03 03:31:15 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : add logging for native build options/vars (whisper/2935)	2025-03-30 08:33:31 +03:00