R0CKSTAR
e291450b76
musa: fix build warning ( #13129 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-27 13:22:49 +02:00
Alan Gray
207c22ec2d
ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes ( #12970 )
2025-04-17 15:19:42 +02:00
Sigbjørn Skjæret
7538246e7c
cuda : add f32 to bf16 copy op ( #12806 )
...
This allows BF16 KV-cache on CUDA.
2025-04-08 23:21:31 +02:00
R0CKSTAR
916c83bfe7
musa: fix compilation warnings in mp_22/31 ( #12780 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-06 15:23:54 +02:00
Alan Gray
3f9da22c2b
Simplify and improve CUDA graphs through use of indirect copy pointers ( #9017 )
...
* CUDA: Simplify and improve CUDA graphs through use of indirect copy pointers
Previously there was complexity in the CUDA graphs implementation due
frequently changing parameters to copy kernels associated with K and V
cache pointers. This patch simplifies by using indirection to avoid
such parameters frequently changing, avoiding the need for frequent
graph updates.
Fixes #12152
* Addressed comments
* fix HIP builds
* properly sync to stream
* removed ggml_cuda_cpy_fn_ptrs
* move stream sync before free
* guard to only use indirection with graphs
* style fixes
* check for errors
---------
Co-authored-by: slaren <slarengh@gmail.com>
2025-04-03 03:31:15 +02:00
Gian-Carlo Pascutto
d70908421f
cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. ( #12000 )
2025-02-22 09:43:24 +01:00
Ivan
116efee0ee
cuda: add q8_0->f32 cpy operation ( #9571 )
...
llama: enable K-shift for quantized KV cache
It will fail on unsupported backends or quant types.
2024-09-24 02:14:24 +02:00
slaren
4db04784f9
cuda : fix defrag with quantized KV ( #9319 )
2024-09-05 11:13:11 +02:00
slaren
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
Clint Herron
07a3fc0608
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. ( #8258 )
2024-07-02 12:18:10 -04:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00