llama.cpp/ggml/src
Akarshan Biswas 228f34c9ce
SYCL: Implement few same quantized type copy kernels (#13739)
* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.
2025-06-07 18:58:20 +05:30
..
ggml-blas cmake : Fix broken CMake error messages (ggml/1252) 2025-06-01 13:43:57 +03:00
ggml-cann CANN: Add SOC TYPE printing in cmake configuration (#13837) 2025-05-28 11:54:20 +08:00
ggml-cpu releases : use dl backend for linux release, remove arm64 linux release (#13996) 2025-06-04 13:15:54 +02:00
ggml-cuda CUDA: fix FTZ in FA for Gemma 3 (#13991) 2025-06-04 08:57:05 +02:00
ggml-hip CUDA/HIP: Share the same unified memory allocation logic. (#12934) 2025-04-15 11:20:38 +02:00
ggml-kompute llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-metal metal : use F32 accumulators in FA kernels (#13975) 2025-06-02 21:33:40 +03:00
ggml-musa musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647) 2025-05-21 09:58:49 +08:00
ggml-opencl opencl: add backend_synchronize (#13939) 2025-06-02 16:54:58 -07:00
ggml-rpc rpc : add rpc_msg_set_tensor_hash_req (#13353) 2025-05-09 10:31:07 +03:00
ggml-sycl SYCL: Implement few same quantized type copy kernels (#13739) 2025-06-07 18:58:20 +05:30
ggml-vulkan vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001) 2025-06-05 16:00:29 +02:00
CMakeLists.txt llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) 2025-06-05 11:57:42 +02:00
ggml-alloc.c ggml: Don't assert fail when tensor data changes (#13222) 2025-05-01 22:46:10 +02:00
ggml-backend-impl.h ggml : upgrade init_tensor API to return a ggml_status (#11854) 2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp ggml-backend : fix backend search path (#12330) 2025-03-11 14:25:17 +01:00
ggml-backend.cpp sched : avoid changing cur_copy when a graph is already allocated (#13922) 2025-05-30 18:56:19 +02:00
ggml-common.h musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (#12611) 2025-03-30 10:59:38 +02:00
ggml-impl.h ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) 2025-06-01 13:43:57 +03:00
ggml-opt.cpp mnist: fix segmentation fault (ggml/1227) 2025-05-19 13:29:56 +03:00
ggml-quants.c whisper: remove MSVC warnings pragmas (whisper/3090) 2025-05-07 17:28:36 +03:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c sync : whisper.cpp (ggml/1250) 2025-06-01 13:43:57 +03:00
ggml.cpp ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) 2025-06-01 13:43:57 +03:00
gguf.cpp gguf: fix failure on version == 0 (#13956) 2025-06-01 18:08:05 +02:00