llama.cpp

History

Akarshan Biswas 228f34c9ce SYCL: Implement few same quantized type copy kernels (#13739 ) * SYCL: Implement few same quantized type copy kernels * Use memcpy for copying contiguous tensors ggml-ci * feat(sycl): add contiguous tensor copy support and device checks Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance. * refactor: replace specific block copy functions with template The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed. * Exclude BF16 support for COPY tensors for now ggml-ci * perf: adjust SYCL copy kernel block sizes for efficiency Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.		2025-06-07 18:58:20 +05:30
..
cmake	cmake: Factor out CPU architecture detection (#13883 )	2025-05-29 12:50:25 +02:00
include	ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)	2025-06-01 13:43:57 +03:00
src	SYCL: Implement few same quantized type copy kernels (#13739 )	2025-06-07 18:58:20 +05:30
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013 )	2025-06-05 11:57:42 +02:00