llama.cpp

Author	SHA1	Message	Date
Nicolò Scipione	8308f98c7f	sycl: add usage of enqueue_functions extension (#14244 ) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-20 15:07:21 +02:00
Romain Biessy	9012eb9b45	sycl: Add more debug prints (#13640 )	2025-05-26 10:28:53 +02:00
Łukasz Ślusarczyk	a53f7f7b88	fixed compilation warnings in ggml-sycl (#12424 )	2025-03-18 08:51:25 +08:00
Akarshan Biswas	8303e8b0fb	SYCL: Fix GGML_SYCL_DEBUG macro (#11995 )	2025-02-24 10:18:25 +00:00
Akarshan Biswas	6e84b0ab8e	SYCL : SOFTMAX F16 mask support and other fixes (#11261 ) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-01-28 09:56:58 +00:00
Akarshan Biswas	83ed24a97b	SYCL: Reduce most of the compiler warnings (#10748 ) * Try to reduce some unused and typecast warnings * Reduce compiler warnings step 2 * add a newline at the end of the file * Initialize nreduce as size_t * [SYCL] Remove pragma directives from mmq.cpp * SYCL: mmq add condition to prevent blocks_per_tile_x_row variable from becoming 0 * SYCL softmax: Initialize nreduce as size_t * ggml-sycl.cpp: fix some trailing whitespaces * SYCL: remove the unused variables instead of commenting it out * SYCL poo2d kernel: set NAN for invalid pooling op * SYCL gemm.hpp: remove pragma directives * SYCL gemm.hpp: use const cast to properly support dnnl::memory * SYCL: wkv6 remove a comment * SYCL: clean comments step 2 * SYCL: clean comments and variables step 3 * SYCL: Use GGML_UNUSED for unused variables * SYCL: remove extra empty lines and a comment * Remove TODO * cleanup spaces * add a stdout for unsupported op * use sycl printf over fprintf * remove prints for CI * SYCL ggml-sycl: pool2D use sycl::nan and remove if-else block --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-12-13 12:12:15 +05:30
luoyu-intel	063d99ad11	[SYCL] fix scratch size of softmax (#8642 )	2024-07-23 15:43:28 +08:00
AidanBeltonS	f4444d992c	[SYCL] Use multi_ptr to clean up deprecated warnings (#8256 )	2024-07-10 16:10:49 +01:00
luoyu-intel	a9554e20b6	[SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266 ) * fix group_norm ut * split softmax * fix softmax * add concat support condition * revert debug code * move QK_WARP_SIZE to presets.hpp	2024-07-05 13:06:13 +08:00

9 commits