llama.cpp

Author	SHA1	Message	Date
Nicolò Scipione	d394a9aedc	sycl : Remove waits from function calls (#13702 ) * removes the waits in async memcpy functions	2025-05-22 12:54:43 +01:00
Ewan Crawford	6b56a64690	SYCL: Avoid using with SYCL-Graph for unsupported nodes (#13587 ) Currently on a CUDA backend to SYCL when running `GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there are two operations that throw an exception from the blocking waits during queue recording. * `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187 * `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074 We've noticed that `ggml-cuda.cu` has the [check_node_graph_compatibility_and_refresh_copy_ops](`39e73ae0d6/ggml/src/ggml-cuda/ggml-cuda.cu (L2458-L2458)`) method for checking if a graph can be used, even if enabled. I've taken a similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking if a graph can be used for the operations even if a user has asked for it to be enabled.	2025-05-22 16:24:09 +08:00
Svetlozar Georgiev	4245e622e0	sycl: disable reorder for sycl mulmat (#13536 )	2025-05-20 11:34:15 +02:00
Nicolò Scipione	f7c9429c85	sycl : Overcoming workaround for mmap() allocation on Windows (#13482 ) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag	2025-05-20 08:54:43 +08:00
Łukasz Ślusarczyk	0a338ed013	sycl : fixed compilation warnings (#13582 )	2025-05-16 18:15:29 +08:00
Atharva Dubey	02cdd2d8b0	sycl: simplify bin_bcast_kernel (#13383 )	2025-05-15 17:39:52 +02:00
Svetlozar Georgiev	64bb51cf90	sycl: reordered Q4_K MMVQ (#13109 )	2025-05-15 17:35:44 +02:00
Łukasz Ślusarczyk	9c404ed54c	sycl: use oneDNN for matrices multiplication (#12972 )	2025-05-15 16:53:41 +02:00
Atharva Dubey	14492144c2	enable dpcpp nightly builds with libraries (#13406 )	2025-05-12 13:15:32 +08:00
Alberto Cabrera Pérez	17512a94d6	sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858 ) * sycl : Implemented reorder Q4_0 mmvq Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * sycl : Fixed mmvq being called when reorder is disabled * sycl : Improved comments in the quants header Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> * Use static_assert * safe_div -> ceil_div * Clarify qi comment * change the reorder tensor from init to execute OP * dbg * Undo changes to test-backend-ops * Refactor changes on top of q4_0 reorder fix * Missing Reverts * Refactored opt_for_reorder logic to simplify code path * Explicit inlining and unroll * Renamed mul_mat_algo enum for consistency --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com> Co-authored-by: romain.biessy <romain.biessy@codeplay.com>	2025-05-09 16:34:08 +01:00
Alberto Cabrera Pérez	8733e0cf6e	sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343 ) * sycl: fixed non-contiguous src1 mul_mats (nc and batched) * Fixed wrong static_cast inside kernel	2025-05-08 10:08:01 +01:00
Daniel Bevenius	13b0a04597	whisper: remove MSVC warnings pragmas (whisper/3090) * ggml : remove MSVC warnings pragmas This commit removes the MSVC-specific pragmas as these are now handled in ggml/CMakeLists.txt. * whisper : remove MSVC warning pragmas This commit removes the MSVC-specific pragmas. These are now handled in the ggml/CMakeLists.txt file.	2025-05-07 17:28:36 +03:00
Akarshan Biswas	1e333d5bba	SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254 ) * SYCL: Do not set tensor extras when reorder optimize is disabled * SYCL: Disable reorder optimize by default	2025-05-06 20:27:06 +05:30
Akarshan Biswas	66645a5285	SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308 ) ggml-ci	2025-05-05 13:39:10 +05:30
Akarshan Biswas	a4c340f974	SYCL: Add all missing unary kernels (#13074 ) * SYCL: Add all missing unary kernels ggml-ci * decouple kernel launch range from data size using strided loop * use ciel_div helper for num_blocks ggml-ci * clean auto imported header files	2025-04-28 11:33:25 +02:00
Neo Zhang Jianyu	514c45608f	change the reorder tensor from init to execute OP (#13003 )	2025-04-25 17:37:51 +08:00
Akarshan Biswas	5368ddda7a	SYCL: Add non-contiguous support in ROPE (#12993 ) ggml-ci	2025-04-21 19:13:30 +05:30
Akarshan Biswas	8d66005763	SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975 ) * SYCL: refactor move to a separate file * Fix binbcast * Remove duplicates * fix include formatting * fix typo	2025-04-18 15:57:56 +02:00
Akarshan Biswas	510676475f	SYCL: Add ROPE vision kernel (#12887 ) * SYCL: Add ROPE vision kernel * Add comment about rope mode	2025-04-15 10:37:42 +02:00
Akarshan Biswas	75afa0ae31	SYCL: Fix im2col (#12910 ) * SYCL: Fix im2col * restore local workgroup size adjustments for large inputs * restore format	2025-04-14 14:23:53 +02:00
Ewan Crawford	578754b315	sycl: Support sycl_ext_oneapi_limited_graph (#12873 ) The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update	2025-04-11 15:32:14 +02:00
Akarshan Biswas	fccf9cae83	SYCL: Add fp16 type support to unary op kernels (#12788 ) * SYCL: Add fp16 support to some elementwise OP kernels * remove comment ggml-ci * Use static_cast directly * remove not needed cast from tanh * Use static cast and remove unneeded castings * Adjust device_support_op for unary OPs * Use cast_data and typed_data struct to deduplicate casting code	2025-04-11 16:03:50 +08:00
Diego Devesa	fe92821ea9	ggml : add bilinear upscale support (ggml/1185)	2025-04-11 00:17:47 +03:00
Neo Zhang Jianyu	656babd6c2	Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812 ) * Revert "sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_s…" This reverts commit `518a01480e`. * Update ggml/src/ggml-sycl/ggml-sycl.cpp * Update ggml/src/ggml-sycl/ggml-sycl.cpp * rm tail space	2025-04-08 15:03:21 +08:00
zhouwg	518a01480e	sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (#12734 )	2025-04-07 17:22:57 +02:00
Nicolò Scipione	94148ba330	sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (#12625 )	2025-04-04 16:00:46 +02:00
Romain Biessy	8293970542	SYCL: Rename oneMKL to oneMath (#12192 ) * Rename oneMKL Interface to oneMath * Use oneMath for Intel vendor * Rename occurences to mkl * clang-format * Silence verbose warnings * Set oneMath HIP_TARGETS * Fix silence warnings * Remove step to build oneMath from build instructions * Use fixed oneMath version * Remove INTEL_CPU * Fold CMake oneDNN conditions * Use Intel oneMKL for Intel devices * Improve CMake message * Link against MKL::MKL_SYCL::BLAS only * Move oneMath documentation to Nvidia and AMD sections	2025-04-01 16:24:29 +08:00
Akarshan Biswas	8bbf26083d	SYCL: switch to SYCL namespace (#12674 )	2025-04-01 10:11:39 +02:00
Akarshan Biswas	6c02a032fa	SYCL: Remove misleading ggml_sycl_op_flatten function (#12387 ) * SYCL: Remove misleading ggml_sycl_op_flatten function * remove trailing whitespace * Fix L2 norm from rebase * remove try catch block from element_wise.cpp * remove comment from common.hp * ggml-sycl.cpp: Add try catch sycl::exception block in compute_forward * norm.cpp: remove try catch exception block	2025-03-31 11:25:24 +02:00
Akarshan Biswas	f17a3bb4e8	SYCL: implement memset ggml backend buffer interface (#12580 ) * SYCL: implement memset ggml backend buffer interface * use GGML_ABORT macro * Do not wait for all queues to finish for memset operation	2025-03-27 09:46:00 +08:00
Akarshan Biswas	e2f560175a	SYCL: disable Q4_0 reorder optimization (#12560 ) ggml-ci	2025-03-25 18:40:18 +08:00
Svetlozar Georgiev	9ffcc9e374	sycl: cleanup oneDNN related code (#12097 )	2025-03-21 10:15:56 +08:00
Łukasz Ślusarczyk	35cae5ba05	SYCL: using graphs is configurable by environment variable and compile option (#12371 ) * alberto changes * enable sycl graphs by env variable * fixed compilation warnings in ggml-sycl.cpp * renamed graph variables * fix markdown in docs/backend/SYCL.md Co-authored-by: Romain Biessy <romain.biessy@codeplay.com> * fix markdown in docs/backend/SYCL.md again * compiling graphs by default, renamed graph_enable to graph_disable --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>	2025-03-18 11:16:31 +01:00
Łukasz Ślusarczyk	a53f7f7b88	fixed compilation warnings in ggml-sycl (#12424 )	2025-03-18 08:51:25 +08:00
Molly Sophia	7dfad387e3	llama: Add support for RWKV v7 architecture (#12412 ) * ggml: Add op l2_norm Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Apply code-format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * fix MUSA build Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-03-18 07:27:50 +08:00
Akarshan Biswas	b3c9a65673	SYCL: set extras only on GGML_TYPE_Q4_0 (#12366 ) * SYCL: set extras only on GGML_TYPE_Q4_0 * release tensor_extras in reset buffer interface	2025-03-17 09:45:12 +08:00
aubreyli	3d35d87b41	SYCL: Delete redundant plus sign and space (#12391 )	2025-03-15 15:49:03 +01:00
fairydreaming	b19bd064c0	SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399 ) * sycl : support non-contiguous tensors in binary ops * sycl : silence unused variable warning --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-03-15 22:19:30 +08:00
Alberto Cabrera Pérez	363f8c5d67	sycl : variable sg_size support for mmvq kernels (#12336 )	2025-03-12 09:57:32 +00:00
Akarshan Biswas	5e43f104cc	SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201 )	2025-03-05 16:58:23 +01:00
Akarshan Biswas	ece9745bb8	SYCL: Move CPY kernels to a separate file and add few missing kernels (#12133 ) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs	2025-03-03 11:07:22 +01:00
William Tambellini	70680c48e5	ggml : upgrade init_tensor API to return a ggml_status (#11854 ) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-02-28 14:41:47 +01:00
Neo Zhang Jianyu	08d5986290	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 ) * opt performance by reorder for Intel GPU * detect hw type and save opt feature, and print opt feature * correct name * support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed * add env variable GGML_SYCL_DISABLE_OPT for debug * use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT * add performance data * mv getrows functions to separeted files * fix global variables --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>	2025-02-24 22:33:23 +08:00
Akarshan Biswas	8303e8b0fb	SYCL: Fix GGML_SYCL_DEBUG macro (#11995 )	2025-02-24 10:18:25 +00:00
Akarshan Biswas	ec3bc8270b	SYCL: remove XMX info from print devices (#11712 )	2025-02-07 09:27:53 +00:00
Akarshan Biswas	194b2e69f8	SYCL: Adjust support condition for norm operators (#11674 ) SYCL does not support non contiguous tensors for norm operations	2025-02-06 11:42:35 +00:00
Akarshan Biswas	6e84b0ab8e	SYCL : SOFTMAX F16 mask support and other fixes (#11261 ) Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021. To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it). * SYCL: SOFTMAX F16 mask support and other fixes * test-backend-ops: Add F16 mask test cases	2025-01-28 09:56:58 +00:00
Nicolò Scipione	99487b57d4	SYCL: Introducing memory host pool (#11251 ) * Implement host pool for matrix_info Creating a new memory pool on the host to store memory location for matrix_info needed to launch gemm_batch from oneMKL/oneMath. Removing complex support in gemm_batch since it is not used in llama.cpp * Remove unnecessary headers and cast * Reorder member variable to avoid warning on initialization * Formatting * Remove unused variable * Address PR review feedback - remove warning --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-01-19 21:33:34 +08:00
Akarshan Biswas	f446c2cf6a	SYCL: Add gated linear attention kernel (#11175 ) * SYCL: Add Gated Linear attention kernel * glahpp: add a space at the end of file * gla: Put the barrier inside the main logic loop	2025-01-15 11:20:17 +08:00
Molly Sophia	ee7136c6d1	llama: add support for QRWKV6 model architecture (#11001 ) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-10 09:58:08 +08:00

1 2

95 commits