llama.cpp

History

Neo Zhang Jianyu 08d5986290 [SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 ) * opt performance by reorder for Intel GPU * detect hw type and save opt feature, and print opt feature * correct name * support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed * add env variable GGML_SYCL_DISABLE_OPT for debug * use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT * add performance data * mv getrows functions to separeted files * fix global variables --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>		2025-02-24 22:33:23 +08:00
..
dpct	SYCL: Introducing memory host pool (#11251 )	2025-01-19 21:33:34 +08:00
backend.hpp	SYCL: Add gated linear attention kernel (#11175 )	2025-01-15 11:20:17 +08:00
CMakeLists.txt	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
common.cpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
common.hpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
concat.cpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
concat.hpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
conv.cpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
conv.hpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
convert.cpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
convert.hpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
dequantize.hpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
dmmv.cpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
dmmv.hpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
element_wise.cpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
element_wise.hpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
gemm.hpp	SYCL: Reduce most of the compiler warnings (#10748 )	2024-12-13 12:12:15 +05:30
getrows.cpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
getrows.hpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
ggml-sycl.cpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
gla.cpp	SYCL: Add gated linear attention kernel (#11175 )	2025-01-15 11:20:17 +08:00
gla.hpp	SYCL: Add gated linear attention kernel (#11175 )	2025-01-15 11:20:17 +08:00
im2col.cpp	SYCL: Reduce most of the compiler warnings (#10748 )	2024-12-13 12:12:15 +05:30
im2col.hpp	[SYCL] Fix SYCL `im2col` and `convert` Overflow with Large Dims (#9052 )	2024-08-20 23:06:51 +08:00
mmq.cpp	SYCL: Reduce most of the compiler warnings (#10748 )	2024-12-13 12:12:15 +05:30
mmq.hpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
mmvq.cpp	SYCL: Reduce most of the compiler warnings (#10748 )	2024-12-13 12:12:15 +05:30
mmvq.hpp	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
norm.cpp	SYCL: Reduce most of the compiler warnings (#10748 )	2024-12-13 12:12:15 +05:30
norm.hpp	[SYCL] Fix the sub group size of Intel (#8106 )	2024-07-02 10:16:00 +08:00
outprod.cpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
outprod.hpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
presets.hpp	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133 )	2024-11-07 15:19:10 +08:00
rope.cpp	SYCL: Reduce most of the compiler warnings (#10748 )	2024-12-13 12:12:15 +05:30
rope.hpp	[SYCL] Update SYCL-Rope op and Refactor (#8157 )	2024-07-01 19:39:06 +08:00
softmax.cpp	SYCL: Fix GGML_SYCL_DEBUG macro (#11995 )	2025-02-24 10:18:25 +00:00
softmax.hpp	SYCL : SOFTMAX F16 mask support and other fixes (#11261 )	2025-01-28 09:56:58 +00:00
sycl_hw.cpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
sycl_hw.hpp	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
tsembd.cpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
tsembd.hpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00
vecdotq.hpp	sycl: Use syclcompat::dp4a (#10267 )	2024-11-15 11:09:12 +08:00
wkv6.cpp	llama: add support for QRWKV6 model architecture (#11001 )	2025-01-10 09:58:08 +08:00
wkv6.hpp	SYCL: Refactor ggml_sycl_compute_forward (#11121 )	2025-01-10 08:13:03 +08:00