llama.cpp

Author	SHA1	Message	Date
Georgi Gerganov	8551c44d84	context : always use non-causal attention for encoder graphs (#12447 ) * context : always use non-causal attention for encoder graphs ggml-ci * context : move the change to llama_context::encode() ggml-ci	2025-03-18 13:05:49 +02:00
Łukasz Ślusarczyk	35cae5ba05	SYCL: using graphs is configurable by environment variable and compile option (#12371 ) * alberto changes * enable sycl graphs by env variable * fixed compilation warnings in ggml-sycl.cpp * renamed graph variables * fix markdown in docs/backend/SYCL.md Co-authored-by: Romain Biessy <romain.biessy@codeplay.com> * fix markdown in docs/backend/SYCL.md again * compiling graphs by default, renamed graph_enable to graph_disable --------- Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>	2025-03-18 11:16:31 +01:00
Georgi Gerganov	810e0af3f5	server : fix warmup draft cache type (#12446 ) ggml-ci	2025-03-18 12:05:42 +02:00
Prajwal B Mehendarkar	eba92d64c3	cmake : fix PowerPC build (#12241 ) Closes #12240	2025-03-18 11:37:33 +02:00
fj-y-saito	d9a14523bb	ggml : add SVE support for q6_K_q8_K (#12361 )	2025-03-18 10:14:39 +02:00
0cc4m	fd123cfead	Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (#12434 )	2025-03-18 07:21:40 +01:00
Łukasz Ślusarczyk	a53f7f7b88	fixed compilation warnings in ggml-sycl (#12424 )	2025-03-18 08:51:25 +08:00
Molly Sophia	7dfad387e3	llama: Add support for RWKV v7 architecture (#12412 ) * ggml: Add op l2_norm Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Apply code-format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * fix MUSA build Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-03-18 07:27:50 +08:00
Sigbjørn Skjæret	60c902926c	docs : bring llama-cli conversation/template docs up-to-date (#12426 )	2025-03-17 21:14:32 +01:00
Gaurav Garg	b1b132efcb	cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394 ) * Enable CUDA Graph on CTK < 12.x `cudaGraphExecUpdate` API was changed on 12.x. For this reason CUDA graph support was disabled on older CUDA toolkit. This change enables CUDA support in CTK version < 12.x by using older API if CTK < 12.x. * Fix compilation errors with MUSA * Disable CUDA Graph for MUSA	2025-03-17 20:25:13 +02:00
Guus Waals	01e8f2138b	ggml-vulkan: remove unused find_program(glslc) (#12416 ) It's already found by FindVulkan.cmake in the parent CMakeLists	2025-03-17 13:35:43 -03:00
Jeff Bolz	484a8ab513	vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312 )	2025-03-17 09:26:18 -05:00
Daniele	cf2270e4d3	vulkan: subgroup size tuning (#12087 ) * vulkan: subgroup size test * Vulkan: Add device architecture enum and logic to recognize AMD generations * vulkan: use new architecture logic to specify subgroup size * Initial vulkan subgroup size tuning for RDNA3 * vulkan: commonize RDNA subgroup tuning * vulkan: override subgroup size if required_subgroup_size = 0 * vulkan: disable warp 32 for RDNA3 * vulkan: fine tuned RDNA1 subgroup sizes * vulkan: adjusted subgroup size map * vulkan: fixed RDNA2 subgroup map --------- Co-authored-by: 0cc4m <picard12@live.de>	2025-03-17 12:42:33 +01:00
Jeff Bolz	f07690c930	vulkan: use fp32 in coopmat2 q4_k dequant function (#12309 )	2025-03-17 10:43:35 +01:00
Jeff Bolz	891c63956d	vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (#12273 ) * vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking	2025-03-17 10:41:59 +01:00
Jeff Bolz	2f21123c1d	vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258 )	2025-03-17 10:35:00 +01:00
Christian Kastner	374101fd74	cmake : enable building llama.cpp using system libggml (#12321 ) * cmake: Factor out compiler flag function from ggml llama.cpps's build requires it, too, and we may want to make use of it without add_subdirectory(ggml). * cmake: Enable building against system ggml This facilitates package maintenance for Linux distributions, where the libggml library most likely will be shipped as an individual package upon which a llama.cpp package depends.	2025-03-17 11:05:23 +02:00
Akarshan Biswas	b3c9a65673	SYCL: set extras only on GGML_TYPE_Q4_0 (#12366 ) * SYCL: set extras only on GGML_TYPE_Q4_0 * release tensor_extras in reset buffer interface	2025-03-17 09:45:12 +08:00
Sigbjørn Skjæret	8ba95dca20	llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400 )	2025-03-16 19:46:36 +02:00
Georgi Gerganov	dc079cfdff	context : fix init of n_outputs (#12397 ) ggml-ci	2025-03-16 19:29:36 +02:00
Daniel Bevenius	7b61bcc87c	ci : add --symlinks to xcframework zip command (#12409 ) This commit adds the --symlinks option to the zip command used to create the xcframework zip file. This is necessary to create symlinks in the zip file. Without this option, the Versions symlink is stored as a regular directory entry in the zip file, rather than as a symlink in the zip which causes the followig error in xcode: ```console Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22) ``` Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377	2025-03-16 18:22:05 +01:00
marcoStocchi	f4c3dd5daa	llama-tts : add '-o' option (#12398 ) * added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR #12042 is closed as superseded with this one.	2025-03-15 17:23:11 +01:00
aubreyli	3d35d87b41	SYCL: Delete redundant plus sign and space (#12391 )	2025-03-15 15:49:03 +01:00
fairydreaming	b19bd064c0	SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399 ) * sycl : support non-contiguous tensors in binary ops * sycl : silence unused variable warning --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-03-15 22:19:30 +08:00
Chenguang Li	92a391327e	[CANN]MUL_MAT optimization (#12382 )	2025-03-15 09:31:08 +08:00
Eric Curtin	9f2250ba72	Add CLI arg to llama-run to adjust the number of threads used (#12370 ) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret	774973b8f3	main : add -sysf / --system-prompt-file (#12249 ) (#12250 ) * add system_prompt_file * add -sysf / --system-prompt-file * remove system_prompt_file	2025-03-14 16:57:05 +01:00
fairydreaming	8fcb563613	Load all MoE experts during warmup (#11571 ) * llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup * common : use new API to enable warmup mode during model warmup --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-03-14 13:47:05 +01:00
Victor	add2a3aa5a	server: fix "--grammar-file" parameter (#12285 )	2025-03-14 11:21:17 +01:00
Georgi Gerganov	c522ce4143	graph : simplify attn input build for unified KV cache (#12381 ) ggml-ci	2025-03-14 10:47:44 +02:00
Georgi Gerganov	081bee8c64	hparams : add SWA rope parameters (#12374 ) ggml-ci	2025-03-14 09:03:24 +02:00
Georgi Gerganov	84d5475541	llama : fix Gemma3 SWA KV cache shift (#12373 ) * llama : fix Gemma3 SWA KV cache shift ggml-ci * hparams : add comment [no ci]	2025-03-13 19:08:07 +02:00
Xuan-Son Nguyen	be7c303410	arg : no n_predict = -2 for examples except for main and infill (#12364 )	2025-03-13 12:34:54 +01:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Ishaan Gandhi	2048b5913d	server : fix crash when using verbose output with input tokens that are not in printable range (#12178 ) (#12338 ) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-03-13 11:10:05 +01:00
Oscar Barenys	f08f4b3187	Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301 )	2025-03-12 20:06:58 +01:00
Daniel Bevenius	80a02aa858	llama.swiftui : fix xcframework dir in README [no ci] (#12353 ) This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.	2025-03-12 13:45:32 +01:00
Alberto Cabrera Pérez	363f8c5d67	sycl : variable sg_size support for mmvq kernels (#12336 )	2025-03-12 09:57:32 +00:00
uvos	34c961b181	CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315 ) When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need to avoid launching them with parameters for warp64	2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Jeff Bolz	bf69cfe62f	vulkan: fix bug in coopmat1 mul_mat_id (#12316 ) * tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id	2025-03-12 06:59:19 +01:00
uvos	10f2e81809	CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177 ) refactor mmqv to unify the calculation of nwarps and rows per block between host and device code. --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-03-11 20:16:03 +01:00
jklincn	ba7654380a	ggml-backend : fix backend search path (#12330 ) * Fix backend search path * replace .native() with '/' * reverted .native()	2025-03-11 14:25:17 +01:00
BB-fat	6ab2e4765a	metal : Cache the Metal library at the device context level (#12265 )	2025-03-11 13:45:02 +02:00
Xuan-Son Nguyen	96e1280839	clip : bring back GPU support (#12322 ) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up	2025-03-11 09:20:16 +01:00
Eve	2c9f833d17	mat vec double buffer (#12188 )	2025-03-10 19:28:11 +00:00
R0CKSTAR	251364549f	musa: support new arch mp_31 and update doc (#12296 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-03-10 18:18:25 +01:00
Henry Linjamäki	8acdacb3ea	opencl: use OpenCL C standard supported by the device (#12221 ) This patch nudges the llama.cpp a bit to be supported on PoCL which doesn't support OpenCL C CL2.0. The issue is solved by querying the device for the supported OpenCL C versions and using the highest one available.	2025-03-10 09:57:00 -07:00
John Bean	89b2b56e86	readme: added Sidekick to available UIs (#12311 )	2025-03-10 16:13:09 +02:00
Georgi Gerganov	e128a1bf5b	tests : fix test-quantize-fns to init the CPU backend (#12306 ) ggml-ci	2025-03-10 14:07:15 +02:00

... 2 3 4 5 6 ...

5064 commits