llama.cpp

Author	SHA1	Message	Date
Nicolò Scipione	f7c9429c85	sycl : Overcoming workaround for mmap() allocation on Windows (#13482 ) * Remove mmap workaround on windows After some testing I found that mmap is supported on windows and for many GPUs on Linux. Therefore I remove the workaround for windows since it is not necessary. * Update llama-bench README SYCL backend introduced a workaround that allows execution of llama-bench also without specifying `--mmp 0` flag	2025-05-20 08:54:43 +08:00
psocolovsky	1dfbf2cf3a	common : add load_progress_callback (#13617 )	2025-05-19 21:17:36 +02:00
0cc4m	8960efd0a6	Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (#13607 )	2025-05-19 17:54:08 +02:00
Alberto Cabrera Pérez	725f23f1f3	sycl : backend documentation review (#13544 ) * sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table	2025-05-19 14:38:20 +01:00
Xuan-Son Nguyen	92ecdcc06a	mtmd : add vision support for llama 4 (#13282 ) * wip llama 4 conversion * rm redundant __init__ * fix conversion * fix conversion * test impl * try this * reshape patch_embeddings_0 * fix view * rm ffn_post_norm * cgraph ok * f32 for pos embd * add image marker tokens * Llama4UnfoldConvolution * correct pixel shuffle * fix merge conflicts * correct * add debug_graph * logits matched, but it still preceives the image incorrectly * fix style * add image_grid_pinpoints * handle llama 4 preprocessing * rm load_image_size * rm unused line * fix * small fix 2 * add test & docs * fix llava-1.6 test * test: add notion of huge models * add comment * add warn about degraded quality	2025-05-19 13:04:14 +02:00
Alberto Cabrera Pérez	f71f40a284	ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532 )	2025-05-19 11:46:09 +01:00
Georgi Gerganov	d30cb5a7fa	sync : ggml ggml-ci	2025-05-19 13:29:56 +03:00
Johannes Gäßler	6c35981a64	mnist: fix segmentation fault (ggml/1227)	2025-05-19 13:29:56 +03:00
Diego Devesa	8b5e19aea6	ggml : fix apple OS check in ggml_print_backtrace (ggml/1229)	2025-05-19 13:29:56 +03:00
Daniel Tang	60aea028b5	ggml : Fix missing backtrace on Linux (ggml/1228) * Modern Linux defaults /proc/sys/kernel/yama/ptrace_scope to 1 * Fixed lldb attach * Simplify by having the child do ggml_print_backtrace_symbols	2025-05-19 13:29:56 +03:00
Nick	9c55e5c5c2	fix: check model pointer validity before use (#13631 )	2025-05-19 13:25:41 +03:00
Chenguang Li	33d7aed4a8	CANN: Support MOE Model MUL_MAT_ID (#13042 ) Signed-off-by: noemotiovon <757486878@qq.com>	2025-05-19 14:21:17 +08:00
Isaac McFadyen	6a2bc8bfb7	server : added --no-prefill-assistant flag (#13608 ) * added no-prefill-assistant flag * reworded documentation comment * updated server README.md	2025-05-17 23:59:48 +02:00
Gilad S.	e3a7cf6c5b	cmake: use the current build config for vulkan-shaders-gen (#13595 ) * fix: use the current build config for `vulkan-shaders-gen` * fix: only pass a valid build type to `--config`	2025-05-17 15:26:43 -03:00
Georgi Gerganov	518329b2d4	parallel : add option for non-shared and larger prompts (#13598 ) * parallel : add option for non-shared and larger prompts * parallel : update readme [no ci] * cont : add note about base models [no ci] * parallel : better var name ggml-ci	2025-05-17 12:58:55 +03:00
Jeff Bolz	2f5a4e1e09	vulkan: move common FA code to flash_attn_base.comp (#13556 ) * vulkan: move common FA code to flash_attn_base.comp * vulkan: move common FA index/stride setup code to flash_attn_base.comp * build fix	2025-05-17 09:14:55 +02:00
Jeff Bolz	4f41ee11d6	vulkan: use scalar FA rather than coopmat2 when N==1 (#13554 )	2025-05-17 08:35:47 +02:00
Z	3e0be1cace	llguidance : official v0.7.20 release (no actual changes) [noci] (#13594 )	2025-05-16 22:56:28 +02:00
Xuan-Son Nguyen	6aa892ec2a	server : do not return error out of context (with ctx shift disabled) (#13577 )	2025-05-16 21:50:00 +02:00
Xuan-Son Nguyen	aea9f8b4e7	webui : improve accessibility for visually impaired people (#13551 ) * webui : improve accessibility for visually impaired people * add a11y for extra contents * fix some labels being read twice * add skip to main content	2025-05-16 21:49:01 +02:00
Xuan-Son Nguyen	06c1e4abc1	readme : add list of dependencies and their license (#13591 )	2025-05-16 20:04:18 +02:00
Diego Devesa	415e40a357	releases : use arm version of curl for arm releases (#13592 )	2025-05-16 19:36:51 +02:00
Georgi Gerganov	654a67794f	metal : add FA-vec kernel for head size 64 (#13583 ) ggml-ci	2025-05-16 20:32:58 +03:00
Diego Devesa	5364ae4ba5	llama : print hint when loading a model when no backends are loaded (#13589 )	2025-05-16 16:38:07 +02:00
Sigbjørn Skjæret	7c07ac244d	ci : add ppc64el to build-linux-cross (#13575 )	2025-05-16 14:54:23 +02:00
Łukasz Ślusarczyk	0a338ed013	sycl : fixed compilation warnings (#13582 )	2025-05-16 18:15:29 +08:00
Olivier Chafik	bc098c3cf0	minja: sync (qwen3) (#13573 ) * minja: sync `f06140fa52` - https://github.com/google/minja/pull/67 (@grf53) - https://github.com/google/minja/pull/66 (@taha-yassine) - https://github.com/google/minja/pull/63 (@grf53) - https://github.com/google/minja/pull/58 --------- Co-authored-by: ochafik <ochafik@google.com>	2025-05-15 23:29:10 +01:00
Diego Devesa	c6a2c9e741	gguf : use ggml log system (#13571 ) * gguf : use ggml log system * llama : remove unnecessary new lines in exception messages	2025-05-15 19:13:11 +02:00
Daniel Tang	07ad2b6db3	gguf-py : fix disconnect-before-connect in editor-gui (#13569 ) The bug caused a crash upon load with venvs created with --system-site-packages to use python3-pyside6.qtwidgets=python3-pyside6.qtwidgets=6.6.2-4 from Kubuntu 24.10.	2025-05-15 18:47:10 +02:00
Xuan-Son Nguyen	c531edfa34	convert : fix conversion for llama 4 (#13567 )	2025-05-15 17:40:07 +02:00
Atharva Dubey	02cdd2d8b0	sycl: simplify bin_bcast_kernel (#13383 )	2025-05-15 17:39:52 +02:00
Svetlozar Georgiev	64bb51cf90	sycl: reordered Q4_K MMVQ (#13109 )	2025-05-15 17:35:44 +02:00
Łukasz Ślusarczyk	9c404ed54c	sycl: use oneDNN for matrices multiplication (#12972 )	2025-05-15 16:53:41 +02:00
Diego Devesa	6c8b91500e	llama-bench : fix -ot with dl backends (#13563 )	2025-05-15 15:46:55 +02:00
Xuan-Son Nguyen	3cc1f1f1d2	webui : handle PDF input (as text or image) + convert pasted long content to file (#13562 ) * webui : handle PDF input (as text or image) * handle the case where pdf image + server without mtmd * fix bug missing pages	2025-05-15 14:24:50 +02:00
Piotr Wilkin (ilintar)	c753d7bed0	server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540 )	2025-05-15 08:40:58 +02:00
Georgi Gerganov	b2838049cc	bench : handle decode errors (#13548 ) ggml-ci	2025-05-15 05:57:02 +03:00
Olivier Chafik	aa48e373f2	`server`: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 ) * Inject date_string in llama 3.x + fix for functionary v2 https://github.com/ggml-org/llama.cpp/issues/12729 * move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-05-15 02:39:51 +01:00
Georgi Gerganov	e3a9421b78	kv-cache : fix out-of-bounds view during reserve graph (#13547 ) * kv-cache : fix reserve graph out-of-bounds access ggml-ci * cont : add comment * cont : fix comments [no ci] * cont : more correct comment [no ci]	2025-05-14 23:15:15 +03:00
Yibo Cai	5ab5d5fb25	arm64: optimize q6_k_q8_k kernel with i8mm (#13519 ) This PR improves q6_k_q8_k gemm kernel with arm64 i8mm instruction. Tested on neoverse-n2 with llama3 8b q6_k quantization model. - 40% ~ 54% S_PP uplift for all batch sizes - 16% ~ 47% S_TG uplift for batch size 4 and above Perplexity doesn't change with this PR. ``` // tested on neoverse-n2 $ llama-batched-bench \ -m Meta-Llama-3-8B-Instruct-Q6_K.gguf \ --no-mmap -fa \ -c 8192 -b 4096 -ub 512 -npp 128 -ntg 128 \ -npl 1,2,4,8,16,32 \ -t 64 --------------------------------------------------------------------- \| PP \| TG \| B \| S_PP t/s \| S_TG t/s \| \| \| \| \| original \| this pr \| original \| this pr \| \|-------\|--------\|------\|----------\|----------\|----------\|----------\| \| 128 \| 128 \| 1 \| 78.52 \| 109.18 \| 18.63 \| 18.88 \| \| 128 \| 128 \| 2 \| 84.62 \| 123.94 \| 34.54 \| 36.92 \| \| 128 \| 128 \| 4 \| 84.36 \| 122.49 \| 52.65 \| 61.32 \| \| 128 \| 128 \| 8 \| 90.52 \| 138.87 \| 63.46 \| 84.41 \| \| 128 \| 128 \| 16 \| 90.11 \| 138.56 \| 71.04 \| 101.33 \| \| 128 \| 128 \| 32 \| 89.81 \| 137.79 \| 75.14 \| 110.47 \| --------------------------------------------------------------------- ```	2025-05-14 21:53:52 +02:00
Olivier Chafik	3198405e98	`common`: add partial regex support (#12808 ) * move string_find_partial_stop & string_ends_with to common * add common_regex (supports partial matches) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/regex-partial.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * partial regex: add missing iterator end checks * string utils: use string_views * direct throw to avoid ggml.h include * regex-partial: replace missed ggml_asserts --------- Co-authored-by: ochafik <ochafik@google.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-14 19:50:57 +01:00
Sigbjørn Skjæret	f5170c1d7a	editorconfig : fix trailing whitespace from #13542 (#13546 )	2025-05-14 21:22:49 +03:00
Gilad S.	017f10b5fa	fix: crash when calling `llama_state_get_size` on a context without a KV cache (#13542 )	2025-05-14 19:18:18 +03:00
Johannes Gäßler	4696d56749	CUDA: fix crash on large batch size for quant. MoE (#13537 )	2025-05-14 16:41:02 +02:00
Diego Devesa	b7d2672082	llama : fix quantize with dl backends (#13539 )	2025-05-14 16:12:36 +02:00
Johannes Gäßler	6da34fa276	CUDA: faster Deepseek FA, add Turing support (#13435 )	2025-05-14 16:08:20 +02:00
Gabe Goodhart	5e7d95e22e	fix: Move build_inp_pos to the top of the graph section for build_granite (#13538 ) This matches how others do it, but will still avoid the extra initialization when rope is disabled. Branch: GraniteFour Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-05-14 15:53:59 +03:00
Georgi Gerganov	053174436f	server : passthrough the /models endpoint during loading (#13535 ) * server : passthrough the /models endpoint during loading * server : update readme + return json for "meta" field	2025-05-14 15:42:10 +03:00
Xuan-Son Nguyen	360a9c98e1	server : fix cache_tokens bug with no cache_prompt (#13533 )	2025-05-14 13:35:07 +02:00
bandoti	09d13d94fb	cmake: simplify vulkan shader test logic (#13263 )	2025-05-14 07:53:57 -03:00

1 2 3 4 5 ...

5427 commits