llama.cpp

Author	SHA1	Message	Date
Xuan-Son Nguyen	5fa9e63be8	clip : refactor set input for cgraph + fix qwen2.5vl input (#13136 ) * clip : refactor set input for cgraph * more strict assert * minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere * split qwen2 and qwen2.5 code blocks * minor style fix	2025-04-28 12:18:59 +02:00
Akarshan Biswas	a4c340f974	SYCL: Add all missing unary kernels (#13074 ) * SYCL: Add all missing unary kernels ggml-ci * decouple kernel launch range from data size using strided loop * use ciel_div helper for num_blocks ggml-ci * clean auto imported header files	2025-04-28 11:33:25 +02:00
Georgi Gerganov	d0a417f3c7	readme : update hot topics (#13150 )	2025-04-28 12:10:18 +03:00
Georgi Gerganov	43f2b07193	common : fix noreturn compile warning (#13151 ) ggml-ci	2025-04-28 11:57:19 +03:00
Xuan-Son Nguyen	e5d6c2554e	llama-chat : fix typo GML --> GLM (#13143 )	2025-04-28 10:11:58 +02:00
R0CKSTAR	f0dd6a1926	musa: fix typo in cc control (#13144 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-28 09:33:28 +02:00
Johannes Gäßler	69699be48a	CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137 )	2025-04-28 09:29:26 +02:00
Xuan-Son Nguyen	85f36e5e71	arg : fix unused variable (#13142 )	2025-04-28 08:16:59 +03:00
4onen	c0a97b762e	llama-bench : Add `--override-tensors` arg (#12922 ) * Add --override-tensors option to llama-bench * Correct llama-bench --override-tensors to --override-tensor * llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix. * Make new llama-bench util functions static to fix Ubuntu CI * llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)	2025-04-27 23:48:26 +02:00
matteo	ced44be342	llama-chat : fix wrong template in GLM4-0414 (#13140 ) * fix wrong template in GLM4-0414 * fix spaces * no bos token since it is already in the template * moved the chatgml4 check to higher priority * restored template for old GLM models * moved the GLM4 template check in the correct place with correct check	2025-04-27 21:57:32 +02:00
R0CKSTAR	e291450b76	musa: fix build warning (#13129 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-27 13:22:49 +02:00
LostRuins Concedo	59e991c23c	Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133 )	2025-04-27 12:43:37 +02:00
HimariO	ca2bb89eac	clip : Add Qwen2.5VL support (#12402 ) * implment vision model architecture, gguf convertor * handle window attention inputs * add debug utils * fix few incorrect tensor memory layout * move position id remap out of ggml to avoid int32 cuda operations * cleaning up * ignore transformers Qwen2_5_xxx type check * remove not so often use `qwen2vl-cli` debug functions * remove commented-out code blocks * fix attn weight scaling after rebase * add `PROJECTOR_TYPE_QWEN2_5_VL` * remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM` * replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN` * remove `attn_window_size` from gguf * fix model conversion * clean up * fix merging problem * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-27 10:10:34 +02:00
Xuan-Son Nguyen	2d451c8059	common : add common_remote_get_content (#13123 ) * common : add common_remote_get_content * support max size and timeout * add tests	2025-04-26 22:58:12 +02:00
Xuan-Son Nguyen	4753791e70	clip : improve projector naming (#13118 ) * clip : improve projector naming * no more kv has_llava_projector * rm unused kv * rm more unused	2025-04-26 22:39:47 +02:00
SXX	77d5e9a76a	ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107 ) * ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion * move fp converter to ggml-cpu * Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32	2025-04-26 16:05:31 +02:00
frob	d5fe4e81bd	grammar : handle maxItems == 0 in JSON schema (#13117 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-04-26 10:10:20 +02:00
Diego Devesa	295354ea68	llama : fix K-shift with quantized K and BLAS backend (#13113 )	2025-04-25 19:40:11 +02:00
City	558a764713	Force FP32 compute in GLM4 FFN Down (#13101 ) * Force FP32 compute in cuBLAS GEMM * Revert "Force FP32 compute in cuBLAS GEMM" This reverts commit 6efd872732159ab88ee7b3c1d77ba5ebc83079bd. * Force F32 compute in GLM4 ffn down * Edit comment to clarify issue Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-04-25 14:38:34 +02:00
Xuan-Son Nguyen	edb18b6e8f	clip : fix pixtral on some GPU backends (#13097 ) * clip : fix pixtral on some GPU backends * refactor inp_raw set * rm outdated comment * fix dynamic size * add TODO	2025-04-25 14:31:42 +02:00
Neo Zhang Jianyu	514c45608f	change the reorder tensor from init to execute OP (#13003 )	2025-04-25 17:37:51 +08:00
Radoslav Gerganov	553a5c3a9f	rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943 ) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.	2025-04-25 10:08:08 +03:00
Xuan-Son Nguyen	13be08daf9	clip : remove boi/eoi embeddings for GLM-edge model (#13081 )	2025-04-24 22:17:04 +02:00
Georgi Gerganov	226251ed56	embeddings : fix batch sizes (#13076 ) ggml-ci	2025-04-24 22:29:22 +03:00
Georgi Gerganov	87616f0680	ggml : fix trailing whitespaces (#0 )	2025-04-24 17:32:47 +03:00
Georgi Gerganov	63b4911494	sync : ggml ggml-ci	2025-04-24 17:32:47 +03:00
Acly	c6e8cc28c1	ggml : Depthwise 2D convolution (ggml/1152) * ggml-cpu : kernels for faster depthwise 2D convolution * fix compile: remove static after moving to ops.cpp * add dilation for depthwise_conv_2d * review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace * review: rename depthwise_conv_2d -> conv_2d_dw everywhere	2025-04-24 17:32:47 +03:00
Johannes Gäßler	b10d8bfdb1	CUDA: use switch statements in constexpr functions (#13095 )	2025-04-24 15:57:10 +02:00
Georgi Gerganov	13b4548877	cmake : do not include ./src as public for libllama (#13062 ) * cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci	2025-04-24 16:00:10 +03:00
Georgi Gerganov	572b3141d3	clang-tidy : disable warning about missing math parenthesis (#13091 )	2025-04-24 15:44:05 +03:00
Xuan-Son Nguyen	7c727fbe39	arg : add --no-mmproj-offload (#13093 ) * arg : add --no-mmproj-offload * Update common/arg.cpp	2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen	80982e815e	arg : clean up handling --mmproj with -hf (#13082 ) * arg : clean up handling --mmproj with -hf * rm change about no_mmproj * Revert "rm change about no_mmproj" This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c. * handle no_mmproj explicitly * skip download mmproj on examples not using it	2025-04-24 12:14:13 +02:00
Georgi Gerganov	7604a7d6b8	metal : fix floating-point range of attention scores in FA kernels (#13090 ) ggml-ci	2025-04-24 10:38:30 +03:00
Eve	b3b6d862cf	vulkan: matmul gcn tuning (#13016 ) * tune matmul for gcn * this one is more power efficient * Update ggml/src/ggml-vulkan/ggml-vulkan.cpp Co-authored-by: 0cc4m <picard12@live.de> * disable this tune for the proprietary driver --------- Co-authored-by: 0cc4m <picard12@live.de>	2025-04-24 09:18:33 +02:00
pl752	5630406959	llama-mtmd-cli: Sigint rework in mtmd vision example (#13080 ) * Sigint rework in mtmd vision example * Applied suggestions on mtmd-cli PR * Forgot to invert one of the conditions * Update examples/llava/mtmd-cli.cpp * Removed redundant exit check --------- Co-authored-by: pl752 <maximpl752@gmail.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-04-23 23:32:35 +02:00
Xuan-Son Nguyen	ecda2ec4b3	mtmd : Support Pixtral 12B (#13065 ) * add pixtral text model (vision is wip) * cgraph ok, just missing 2D RoPE * fix bad rebase * first working version * fix problem with img_break token * support dynamic image size * update docs * update test script	2025-04-23 20:21:59 +02:00
piDack	eb1776b15a	convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (#13021 ) * append mult-eos,half-rope,bos to GLM4-0414 * remove unset var	2025-04-23 16:59:14 +02:00
Radoslav Gerganov	2cca6c01e4	rpc : add command line option for number of threads for the CPU backend (#13060 ) closes #13051	2025-04-23 10:32:49 +03:00
Johannes Gäßler	658987cfc9	CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014 ) * CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID * fix logic for RoPE support, CUDA graphs	2025-04-22 21:27:40 +02:00
Xuan-Son Nguyen	dc39a5e7a8	mtmd : support SmolVLM (version 1 and 2) (#13050 ) * mtmd : support SmolVLM (version 1 and 2) * correct chat template * fix n_patches * scale_factor is an int * add more models to test	2025-04-22 16:24:54 +02:00
Georgi Gerganov	ab47dec3d3	security : add note about RPC and server functionality (#13061 ) * security : add note about RPC functionality * security : add note about llama-server	2025-04-22 16:16:10 +03:00
Georgi Gerganov	7b53389c24	metal : add memory pool for temp allocs (#12850 ) * metal : add memory pool for temp allocs (wip) [no ci] * cont : free buffers from the heap * cont : resize heap [no ci] * cont : refactor heap [no ci] * cont : heap for each cmd buffer [no ci] * cont : fix free * wip * cont : fix alignment [no ci] * cont : not working .. [no ci] * cont : heap allocation now works [no ci] * cont : use MTLHeapTypePlacement ggml-ci * metal : use dynamic MTLHeap allocations ggml-ci * metal : add comments * metal : disable softmax use of mem_pool ggml-ci * metal : final touches	2025-04-22 16:15:51 +03:00
Xuan-Son Nguyen	243453533e	llava : update documentations (#13055 ) * llava : update documentations * fix typo	2025-04-22 10:37:00 +02:00
Diego Devesa	1d735c0b4f	ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (#12871 ) * ggml : add SSE 4.2 variant for CPUs without AVX * ggml : add x64 base ABI variant	2025-04-21 18:13:51 +02:00
Akarshan Biswas	5368ddda7a	SYCL: Add non-contiguous support in ROPE (#12993 ) ggml-ci	2025-04-21 19:13:30 +05:30
Xuan-Son Nguyen	84a9bf2fc2	mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012 ) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-21 15:32:58 +02:00
Xuan-Son Nguyen	2016f07bd1	convert : experimental support for `--mmproj` flag (#13023 ) * convert : experimental support for `--mmproj` flag * fix bad ctrl+f replace * fix style * split into subclasses TextModel and VisionModel * rename Mode --> ModelBase * small fix * correct CLIP_VISION arch name (because existing GGUF already use it) * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> * fix Mistral3Model * fix typo Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2025-04-20 23:29:36 +02:00
Jeffrey Morgan	6602304814	llava: fix errors in clip.h on certain compilers (#13030 )	2025-04-20 12:15:41 +02:00
Jeff Bolz	66168204be	vulkan: support noncontiguous rms_norm (#13031 )	2025-04-20 10:50:02 +02:00
Jeffrey Morgan	4ba9d711ba	metal: add neg operator (#13029 )	2025-04-20 08:28:40 +03:00

... 2 3 4 5 6 ...

5358 commits