llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

13c9a3319b

arg : remove CURLINFO_EFFECTIVE_METHOD (#13228) Xuan-Son Nguyen 2025-05-01 10:23:25 +02:00
a70183eb00

llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223) Jared Van Bortel 2025-05-01 03:09:41 -04:00
8d33d740c3 sync : ggml Georgi Gerganov 2025-05-01 09:59:02 +03:00
4254bb4951 ggml : fix ggml_gallocr_ptr type (ggml/1205) Diego Devesa 2025-04-30 15:20:40 +02:00
9998540149 cuda : fix unused variable compile warning (whisper/0) Georgi Gerganov 2025-04-24 18:59:06 +03:00
e1e8e0991f

CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199) Johannes Gäßler 2025-04-30 23:12:59 +02:00
6f67cf1f48

arg : -hf do not fail if url mismatch (#13219) Xuan-Son Nguyen 2025-04-30 22:29:15 +02:00
16a457facd

fix typo: n_ctx_pre_seq -> n_ctx_per_seq (#13221) ddh0 2025-04-30 15:28:43 -05:00
3e168bede4

convert : improve model arch handling (#13122) Xuan-Son Nguyen 2025-04-30 16:56:24 +02:00
ceda28ef8e

llava : remove duplicate include (#13207) Tatsuya Tanaka 2025-04-30 22:25:20 +09:00
3b127c7385

common : add -jf / --json-schema-file flag (#12011) Olivier Chafik 2025-04-30 13:52:35 +01:00
e5007a5edf

vulkan: use uint array index to avoid glslang bug (#13193) Jeff Bolz 2025-04-30 07:38:37 -05:00
416313773b

ggml : fix ppc64le build (#13176) shalinib-ibm 2025-04-30 16:47:08 +05:30
07c2e2f76c

convert : correct typo image_mean --> image_std (#13208) Xuan-Son Nguyen 2025-04-30 13:06:15 +02:00
44cd8d91ff

feat(ggml-cpu): enable z17 compile (#13182) Aaron Teo 2025-04-30 17:47:35 +08:00
5933e6fdc9

arg : allow using -hf offline (#13202) Xuan-Son Nguyen 2025-04-30 10:46:32 +02:00
da84c04d8f

docker : do not build tests (#13204) Xuan-Son Nguyen 2025-04-30 10:44:07 +02:00
a0f7016d17

rpc : fix cache directory initialization (#13188) xiaofei 2025-04-30 14:29:22 +08:00
19e899ce21

scripts: n_depth for compare-llama-bench [no ci] (#13201) Johannes Gäßler 2025-04-29 23:32:04 +02:00
e2e1ddb93a

server : Prefilling assistant message in openai compatible API (#13174) matteo 2025-04-29 20:33:10 +02:00
d9d398f84f

sampling : when top-k <= 0 -> noop (#13173) Georgi Gerganov 2025-04-29 20:22:57 +03:00
5a63980117

llama-bench: fixed size of fields to correctly map to values (#13183) Alberto Cabrera Pérez 2025-04-29 16:24:36 +01:00
cdf76586b2

CUDA: fix non-cont. inputs for batched mat mul (#13155) Johannes Gäßler 2025-04-29 16:00:27 +02:00
7d3af70b08

llama : llm_type order by size (#13177) Sigbjørn Skjæret 2025-04-29 13:25:53 +02:00
00e3e5a194

mtmd : add qwen2vl and qwen2.5vl (#13141) Xuan-Son Nguyen 2025-04-29 11:47:04 +02:00
e98b3692be

llama : set qwen3 model type sizes (#13175) Sigbjørn Skjæret 2025-04-29 11:00:31 +02:00
b6ce7430b7

llama-graph : fix text position for mrope (#13159) Xuan-Son Nguyen 2025-04-29 08:45:49 +02:00
5f5e39e1ba

model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466) AT 2025-04-28 15:52:15 -04:00
eaea325324

clip : fix model size display (#13153) Xuan-Son Nguyen 2025-04-28 21:23:19 +02:00
43ddab6eee

fix(rpc): Improve input validation and error handling (#13069) Ville Vesilehto 2025-04-28 21:00:20 +03:00
1831f538f7

llama-bench: add -d depth arg (#13096) Vishal Agarwal 2025-04-28 20:20:39 +05:30
4e87962e34

mtmd : fix glm-edge redundant token count (#13139) Xuan-Son Nguyen 2025-04-28 16:12:56 +02:00
fb0471d175

context : do not clear output buffer on reserve (#13152) pockers21 2025-04-28 06:45:40 -07:00
d2b2031e5f

llama : (mrope) allow using normal 1D position for text token (#13138) Xuan-Son Nguyen 2025-04-28 14:20:56 +02:00
5fa9e63be8

clip : refactor set input for cgraph + fix qwen2.5vl input (#13136) Xuan-Son Nguyen 2025-04-28 12:18:59 +02:00
a4c340f974

SYCL: Add all missing unary kernels (#13074) Akarshan Biswas 2025-04-28 15:03:25 +05:30
d0a417f3c7

readme : update hot topics (#13150) Georgi Gerganov 2025-04-28 12:10:18 +03:00
43f2b07193

common : fix noreturn compile warning (#13151) Georgi Gerganov 2025-04-28 11:57:19 +03:00
e5d6c2554e

llama-chat : fix typo GML --> GLM (#13143) Xuan-Son Nguyen 2025-04-28 10:11:58 +02:00
f0dd6a1926

musa: fix typo in cc control (#13144) R0CKSTAR 2025-04-28 15:33:28 +08:00
69699be48a

CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137) Johannes Gäßler 2025-04-28 09:29:26 +02:00
85f36e5e71

arg : fix unused variable (#13142) Xuan-Son Nguyen 2025-04-28 07:16:59 +02:00
c0a97b762e

llama-bench : Add --override-tensors arg (#12922) 4onen 2025-04-27 14:48:26 -07:00
ced44be342

llama-chat : fix wrong template in GLM4-0414 (#13140) matteo 2025-04-27 21:57:32 +02:00
e291450b76

musa: fix build warning (#13129) R0CKSTAR 2025-04-27 19:22:49 +08:00
59e991c23c

Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133) LostRuins Concedo 2025-04-27 18:43:37 +08:00
ca2bb89eac

clip : Add Qwen2.5VL support (#12402) HimariO 2025-04-27 16:10:34 +08:00
2d451c8059

common : add common_remote_get_content (#13123) Xuan-Son Nguyen 2025-04-26 22:58:12 +02:00
4753791e70

clip : improve projector naming (#13118) Xuan-Son Nguyen 2025-04-26 22:39:47 +02:00
77d5e9a76a

ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107) SXX 2025-04-26 22:05:31 +08:00
d5fe4e81bd

grammar : handle maxItems == 0 in JSON schema (#13117) frob 2025-04-26 10:10:20 +02:00
295354ea68

llama : fix K-shift with quantized K and BLAS backend (#13113) Diego Devesa 2025-04-25 19:40:11 +02:00
558a764713

Force FP32 compute in GLM4 FFN Down (#13101) City 2025-04-25 14:38:34 +02:00
edb18b6e8f

clip : fix pixtral on some GPU backends (#13097) Xuan-Son Nguyen 2025-04-25 14:31:42 +02:00
514c45608f

change the reorder tensor from init to execute OP (#13003) Neo Zhang Jianyu 2025-04-25 17:37:51 +08:00
553a5c3a9f

rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943) Radoslav Gerganov 2025-04-25 10:08:08 +03:00
13be08daf9

clip : remove boi/eoi embeddings for GLM-edge model (#13081) Xuan-Son Nguyen 2025-04-24 22:17:04 +02:00
226251ed56

embeddings : fix batch sizes (#13076) Georgi Gerganov 2025-04-24 22:29:22 +03:00
87616f0680 ggml : fix trailing whitespaces (#0) Georgi Gerganov 2025-04-24 17:22:27 +03:00
63b4911494 sync : ggml Georgi Gerganov 2025-04-24 16:47:43 +03:00
c6e8cc28c1 ggml : Depthwise 2D convolution (ggml/1152) Acly 2025-04-17 14:16:45 +02:00
b10d8bfdb1

CUDA: use switch statements in constexpr functions (#13095) Johannes Gäßler 2025-04-24 15:57:10 +02:00
13b4548877

cmake : do not include ./src as public for libllama (#13062) Georgi Gerganov 2025-04-24 16:00:10 +03:00
572b3141d3

clang-tidy : disable warning about missing math parenthesis (#13091) Georgi Gerganov 2025-04-24 15:44:05 +03:00
7c727fbe39

arg : add --no-mmproj-offload (#13093) Xuan-Son Nguyen 2025-04-24 14:04:14 +02:00
80982e815e

arg : clean up handling --mmproj with -hf (#13082) Xuan-Son Nguyen 2025-04-24 12:14:13 +02:00
7604a7d6b8

metal : fix floating-point range of attention scores in FA kernels (#13090) Georgi Gerganov 2025-04-24 10:38:30 +03:00
b3b6d862cf

vulkan: matmul gcn tuning (#13016) Eve 2025-04-24 07:18:33 +00:00
5630406959

llama-mtmd-cli: Sigint rework in mtmd vision example (#13080) pl752 2025-04-24 02:32:35 +05:00
ecda2ec4b3

mtmd : Support Pixtral 12B (#13065) Xuan-Son Nguyen 2025-04-23 20:21:59 +02:00
eb1776b15a

convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (#13021) piDack 2025-04-23 22:59:14 +08:00
2cca6c01e4

rpc : add command line option for number of threads for the CPU backend (#13060) Radoslav Gerganov 2025-04-23 10:32:49 +03:00
658987cfc9

CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014) Johannes Gäßler 2025-04-22 21:27:40 +02:00
dc39a5e7a8

mtmd : support SmolVLM (version 1 and 2) (#13050) Xuan-Son Nguyen 2025-04-22 16:24:54 +02:00
ab47dec3d3

security : add note about RPC and server functionality (#13061) Georgi Gerganov 2025-04-22 16:16:10 +03:00
7b53389c24

metal : add memory pool for temp allocs (#12850) Georgi Gerganov 2025-04-22 16:15:51 +03:00
243453533e

llava : update documentations (#13055) Xuan-Son Nguyen 2025-04-22 10:37:00 +02:00
1d735c0b4f

ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (#12871) Diego Devesa 2025-04-21 18:13:51 +02:00
5368ddda7a

SYCL: Add non-contiguous support in ROPE (#12993) Akarshan Biswas 2025-04-21 19:13:30 +05:30
84a9bf2fc2

mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli (#13012) Xuan-Son Nguyen 2025-04-21 15:32:58 +02:00
2016f07bd1

convert : experimental support for --mmproj flag (#13023) Xuan-Son Nguyen 2025-04-20 23:29:36 +02:00
6602304814

llava: fix errors in clip.h on certain compilers (#13030) Jeffrey Morgan 2025-04-20 03:15:41 -07:00
66168204be

vulkan: support noncontiguous rms_norm (#13031) Jeff Bolz 2025-04-20 03:50:02 -05:00
4ba9d711ba

metal: add neg operator (#13029) Jeffrey Morgan 2025-04-19 22:28:40 -07:00
00137157fc

Disable CI cross-compile builds (#13022) bandoti 2025-04-19 13:05:03 -03:00
fb28f4f80e

gguf-py : fix upload python package workflow (#13020) Sigbjørn Skjæret 2025-04-19 16:26:38 +02:00
37b9f0d29d

clip : refactor, add image_manipulation and llava_uhd classes (#13011) Xuan-Son Nguyen 2025-04-19 09:15:45 +02:00
6408210082

main : Fix Ctrl+D/newline handling (#12951) Daniel Tang 2025-04-18 16:02:55 -04:00
aff9d107b0

gguf-py : GGUF Editor GUI - Python + Qt6 (#12930) Chris Thompson 2025-04-18 12:30:41 -06:00
35370ba945

server : use std::move whenever possible (#12936) Xuan-Son Nguyen 2025-04-18 19:58:12 +02:00
8d66005763

SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975) Akarshan Biswas 2025-04-18 19:27:56 +05:30
b9154ecff9

mtmd : add methods to access mtmd_image_tokens (#12906) Xuan-Son Nguyen 2025-04-18 10:04:51 +02:00
2db9ba1464

rpc : add RPC_CMD_HELLO (#12955) Radoslav Gerganov 2025-04-18 10:13:42 +03:00
2f74c354c0

graph : make FA compatible with MLA + add initial Metal kernels (#12953) Georgi Gerganov 2025-04-17 18:16:36 +03:00
207c22ec2d

ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970) Alan Gray 2025-04-17 14:19:42 +01:00
7a395f67a7

CANN: Add support for async operator submission (#12864) hipudding 2025-04-17 20:34:16 +08:00
971f245b3b

llama : recognize IBM Granite 3.3 FIM tokens (#12988) Mikko Juola 2025-04-17 01:37:05 -07:00
12b17501e6

opencl: fix incorrect local_size index in profiling log (#12868) kimminsu 2025-04-17 06:25:57 +09:00
015022bb53

vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931) Jeff Bolz 2025-04-16 13:37:25 -05:00
b43d89e311

CANN: Add 310P operator support check (#12962) Chenguang Li 2025-04-16 16:21:05 +08:00