llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

0208355f42

CUDA: fix race conditions FlashAttention kernels (#13438) Johannes Gäßler 2025-05-10 22:22:48 +02:00
d2a4ef05c6

vocab : add ByteDance-Seed/Seed-Coder (#13423) Sigbjørn Skjæret 2025-05-10 22:08:07 +02:00
15e6125a39

mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434) Xuan-Son Nguyen 2025-05-10 19:57:54 +02:00
3b24d26c22

server : update docs (#13432) Xuan-Son Nguyen 2025-05-10 18:44:49 +02:00
43dfd741a5

llguidance : set tokenizer slices to default (#13424) Sigbjørn Skjæret 2025-05-10 17:19:52 +02:00
b064a51a4e

ci: free_disk_space flag enabled for intel variant (#13426) Thammachart Chinvarapon 2025-05-10 21:34:48 +07:00
053367d149

mtmd : support InternVL 2.5 and 3 (#13422) Xuan-Son Nguyen 2025-05-10 16:26:42 +02:00
d8919424f1

CUDA: fix FlashAttention on Turing (#13415) Johannes Gäßler 2025-05-10 09:16:52 +02:00
7fef11766c

arg : add env var to control mmproj (#13416) Xuan-Son Nguyen 2025-05-10 08:16:29 +02:00
dc1d2adfc0

vulkan: scalar flash attention implementation (#13324) Jeff Bolz 2025-05-09 23:07:07 -07:00
7c28a74e07

chore(llguidance): use tagged version that does not break the build (#13413) Helton Reis 2025-05-09 17:15:39 -03:00
33eff40240

server : vision support via libmtmd (#12898) Xuan-Son Nguyen 2025-05-09 19:29:37 +02:00
17512a94d6

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) Alberto Cabrera Pérez 2025-05-09 16:34:08 +01:00
611aa914ef

metal : optimize MoE for large batches (#13388) Georgi Gerganov 2025-05-09 15:14:56 +03:00
0cf6725e9f

CUDA: FA support for Deepseek (Ampere or newer) (#13306) Johannes Gäßler 2025-05-09 13:34:58 +02:00
27ebfcacba

llama : do not crash if there is no CPU backend (#13395) Diego Devesa 2025-05-09 13:02:07 +02:00
5c86c9ed3e

CUDA: fix crash on large batch size for MoE models (#13384) Johannes Gäßler 2025-05-09 12:14:04 +02:00
efb8b47eda

imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389) Bartowski 2025-05-09 05:53:58 -04:00
0527771dd8

llama-run: add support for downloading models from ModelScope (#13370) R0CKSTAR 2025-05-09 17:25:50 +08:00
2189fd3b63

mtmd : fix batch_view for m-rope (#13397) Xuan-Son Nguyen 2025-05-09 11:18:02 +02:00
3f96aeff39

llama : one-off chat template fix for Mistral-Small-2503 (#13398) Xuan-Son Nguyen 2025-05-09 11:17:51 +02:00
b486ba05bf

rpc : add rpc_msg_set_tensor_hash_req (#13353) Radoslav Gerganov 2025-05-09 10:31:07 +03:00
02115dcd9a

vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326) Jeff Bolz 2025-05-09 02:23:41 -05:00
d9c4accaff

server : (webui) rename has_multimodal --> modalities (#13393) Xuan-Son Nguyen 2025-05-09 09:06:37 +02:00
15e03282bb

ci : limit write permission to only the release step + fixes (#13392) Diego Devesa 2025-05-08 23:45:22 +02:00
f05a6d71a0

mtmd : Expose helper_decode_image_chunk (#13366) Matt Clayton 2025-05-08 14:25:39 -04:00
ee01d71e58

server : (webui) fix a very small misalignment (#13387) Xuan-Son Nguyen 2025-05-08 18:51:45 +02:00
8c83449cb7

server : (webui) revamp the input area, plus many small UI improvements (#13365) Xuan-Son Nguyen 2025-05-08 15:37:29 +02:00
1a844be132

convert : support rope_scaling type and rope_type (#13349) Sigbjørn Skjæret 2025-05-08 15:34:29 +02:00
0ccc121354

mtmd : fix the calculation of n_tokens for smolvlm (#13381) welix 2025-05-08 22:03:53 +09:00
6562e5a4d6

context : allow cache-less context for embeddings (#13108) Georgi Gerganov 2025-05-08 14:28:33 +03:00
51fb96b1ff

context : remove logits_all flag (#13284) Georgi Gerganov 2025-05-08 14:26:50 +03:00
70a6991edf

ci : move release workflow to a separate file (#13362) Diego Devesa 2025-05-08 13:15:28 +02:00
f061021206

llama : print size and type of overridden tensors (#13364) Diego Devesa 2025-05-08 13:15:15 +02:00
8733e0cf6e

sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343) Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00
814f795e06

docker : disable arm64 and intel images (#13356) Diego Devesa 2025-05-07 16:36:33 +02:00
d879433824 sync : ggml Georgi Gerganov 2025-05-07 16:39:36 +03:00
13b0a04597 whisper: remove MSVC warnings pragmas (whisper/3090) Daniel Bevenius 2025-05-05 13:09:35 +02:00
bba9d945c1 cmake : removed stdc++fs (whisper/3097) Jared Tweed 2025-05-02 02:41:35 -07:00
bc4e1128f7

llama : deci : support ffn-free with attention (#13296) Sigbjørn Skjæret 2025-05-07 12:49:27 +02:00
39e73ae0d6

common : Add a warning when we can't match samplers from a string or char. (#13330) Ycros 2025-05-07 18:23:28 +10:00
1f73301b63

cuda : remove nrows_x in mul_mat_q_process_tile (#13325) R0CKSTAR 2025-05-07 15:48:23 +08:00
4773d7a02f

examples : remove infill (#13283) Georgi Gerganov 2025-05-07 10:28:02 +03:00
6c7fd67b64

llama : support tie embedding for chatglm models (#13328) piDack 2025-05-07 15:23:11 +08:00
141a908a59

CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135) Johannes Gäßler 2025-05-06 23:35:51 +02:00
32916a4907

clip : refactor graph builder (#13321) Xuan-Son Nguyen 2025-05-06 22:40:24 +02:00
ffc727203a

sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345) DocShotgun 2025-05-06 13:36:24 -07:00
91a86a6f35

sampling : don't consider -infinity values in top_n_sigma (#13344) oobabooga 2025-05-06 15:24:15 -03:00
f4ed10b69c

cmake : remove arm64 msvc presets (#13342) Diego Devesa 2025-05-06 20:15:31 +02:00
1e333d5bba

SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254) Akarshan Biswas 2025-05-06 20:27:06 +05:30
2f54e348ad

llama : fix build_ffn without gate (#13336) Xuan-Son Nguyen 2025-05-06 14:25:40 +02:00
2356fb1d53

CUDA: fix bad asserts for partial offload (#13337) Johannes Gäßler 2025-05-06 13:58:51 +02:00
764b85627b

convert : qwen2/3moe : set yarn metadata if present (#13331) Sigbjørn Skjæret 2025-05-06 11:12:06 +02:00
15a28ec8c7

CUDA: fix --split-mode row for MMQ (#13323) Johannes Gäßler 2025-05-06 08:36:46 +02:00
a7366faa5b

gguf-py : avoid requiring pyside6 for other scripts (#13036) compilade 2025-05-05 22:27:31 -04:00
9070365020

CUDA: fix logic for clearing padding with -ngl 0 (#13320) Johannes Gäßler 2025-05-05 22:32:13 +02:00
233461f812

sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264) oobabooga 2025-05-05 17:12:19 -03:00
b34c859146

server : Webui - change setText command from parent window to also send the message. (#13309) igardev 2025-05-05 17:03:31 +03:00
9b61acf060

mtmd : rename llava directory to mtmd (#13311) Xuan-Son Nguyen 2025-05-05 16:02:55 +02:00
5215b91e93

clip : fix confused naming ffn_up and ffn_down (#13290) Xuan-Son Nguyen 2025-05-05 12:54:44 +02:00
ae803bfc3d

convert : bailingmoe : set yarn metadata if present (#13312) Sigbjørn Skjæret 2025-05-05 12:34:26 +02:00
66645a5285

SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308) Akarshan Biswas 2025-05-05 13:39:10 +05:30
27aa259532

mtmd : add C public API (#13184) Xuan-Son Nguyen 2025-05-04 23:43:42 +02:00
9fdfcdaedd

rpc : use backend registry, support dl backends (#13304) Diego Devesa 2025-05-04 21:25:43 +02:00
6eb7d25c70

ggml : activate s390x simd for Q3_K (#13301) Aaron Teo 2025-05-05 01:49:12 +08:00
86bd60d3fe

llava/mtmd : fixes to fully support dl backends (#13303) Diego Devesa 2025-05-04 17:05:20 +02:00
9f2da5871f

llama : build windows releases with dl backends (#13220) Diego Devesa 2025-05-04 14:20:49 +02:00
93c4e23905

CUDA: fix race condition in MMQ stream-k fixup (#13299) Johannes Gäßler 2025-05-04 14:16:39 +02:00
8afbd96818

CUDA: fix race condition in MMQ ids_dst (#13294) Johannes Gäßler 2025-05-04 13:58:38 +02:00
8ae5ebcf85

vulkan: Additional type support for unary, binary, and copy (#13266) Jeff Bolz 2025-05-04 00:17:16 -05:00
3e959f0976

imatrix: fix oob writes if src1 is not contiguous (#13286) Johannes Gäßler 2025-05-04 00:50:37 +02:00
36667c8edc

clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259) Xuan-Son Nguyen 2025-05-03 20:07:54 +02:00
3bf785f3ef

llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843) ymcki 2025-05-03 23:39:51 +08:00
1d36b3670b

llama : move end-user examples to tools directory (#13249) Diego Devesa 2025-05-02 20:27:13 +02:00
b34443923c

sync : ggml (#13268) Georgi Gerganov 2025-05-02 20:54:30 +03:00
a75cb30dc9

context : fix reorder logic (#13267) Georgi Gerganov 2025-05-02 20:54:13 +03:00
3f3769ba76

ggml : Enable MMA for BF16 in llamafile_sgemm (#13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
2f567611c0

llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245) Jared Van Bortel 2025-05-02 11:42:30 -04:00
7d2123484e

convert : use correct context length for nomic-embed-text-v2 (#13216) Jared Van Bortel 2025-05-02 11:41:54 -04:00
074e42ab31

convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209) Xuan-Son Nguyen 2025-05-02 17:17:15 +02:00
c642bc014c

kv-cache : separate recurrent vs non-recurrent impl (#12799) Georgi Gerganov 2025-05-02 17:48:36 +03:00
cb06a3c363

llama : orion rope type is neox (#13261) Sigbjørn Skjæret 2025-05-02 12:44:24 +02:00
626083faf7

llama : plamo rope type is neox (#13260) Sigbjørn Skjæret 2025-05-02 12:40:56 +02:00
2af6880178

llama-chat : reset glmedge chat template (#13253) piDack 2025-05-02 17:06:09 +08:00
e84773ab60

mtmd-cli : fix out_of_range when input image path is empty (#13244) Shakil Ahmed 2025-05-02 14:20:27 +06:00
fab647e884

server : add cache reuse card link to help (#13230) Georgi Gerganov 2025-05-02 09:48:31 +03:00
dcf886007d

convert : explicitly disable trust_remote_code for AutoConfig (#13246) Xuan-Son Nguyen 2025-05-02 08:45:10 +02:00
d24d592808

ci: fix cross-compile sync issues (#12804) bandoti 2025-05-01 19:06:39 -03:00
8efbdadc61

rpc : avoid uninitialized memory in serialize_tensor (#13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
f057808ffa

ggml: Don't assert fail when tensor data changes (#13222) Jesse Gross 2025-05-01 13:46:10 -07:00
d7a14c42a1

build : fix build info on windows (#13239) Diego Devesa 2025-05-01 21:48:08 +02:00
b6e4ff69b8

clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237) Loïc Carrère 2025-05-01 21:32:21 +02:00
e0f572c846

llama-chat : update GLM4 chat template (#13238) matteo 2025-05-01 21:16:38 +02:00
79f26e9e12

vulkan: Add bfloat16 support (#12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
fc727bcdd5

vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (#13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
b0ecbd434b

test: non-cont. b in test-backend-ops -o MUL_MAT (#13187) Johannes Gäßler 2025-05-01 20:18:56 +02:00
b1dd4d08e8 sync : ggml Georgi Gerganov 2025-05-01 17:07:13 +03:00
99881f77d8 whisper : add check that target name exists (whisper/3103) Daniel Bevenius 2025-05-01 10:05:24 +02:00
b5769d92b4 ggml : suppress Windows compiler warnings (whisper/3075) Daniel Bevenius 2025-04-29 15:47:55 +02:00
8936784f7a

mtmd : add **vision** support for Mistral Small 3.1 (#13231) Xuan-Son Nguyen 2025-05-01 17:05:42 +02:00