llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

80f19b4186

opencl: split ggml-opencl.cl into multiple files and cleanup (#12886) lhez 2025-04-15 12:26:00 -07:00
f8f820cc4d

metal : add FA-vec kernels for head size 96 (#12952) Georgi Gerganov 2025-04-15 14:45:05 +03:00
54a7272043

CANN: Add x86 build ci (#12950) hipudding 2025-04-15 19:08:55 +08:00
84778e9770

CUDA/HIP: Share the same unified memory allocation logic. (#12934) David Huang 2025-04-15 17:20:38 +08:00
510676475f

SYCL: Add ROPE vision kernel (#12887) Akarshan Biswas 2025-04-15 14:07:42 +05:30
daa422881a

llama : DeepSeek V2/V3 MLA implementation (#12801) Juk Armstrong 2025-04-15 07:49:57 +01:00
eccc7a1602

ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829) Srihari-mcw 2025-04-15 11:52:36 +05:30
0019279bb5

CANN: Opt ROPE optimization (#12865) Chenguang Li 2025-04-15 10:09:35 +08:00
b0c75ac9f9

CANN: Optimize CANN buffer pool memory management (#12875) Xinpeng Dou 2025-04-15 10:04:24 +08:00
d6d2c2ab8c

Add performance print for gemma3 in example (#12929) Russyyds 2025-04-15 01:18:20 +08:00
75afa0ae31

SYCL: Fix im2col (#12910) Akarshan Biswas 2025-04-14 17:53:53 +05:30
c772d54926

rpc : use ggml_context_ptr (#12938) Radoslav Gerganov 2025-04-14 13:59:34 +03:00
81c7e64fc2

dsiable curl lib check, this action is missed by commit bd3f59f812 (#12761) (#12937) Neo Zhang Jianyu 2025-04-14 18:19:07 +08:00
526739b879 sync : ggml Georgi Gerganov 2025-04-14 08:52:10 +03:00
a25355e264 cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190) cmdr2 2025-04-11 12:14:19 +05:30
e959d32b1c

ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773) SXX 2025-04-14 13:47:55 +08:00
307bfa253d

ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891) Alan Gray 2025-04-13 22:12:21 +01:00
71e90e8813

quantize: Handle user-defined quantization levels for additional tensors (#12511) Ed Addario 2025-04-13 19:29:28 +01:00
bc091a4dc5

common : Define cache directory on AIX (#12915) Prajwal B Mehendarkar 2025-04-12 21:03:39 +05:30
a4837577aa

vulkan: use aligned loads for flash attention mask (#12853) Jeff Bolz 2025-04-12 03:44:48 -05:00
e59ea539b8

llava: Fix cpu-only clip image encoding sefault (#12907) Matt Clayton 2025-04-12 01:29:03 -04:00
c94085df28

server : add VSCode's Github Copilot Chat support (#12896) Georgi Gerganov 2025-04-11 23:37:41 +03:00
e8a62631b3

rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903) yuri@FreeBSD 2025-04-11 13:04:14 -07:00
b6930ebc42

tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900) Olivier Chafik 2025-04-11 12:47:52 -07:00
68b08f36d0

common : Define cache directory on FreeBSD (#12892) yuri@FreeBSD 2025-04-11 12:45:44 -07:00
578754b315

sycl: Support sycl_ext_oneapi_limited_graph (#12873) Ewan Crawford 2025-04-11 15:32:14 +02:00
b2034c2b55

contrib: support modelscope community (#12664) tastelikefeet 2025-04-11 20:01:56 +08:00
06bb53ad9b

llama-model : add Glm4Model implementation for GLM-4-0414 (#12867) Yuxuan Zhang 2025-04-11 18:10:10 +08:00
0c50923944

clip : use smart pointer (⚠️ breaking change) (#12869) Xuan-Son Nguyen 2025-04-11 12:09:39 +02:00
fccf9cae83

SYCL: Add fp16 type support to unary op kernels (#12788) Akarshan Biswas 2025-04-11 13:33:50 +05:30
ec6c09d0fa

convert : Llama4 RoPE fix (#12889) Daniel Han 2025-04-11 00:49:09 -07:00
8ac9f5d765

ci : Replace freediskspace to free_disk_space in docker.yml (#12861) R0CKSTAR 2025-04-11 15:26:17 +08:00
12e9158f25

xcf : add check for visionos build version (#12854) Daniel Bevenius 2025-04-11 09:24:34 +02:00
5b1f13cb64

convert : proper tensor name mapping for llama4 (#12870) Xuan-Son Nguyen 2025-04-11 09:23:37 +02:00
8b91d5355a

llama : correct rms norm for llama 4 (#12882) Xuan-Son Nguyen 2025-04-11 08:49:50 +02:00
0fed24c347

ggml: fix compilation error s390x (#12848) Aaron Teo 2025-04-11 13:20:07 +08:00
47ba87d0a4 sync : ggml Georgi Gerganov 2025-04-11 00:08:23 +03:00
1d2b613445 tests : fix init order (#0) Georgi Gerganov 2025-04-11 00:04:25 +03:00
eb420e1148 sync : ggml Georgi Gerganov 2025-04-10 23:59:16 +03:00
cb79c2e7fa ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) cmdr2 2025-04-10 17:53:08 +05:30
fe92821ea9 ggml : add bilinear upscale support (ggml/1185) Diego Devesa 2025-04-09 12:32:13 +02:00
459895c326 ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) Diego Devesa 2025-04-09 12:31:34 +02:00
e4bf72d631 scripts : fix sync-ggml-am.sh Georgi Gerganov 2025-04-10 23:59:01 +03:00
8b9cc7cdd8

llava : introduce libmtmd (#12849) Xuan-Son Nguyen 2025-04-10 22:57:16 +02:00
64eda5deb9

convert : ability to lazy-load safetensors remotely without downloading to disk (#12820) Xuan-Son Nguyen 2025-04-10 17:24:44 +02:00
fe5b78c896

CANN: Support more ops (#12841) Chenguang Li 2025-04-10 08:51:52 +08:00
11d07e1e69

Fixes #12823 (#12830) Prajwal B Mehendarkar 2025-04-10 04:48:01 +05:30
b0091ecc1e

docker : added all CPU to GPU images (#12749) Rudi Servo 2025-04-09 23:17:12 +00:00
31f7803bc4

ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856) Piotr Kubaj 2025-04-09 23:00:34 +00:00
2391506ace

ggml-impl.h: fix build on POWER9 (#12855) Piotr Kubaj 2025-04-09 23:00:25 +00:00
d3bd7193ba

llama : Support Qwen3 and Qwen3MoE (#12828) Bo Zheng 2025-04-09 17:47:36 +08:00
d9a63b2f2e

musa: enable freediskspace for docker image build (#12839) R0CKSTAR 2025-04-09 17:22:30 +08:00
8ed71242f4

sycl: update documentation to use -no-cnv (#12845) Romain Biessy 2025-04-09 11:22:04 +02:00
381603a775

ci: detach common from the library (#12827) Plamen Minev 2025-04-09 11:11:11 +03:00
65a69e6e1b

clip : do not print ftype (#12832) Xuan-Son Nguyen 2025-04-09 10:09:53 +02:00
47277d6d1d

readme : add rpc backend (#12842) Georgi Gerganov 2025-04-09 10:54:42 +03:00
6e1c4cebdb

CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786) Chenguang Li 2025-04-09 14:04:14 +08:00
0090950f67

vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (#12833) Jeff Bolz 2025-04-09 00:25:08 -05:00
7ecd780b1a

vulkan: Use fp16 for the flash attention P*V multiplication (#12783) Jeff Bolz 2025-04-09 00:12:57 -05:00
7538246e7c

cuda : add f32 to bf16 copy op (#12806) Sigbjørn Skjæret 2025-04-08 23:21:31 +02:00
b32efad2bc

llava: improve clip_ctx destructor to not memleak load_image_size (#12834) Matt Clayton 2025-04-08 16:01:58 -04:00
a19b5cef16

llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) Georgi Gerganov 2025-04-08 19:54:51 +03:00
78a1ba0a4f

server : fix thread.join() on exit (#12831) Xuan-Son Nguyen 2025-04-08 18:37:06 +02:00
2dabf759e7

llava: add more helper functions to check projector types in clip context (#12824) dm4 2025-04-08 21:49:13 +08:00
1d343b4069

arg : Including limits file on AIX (#12822) Prajwal B Mehendarkar 2025-04-08 18:00:59 +05:30
8ca6e1c3a4

server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785) characharm 2025-04-08 14:14:59 +05:00
656babd6c2

Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812) Neo Zhang Jianyu 2025-04-08 15:03:21 +08:00
a226bc7a9a

gguf-py : support lazy tensor splitting (#12809) compilade 2025-04-08 03:03:07 -04:00
1466621e73

llama : Support llama 4 text-only (#12791) Xuan-Son Nguyen 2025-04-07 23:06:44 +02:00
82974011f3

opencl: better identify Adreno GPU (#12760) lhez 2025-04-07 13:22:54 -07:00
4ccea213bc

hellaswag: display estimated score confidence interval (#12797) stduhpf 2025-04-07 17:47:08 +02:00
1a1ab7e7a4 cuda : fix HIP and MUSA BF16 (#0) Georgi Gerganov 2025-04-07 13:18:07 +03:00
a4e46e28f9 sync : ggml Georgi Gerganov 2025-04-07 12:32:39 +03:00
ff067dbcb9 ggml : simplify Arm fp16 CPU logic (ggml/1177) Georgi Gerganov 2025-04-07 12:25:15 +03:00
36ca8b3628 CUDA: don't convert BF16 weights to FP32 (ggml/1174) Sigbjørn Skjæret 2025-04-04 21:05:12 +02:00
995083e4ed cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) cmdr2 2025-04-02 17:46:16 +05:30
518a01480e

sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (#12734) zhouwg 2025-04-07 23:22:57 +08:00
e391d3ee8d

ci : no curl on ggml-ci (#12796) Xuan-Son Nguyen 2025-04-07 14:37:28 +02:00
bd3f59f812

cmake : enable curl by default (#12761) Xuan-Son Nguyen 2025-04-07 13:35:19 +02:00
52b3d71f12

CANN: fix typo in ggml-cann (#12733) zhouwg 2025-04-07 19:34:14 +08:00
d0d5b2232b

CANN: Refactor to reduce duplicate code (#12731) hipudding 2025-04-07 17:10:36 +08:00
916c83bfe7

musa: fix compilation warnings in mp_22/31 (#12780) R0CKSTAR 2025-04-06 21:23:54 +08:00
0c74b04376

vulkan: fix NaN issue in flash attention shader (#12776) Jeff Bolz 2025-04-06 04:03:47 -05:00
80b717d493

vulkan: Use unclamped loads for flash attention mask (#12720) Jeff Bolz 2025-04-06 03:47:13 -05:00
6bf28f0111

Vulkan: Tune Vulkan mmq int dot shader for performance (#12767) 0cc4m 2025-04-05 18:04:03 +02:00
f1e3eb4249

common : fix includes in arg.cpp and gemma3-cli.cpp (#12766) Sergey Fedorov 2025-04-05 23:46:00 +08:00
0364178ca2

clip : refactor clip_init, add tests (#12757) Xuan-Son Nguyen 2025-04-05 17:17:40 +02:00
c6ff5d2a8d

common: custom hf endpoint support (#12769) エシュナヴァリシア 2025-04-05 21:31:42 +08:00
7a84777f42

sync: minja (#12739) Olivier Chafik 2025-04-04 13:16:39 -07:00
3e1d29348b

kv-cache : simplify + fix warning for recurrent models (#12756) Georgi Gerganov 2025-04-04 21:48:10 +03:00
1be76e4620

ci: add Linux cross-compile build (#12428) bandoti 2025-04-04 14:05:12 -03:00
b772394297

server : webui : Upgrade daisyui, tailwindcss. (#12735) Nauful Shaikh 2025-04-04 09:09:52 -05:00
23106f94ea

gguf-split : --merge now respects --dry-run option (#12681) nick huang 2025-04-04 22:09:12 +08:00
94148ba330

sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (#12625) Nicolò Scipione 2025-04-04 16:00:46 +02:00
9ac4d611d0

cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747) Ronny Brendel 2025-04-04 15:12:40 +02:00
348888e0dc

docs : add XCFramework section to README.md [no ci] (#12746) Daniel Bevenius 2025-04-04 10:24:12 +02:00
74d4f5b041

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (#12630) Jeff Bolz 2025-04-04 00:54:35 -05:00
35e592eb30

vulkan: set cmake minimum and project name in vulkan-shaders (#12744) Jeff Bolz 2025-04-04 00:53:20 -05:00
7d7b1bafa7

opencl: update doc for OpenCL (#12702) lhez 2025-04-03 22:18:17 -07:00
c262beddf2

CUDA: Prefer vector flash decoding kernel for Gemma models (#12738) Gaurav Garg 2025-04-03 21:50:29 +05:30