llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

5dd5d1ab00

vocab : use string_view::find() to avoid unnecessary looking up beyond the fragment range (#12706) yumeyao 2025-04-03 23:32:54 +08:00
1c059995e0

vulkan: Fix missing cmake logic for dot product extension (#12721) Jeff Bolz 2025-04-03 10:08:26 -05:00
2004644b7a

ci : add env variable in ggml-ci and document the same in SYCL.md (#12736) Atharva Dubey 2025-04-03 13:12:39 +01:00
5f696e88e0

sync : minja (inclusionAI/Ling) and update tests (#12699) R0CKSTAR 2025-04-03 19:51:35 +08:00
193c3e03a6

fix MUSA compiler warning (#12704) a3sh 2025-04-03 15:32:55 +08:00
65cfe136a0

CANN: Support operator SIN COS ARGMAX (#12709) Chenguang Li 2025-04-03 15:18:08 +08:00
3f9da22c2b

Simplify and improve CUDA graphs through use of indirect copy pointers (#9017) Alan Gray 2025-04-03 02:31:15 +01:00
2a0dc97e56

CANN: Fix failed test cases (#12708) hipudding 2025-04-03 08:49:51 +08:00
97a20c012b

opencl: use max_alloc_size in backend ctx instead of querying again (#12705) lhez 2025-04-02 17:01:42 -07:00
f01bd02376

vulkan: Implement split_k for coopmat2 flash attention. (#12627) Jeff Bolz 2025-04-02 14:25:08 -05:00
6f3bd38640

cmake: remove caching from vulkan coopmat checks (#12719) bandoti 2025-04-02 14:56:26 -03:00
be0a0f8cae

vulkan: Implement grouped query attention in the coopmat2 FA shader (#12559) Jeff Bolz 2025-04-02 12:40:32 -05:00
92e3006bb6

Vulkan: Fix mmq int dot float cache size (#12722) 0cc4m 2025-04-02 19:12:30 +02:00
833e2b7409

model : print tensor size during load (#12711) Georgi Gerganov 2025-04-02 16:38:54 +03:00
e0e912f49b

llama : add option to override model tensor buffers (#11397) Diego Devesa 2025-04-02 14:52:01 +02:00
a10b36c91a

llama : refactor kv cache guard (#12695) Georgi Gerganov 2025-04-02 14:32:59 +03:00
83a88bd6af

vocab : BailingMoE : change possessive quantifiers to greedy (#12677) Sigbjørn Skjæret 2025-04-02 11:21:48 +02:00
42eb248f46

common : remove json.hpp from common.cpp (#12697) Xuan-Son Nguyen 2025-04-02 09:58:34 +02:00
9bacd6b374

[CANN] get_rows and dup optimization (#12671) Chenguang Li 2025-04-02 15:22:13 +08:00
267c1399f1

common : refactor downloading system, handle mmproj with -hf option (#12694) Xuan-Son Nguyen 2025-04-01 23:44:05 +02:00
f423981ac8

opencl : fix memory allocation size (#12649) Junil Kim 2025-04-02 01:54:34 +09:00
e39e727e9a

llama : use LLM_KV_GENERAL_FILE_TYPE instead of gguf_find_key (#12672) jklincn 2025-04-01 20:54:28 +08:00
5936a616e4

convert : BailingMoE : fix qkv split when head_dim is 0 (#12687) Sigbjørn Skjæret 2025-04-01 14:37:13 +02:00
3fd072a540

metal : use F32 prec in FA kernels (#12688) Georgi Gerganov 2025-04-01 14:57:19 +03:00
a6f32f0b34

Fix clang warning in gguf_check_reserved_keys (#12686) R0CKSTAR 2025-04-01 19:12:53 +08:00
2bb3597e42

vulkan: fix build when glslc doesn't support coopmat (#12683) Wagner Bruna 2025-04-01 06:38:07 -03:00
8293970542

SYCL: Rename oneMKL to oneMath (#12192) Romain Biessy 2025-04-01 10:24:29 +02:00
8bbf26083d

SYCL: switch to SYCL namespace (#12674) Akarshan Biswas 2025-04-01 13:41:39 +05:30
35782aeedb

convert : BailingMoE : avoid setting rope_dim to 0 (#12678) Sigbjørn Skjæret 2025-03-31 23:09:48 +02:00
c80a7759da

vocab : add special infill tokens for CodeLlama (#11850) Daniel Bevenius 2025-03-31 18:40:56 +02:00
250d7953e8

ggml : faster ssm scan (#10558) a3sh 2025-04-01 00:05:13 +08:00
403fbacbbc

convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present (#12667) Sigbjørn Skjæret 2025-03-31 16:36:25 +02:00
a8a1f33567

Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135) 0cc4m 2025-03-31 14:37:01 +02:00
1790e73157 cmake : fix whitespace (#0) Georgi Gerganov 2025-03-31 15:05:30 +03:00
0114a32da0 sync : ggml Georgi Gerganov 2025-03-31 14:59:21 +03:00
a7724480fd cmake: improve Vulkan cooperative matrix support checks (whisper/2966) Sandro Hanea 2025-03-31 12:44:36 +02:00
1a85949067

llava : proper description fix (#12668) Sigbjørn Skjæret 2025-03-31 11:28:30 +02:00
6c02a032fa

SYCL: Remove misleading ggml_sycl_op_flatten function (#12387) Akarshan Biswas 2025-03-31 14:55:24 +05:30
f52d59d771

llava : fix clip loading GGUFs with missing description (#12660) Sigbjørn Skjæret 2025-03-31 11:07:07 +02:00
52de2e5949

tts : remove printfs (#12640) marcoStocchi 2025-03-31 10:20:30 +02:00
2c3f8b850a

llama : support BailingMoE (Ling) (#12634) Sigbjørn Skjæret 2025-03-30 22:21:03 +02:00
4663bd353c

metal : use constexpr in FA kernels + fix typedef (#12659) Georgi Gerganov 2025-03-30 22:04:04 +03:00
b3de7cac73

llama : add Trillion 7B model support (#12556) Juyoung Suk 2025-03-31 03:38:33 +09:00
7242dd9675

llama-chat : Add Yandex instruct model template support (#12621) Sergei Vorobyov 2025-03-30 21:12:03 +03:00
492d7f1ff7

musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (#12611) R0CKSTAR 2025-03-30 16:59:38 +08:00
d3f1f0acfb sync : ggml Georgi Gerganov 2025-03-29 15:37:54 +02:00
360dc22c00 cpu : rm unused variable (ggml/1166) Xuan-Son Nguyen 2025-03-29 11:59:56 +01:00
a62d7fa7a9 cpu: de-duplicate some of the operators and refactor (ggml/1144) cmdr2 2025-03-29 11:37:13 +05:30
e408d4351a ggml : add logging for native build options/vars (whisper/2935) Daniel Bevenius 2025-03-24 09:53:38 +01:00
3891e183c6 examples : command.wasm updates (whisper/2904) Daniel Bevenius 2025-03-20 07:02:18 +01:00
af6ae1efb2

llama : fix non-causal mask for gemma 3 (#12615) Xuan-Son Nguyen 2025-03-30 00:07:37 +01:00
0bb2919335

llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU (#12632) Djip007 2025-03-29 14:07:37 +01:00
a69f846351

cmake : fix ccache conflict (#12522) Jay 2025-03-29 18:04:58 +08:00
d07a0d7a79

CANN : remove clang-format in ggml-cann (#12607) hipudding 2025-03-29 18:03:28 +08:00
3714c3ee1a

llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631) Sigbjørn Skjæret 2025-03-28 22:13:02 +01:00
b4ae50810e

metal : improve FA + improve MoE (#12612) Georgi Gerganov 2025-03-28 20:21:59 +02:00
b86f600723

vulkan: fix coopmat shader generation when cross-compiling (#12272) Icenowy Zheng 2025-03-29 01:51:06 +08:00
dd373dd3bf

llama: fix error on bad grammar (#12628) Johannes Gäßler 2025-03-28 18:08:52 +01:00
5d01670266

server : include speculative decoding stats when timings_per_token is enabled (#12603) Benson Wong 2025-03-28 01:05:44 -07:00
ef03229ff4

rpc : update README for cache usage (#12620) Radoslav Gerganov 2025-03-28 09:44:13 +02:00
13731766db

llamafile : ppc64le GEMV forwarding for FP32. (#12594) amritahs-ibm 2025-03-28 13:13:22 +05:30
ab6ab8f809

rpc : send hash when tensor data is above some fixed threshold (#12496) Radoslav Gerganov 2025-03-28 08:18:04 +02:00
2099a9d5db

server : Support listening on a unix socket (#12613) Piotr 2025-03-27 23:41:04 +01:00
2969019837

media : add SVG logo [no ci] (#12616) Georgi Gerganov 2025-03-27 23:09:05 +02:00
5dec47dcd4

opencl: add multi and vision rope, gelu_quick and im2col (#12600) lhez 2025-03-27 08:08:08 -07:00
f125b8dccf

llama : add PLM GGUF Conversion & Inference Support (#12457) Si1w 2025-03-27 10:49:15 +00:00
953c2a62cf

model : restore support for T5Encoder (#12590) HighDoping 2025-03-27 18:43:33 +08:00
d5c6309d91

convert : Support Qwen2_5_VLForConditionalGeneration (#12595) Csaba Kecskemeti 2025-03-27 03:11:23 -07:00
029c693fdc sync : ggml Georgi Gerganov 2025-03-27 09:36:13 +02:00
771d84371c scripts : update sync + fix cmake merge Georgi Gerganov 2025-03-27 09:22:30 +02:00
df0665a483 sync : ggml Georgi Gerganov 2025-03-27 09:01:21 +02:00
0306aad1ca cmake : sync/merge PowerPC build commands (#0) Georgi Gerganov 2025-03-27 09:00:57 +02:00
c7b43ab608

llamafile : ppc64le MMA implementation for Q4_0. (#12489) amritahs-ibm 2025-03-27 12:21:47 +05:30
24feaec057

ggml : riscv: add 128-bit RVV support (#12530) xctan 2025-03-27 14:38:34 +08:00
f28bc4c286

llama : make loras compatible with repacking (#12593) Georgi Gerganov 2025-03-27 08:24:10 +02:00
f17a3bb4e8

SYCL: implement memset ggml backend buffer interface (#12580) Akarshan Biswas 2025-03-27 07:16:00 +05:30
bd40678df7

HIP: Add support for RDNA4 targets (#12372) Slobodan Josic 2025-03-26 23:46:30 +01:00
b3298fa47a

metal : refactor mat-vec code (#12569) Georgi Gerganov 2025-03-26 21:38:38 +02:00
2447ad8a98

upgrade to llguidance 0.7.10 (#12576) Michał Moskal 2025-03-26 11:06:09 -07:00
02082f1519

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566) Ivy233 2025-03-26 22:06:04 +08:00
df4d20cd53

convert : fix squeeze for ssm_conv tensors (#12573) Georgi Gerganov 2025-03-26 14:21:05 +02:00
5ed38b6852

ggml : fix MUL_MAT_ID repack with Q8_K (#12544) Georgi Gerganov 2025-03-26 13:02:00 +02:00
fd7855f8f5

doc: [MUSA] minor changes (#12583) R0CKSTAR 2025-03-26 15:09:48 +08:00
53af4dba42

convert: fix Mistral3/Gemma3 model hparams init (#12571) Sigbjørn Skjæret 2025-03-25 23:03:10 +01:00
ef19c71769

run: de-duplicate fmt and format functions and optimize (#11596) Eric Curtin 2025-03-25 17:46:11 +00:00
053b3f9aae

ggml-cpu : update KleidiAI to v1.5.0 (#12568) Dan Johansson 2025-03-25 12:10:18 +01:00
e2f560175a

SYCL: disable Q4_0 reorder optimization (#12560) Akarshan Biswas 2025-03-25 16:10:18 +05:30
36ee06dd2d

docs : add build instructions for KleidiAI (#12563) Dan Johansson 2025-03-25 10:35:20 +01:00
3cd3a39532

ci: [MUSA] add CI and update doc (#12562) R0CKSTAR 2025-03-25 15:45:08 +08:00
2d77d88e70

context : fix worst-case reserve outputs (#12545) Georgi Gerganov 2025-03-25 09:19:23 +02:00
c95fa362b3

ci: [SYCL] ggml-ci Use main GPU and enable sysman (#12547) Akarshan Biswas 2025-03-24 23:05:38 +05:30
2b65ae3029

opencl: simplify kernel embedding logic in cmakefile (#12503) lhez 2025-03-24 09:20:47 -07:00
48d7021c61

CI: fix SYCL build (#12546) Akarshan Biswas 2025-03-24 18:28:32 +05:30
3361e2deba

docs: update: improve the Fedoa CUDA guide (#12536) Tei Home 2025-03-24 19:02:26 +08:00
00d53800e0

llama-vocab : add SuperBPE pre-tokenizer (#12532) compilade 2025-03-24 06:47:24 -04:00
7ea75035b6

CUDA: Fix clang warnings (#12540) R0CKSTAR 2025-03-24 18:28:34 +08:00
c54f6b7988

mmap : skip resource limit checks on AIX (#12541) Prajwal B Mehendarkar 2025-03-24 15:47:10 +05:30
9b169a4d4e

vulkan: fix mul_mat_vec failure in backend tests (#12529) Jeff Bolz 2025-03-24 01:56:17 -05:00
77f9c6bbe5

server : Add verbose output to OAI compatible chat endpoint. (#12246) Marius Gerdes 2025-03-23 19:30:26 +01:00
18b663d8e4

install : add macports (#12518) Lars Sonchocky-Helldorf 2025-03-23 09:21:48 +01:00