llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

9596506965

kv-cache : fix split_equal handling in unified implementation (#14130) Georgi Gerganov 2025-06-12 10:02:15 +03:00
a20b2b05bc

context : round n_tokens to next multiple of n_seqs when reserving (#14140) compilade 2025-06-12 02:56:04 -04:00
2e89f76b7a

common: fix issue with regex_escape routine on windows (#14133) bandoti 2025-06-11 17:19:44 -03:00
532802f938

Implement GGML_CPU_ALL_VARIANTS for ARM (#14080) Christian Kastner 2025-06-11 19:07:44 +00:00
d4e0d95cf5

chore : clean up relative source dir paths (#14128) Sigbjørn Skjæret 2025-06-11 19:04:23 +02:00
cc66a7f78f

tests : add test-tokenizers-repo (#14017) Sigbjørn Skjæret 2025-06-11 17:16:32 +02:00
bd248d4dc7

vulkan: Better thread-safety for command pools/buffers (#14116) Jeff Bolz 2025-06-11 09:48:52 -05:00
7781e5fe99

webui: Wrap long numbers instead of infinite horizontal scroll (#14062) Aman 2025-06-11 22:42:25 +08:00
89a184fa71

kv-cache : relax SWA masking condition (#14119) Georgi Gerganov 2025-06-11 16:48:45 +03:00
2baf07727f

server : pass default --keep argument (#14120) Taylor 2025-06-11 06:43:43 -04:00
7ae2932116

kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121) Georgi Gerganov 2025-06-11 12:52:45 +03:00
1f7d50b293

vulkan: Track descriptor pools/sets per-context (#14109) Jeff Bolz 2025-06-11 00:19:25 -05:00
4c763c8d1b

opencl: add mul_mv_id_q4_0_f32_8x_flat (#14003) lhez 2025-06-10 16:55:58 -07:00
dad5c44398

kv-cache : avoid modifying recurrent cells when setting inputs (#13834) compilade 2025-06-10 18:20:14 -04:00
55f6b9fa65

convert : fix duplicate key DeepSeek-R1 conversion error (#14103) Sigbjørn Skjæret 2025-06-10 23:29:52 +02:00
3678b838bb

llama : support GEGLU for jina-bert-v2 (#14090) Sigbjørn Skjæret 2025-06-10 18:02:08 +02:00
652b70e667

vulkan: force device 0 in CI (#14106) Jeff Bolz 2025-06-10 10:53:47 -05:00
3a12db23b6

Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104) Juk Armstrong 2025-06-10 16:48:07 +01:00
ae92c1855b sync : ggml Georgi Gerganov 2025-06-10 17:37:45 +03:00
b7ce1ad1e3 ggml : fix weak alias win32 (whisper/0) Georgi Gerganov 2025-06-10 11:34:10 +03:00
97340b4c99

Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099) 0cc4m 2025-06-10 14:01:33 +02:00
2bb0467043

rpc : nicer error messages for RPC server crash (#14076) Isaac McFadyen 2025-06-10 02:41:01 -04:00
b8e2194efc sync : ggml Georgi Gerganov 2025-06-10 09:20:51 +03:00
1a3b5e80f7 Add in-build ggml::ggml ALIAS library (ggml/1260) Kai Pastor 2025-06-03 12:33:28 +02:00
1f63e75f3b

metal : use less stack memory in FA kernel (#14088) Georgi Gerganov 2025-06-09 23:05:02 +03:00
40cbf571c9

kv-cache : fix shift and defrag logic (#14081) Georgi Gerganov 2025-06-09 23:04:35 +03:00
7f4fbe5183

llama : allow building all tests on windows when not using shared libs (#13980) Diego Devesa 2025-06-09 11:03:09 -07:00
f470bc36be

ggml-cpu : split arch-specific implementations (#13892) xctan 2025-06-09 22:47:13 +08:00
8f47e25f56

cuda : fix device sync on buffer clear (#14033) Diego Devesa 2025-06-09 07:36:26 -07:00
201b31dc2e

graph : fix geglu (#14077) Georgi Gerganov 2025-06-09 17:17:31 +03:00
e21d2d4ae2

CANN: Simplify the environment variable setting(#13104) Xinpeng Dou 2025-06-09 19:47:39 +08:00
dc0623fddb

webui: fix sidebar being covered by main content (#14082) R0CKSTAR 2025-06-09 18:01:17 +08:00
87d34b381d

server : fix LRU check (#14079) Georgi Gerganov 2025-06-09 12:57:58 +03:00
b460d16ae8

sycl: Add reorder to Q6_K mmvq implementation (#13885) Nicolò Scipione 2025-06-09 11:47:07 +02:00
91a8ee6a6f

add geglu activation function (#14074) Đinh Trọng Huy 2025-06-09 13:15:31 +09:00
056eb74534

CANN: Enable labeler for Ascend NPU (#13914) Yuanhao Ji 2025-06-09 11:20:06 +08:00
247e5c6e44

cuda : fix buffer type check with integrated GPUs (#14069) Diego Devesa 2025-06-08 11:39:56 -07:00
5787b5da57

ci: add LoongArch cross-compile build (#13944) 吴小白 2025-06-07 21:39:11 +08:00
228f34c9ce

SYCL: Implement few same quantized type copy kernels (#13739) Akarshan Biswas 2025-06-07 18:58:20 +05:30
0974ad7a7c

llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050) Sigbjørn Skjæret 2025-06-07 14:13:12 +02:00
745aa5319b

llama : deprecate llama_kv_self_ API (#14030) Georgi Gerganov 2025-06-06 14:11:15 +03:00
487a5e0401

context : fix SWA-related warning for multiple sequences (#14045) Georgi Gerganov 2025-06-06 13:29:18 +03:00
d17a809ef0

llama : support multiple classifier outputs and labels (#13940) Sigbjørn Skjæret 2025-06-06 09:03:25 +02:00
1caae7fc6c

gguf-py : add add_classifier_output_labels method to writer (#14031) Sigbjørn Skjæret 2025-06-05 17:42:31 +02:00
669c13e0f6

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001) Masato Nakasaka 2025-06-05 23:00:29 +09:00
146b88e8b3

ci: fix CUDA build failure on autodl cloud machines (#14005) pockers21 2025-06-05 06:25:29 -07:00
7f37b6cf1e

memory : migrate from llama_kv_cache to more generic llama_memory (#14006) Georgi Gerganov 2025-06-05 15:29:22 +03:00
3a077146a4

llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) Diego Devesa 2025-06-05 02:57:42 -07:00
d01d112abb

readme : add badge (#13938) Olexandr88 2025-06-05 10:50:55 +03:00
9f47fa5792

vocab : warn about missing mask token (#14022) Sigbjørn Skjæret 2025-06-05 09:29:18 +02:00
9e31bec4fd

context : fix pos_min initialization upon error decode (#14008) Georgi Gerganov 2025-06-05 09:06:29 +03:00
5a8ae3053c

vulkan: automatically deduce size of push constants (#13936) Jeff Bolz 2025-06-05 00:17:58 -05:00
0d3984424f

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813) Ervin Áron Tasnádi 2025-06-04 22:02:00 +02:00
3e63a58ef7

kv-cache : refactor the update/defrag mechanism (#13988) Georgi Gerganov 2025-06-04 18:58:20 +03:00
2589ad3704

ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) Diego Devesa 2025-06-04 06:37:40 -07:00
482548716f

releases : use dl backend for linux release, remove arm64 linux release (#13996) Diego Devesa 2025-06-04 04:15:54 -07:00
3ac67535c8

llama-graph : use ggml_repeat_4d (#13998) Xuan-Son Nguyen 2025-06-04 10:11:26 +02:00
0b4be4c435

CUDA: fix FTZ in FA for Gemma 3 (#13991) Johannes Gäßler 2025-06-04 08:57:05 +02:00
e0e806f52e

kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985) Georgi Gerganov 2025-06-04 09:50:32 +03:00
7e00e60ef8

vulkan: fix warnings in perf logger querypool code (#13937) Jeff Bolz 2025-06-03 13:30:22 -05:00
ea1431b0fa

docs : add "Quick start" section for new users (#13862) Xuan-Son Nguyen 2025-06-03 13:09:36 +02:00
71e74a3ac9

opencl: add backend_synchronize (#13939) lhez 2025-06-02 16:54:58 -07:00
bfb1e012a0

OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840) rmatif 2025-06-02 23:53:36 +00:00
3637576288

server : disable speculative decoding for SWA models (#13970) Georgi Gerganov 2025-06-02 21:34:40 +03:00
ea394d7ab1

metal : use F32 accumulators in FA kernels (#13975) Georgi Gerganov 2025-06-02 21:33:40 +03:00
5582c49c39

gemma : more consistent attention scaling for v2 and v3 (#13951) Georgi Gerganov 2025-06-02 20:54:26 +03:00
c9bbc77931

server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933) Olivier Chafik 2025-06-02 10:15:44 -07:00
bfd322796c

mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961) Xuan-Son Nguyen 2025-06-02 16:29:28 +02:00
093e3f1feb

cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966) shalinib-ibm 2025-06-02 17:48:36 +05:30
663445b0de

sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826) Atharva Dubey 2025-06-02 10:12:20 +01:00
7675c555a1

gguf: fix failure on version == 0 (#13956) Johannes Gäßler 2025-06-01 18:08:05 +02:00
5e1c3aed40

convert : fix nomic-bert-moe mask token (#13757) Sigbjørn Skjæret 2025-06-01 18:07:21 +02:00
c496fe0b1d

convert : fix vocab padding code for bert models (#13954) Sigbjørn Skjæret 2025-06-01 17:23:11 +02:00
e57bb87ced

ggml: check if non-native endian model is being loaded (#13943) Aaron Teo 2025-06-01 22:53:57 +08:00
f3a4b1659c sync : ggml Georgi Gerganov 2025-06-01 12:23:14 +03:00
108009f5c7 vulkan : Remove unexpected ; (ggml/1253) Kai Pastor 2025-05-31 12:49:55 +02:00
d337252acf cmake : Fix broken CMake error messages (ggml/1252) Kai Pastor 2025-05-31 12:39:19 +02:00
af6f91db47 ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) Radoslav Gerganov 2025-05-30 09:11:09 +03:00
a7b8d35f78 sync : whisper.cpp (ggml/1250) Georgi Gerganov 2025-05-29 13:29:50 +03:00
6eba72b71c ggml : install dynamic backends (ggml/1240) Radoslav Gerganov 2025-05-29 08:34:46 +03:00
fedf034a98 ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) Daniel Tang 2025-05-27 20:58:46 -04:00
8726392d3d

readme : update bindings (#13950) ddh0 2025-06-01 03:44:30 -05:00
c04621711a

parallel : fix n_junk == 0 (#13952) Georgi Gerganov 2025-06-01 11:42:16 +03:00
0fc16b42e8

kv-cache : split implementation in separate sources (#13920) Georgi Gerganov 2025-06-01 11:39:27 +03:00
053b1539c0

threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995) Max Krasnyansky 2025-05-31 15:39:19 -07:00
b3a89c3d9e

docs : Note about necessity of having libcurl installed for standard build. (#13945) Jiří Podivín 2025-05-31 18:58:35 +02:00
e15898d1c7

server: allow unclosed thinking tags (#13931) Olivier Chafik 2025-05-31 08:26:10 -07:00
803f8baf4f

llama : deprecate explicit kv_self defrag/update calls (#13921) Georgi Gerganov 2025-05-31 15:58:33 +03:00
3600cc2886

llama : use n_swa + n_ubatch cells for SWA cache (#13833) Georgi Gerganov 2025-05-31 15:57:44 +03:00
c7e0a2054b

webui : Replace alert and confirm with custom modals. (#13711) igardev 2025-05-31 12:56:08 +03:00
3f55f781f1

llama : auto-batch preparation (#13845) Georgi Gerganov 2025-05-31 12:55:57 +03:00
51fa76f172

mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change) (#13917) Xuan-Son Nguyen 2025-05-31 10:14:29 +02:00
12d0188c0d

kv-cache : refactor + add llama_memory_state_i (#13746) Georgi Gerganov 2025-05-31 10:24:04 +03:00
eb3949938e

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (#13895) Shawn yang 2025-05-31 14:48:04 +08:00
e562eece7c

CUDA: fix typo in FlashAttention code (#13926) Johannes Gäßler 2025-05-30 21:22:03 +02:00
b47ab7b8e9

sched : avoid changing cur_copy when a graph is already allocated (#13922) Diego Devesa 2025-05-30 09:56:19 -07:00
dd665cc9d4

parallel : increase the variability of the prompt lengths (#13927) Georgi Gerganov 2025-05-30 19:38:07 +03:00
df0c0c7d02

cuda : prevent using split buffers with 3d/4d matrices (#13919) Diego Devesa 2025-05-30 07:37:18 -07:00
b49a8ff96b

SYCL: Add mrope kernel (#13755) Akarshan Biswas 2025-05-30 19:40:57 +05:30
53f925074d

sync : vendor (#13901) Georgi Gerganov 2025-05-30 16:25:45 +03:00