llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

ccc4c16970 Update rocm build master ver4a 2025-05-10 01:01:02 +02:00
3a9457df96

vulkan: update windows SDK in CI (#14334) Jeff Bolz 2025-06-23 03:19:24 -05:00
fa4a9f2a1c

quantize : handle user-defined pruning of whole layers (blocks) (#13037) Ed Addario 2025-06-22 22:16:26 +01:00
238005c2dc

gguf-py : fix SpecialVocab parsing when post_processor is null (#14330) Sigbjørn Skjæret 2025-06-22 19:46:17 +02:00
66aba7aca9

run : avoid double tokenization (#14327) Ruikai Peng 2025-06-23 01:28:06 +08:00
f1f5e82df6

examples : fix is_first logic for tokenization (#14329) Georgi Gerganov 2025-06-22 20:10:07 +03:00
af3373f1ad

HIP: enable vec fattn on RDNA4 (#14323) uvos 2025-06-22 16:51:23 +02:00
5d5c066de8

mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326) yuiseki 2025-06-22 21:44:57 +09:00
40bfa04c95

common : use std::string_view now that we target c++17 (#14319) Sigbjørn Skjæret 2025-06-22 07:37:43 +02:00
aa064b2eb7

CUDA: add mean operation (#14313) Aman Gupta 2025-06-22 12:39:54 +08:00
aa0ef5c578

gguf-py : fix Qwen3-Embedding eos token (#14314) Sigbjørn Skjæret 2025-06-21 18:12:05 +02:00
bb16041cae

Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792) Markus Tavenrath 2025-06-21 08:17:12 +02:00
58cba76a9a

gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) Sigbjørn Skjæret 2025-06-21 07:33:21 +02:00
67ae5312e2

metal : fix thread-safety (#14300) Georgi Gerganov 2025-06-21 08:04:18 +03:00
692e3cdd0a

memory : rename interface to llama_memory_context_i (#14296) Georgi Gerganov 2025-06-21 08:03:46 +03:00
b23fa0b3f4

convert : fix Llama 4 conversion (#14311) Daniel Han 2025-06-20 21:32:01 -07:00
06cbedfca1 sync : ggml Georgi Gerganov 2025-06-20 20:50:24 +03:00
b7147673f2 Add ggml_roll (ggml/1274) Acly 2025-06-18 13:34:50 +02:00
d860dd99a4

docs : fix the link to llama.h (#14293) David Chiu 2025-06-21 01:43:35 +08:00
c959f462a0

CUDA: add conv_2d_transpose (#14287) Aman Gupta 2025-06-20 22:48:24 +08:00
22015b2092

lint : remove trailing whitepace (#14304) Sigbjørn Skjæret 2025-06-20 16:37:44 +02:00
dd6e6d0b6a

vocab : prevent tokenizer overflow (#14301) Ruikai Peng 2025-06-20 22:13:06 +08:00
8308f98c7f

sycl: add usage of enqueue_functions extension (#14244) Nicolò Scipione 2025-06-20 15:07:21 +02:00
6369be0735

Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286) Christian Kastner 2025-06-20 12:17:32 +00:00
88fc854b4b

llama : improve sep token handling (#14272) Sigbjørn Skjæret 2025-06-20 14:04:09 +02:00
e28c1b93fd

cuda : synchronize graph capture and cublas handle destruction (#14288) Diego Devesa 2025-06-20 04:57:36 -07:00
d27b3ca175

ggml : fix repack work size for mul_mat_id (#14292) Georgi Gerganov 2025-06-20 11:19:15 +03:00
9230dbe2c7

ggml: Update KleidiAI to v1.9.0 (#14277) Charles Xu 2025-06-20 09:51:01 +02:00
812939a9e9

model : more uniform output id handling (#14275) Georgi Gerganov 2025-06-20 10:50:27 +03:00
4c9fdfbe15

ubatch : new splitting logic (#14217) Georgi Gerganov 2025-06-20 10:14:14 +03:00
9eaa51e7f0

CUDA: add conv_2d_dw (#14265) Aman Gupta 2025-06-20 09:50:24 +08:00
8f71d0f3e8

ggml-cpu : remove unnecesary arm feature detection (#14281) Diego Devesa 2025-06-19 12:24:14 -07:00
381174bbda

gguf-py : make sentencepiece optional (#14200) Alex Trotta 2025-06-19 09:56:12 -04:00
d67341dc18

server : add server parameters for draft model cache type (#13782) aa956 2025-06-19 16:01:03 +03:00
456af35eb7

build : suppress gcc15 compile warnings (#14261) fanyang 2025-06-19 20:49:48 +08:00
600e3e9b50

sycl: Cleanup codepaths in Get Rows in sycl backend (#14215) Anton Mitkov 2025-06-19 11:40:21 +01:00
fffcce535e

llama-bench : add --no-warmup flag (#14224) (#14270) bashayer hijji 2025-06-19 13:24:12 +03:00
5fc7856815

convert : fix remote option in Windows (#14100) pqnet 2025-06-19 12:21:40 +02:00
faed5a5f5d

llamafile : support s390x SIMD instruction set (#14273) Aaron Teo 2025-06-19 17:48:54 +08:00
10bb545c5b

Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249) 0cc4m 2025-06-19 09:15:42 +02:00
edc4a29eff

memory : Hybrid recurrent cache (#13979) Gabe Goodhart 2025-06-19 00:08:14 -05:00
ed3290ab34

metal : add mean kernel (#14267) Georgi Gerganov 2025-06-19 08:05:21 +03:00
8d94713654

docs: add s390x build documentation (#14264) Aaron Teo 2025-06-19 01:10:26 +08:00
50d2227953

ggml-cpu: reduce asm calls for hsum (#14037) Aaron Teo 2025-06-19 01:10:08 +08:00
6231c5cd6d

ggml-cpu: fix uncaught underscore terminators (#14023) Aaron Teo 2025-06-19 01:06:49 +08:00
ef035803eb

ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258) Charles Xu 2025-06-18 13:40:07 +02:00
413977de32

mtmd : refactor llava-uhd preprocessing logic (#14247) Xuan-Son Nguyen 2025-06-18 10:43:57 +02:00
95402553a5

llama-chat : fix multiple system message for gemma, orion (#14246) Xuan-Son Nguyen 2025-06-18 09:58:43 +02:00
3865cff4f5

convert : fix null head_dim AutoConfig regression (#14248) Sigbjørn Skjæret 2025-06-18 09:52:07 +02:00
d03172cc79 sync : ggml Georgi Gerganov 2025-06-18 09:58:23 +03:00
dd8e59f443 ggml : disable warnings for tests when using MSVC (ggml/1273) Daniel Bevenius 2025-06-13 15:06:42 +02:00
bbe98d2784 ggml : remove unused ggml_context_container (ggml/1272) Daniel Bevenius 2025-06-13 09:05:44 +02:00
c2056ed6d4 examples : include examples in msvc disable warn (ggml/1270) Daniel Bevenius 2025-06-12 12:27:09 +02:00
c46503014d

cmake: remove shader-gen step-targets from ggml-vulkan (#14226) bandoti 2025-06-17 17:33:25 -03:00
860a9e4eef

ggml-cpu : remove the weak alias trick (#14221) xctan 2025-06-17 17:58:32 +08:00
fe9d60e74a

musa: fix build warning (unused variable) (#14231) R0CKSTAR 2025-06-17 17:48:08 +08:00
e434e69183

common : suggest --jinja when autodetection fails (#14222) Sigbjørn Skjæret 2025-06-16 21:58:42 +02:00
89fea80d29

server : fix incorrect usage of llama_get_embeddings() (#14225) Georgi Gerganov 2025-06-16 22:33:27 +03:00
6adc3c3ebc

llama : add thread safety test (#14035) Diego Devesa 2025-06-16 08:11:43 -07:00
0dbcabde8c

cmake: clean up external project logic for vulkan-shaders-gen (#14179) bandoti 2025-06-16 10:32:13 -03:00
ad590be98c

model : add NeoBERT (#14164) Đinh Trọng Huy 2025-06-16 21:53:41 +09:00
7d6d91babf

HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202) uvos 2025-06-16 13:47:38 +02:00
d3e64b9f49

llama : rework embeddings logic (#14208) Georgi Gerganov 2025-06-16 14:14:00 +03:00
3ba0d843c6

ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206) Charles Xu 2025-06-16 11:47:57 +02:00
0bf49eb668

convert : remove arcee change in convert_hf_to_gguf_update.py (#14207) Bartowski 2025-06-16 09:16:06 +01:00
4ad243677b

gguf-py : allow key override when adding value to GGUFWriter (#14194) Đinh Trọng Huy 2025-06-16 16:20:59 +09:00
c89c2d1ab9

vulkan: mutex around vkQueueSubmit (#14127) Jeff Bolz 2025-06-16 00:21:08 -06:00
3555b3004b

ggml-cpu : rework weak alias on apple targets (#14146) xctan 2025-06-16 13:54:15 +08:00
d7da8dc83a

model : Add support for Arcee AI's upcoming AFM model (#14185) Bartowski 2025-06-16 00:04:06 +01:00
cd355eda7d

server : When listening on a unix domain socket don't print http:// and port (#14180) Eric Curtin 2025-06-15 23:36:22 +02:00
30e5b01de2

quantize : change int to unsigned int for KV overrides (#14197) Ed Addario 2025-06-15 17:53:45 +01:00
e54b394082

CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196) uvos 2025-06-15 17:30:13 +02:00
2c2caa4443

HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183) uvos 2025-06-15 15:45:27 +02:00
5fce5f948d

kv-cache : fix use-after-move of defrag info (#14189) Georgi Gerganov 2025-06-15 10:52:11 +03:00
9ae4143bc6

model : add dots.llm1 architecture support (#14044) (#14118) Mikko Juola 2025-06-15 00:52:06 -07:00
c311ac664d

cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) Georgi Gerganov 2025-06-15 10:08:58 +03:00
b9912ac570

batch : auto-gen positions + verify multi-sequence input (#14177) Georgi Gerganov 2025-06-15 09:18:37 +03:00
00ba772610

docs : remove WIP since PR has been merged (#13912) Pepijn de Vos 2025-06-15 08:06:37 +02:00
3cb203c89f

llama-chat : Do not throw when tool parsing fails (#14012) Piotr 2025-06-14 18:25:15 +02:00
2e42be42bd

compare-llama-bench: add option to plot (#14169) Aman Gupta 2025-06-14 16:34:20 +08:00
fb85a288d7

vocab : fix build (#14175) Georgi Gerganov 2025-06-13 20:03:05 +03:00
40643edb86

sycl: fix docker image (#14144) Svetlozar Georgiev 2025-06-13 17:32:56 +01:00
3cfbbdb44e

Merge commit from fork Guy Goldenberg 2025-06-13 19:20:25 +03:00
80709b70a2

batch : add LLAMA_BATCH_DEBUG environment variable (#14172) Georgi Gerganov 2025-06-13 18:35:00 +03:00
26ff3685bf

docs : Update multimodal.md (#14122) ddpasa 2025-06-13 15:17:53 +02:00
60c666347b

batch : rework llama_batch_allocr (#14153) Georgi Gerganov 2025-06-13 13:47:55 +03:00
b7cc7745e3

readme : remove survey link (#14168) Georgi Gerganov 2025-06-13 11:55:44 +03:00
cc8d081879

cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) Christian Kastner 2025-06-13 08:38:52 +00:00
d714dadb57

pooling : make cls_b and cls_out_b optional (#14165) Đinh Trọng Huy 2025-06-13 17:34:08 +09:00
ffad043973

server : fix SWA condition for full context reprocess (#14163) Georgi Gerganov 2025-06-13 11:18:25 +03:00
0889eba570

sycl: Adding additional cpy dbg print output (#14034) Anton Mitkov 2025-06-13 08:51:39 +01:00
c61285e739

SYCL: Bump oneMath commit (#14152) Ewan Crawford 2025-06-13 08:45:37 +01:00
09cf2c7c65

cmake : Improve build-info.cpp generation (#14156) Christian Kastner 2025-06-13 06:51:34 +00:00
c33fe8b8c4

vocab : prevent heap overflow when vocab is too small (#14145) Georgi Gerganov 2025-06-13 08:03:54 +03:00
ed52f3668e

sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125) Anton Mitkov 2025-06-12 14:15:11 +01:00
a681b4ba83

readme : remove project status link (#14149) Georgi Gerganov 2025-06-12 14:43:09 +03:00
7d516443dd

server : re-enable SWA speculative decoding (#14131) Georgi Gerganov 2025-06-12 11:51:38 +03:00
f6e1a7aa87

context : simplify output counting logic during decode (#14142) Georgi Gerganov 2025-06-12 11:50:01 +03:00
c3ee46fab4

batch : remove logits_all flag (#14141) Georgi Gerganov 2025-06-12 11:49:26 +03:00
e2c0b6e46a

cmake : handle whitepsaces in path during metal build (#14126) Georgi Gerganov 2025-06-12 10:14:24 +03:00