llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

abd4d0bc4f

speculative : update default params (#11954) Georgi Gerganov 2025-02-19 13:29:42 +02:00
9626d9351a

llama : fix indentation in llama-grammar [no ci] (#11943) Daniel Bevenius 2025-02-19 06:16:23 +01:00
b58934c183

server : (webui) Enable communication with parent html (if webui is in iframe) (#11940) igardev 2025-02-19 00:01:44 +02:00
63e489c025

tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900) Olivier Chafik 2025-02-18 18:03:23 +00:00
63ac128563

server : add TEI API format for /rerank endpoint (#11942) Xuan-Son Nguyen 2025-02-18 14:21:41 +01:00
5137da7b8c

scripts: corrected encoding when getting chat template (#11866) (#11907) MoonRide303 2025-02-18 10:30:16 +01:00
09aaf4f1f5

docs : Fix duplicated file extension in test command (#11935) xiaobing318 2025-02-18 17:12:49 +08:00
73e2ed3ce3

CUDA: use async data loading for FlashAttention (#11894) Johannes Gäßler 2025-02-17 14:03:24 +01:00
f7b1116af1

update release requirements (#11897) Eve 2025-02-17 11:20:23 +00:00
c4d29baf32

server : fix divide-by-zero in metrics reporting (#11915) Antoine Viallon 2025-02-17 11:25:12 +01:00
2eea03d86a

vulkan: implement several ops relevant for ggml_opt (#11769) Rémy O 2025-02-17 07:55:57 +01:00
0f2bbe6564

server : bump httplib to 0.19.0 (#11908) Xuan-Son Nguyen 2025-02-16 18:11:22 +01:00
fe163d5bf3

common : Fix a typo in help (#11899) standby24x7 2025-02-16 18:51:13 +09:00
818a340ea8

ci : fix (again) arm64 build fails (#11895) Xuan-Son Nguyen 2025-02-16 10:36:39 +01:00
bf42a23d0a

vulkan: support multi/vision rope, and noncontiguous rope (#11902) Jeff Bolz 2025-02-16 01:52:23 -06:00
c2ea16f260

metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904) Hale Chan 2025-02-16 14:50:26 +08:00
6dde178248

scripts: fix compare-llama-bench commit hash logic (#11891) Johannes Gäßler 2025-02-15 20:23:22 +01:00
fc10c38ded

examples: fix typo in imatrix/README.md (#11884) 708-145 2025-02-15 20:03:30 +01:00
22885105a6

metal : optimize dequant q6_K kernel (#11892) Adrian Kretz 2025-02-15 19:39:20 +01:00
c2cd24fbfd

readme : add notice about new package registry (#11890) Georgi Gerganov 2025-02-15 20:29:56 +02:00
68ff663a04

repo : update links to new url (#11886) Georgi Gerganov 2025-02-15 16:40:57 +02:00
f355229692

server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880) Olivier Chafik 2025-02-15 10:11:36 +00:00
fc1b0d0936

vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528) Rémy O 2025-02-15 09:01:40 +01:00
89daa2564f

llguidance build fixes for Windows (#11664) Michał Moskal 2025-02-14 12:46:08 -08:00
300907b211

opencl: Fix rope and softmax (#11833) lhez 2025-02-14 11:12:23 -08:00
94b87f87b5

cuda : add ampere to the list of default architectures (#11870) Diego Devesa 2025-02-14 15:33:52 +01:00
dbc2ec59b5

docker : drop to CUDA 12.4 (#11869) Georgi Gerganov 2025-02-14 14:48:40 +02:00
3d68f034da

llama : add completion for --chat-template-file (#11860) Daniel Bevenius 2025-02-14 11:16:56 +01:00
38e32eb6a0

ggml: optimize some vec dot functions for LoongArch ASX (#11842) Jinyang He 2025-02-14 16:54:27 +08:00
a4f011e8d0

vulkan: linux builds + small subgroup size fixes (#11767) Eve 2025-02-14 02:59:40 +00:00
a7b8ce2260

llama-bench : fix unexpected global variable initialize sequence issue (#11832) theraininsky 2025-02-14 09:13:43 +08:00
04045bb842

readme : minor Georgi Gerganov 2025-02-14 00:16:56 +02:00
8a8c4ceb60

llamafile: use member variable instead of constant for iq4nlt (#11780) Jeffrey Morgan 2025-02-13 09:05:04 -08:00
c1f958c038

server : (docs) Update wrong tool calling example (#11809) Reza Rahemtola 2025-02-13 17:22:44 +01:00
c48f630d1c

llama : add --completion-bash option (#11846) Daniel Bevenius 2025-02-13 14:46:59 +01:00
bd6e55bfd3

musa: bump MUSA SDK version to rc3.1.1 (#11822) R0CKSTAR 2025-02-13 20:28:18 +08:00
c7f460ab88

server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607) Olivier Chafik 2025-02-13 10:05:16 +00:00
27e8a23300

sampling: add Top-nσ sampler (#11223) Vinesh Janarthanan 2025-02-13 00:45:57 -06:00
e4376270d9

llama.cpp: fix warning message (#11839) Oleksandr Kuvshynov 2025-02-13 01:25:34 -05:00
3e69319772

llama : update llama_decode_internal ref [no ci] (#11840) Daniel Bevenius 2025-02-13 07:07:51 +01:00
a394039db0

ggml-cpu : add chunking support to mul_mat_id (#11666) Diego Devesa 2025-02-13 01:02:38 +01:00
be3bbd6215

ggml : x2 speed for WASM by optimizing SIMD (#11453) Xuan-Son Nguyen 2025-02-13 00:33:45 +01:00
31afcbee0e

server : (webui) Give copy button back to all message bubbles (#11814) Woof Dog 2025-02-12 22:47:11 +00:00
5c4284d57b

HIP: Remove GCN from list of devices that avoid MMQ (#11831) uvos 2025-02-12 22:25:28 +01:00
bfd11a2344

Fix: Compile failure due to Microsoft STL breaking change (#11836) JC 2025-02-12 20:36:11 +00:00
0fb77f821f

sync : ggml Georgi Gerganov 2025-02-12 21:46:02 +02:00
e598697d63

HIP: Switch to std::vector in rocblas version check (#11820) uvos 2025-02-12 17:25:03 +01:00
fef0cbeadf

cleanup: fix compile warnings associated with gnu_printf (#11811) bandoti 2025-02-12 10:06:53 -04:00
748ee9fe93

ggml : fix multi-threaded clamp_f32 (#11824) Richard 2025-02-12 13:57:33 +00:00
198b1ec611

ggml-cpu: Fix duplicate MATMUL_INT8 (#11817) Weizhao Ouyang 2025-02-12 20:22:58 +08:00
c3d6af7cd2

CUDA: fix CUDART_VERSION checks (#11821) Johannes Gäßler 2025-02-12 13:16:39 +01:00
369be5598a

llama : fix typo in llama-grammar.h [no ci] (#11816) Daniel Bevenius 2025-02-12 08:40:01 +01:00
4078c77f98

docs: add OpenCL (#11697) lhez 2025-02-11 14:04:13 -08:00
90e4dba461

Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803) Sheldon Robinson 2025-02-11 10:55:45 -05:00
a18f481f99

server : use common_token_to_piece instead of common_detokenize (#11740) Daniel Bevenius 2025-02-11 14:06:45 +01:00
b9ab0a4d0b

CUDA: use arch list for compatibility check (#11775) Johannes Gäßler 2025-02-11 00:17:22 +01:00
7b891bdc86

fix: typos in documentation files (#11791) Maxim Evtush 2025-02-10 23:21:31 +01:00
81732619fd

docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770) jason_w 2025-02-11 06:17:48 +08:00
507f9174fe

server : (webui) introduce conversation branching + idb storage (#11792) Xuan-Son Nguyen 2025-02-10 21:23:17 +01:00
19b392d58d

llama-mmap: fix missing include (#11796) Wilken Gottwalt 2025-02-10 19:58:18 +01:00
0893e0114e

server : correct signal handler (#11795) Xuan-Son Nguyen 2025-02-10 18:03:28 +01:00
d7b31a9d84

sync: minja (a72057e519) (#11774) Olivier Chafik 2025-02-10 09:34:09 +00:00
9ac3457b39

Update README.md [no ci] (#11781) pascal-lc 2025-02-10 16:05:57 +08:00
c2a67efe38

vulkan: Make Vulkan optional at runtime (#11493). (#11494) Danny Milosavljevic 2025-02-10 07:17:21 +01:00
b044a0fe3c

vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) Wagner Bruna 2025-02-10 03:08:22 -03:00
19d3c8293b

There's a better way of clearing lines (#11756) Eric Curtin 2025-02-09 10:34:49 +00:00
98f6b0fd1e

vulkan: account for lookup tables when checking shared memory size (#11502) Jeff Bolz 2025-02-09 01:43:51 -06:00
55ac8c7791

server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759) Xuan-Son Nguyen 2025-02-08 21:54:50 +01:00
e6e6583199

server : (webui) increase edit textarea size (#11763) Woof Dog 2025-02-08 19:09:55 +00:00
aaa5505307

server : minor log updates (#11760) Georgi Gerganov 2025-02-08 18:08:43 +02:00
bdcf8b6a56

cont : fix mmap flag print (#11699) Georgi Gerganov 2025-02-08 16:49:38 +02:00
4d3465c5ae

ggml: Fix data race in ggml threadpool (#11736) Karol Kontny 2025-02-08 15:30:53 +01:00
d80be897ac

CUDA: fix min. version for movmatrix (#11751) Johannes Gäßler 2025-02-08 10:46:07 +01:00
3ab410f55f

readme : update front-end framework (#11753) Nikolaos Pothitos 2025-02-08 11:43:04 +02:00
0cf867160c

server : (webui) fix numeric settings being saved as string (#11739) Xuan-Son Nguyen 2025-02-08 10:42:34 +01:00
d2fe216fb2

Make logging more verbose (#11714) Eric Curtin 2025-02-07 14:42:46 +00:00
ed926d8833

llama : fix defrag logic (#11707) Georgi Gerganov 2025-02-07 16:05:34 +02:00
2d219b389e

vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729) Christian Fillion 2025-02-07 08:55:47 -05:00
333820d749

llama : fix progress dots (#11730) magicse 2025-02-07 15:48:47 +02:00
c026ba3c23

vulkan: print shared memory size (#11719) Jeff Bolz 2025-02-07 04:26:03 -06:00
7ee953a64a

llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727) Christian Fillion 2025-02-07 04:33:27 -05:00
ec3bc8270b

SYCL: remove XMX info from print devices (#11712) Akarshan Biswas 2025-02-07 14:57:53 +05:30
b7552cfcbc

common : add default embeddings presets (#11677) Daniel Bevenius 2025-02-07 09:15:22 +01:00
225bbbfa39

ggml : optimize and build warning fix for LoongArch (#11709) Jinyang He 2025-02-07 15:38:31 +08:00
855cd0734a

llama : fix old glm4 models (#11670) tv1wnd 2025-02-06 22:48:51 +01:00
8a59053f63

sync : ggml Georgi Gerganov 2025-02-06 21:23:03 +02:00
1d20e53c40

rpc: fix known RCE in rpc-server (ggml/1103) Patrick Peng 2025-02-06 09:29:13 -05:00
2fb3c32a16

server : (webui) migrate project to ReactJS with typescript (#11688) Xuan-Son Nguyen 2025-02-06 17:32:29 +01:00
9ab42dc722

docs: update fedora cuda guide for 12.8 release (#11393) Tei Home 2025-02-06 20:16:15 +08:00
194b2e69f8

SYCL: Adjust support condition for norm operators (#11674) Akarshan Biswas 2025-02-06 17:12:35 +05:30
9dd7a0390f

llama : add log about loading model tensors (#11699) Georgi Gerganov 2025-02-06 13:41:37 +02:00
c0d4843225

build : fix llama.pc (#11658) Adrien Gallouët 2025-02-06 12:08:13 +01:00
8d4d2be143

ggml : fix LoongArch compile error with 128-bit SIMD (#11701) junchao-zhao 2025-02-06 17:20:00 +08:00
2c6c8df56d

vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521) Jeff Bolz 2025-02-06 00:15:30 -06:00
8a7e3bf17a

vulkan: initial support for IQ4_XS quantization (#11501) Rémy O 2025-02-06 07:09:59 +01:00
1b598b3058

vulkan: use smaller combined allocations to avoid fragmentation (#11551) Jeff Bolz 2025-02-06 00:02:18 -06:00
902368a06b

metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690) Charles Duffy 2025-02-05 19:52:31 -06:00
c3db0480bb

readme : add link to Autopen under UIs (#11684) Matvey Soloviev 2025-02-06 01:55:25 +01:00
d774ab3acc

metal : adjust support conditions for norm operators (#11671) Georgi Gerganov 2025-02-05 10:57:42 +02:00
fa62da9b2d

CUDA: support for mat. mul. with ne03 != ne13 (#11656) Johannes Gäßler 2025-02-05 08:58:31 +01:00