llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

1af6945eb0

cmake : avoid -march=native when reproducible build is wanted (#11366) Bernhard M. Wiedemann 2025-01-24 12:21:35 +01:00
01f37edf1a

Update llama-run README.md (#11386) Eric Curtin 2025-01-24 09:39:24 +00:00
c07e87f38b

server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364) stduhpf 2025-01-24 09:02:38 +01:00
564804b79b

tests: fix some mul_mat test gaps (#11375) Jeff Bolz 2025-01-23 14:51:24 -06:00
05f63cc9ee

Update documentation (#11373) Eric Curtin 2025-01-23 20:04:31 +00:00
f7fb43cd0b

Add -ngl (#11372) Eric Curtin 2025-01-23 16:16:18 +00:00
5845661640

server : add more clean up when cancel_tasks is called (#11340) Xuan Son Nguyen 2025-01-23 13:56:05 +01:00
f211d1dc10

Treat hf.co/ prefix the same as hf:// (#11350) Eric Curtin 2025-01-23 10:38:20 +00:00
955a6c2d91

Vulkan-run-test: fix mmq_wg_denoms (#11343) amd-dwang 2025-01-23 15:14:28 +08:00
1971adf55e

vulkan: sort shaders for more deterministic binary (#11315) Jeff Bolz 2025-01-23 01:07:50 -06:00
5245729e33

vulkan: fix diag_mask_inf (#11323) Jeff Bolz 2025-01-23 01:01:17 -06:00
6152129d05

main : update README documentation for batch size (#11353) Diego Devesa 2025-01-22 19:22:20 +01:00
16d3df7ab0

readme : add plugin links (#11355) Georgi Gerganov 2025-01-22 19:44:26 +02:00
12c2bdf2de

server : fix draft context not being released (#11354) Diego Devesa 2025-01-22 17:44:40 +01:00
c64d2becb1

minja: sync at 0f5f7f2b37 (#11352) Olivier Chafik 2025-01-22 16:16:27 +00:00
96f4053934

Adding logprobs to /v1/completions (#11344) Jiří Podivín 2025-01-22 12:51:32 +01:00
a94f3b2727

common: utils to split / join / repeat strings (from json converter) (#11342) Olivier Chafik 2025-01-22 09:51:44 +00:00
3e3357fd77

llava : support Minicpm-omni (#11289) tc-mb 2025-01-22 15:35:48 +08:00
6171c9d258

Add Jinja template support (#11016) Olivier Chafik 2025-01-21 13:18:51 +00:00
e28245f35f

export-lora : fix tok_embd tensor (#11330) Xuan Son Nguyen 2025-01-21 14:07:12 +01:00
6da5bec81c

rpc : better caching of the base buffer pointer (#11331) Radoslav Gerganov 2025-01-21 15:06:41 +02:00
2e2f8f093c

linenoise.cpp refactoring (#11301) Eric Curtin 2025-01-21 09:32:35 +00:00
2139667ec4

metal : fix out-of-bounds write (#11314) Georgi Gerganov 2025-01-21 08:48:13 +02:00
80d0d6b4b7

common : add -hfd option for the draft model (#11318) Georgi Gerganov 2025-01-20 22:29:43 +02:00
aea8ddd516

vulkan: fix coopmat2 validation failures (#11284) Jeff Bolz 2025-01-20 10:38:32 -06:00
9f7add1cde

examples : fix add_special conditions (#11311) Georgi Gerganov 2025-01-20 16:36:08 +02:00
90d987b105

mmap: add include for cerrno (#11296) Christopher Nielsen 2025-01-20 09:02:43 -05:00
a4251edd6f

cmake: fix shell command quoting in build-info script (#11309) Michael Podvitskiy 2025-01-20 15:02:15 +01:00
ec7f3ac9ab

llama : add support for Deepseek-R1-Qwen distill model (#11310) Xuan Son Nguyen 2025-01-20 14:35:07 +01:00
ef6dada60c

cont : fix whitespaces (#11305) Georgi Gerganov 2025-01-20 09:29:32 +02:00
ae3c1db2f9

llama : re-add LLM_ARCH_PHIMOE (#11305) Kyle Bruene 2025-01-20 01:21:01 -06:00
92bc493917

tests : increase timeout when sanitizers are enabled (#11300) Georgi Gerganov 2025-01-19 20:22:30 +02:00
b9daaffe02

simple-chat : fix BOS being added to each message (#11278) Georgi Gerganov 2025-01-19 18:12:09 +02:00
99487b57d4

SYCL: Introducing memory host pool (#11251) Nicolò Scipione 2025-01-19 14:33:34 +01:00
a1649cc13f

Adding linenoise.cpp to llama-run (#11252) Eric Curtin 2025-01-18 14:42:31 +00:00
4dd34ff831

cmake : add sanitizer flags for llama.cpp (#11279) Georgi Gerganov 2025-01-18 16:18:15 +02:00
f30f099228

server : implement cancellable request (#11285) Xuan Son Nguyen 2025-01-18 14:12:05 +01:00
f26c874179

scripts : restore hf.sh (#11288) Georgi Gerganov 2025-01-18 13:18:32 +02:00
6390a998bf

tts : add guide tokens support (#11186) LostRuins Concedo 2025-01-18 18:20:57 +08:00
44e18ef939

vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281) Jeff Bolz 2025-01-18 02:26:50 -06:00
3edfa7d375

llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) codezjx 2025-01-17 20:57:56 +08:00
667d72846c

rpc : early register backend devices (#11262) Radoslav Gerganov 2025-01-17 10:57:09 +02:00
a133566d34

vocab : fix double-eos check (#11273) Georgi Gerganov 2025-01-17 09:28:00 +02:00
960ec65273

llama : fix deprecation message: vocabable -> vocab (#11269) David Renshaw 2025-01-17 02:12:01 -05:00
7a689c415e

README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
bd38ddea01

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) Jeff Bolz 2025-01-16 15:47:10 -06:00
466300fe14

vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
206bc53422

vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
4dbc8b9cb7

llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00
9c8dcefe17

CUDA: backwards pass for misc. ops, add tests (#11257) Johannes Gäßler 2025-01-16 16:43:38 +01:00
681149ced2

llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
c67cc9837d

ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227) fj-y-saito 2025-01-16 18:11:49 +09:00
adc5dd92e8

vulkan: scale caching for k quants + misc fixes (#11081) Eve 2025-01-15 19:50:13 +00:00
f11cfdfd7f

ci : use -no-cnv in gguf-split tests (#11254) Georgi Gerganov 2025-01-15 18:28:35 +02:00
1d8504338e

fix: ggml: fix vulkan-shaders-gen build (#10448) Junil Kim 2025-01-15 22:17:42 +09:00
432df2d5f9

RoPE: fix back, CUDA support for back + noncont. (#11240) Johannes Gäßler 2025-01-15 12:51:37 +01:00
0ccd7f3eb2

examples : add embd_to_audio to tts-outetts.py [no ci] (#11235) Daniel Bevenius 2025-01-15 05:44:38 +01:00
f446c2cf6a

SYCL: Add gated linear attention kernel (#11175) Akarshan Biswas 2025-01-15 08:50:17 +05:30
b4d92a59a2

ci : add -no-cnv for tests (#11238) Xuan Son Nguyen 2025-01-14 15:42:23 +01:00
bbf3e55e35

vocab : add dummy tokens for "no_vocab" type (#11231) Georgi Gerganov 2025-01-14 12:54:58 +02:00
c5bf0d1bd7

server : Improve code snippets direction between RTL text (#11221) ebraminio 2025-01-14 14:09:33 +03:30
091592d758

Refactor test-chat-template.cpp (#11224) Olivier Chafik 2025-01-14 10:16:41 +00:00
44d1e796d0

sync : ggml Georgi Gerganov 2025-01-14 10:39:42 +02:00
a4f3f5d8e6

scripts : sync gguf (cont) Georgi Gerganov 2025-01-14 09:40:15 +02:00
48e1ae0e61

scripts : sync gguf Georgi Gerganov 2025-01-14 09:36:58 +02:00
d00a80e89d

scripts : sync opencl Georgi Gerganov 2025-01-14 09:19:58 +02:00
504af20ee4

server : (UI) Improve messages bubble shape in RTL (#11220) ebraminio 2025-01-13 22:53:31 +03:30
84a44815f7

cli : auto activate conversation mode if chat template is available (#11214) Xuan Son Nguyen 2025-01-13 20:18:12 +01:00
39509fb082

cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042) Andreas Kieslinger 2025-01-13 16:45:53 +01:00
a29f0870d4

contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:59:26 +02:00
437e05f714

server : (UI) Support for RTL text as models input or output (#11208) ebraminio 2025-01-13 17:16:39 +03:30
ca001f6656

contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:08:44 +02:00
00b4c3da62

common : support tag-based --hf-repo like on ollama (#11195) Xuan Son Nguyen 2025-01-13 13:56:23 +01:00
7426a26b24

contrib : add naming guidelines (#11177) Georgi Gerganov 2025-01-13 14:46:36 +02:00
8f70fc3d1b

llama : remove 'd' from bad special token log (#11212) Daniel Bevenius 2025-01-13 13:38:20 +01:00
1244cdcf14

ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211) Radoslav Gerganov 2025-01-13 13:31:41 +02:00
924518e2e5

Reset color before we exit (#11205) Eric Curtin 2025-01-12 18:23:10 +00:00
9a483999a6

llama : fix chat template gguf key (#11201) Xuan Son Nguyen 2025-01-12 13:45:14 +01:00
08f10f69c3

llama : remove notion of CLS token (#11064) Georgi Gerganov 2025-01-12 12:15:53 +02:00
afa8a9ec9b

llama : add llama_vocab, functions -> methods, naming (#11110) Georgi Gerganov 2025-01-12 11:32:42 +02:00
c05e8c9934

gguf-py: fixed local detection of gguf package (#11180) Vinesh Janarthanan 2025-01-11 03:42:31 -06:00
2739a71e4b

convert : sort print supported models [no ci] (#11179) Daniel Bevenius 2025-01-11 05:50:33 +01:00
ba8a1f9c5b

examples : add README.md to tts example [no ci] (#11155) Daniel Bevenius 2025-01-10 13:16:16 +01:00
ff3fcabc72

convert : add --print-supported-models option (#11172) Daniel Bevenius 2025-01-10 11:30:53 +01:00
c3f9d25706

Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161) 0cc4m 2025-01-10 06:39:33 +01:00
ee7136c6d1

llama: add support for QRWKV6 model architecture (#11001) Molly Sophia 2025-01-10 09:58:08 +08:00
c6860cc734

SYCL: Refactor ggml_sycl_compute_forward (#11121) Akarshan Biswas 2025-01-10 05:43:03 +05:30
1204f97270

doc: add cuda guide for fedora (#11135) Tei Home 2025-01-09 19:32:06 +08:00
8eceb888d7

server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
f8feb4b01a

model: Add support for PhiMoE arch (#11003) Pierrick Hymbert 2025-01-09 11:21:41 +01:00
be0e950c91

media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
d9feae1c06

llama-chat : add phi 4 template (#11148) Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
8d59d91171

fix: add missing msg in static_assert (#11143) hydai 2025-01-09 04:03:28 +08:00
8a1d9c25fa

gguf-py : move scripts directory (#11116) Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
1bf839b1e8

Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
f7cd13301c

ci : use actions from ggml-org (#11140) Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
4d2b3d8804

lora : improve compat with mergekit-extract-lora (#11131) Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
c07d437bbd

llama : avoid hardcoded QK_K (#11061) Georgi Gerganov 2025-01-08 16:19:36 +02:00
99a3755a3c

sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
c792dcf488

ggml : allow loading backend with env variable (ggml/1059) Radoslav Gerganov 2025-01-05 09:50:37 +02:00