Commit graph

  • abd4d0bc4f
    speculative : update default params (#11954) Georgi Gerganov 2025-02-19 13:29:42 +02:00
  • 9626d9351a
    llama : fix indentation in llama-grammar [no ci] (#11943) Daniel Bevenius 2025-02-19 06:16:23 +01:00
  • b58934c183
    server : (webui) Enable communication with parent html (if webui is in iframe) (#11940) igardev 2025-02-19 00:01:44 +02:00
  • 63e489c025
    tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900) Olivier Chafik 2025-02-18 18:03:23 +00:00
  • 63ac128563
    server : add TEI API format for /rerank endpoint (#11942) Xuan-Son Nguyen 2025-02-18 14:21:41 +01:00
  • 5137da7b8c
    scripts: corrected encoding when getting chat template (#11866) (#11907) MoonRide303 2025-02-18 10:30:16 +01:00
  • 09aaf4f1f5
    docs : Fix duplicated file extension in test command (#11935) xiaobing318 2025-02-18 17:12:49 +08:00
  • 73e2ed3ce3
    CUDA: use async data loading for FlashAttention (#11894) Johannes Gäßler 2025-02-17 14:03:24 +01:00
  • f7b1116af1
    update release requirements (#11897) Eve 2025-02-17 11:20:23 +00:00
  • c4d29baf32
    server : fix divide-by-zero in metrics reporting (#11915) Antoine Viallon 2025-02-17 11:25:12 +01:00
  • 2eea03d86a
    vulkan: implement several ops relevant for ggml_opt (#11769) Rémy O 2025-02-17 07:55:57 +01:00
  • 0f2bbe6564
    server : bump httplib to 0.19.0 (#11908) Xuan-Son Nguyen 2025-02-16 18:11:22 +01:00
  • fe163d5bf3
    common : Fix a typo in help (#11899) standby24x7 2025-02-16 18:51:13 +09:00
  • 818a340ea8
    ci : fix (again) arm64 build fails (#11895) Xuan-Son Nguyen 2025-02-16 10:36:39 +01:00
  • bf42a23d0a
    vulkan: support multi/vision rope, and noncontiguous rope (#11902) Jeff Bolz 2025-02-16 01:52:23 -06:00
  • c2ea16f260
    metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904) Hale Chan 2025-02-16 14:50:26 +08:00
  • 6dde178248
    scripts: fix compare-llama-bench commit hash logic (#11891) Johannes Gäßler 2025-02-15 20:23:22 +01:00
  • fc10c38ded
    examples: fix typo in imatrix/README.md (#11884) 708-145 2025-02-15 20:03:30 +01:00
  • 22885105a6
    metal : optimize dequant q6_K kernel (#11892) Adrian Kretz 2025-02-15 19:39:20 +01:00
  • c2cd24fbfd
    readme : add notice about new package registry (#11890) Georgi Gerganov 2025-02-15 20:29:56 +02:00
  • 68ff663a04
    repo : update links to new url (#11886) Georgi Gerganov 2025-02-15 16:40:57 +02:00
  • f355229692
    server: fix type promotion typo causing crashes w/ --jinja w/o tools (#11880) Olivier Chafik 2025-02-15 10:11:36 +00:00
  • fc1b0d0936
    vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528) Rémy O 2025-02-15 09:01:40 +01:00
  • 89daa2564f
    llguidance build fixes for Windows (#11664) Michał Moskal 2025-02-14 12:46:08 -08:00
  • 300907b211
    opencl: Fix rope and softmax (#11833) lhez 2025-02-14 11:12:23 -08:00
  • 94b87f87b5
    cuda : add ampere to the list of default architectures (#11870) Diego Devesa 2025-02-14 15:33:52 +01:00
  • dbc2ec59b5
    docker : drop to CUDA 12.4 (#11869) Georgi Gerganov 2025-02-14 14:48:40 +02:00
  • 3d68f034da
    llama : add completion for --chat-template-file (#11860) Daniel Bevenius 2025-02-14 11:16:56 +01:00
  • 38e32eb6a0
    ggml: optimize some vec dot functions for LoongArch ASX (#11842) Jinyang He 2025-02-14 16:54:27 +08:00
  • a4f011e8d0
    vulkan: linux builds + small subgroup size fixes (#11767) Eve 2025-02-14 02:59:40 +00:00
  • a7b8ce2260
    llama-bench : fix unexpected global variable initialize sequence issue (#11832) theraininsky 2025-02-14 09:13:43 +08:00
  • 04045bb842
    readme : minor Georgi Gerganov 2025-02-14 00:16:56 +02:00
  • 8a8c4ceb60
    llamafile: use member variable instead of constant for iq4nlt (#11780) Jeffrey Morgan 2025-02-13 09:05:04 -08:00
  • c1f958c038
    server : (docs) Update wrong tool calling example (#11809) Reza Rahemtola 2025-02-13 17:22:44 +01:00
  • c48f630d1c
    llama : add --completion-bash option (#11846) Daniel Bevenius 2025-02-13 14:46:59 +01:00
  • bd6e55bfd3
    musa: bump MUSA SDK version to rc3.1.1 (#11822) R0CKSTAR 2025-02-13 20:28:18 +08:00
  • c7f460ab88
    server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607) Olivier Chafik 2025-02-13 10:05:16 +00:00
  • 27e8a23300
    sampling: add Top-nσ sampler (#11223) Vinesh Janarthanan 2025-02-13 00:45:57 -06:00
  • e4376270d9
    llama.cpp: fix warning message (#11839) Oleksandr Kuvshynov 2025-02-13 01:25:34 -05:00
  • 3e69319772
    llama : update llama_decode_internal ref [no ci] (#11840) Daniel Bevenius 2025-02-13 07:07:51 +01:00
  • a394039db0
    ggml-cpu : add chunking support to mul_mat_id (#11666) Diego Devesa 2025-02-13 01:02:38 +01:00
  • be3bbd6215
    ggml : x2 speed for WASM by optimizing SIMD (#11453) Xuan-Son Nguyen 2025-02-13 00:33:45 +01:00
  • 31afcbee0e
    server : (webui) Give copy button back to all message bubbles (#11814) Woof Dog 2025-02-12 22:47:11 +00:00
  • 5c4284d57b
    HIP: Remove GCN from list of devices that avoid MMQ (#11831) uvos 2025-02-12 22:25:28 +01:00
  • bfd11a2344
    Fix: Compile failure due to Microsoft STL breaking change (#11836) JC 2025-02-12 20:36:11 +00:00
  • 0fb77f821f
    sync : ggml Georgi Gerganov 2025-02-12 21:46:02 +02:00
  • e598697d63
    HIP: Switch to std::vector in rocblas version check (#11820) uvos 2025-02-12 17:25:03 +01:00
  • fef0cbeadf
    cleanup: fix compile warnings associated with gnu_printf (#11811) bandoti 2025-02-12 10:06:53 -04:00
  • 748ee9fe93
    ggml : fix multi-threaded clamp_f32 (#11824) Richard 2025-02-12 13:57:33 +00:00
  • 198b1ec611
    ggml-cpu: Fix duplicate MATMUL_INT8 (#11817) Weizhao Ouyang 2025-02-12 20:22:58 +08:00
  • c3d6af7cd2
    CUDA: fix CUDART_VERSION checks (#11821) Johannes Gäßler 2025-02-12 13:16:39 +01:00
  • 369be5598a
    llama : fix typo in llama-grammar.h [no ci] (#11816) Daniel Bevenius 2025-02-12 08:40:01 +01:00
  • 4078c77f98
    docs: add OpenCL (#11697) lhez 2025-02-11 14:04:13 -08:00
  • 90e4dba461
    Fix #11802: Compile bug - RegQueryValueExA changed to RegQueryValueEx (#11803) Sheldon Robinson 2025-02-11 10:55:45 -05:00
  • a18f481f99
    server : use common_token_to_piece instead of common_detokenize (#11740) Daniel Bevenius 2025-02-11 14:06:45 +01:00
  • b9ab0a4d0b
    CUDA: use arch list for compatibility check (#11775) Johannes Gäßler 2025-02-11 00:17:22 +01:00
  • 7b891bdc86
    fix: typos in documentation files (#11791) Maxim Evtush 2025-02-10 23:21:31 +01:00
  • 81732619fd
    docs: utilize the forward slash (/) as the path separator for Unix-like systems (#11770) jason_w 2025-02-11 06:17:48 +08:00
  • 507f9174fe
    server : (webui) introduce conversation branching + idb storage (#11792) Xuan-Son Nguyen 2025-02-10 21:23:17 +01:00
  • 19b392d58d
    llama-mmap: fix missing include (#11796) Wilken Gottwalt 2025-02-10 19:58:18 +01:00
  • 0893e0114e
    server : correct signal handler (#11795) Xuan-Son Nguyen 2025-02-10 18:03:28 +01:00
  • d7b31a9d84
    sync: minja (a72057e519) (#11774) Olivier Chafik 2025-02-10 09:34:09 +00:00
  • 9ac3457b39
    Update README.md [no ci] (#11781) pascal-lc 2025-02-10 16:05:57 +08:00
  • c2a67efe38
    vulkan: Make Vulkan optional at runtime (#11493). (#11494) Danny Milosavljevic 2025-02-10 07:17:21 +01:00
  • b044a0fe3c
    vulkan: add environment variable GGML_VK_PREFER_HOST_MEMORY to avoid VRAM allocation (#11592) Wagner Bruna 2025-02-10 03:08:22 -03:00
  • 19d3c8293b
    There's a better way of clearing lines (#11756) Eric Curtin 2025-02-09 10:34:49 +00:00
  • 98f6b0fd1e
    vulkan: account for lookup tables when checking shared memory size (#11502) Jeff Bolz 2025-02-09 01:43:51 -06:00
  • 55ac8c7791
    server : (webui) revamp Settings dialog, add Pyodide interpreter (#11759) Xuan-Son Nguyen 2025-02-08 21:54:50 +01:00
  • e6e6583199
    server : (webui) increase edit textarea size (#11763) Woof Dog 2025-02-08 19:09:55 +00:00
  • aaa5505307
    server : minor log updates (#11760) Georgi Gerganov 2025-02-08 18:08:43 +02:00
  • bdcf8b6a56
    cont : fix mmap flag print (#11699) Georgi Gerganov 2025-02-08 16:49:38 +02:00
  • 4d3465c5ae
    ggml: Fix data race in ggml threadpool (#11736) Karol Kontny 2025-02-08 15:30:53 +01:00
  • d80be897ac
    CUDA: fix min. version for movmatrix (#11751) Johannes Gäßler 2025-02-08 10:46:07 +01:00
  • 3ab410f55f
    readme : update front-end framework (#11753) Nikolaos Pothitos 2025-02-08 11:43:04 +02:00
  • 0cf867160c
    server : (webui) fix numeric settings being saved as string (#11739) Xuan-Son Nguyen 2025-02-08 10:42:34 +01:00
  • d2fe216fb2
    Make logging more verbose (#11714) Eric Curtin 2025-02-07 14:42:46 +00:00
  • ed926d8833
    llama : fix defrag logic (#11707) Georgi Gerganov 2025-02-07 16:05:34 +02:00
  • 2d219b389e
    vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729) Christian Fillion 2025-02-07 08:55:47 -05:00
  • 333820d749
    llama : fix progress dots (#11730) magicse 2025-02-07 15:48:47 +02:00
  • c026ba3c23
    vulkan: print shared memory size (#11719) Jeff Bolz 2025-02-07 04:26:03 -06:00
  • 7ee953a64a
    llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727) Christian Fillion 2025-02-07 04:33:27 -05:00
  • ec3bc8270b
    SYCL: remove XMX info from print devices (#11712) Akarshan Biswas 2025-02-07 14:57:53 +05:30
  • b7552cfcbc
    common : add default embeddings presets (#11677) Daniel Bevenius 2025-02-07 09:15:22 +01:00
  • 225bbbfa39
    ggml : optimize and build warning fix for LoongArch (#11709) Jinyang He 2025-02-07 15:38:31 +08:00
  • 855cd0734a
    llama : fix old glm4 models (#11670) tv1wnd 2025-02-06 22:48:51 +01:00
  • 8a59053f63
    sync : ggml Georgi Gerganov 2025-02-06 21:23:03 +02:00
  • 1d20e53c40
    rpc: fix known RCE in rpc-server (ggml/1103) Patrick Peng 2025-02-06 09:29:13 -05:00
  • 2fb3c32a16
    server : (webui) migrate project to ReactJS with typescript (#11688) Xuan-Son Nguyen 2025-02-06 17:32:29 +01:00
  • 9ab42dc722
    docs: update fedora cuda guide for 12.8 release (#11393) Tei Home 2025-02-06 20:16:15 +08:00
  • 194b2e69f8
    SYCL: Adjust support condition for norm operators (#11674) Akarshan Biswas 2025-02-06 17:12:35 +05:30
  • 9dd7a0390f
    llama : add log about loading model tensors (#11699) Georgi Gerganov 2025-02-06 13:41:37 +02:00
  • c0d4843225
    build : fix llama.pc (#11658) Adrien Gallouët 2025-02-06 12:08:13 +01:00
  • 8d4d2be143
    ggml : fix LoongArch compile error with 128-bit SIMD (#11701) junchao-zhao 2025-02-06 17:20:00 +08:00
  • 2c6c8df56d
    vulkan: optimize coopmat2 iq2/iq3 callbacks (#11521) Jeff Bolz 2025-02-06 00:15:30 -06:00
  • 8a7e3bf17a
    vulkan: initial support for IQ4_XS quantization (#11501) Rémy O 2025-02-06 07:09:59 +01:00
  • 1b598b3058
    vulkan: use smaller combined allocations to avoid fragmentation (#11551) Jeff Bolz 2025-02-06 00:02:18 -06:00
  • 902368a06b
    metal : avoid breaking build when metal API predates TARGET_OS_VISION (#11690) Charles Duffy 2025-02-05 19:52:31 -06:00
  • c3db0480bb
    readme : add link to Autopen under UIs (#11684) Matvey Soloviev 2025-02-06 01:55:25 +01:00
  • d774ab3acc
    metal : adjust support conditions for norm operators (#11671) Georgi Gerganov 2025-02-05 10:57:42 +02:00
  • fa62da9b2d
    CUDA: support for mat. mul. with ne03 != ne13 (#11656) Johannes Gäßler 2025-02-05 08:58:31 +01:00