llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

05c3a444b8

server : fill usage info in embeddings and rerank responses (#10852) krystiancha 2024-12-17 16:00:24 +00:00
382bc7f2e8

llama : add Falcon3 support (#10864) Billel Mokeddem 2024-12-17 19:24:56 +04:00
4f51968aca

readme : update typos (#10863) Ruan 2024-12-17 17:47:20 +08:00
227d7c5a7f

server : (UI) fix missing async generator on safari (#10857) Xuan Son Nguyen 2024-12-17 09:52:09 +01:00
7b1ec53f56

vulkan: bugfixes for small subgroup size systems + llvmpipe test (#10809) Eve 2024-12-17 05:52:55 +00:00
160bc039c8

rwkv6: add wkv6 support for Vulkan backend (#10829) Zhiyuan Li 2024-12-17 05:00:46 +08:00
08ea539df2

unicode : improve naming style (#10838) Georgi Gerganov 2024-12-16 12:31:45 +02:00
644fd71b44

sampling : refactor + optimize penalties sampler (#10803) Georgi Gerganov 2024-12-16 12:31:14 +02:00
4ddd199f6f

llava : Allow locally downloaded models for QwenVL (#10833) Bartowski 2024-12-15 15:43:25 -05:00
a0974156f3

llama : add Deepseek MoE v1 & GigaChat models (#10827) Valentin Mamedov 2024-12-16 00:02:46 +07:00
87cf323cef

scripts : change build path to "build-bench" for compare-commits.sh (#10836) Georgi Gerganov 2024-12-15 18:44:47 +02:00
5478bbcd17

server: (UI) add syntax highlighting and latex math rendering (#10808) Vinesh Janarthanan 2024-12-15 05:55:54 -06:00
b5ae1ddff9

gguf-py : bump to v0.13.0 Georgi Gerganov 2024-12-15 13:16:42 +02:00
89d604f2c8

server: Fix has_next_line in JSON response (#10818) Michelle Tan 2024-12-14 22:29:45 +00:00
e52aba537a

nix: allow to override rocm gpu targets (#10794) Evgeny Kurnevsky 2024-12-14 18:17:36 +00:00
ba1cb19cdd

llama : add Qwen2VL support + multimodal RoPE (#10361) HimariO 2024-12-14 20:43:46 +08:00
56eea0781c

Removes spurious \r in output that causes logging in journalctl to treat lines as binary and therefore hidden by default (#10771) cduk 2024-12-13 23:21:49 +01:00
a76c56fa1a

Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693) lhez 2024-12-13 12:23:52 -08:00
c27ac678dd

Opt class for positional argument handling (#10508) Eric Curtin 2024-12-13 18:34:25 +00:00
11e07fd63b

fix: graceful shutdown for Docker images (#10815) Corentin REGAL 2024-12-13 18:23:50 +01:00
4601a8bb67

gguf-py : numpy 2 newbyteorder fix (#9772) Jett Janiak 2024-12-13 15:48:44 +01:00
9f35e44592

Fix crash caused by ggml_backend_load_all when launching on Android Activity (#10812) 谢乃闻 2024-12-13 12:56:07 +00:00
64ae065511

vulkan: small mul_mat_vec optimizations (#10665) Eve 2024-12-13 08:42:04 +00:00
83ed24a97b

SYCL: Reduce most of the compiler warnings (#10748) Akarshan Biswas 2024-12-13 12:12:15 +05:30
d583cd03f6

ggml : Fix compilation issues on ARM platform when building without fp16 (#10811) Karol Kontny 2024-12-13 01:04:19 +01:00
adffa6ffd5

common : improve -ctv -ctk CLI arguments (#10806) Xuan Son Nguyen 2024-12-12 22:53:05 +01:00
274ec65af6

contrib : add ngxson as codeowner (#10804) Xuan Son Nguyen 2024-12-12 20:52:28 +01:00
8faa1d4dd4

CUDA: faster non-contiguous concat (#10760) a3sh 2024-12-13 02:09:50 +08:00
cb13ef85a4

remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) Diego Devesa 2024-12-12 19:02:49 +01:00
4064c0e3b6

Vulkan: Use improved q4_k and q5_k dequant code in dequant shaders (#10798) 0cc4m 2024-12-12 18:36:00 +01:00
dc5301d565

Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats (#10721) 0cc4m 2024-12-12 18:35:37 +01:00
9fdb124304

common : add missing env var for speculative (#10801) Xuan Son Nguyen 2024-12-12 16:57:32 +01:00
5555c0c1f6

docs: update server streaming mode documentation (#9519) CentricStorm 2024-12-11 22:40:40 +00:00
973f328b1e

Merge pull request #10788 from ggerganov/gg/gguf-py-0.11.0 Georgi Gerganov 2024-12-11 23:14:46 +02:00
fb18934a97

gguf-py : bump version to 0.11.0 Georgi Gerganov 2024-12-11 23:13:31 +02:00
235f6e14bf

server : (UI) add tok/s, get rid of completion.js (#10786) Xuan Son Nguyen 2024-12-11 20:52:14 +01:00
1a31d0dc00

Update README.md (#10772) qingy1337 2024-12-11 07:16:32 -08:00
92f77a640f

ci : pin nodejs to 22.11.0 (#10779) Xuan Son Nguyen 2024-12-11 14:59:41 +01:00
484d2f31ae

bug-fix: snprintf prints NULL in place of the last character (#10419) kallewoof 2024-12-11 22:48:04 +09:00
4b4d92b098

docs: fix server documentation formatting (#10776) CentricStorm 2024-12-11 10:47:43 +00:00
43041d2eb3

ggml: load all backends from a user-provided search path (#10699) Gilad S. 2024-12-11 02:47:21 +02:00
b685daf386

vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) Jeff Bolz 2024-12-10 14:23:17 -06:00
dafae66cc2

vulkan: dynamic subgroup size for the remaining k quants (#10745) Eve 2024-12-10 19:33:23 +00:00
ae4b922614

imatrix : Add imatrix to --no-context-shift (#10766) Bartowski 2024-12-10 12:23:50 -05:00
750cb3e246

CUDA: rename macros to avoid conflicts with WinAPI (#10736) Andreas Kieslinger 2024-12-10 18:23:24 +01:00
a86ad841f1

server : add flag to disable the web-ui (#10762) (#10751) Yüg 2024-12-10 17:22:34 +00:00
a05e2afcc2

vulkan: disable spirv-opt for coopmat shaders (#10763) Jeff Bolz 2024-12-10 11:22:20 -06:00
26a8406ba9

CUDA: fix shared memory access condition for mmv (#10740) Johannes Gäßler 2024-12-09 20:07:12 +01:00
c37fb4cf62

Changes to CMakePresets.json to add ninja clang target on windows (#10668) Srihari-mcw 2024-12-09 23:10:19 +05:30
3d98b4cb22

vulkan: fix compile warnings (#10731) Jeff Bolz 2024-12-09 01:24:01 -06:00
1a05004743

cmake : simplify msvc charsets (#10672) Borislav Stanimirov 2024-12-09 09:15:13 +02:00
ce8784bdb1

server : fix format_infill (#10724) Xuan Son Nguyen 2024-12-08 23:04:29 +01:00
e52522b869

server : bring back info of final chunk in stream mode (#10722) Xuan Son Nguyen 2024-12-08 20:38:51 +01:00
06d70147e6

Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723) stduhpf 2024-12-08 19:19:19 +01:00
43ed389a3f

llama : use cmake for swift build (#10525) Diego Devesa 2024-12-08 12:14:54 +01:00
ecc93d0558

vulkan: compile a test shader in cmake to check for coopmat2 support (#10713) Jeff Bolz 2024-12-08 02:05:55 -06:00
62e84d9848

llama : add 128k yarn context for Qwen (#10698) Robert Collins 2024-12-07 16:12:27 -05:00
3573fa8e7b

server : (refactor) no more json in server_task input (#10691) Xuan Son Nguyen 2024-12-07 20:21:09 +01:00
d9c3ba2b77

ggml : disable iq4_nl interleave size 8 (#10709) Georgi Gerganov 2024-12-07 18:38:15 +02:00
ce4a7b8493

server : various fixes (#10704) Georgi Gerganov 2024-12-07 18:02:05 +02:00
19d8762ab6

ggml : refactor online repacking (#10446) Djip007 2024-12-07 13:37:50 +01:00
c2a16c0bdb

server : fix free of spec context and batch (#10651) Georgi Gerganov 2024-12-07 11:52:44 +02:00
3df784b305

Vulkan: VK_KHR_cooperative_matrix support to speed up prompt processing (#10597) 0cc4m 2024-12-07 10:24:15 +01:00
86a1934978

metal : Extend how Llama.cpp locates metal resources (#10676) Robert Ormandi 2024-12-07 01:55:01 -06:00
784a14aa49

convert : add support for Roberta embeddings (#10695) Sukriti Sharma 2024-12-07 00:02:14 -07:00
c5ede3849f

convert : add custom attention mapping Georgi Gerganov 2024-12-06 21:33:15 +02:00
f162d45a21

common : bring back --no-warmup to server (#10686) Xuan Son Nguyen 2024-12-06 13:29:05 +01:00
6c5bc0625f

server : (refactoring) do not rely on JSON internally (#10643) Xuan Son Nguyen 2024-12-06 11:14:32 +01:00
7736837d62

fix(server) : not show alert when DONE is received (#10674) Plamen Minev 2024-12-05 23:36:41 +02:00
c9c6e01dae

vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and flash attention (#10206) Jeff Bolz 2024-12-05 13:15:05 -06:00
6fe6247831

llama : add Minerva 7B model support (#10673) Riccardo Orlando 2024-12-05 19:30:59 +01:00
0cd182ebcc

sync : ggml Georgi Gerganov 2024-12-05 13:27:42 +02:00
a8cbab201d

ggml: add GGML_SET Metal kernel + i32 CPU kernel (ggml/1037) PAB 2024-12-04 09:19:30 +01:00
c2082d93a8

ggml : add GGML_PAD_REFLECT_1D operation (ggml/1034) PAB 2024-12-03 20:20:04 +01:00
d405804be8

py : update outdated copy-paste instructions [no ci] (#10667) Daniel Bevenius 2024-12-05 08:47:55 +01:00
f112d198cd

Update deprecation-warning.cpp (#10619) aryantandon01 2024-12-05 03:49:20 +05:30
1da7b76569

server : fix speculative decoding with context shift (#10641) Georgi Gerganov 2024-12-04 22:38:20 +02:00
59f4db1088

ggml : add predefined list of CPU backend variants to build (#10626) Diego Devesa 2024-12-04 14:45:40 +01:00
2803540814

ggml-cpu : fix HWCAP2_I8MM value (#10646) Diego Devesa 2024-12-04 14:40:44 +01:00
253b7fde91

Fix HF repo commit to clone lora test models (#10649) ltoniazzi 2024-12-04 09:45:48 +00:00
8d0cfd554a

llama: Support MiniCPM-1B (with & w/o longrope) (#10559) JFLFY2255 2024-12-04 17:42:50 +08:00
2759916d86

vulkan: Implement "fast divide" (mul+shift) for unary ops like copy (#10642) Jeff Bolz 2024-12-04 01:28:59 -06:00
40c6d79fb5

SYCL : Move to compile time oneMKL interface backend selection for NVIDIA backend (#10584) Nicolò Scipione 2024-12-04 02:29:20 +01:00
98036d5670

fix typo of README.md (#10605) Wang Ran (汪然) 2024-12-04 09:22:50 +08:00
cd2f37b304

Avoid using __fp16 on ARM with old nvcc (#10616) Frankie Robertson 2024-12-04 02:41:37 +02:00
da6aac91f1

Add docs for creating a static build (#10268) (#10630) Benson Wong 2024-12-03 16:40:36 -08:00
01e6d9bb71

clip : add sycl support (#10574) piDack 2024-12-04 08:26:37 +08:00
cc98896db8

vulkan: optimize and reenable split_k (#10637) Jeff Bolz 2024-12-03 13:29:54 -06:00
91c36c269b

server : (web ui) Various improvements, now use vite as bundler (#10599) Xuan Son Nguyen 2024-12-03 19:38:44 +01:00
1cd3df46bd scripts : remove amx sync Georgi Gerganov 2024-12-03 19:42:30 +02:00
c505471857 sync : ggml Georgi Gerganov 2024-12-03 19:40:25 +02:00
e9e661bd59 CUDA: remove unnecessary warp reduce in FA (ggml/1032) mahorozte 2024-12-03 21:11:43 +08:00
efb6ae9630 feat: add GGML_UNARY_OP_ARGMAX Metal kernel (ggml/1019) PAB 2024-12-02 19:27:24 +01:00
667d70d170 metal : add GGML_OP_CONV_TRANSPOSE_1D kernels (ggml/1026) PAB 2024-11-28 09:25:06 +01:00
3b4f2e33e2

llama : add missing LLAMA_API for llama_chat_builtin_templates (#10636) Xuan Son Nguyen 2024-12-03 12:54:30 +01:00
82bca2257b

readme : add option, update default value, fix formatting (#10271) Nikolaos Pothitos 2024-12-03 12:50:08 +02:00
0115df2f65

metal : small-batch mat-mul kernels (#10581) Georgi Gerganov 2024-12-03 11:52:33 +02:00
515d4e5372

github : minify link [no ci] (revert) Georgi Gerganov 2024-12-03 11:21:43 +02:00
844e2e1fee

github : minify link [no ci] Georgi Gerganov 2024-12-03 11:20:35 +02:00
70b98fadbc

server : fix default draft model parameters (#10586) Georgi Gerganov 2024-12-03 11:20:00 +02:00