llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

80ccf5d725

ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00
a3c1232c3f

arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
8cef75c743

llamafile : ppc64le MMA INT8 implementation (#10912) amritahs-ibm 2025-01-08 16:24:19 +05:30
0d52a69e4b

ci : fix cmake option (#11125) Georgi Gerganov 2025-01-08 11:29:34 +02:00
02f0430141

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) Mathieu Baudier 2025-01-08 09:18:13 +01:00
bec2183f2c

fix: Vulkan shader gen binary path when Cross-compiling (#11096) ag2s20150909 2025-01-08 16:17:29 +08:00
53ff6b9b9f

GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
017cc5f446

ggml-backend : only offload from host buffers (fix) (#11124) Diego Devesa 2025-01-07 16:11:57 +01:00
a3d50bc022

ggml-backend : only offload from host buffers (#11120) Diego Devesa 2025-01-07 12:38:05 +01:00
a4dd490069

rpc : code cleanup (#11107) Radoslav Gerganov 2025-01-07 08:37:02 +02:00
c0d6f790d0

SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) Akarshan Biswas 2025-01-07 11:56:07 +05:30
dc7cef9f37

llama-run : fix context size (#11094) Eric Curtin 2025-01-06 22:45:28 +00:00
ecebbd292d

llama : remove unused headers (#11109) Georgi Gerganov 2025-01-06 17:52:35 +02:00
96be8c3264

github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
e6e7c75d94

server : fix extra BOS in infill endpoint (#11106) Georgi Gerganov 2025-01-06 15:36:08 +02:00
09186fabbe

llama : remove check flash_attn with lora (#11104) Xuan Son Nguyen 2025-01-06 13:41:12 +01:00
96a1dc27c3

llama : prevent system info string accumulation across calls (#11101) Asghar Ghorbani 2025-01-06 12:21:46 +01:00
6369f867a4

llama : rename missed batch params/vars to ubatch (#10059) Daniel Bevenius 2025-01-06 10:28:17 +01:00
47182dd03f

llama : update llama_model API names (#11063) Georgi Gerganov 2025-01-06 10:55:18 +02:00
3e6e7a6bc2

tokenize : escape the prompt (#11058) Georgi Gerganov 2025-01-06 10:54:25 +02:00
ae2f606bb5

mmap : fix fileno macro clash (#11076) Georgi Gerganov 2025-01-06 10:52:38 +02:00
727368c60f

llama : use LLAMA_TOKEN_NULL (#11062) Georgi Gerganov 2025-01-06 10:52:15 +02:00
5047dd3546

llama : use _impl suffix instead of _internal (#11060) Georgi Gerganov 2025-01-06 10:52:01 +02:00
46e3556e01

CUDA: add BF16 support (#11093) Johannes Gäßler 2025-01-06 02:33:52 +01:00
b56f079e28

Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074) 0cc4m 2025-01-04 21:09:59 +01:00
9394bbd484

llama : Add support for DeepSeek V3 (#11049) fairydreaming 2025-01-04 21:06:11 +01:00
f922a9c542

[GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) matt23654 2025-01-04 16:10:30 +00:00
46be942214

llama : add support for the cohere2 model architecture (#10900) DAN™ 2025-01-04 09:33:31 -05:00
78c6785175 sync : ggml Georgi Gerganov 2025-01-04 10:54:01 +02:00
5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) Georgi Gerganov 2025-01-04 10:53:54 +02:00
db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) Daniel Bevenius 2024-12-19 03:50:12 +01:00
c31fc8b966

fix: Vulkan shader gen binary path (#11037) Gilad S. 2025-01-04 10:17:31 +02:00
4b0c638b9a

common : disable KV cache shifting automatically for unsupported models (#11053) Molly Sophia 2025-01-03 20:13:18 +08:00
e7da954ecc

metal : avoid uint (#11019) Georgi Gerganov 2025-01-03 11:26:14 +02:00
f66f582927

llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
2f0ee84b9b

server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
0da5d86026

server : allow using LoRA adapters per-request (#10994) Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
a45433ba20

readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00
0827b2c1da

ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) Srihari-mcw 2024-12-31 19:53:33 +05:30
45095a61bf

server : clean up built-in template detection (#11026) Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
5896c65232

server : add OAI compat for /v1/completions (#10974) Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
bc7b1f8632

convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
6e1531aca5

common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) Peter 2024-12-31 11:46:06 +11:00
716bd6dec3

vulkan: optimize mul_mat for small values of N (#10991) Jeff Bolz 2024-12-30 11:27:11 -06:00
c250ecb315

android : fix llama_batch free (#11014) ag2s20150909 2024-12-30 20:35:13 +08:00
a813badbbd

vulkan: im2col and matmul optimizations for stable diffusion (#10942) Jeff Bolz 2024-12-29 03:16:34 -06:00
fdd2188912

vulkan: Use push constant offset to handle misaligned descriptors (#10987) Jeff Bolz 2024-12-29 02:35:11 -06:00
f865ea149d

server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
16cdce7b68

server : fix token duplication when streaming with stop strings (#10997) Alexey Parfenov 2024-12-28 15:08:54 +00:00
d79d8f39b4

vulkan: multi-row k quants (#10846) Eve 2024-12-26 10:54:44 -05:00
d283d02bf2

examples, ggml : fix GCC compiler warnings (#10983) Peter 2024-12-27 00:59:11 +11:00
9ba399dfa7

server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) Reza Kakhki 2024-12-24 21:33:04 +01:00
2cd43f4900

ggml : more perfo with llamafile tinyblas on x86_64 (#10714) Djip007 2024-12-24 18:54:49 +01:00
09fe2e7613

server: allow filtering llama server response fields (#10940) NeverLucky 2024-12-24 19:39:49 +03:00
30caac3a68

llama : the WPM vocabs use the CLS token as BOS (#10930) Georgi Gerganov 2024-12-24 09:44:20 +02:00
60cfa728e2

ggml : use wstring for backend search paths (#10960) Diego Devesa 2024-12-24 04:05:27 +01:00
3327bb0f8d

ggml : fix arm enabled features check (#10961) Diego Devesa 2024-12-24 04:05:17 +01:00
32d6ee6385

ggml : fix const usage in SSE path (#10962) Diego Devesa 2024-12-23 20:25:52 +01:00
14b699ecde

server : fix missing model id in /model endpoint (#10957) Xuan Son Nguyen 2024-12-23 12:52:25 +01:00
485dc01214

server : add system_fingerprint to chat/completion (#10917) Xuan Son Nguyen 2024-12-23 12:02:44 +01:00
86bf31cfe6

rpc-server : add support for the SYCL backend (#10934) Radoslav Gerganov 2024-12-23 10:39:30 +02:00
b92a14a841

llama : support InfiniAI Megrez 3b (#10893) Yun Dou 2024-12-23 08:35:44 +08:00
6f0c9e034b

llama : support for Llama-3_1-Nemotron-51B (#10669) ymcki 2024-12-23 08:22:33 +08:00
dab76c92cc

llama-run : include temperature option (#10899) Eric Curtin 2024-12-23 00:21:40 +00:00
7024d59e6a

ggml : fix run-time on FreeBSD in get_executable_path() (#10948) yuri@FreeBSD 2024-12-22 16:20:11 -08:00
7c0e285858

devops : add docker-multi-stage builds (#10832) Rudi Servo 2024-12-22 21:22:58 -01:00
7ae33a616f

llama : add Falcon3 support (#10883) Billel Mokeddem 2024-12-23 01:09:58 +03:00
ebdee9478c

vulkan: build fixes for 32b (#10927) Jeff Bolz 2024-12-22 03:44:01 -06:00
5cd85b5e00

convert : add BertForMaskedLM (#10919) Georgi Gerganov 2024-12-21 10:10:18 +02:00
a91a41364b

vulkan: optimize coopmat2 dequant functions (#10855) Jeff Bolz 2024-12-21 01:04:45 -06:00
e34c5af43f

ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874) Adrien Gallouët 2024-12-21 00:33:37 +01:00
eb5c3dc64b

SYCL: Migrate away from deprecated ggml_tensor->backend (#10840) Akarshan Biswas 2024-12-20 21:01:28 +05:30
0ca416c91a

server : (UI) fix copy to clipboard function (#10916) Xuan Son Nguyen 2024-12-20 14:12:06 +01:00
21ae3b9be8

ggml : add test for SVE and disable when it fails (#10906) Diego Devesa 2024-12-20 13:31:28 +01:00
0a11f8b7b5

convert : fix RWKV v6 model conversion (#10913) Molly Sophia 2024-12-20 17:44:58 +08:00
d408bb9268

clip : disable GPU support (#10896) Georgi Gerganov 2024-12-19 18:47:15 +02:00
5cab3e4aaa

llama : minor grammar refactor (#10897) Georgi Gerganov 2024-12-19 17:42:13 +02:00
36319dec5d

tts : small QoL for easy model fetch (#10903) Georgi Gerganov 2024-12-19 17:35:15 +02:00
57bb2c40cd

server : fix logprobs, make it OAI-compatible (#10783) Xuan Son Nguyen 2024-12-19 15:40:08 +01:00
a3c33b1dce

ggml: fix arm build with gcc (#10895) Adrien Gallouët 2024-12-19 14:20:41 +01:00
2fffc52b50

llama : fix Roberta embeddings (#10856) Sukriti Sharma 2024-12-19 06:04:51 -07:00
7585edbdeb

convert : Add support for Microsoft Phi-4 model (#10817) fairydreaming 2024-12-19 10:37:12 +01:00
cd920d0ac3

tests: disable GGUF test for bad value size (#10886) Johannes Gäßler 2024-12-19 08:53:58 +01:00
7909e8588d

llama-run : improve progress bar (#10821) Eric Curtin 2024-12-19 02:58:00 +00:00
9177484f58

ggml : fix arm build (#10890) Diego Devesa 2024-12-18 23:21:42 +01:00
0bf2d10c55

tts : add OuteTTS support (#10784) Georgi Gerganov 2024-12-18 19:27:21 +02:00
7bbb5acf12

server: avoid overwriting Authorization header (#10878) Gaetan Bisson 2024-12-18 04:00:07 -10:00
152610eda9

server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +02:00
0e70ba686e

server : add "tokens" output (#10853) Georgi Gerganov 2024-12-18 11:05:29 +02:00
46828872c3

server : (embeddings) using same format for "input" and "content" (#10872) Xuan Son Nguyen 2024-12-18 09:55:09 +01:00
6b064c92b4

docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -08:00
4da69d1abd

Revert "llama : add Falcon3 support (#10864)" (#10876) Diego Devesa 2024-12-18 01:36:46 +01:00
d62b532c52

Use model->gguf_kv for loading the template instead of using the C API. (#10868) DAN™ 2024-12-17 17:24:22 -05:00
081b29bd2a

tests: add tests for GGUF (#10830) Johannes Gäßler 2024-12-17 19:09:35 +01:00
5437d4aaf5

sync : ggml Georgi Gerganov 2024-12-17 18:36:02 +02:00
78f766768d

cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +02:00
8dd19a4812

vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +09:00
130d0c90bd

ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +01:00
3919da8e33

ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +01:00
0006f5a74a

ggml : update ggml_backend_cpu_device_supports_op (#10867) Georgi Gerganov 2024-12-17 18:35:42 +02:00