llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

1ec208083c

llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) SAMI 2025-02-05 14:45:40 +07:00
9f4cc8f8d3

sync: minja (#11641) Olivier Chafik 2025-02-05 01:00:12 +00:00
fd08255d0d

CUDA: non-contiguous (RMS) norm support (#11659) Johannes Gäßler 2025-02-04 22:21:42 +01:00
3ec9fd4b77

HIP: force max threads per block to be 1024 (#11621) fxzjshm 2025-02-05 02:18:38 +08:00
3962fc1a79

server : add try..catch to places not covered by set_exception_handler (#11620) Xuan-Son Nguyen 2025-02-04 18:25:42 +01:00
1bef571f6a

arg : list RPC devices first when using --list-devices (#11655) Radoslav Gerganov 2025-02-04 18:16:20 +02:00
db288b60cb

tool-call: command r7b fix for normal responses (#11608) Olivier Chafik 2025-02-04 15:48:53 +00:00
106045e7bb

readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00
f117d84b48

swift : fix llama-vocab api usage (#11645) Jhen-Jie Hong 2025-02-04 19:15:24 +08:00
534c46b53c

metal : use residency set for other platforms (#11648) Jhen-Jie Hong 2025-02-04 19:07:18 +08:00
387a1598ca

authors : update Georgi Gerganov 2025-02-04 13:04:10 +02:00
7c9e0ca520

sync : ggml Georgi Gerganov 2025-02-04 12:59:21 +02:00
8f8290ada9

cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) Christian Kastner 2025-02-04 00:17:15 +01:00
b34aedd558

ci : do not stale-close roadmap issues Georgi Gerganov 2025-02-04 09:30:42 +02:00
cde3833239

tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) Olivier Chafik 2025-02-03 23:49:27 +00:00
b3451785ac

server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
1d1e6a90bc

server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
5598f475be

server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
8ec05832fa

sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
21c84b5d2d

CUDA: fix Volta FlashAttention logic (#11615) Johannes Gäßler 2025-02-03 13:25:56 +01:00
d92cb67e37

server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
6eecde3cc8

HIP: fix flash_attn_stream_k_fixup warning (#11604) Johannes Gäßler 2025-02-02 23:48:29 +01:00
396856b400

CUDA/HIP: add support for selectable warp size to mmv (#11519) uvos 2025-02-02 22:40:09 +01:00
4d0598e144

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) uvos 2025-02-02 22:08:05 +01:00
90f9b88afb

nit: more informative crash when grammar sampler fails (#11593) Olivier Chafik 2025-02-02 19:58:34 +00:00
864a0b67a6

CUDA: use mma PTX instructions for FlashAttention (#11583) Johannes Gäßler 2025-02-02 19:31:09 +01:00
84ec8a58f7

Name colors (#11573) Eric Curtin 2025-02-02 16:14:48 +01:00
bfcce4d693

tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) Olivier Chafik 2025-02-02 09:25:38 +00:00
69804487e0

Fix exotic ci env that lacks ostringstream::str (#11581) Olivier Chafik 2025-02-02 09:10:15 +00:00
ff227703d6

sampling : support for llguidance grammars (#10224) Michał Moskal 2025-02-01 23:55:32 -08:00
0cec062a63

llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
53debe6f3c

ci: use sccache on windows HIP jobs (#11553) Olivier Chafik 2025-02-01 18:22:38 +00:00
cfd74c86db

sync: minja (418a2364b5) (#11574) Olivier Chafik 2025-02-01 12:24:51 +00:00
ecef206ccb

Implement s3:// protocol (#11511) Eric Curtin 2025-02-01 11:30:54 +01:00
5bbc7362cb

ci: simplify cmake build commands (#11548) Olivier Chafik 2025-02-01 00:01:20 +00:00
aa6fb13213

ci: use sccache on windows instead of ccache (#11545) Olivier Chafik 2025-01-31 17:12:40 +00:00
a83f528688

tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) Olivier Chafik 2025-01-31 14:15:25 +00:00
b1bcd309fc

fix stop regression (#11543) Olivier Chafik 2025-01-31 13:48:31 +00:00
5783575c9d

Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) Olivier Chafik 2025-01-31 08:24:29 +00:00
4a2b196d03

server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) Olivier Chafik 2025-01-31 08:12:40 +00:00
1bd3047a93

common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
a2df2787b3

server : update help metrics processing/deferred (#11512) Daniel Bevenius 2025-01-31 06:04:53 +01:00
553f1e46e9

ci: ccache for all github worfklows (#11516) Olivier Chafik 2025-01-30 22:01:06 +00:00
8b576b6c55

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) Olivier Chafik 2025-01-30 19:13:58 +00:00
27d135c970 HIP: require at least HIP 5.5 uvos 2025-01-29 19:36:00 +01:00
6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
3d804dec76

sync: minja (#11499) Olivier Chafik 2025-01-30 10:30:27 +00:00
ffd0821c57

vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) mgroeber9110 2025-01-30 11:10:59 +01:00
4314e56c4f

server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
496e5bf46b

server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
7919256c57

readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
e0449763a4

server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
eb7cf15a80

server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) Nigel Bosch 2025-01-29 12:45:44 -06:00
66ee4f297c

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) Rémy Oudompheng 2025-01-29 18:29:39 +01:00
e51c47b401

server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
2711d0215f

vulkan: Catch pipeline creation failure and print an error message (#11436) Jeff Bolz 2025-01-29 09:26:50 -06:00
f0d4b29edf

Parse https://ollama.com/library/ syntax (#11480) Eric Curtin 2025-01-29 12:23:10 +01:00
815857791d

sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
1a0e87d291

ggml : add option to not print stack on abort (ggml/1081) William Tambellini 2025-01-23 11:59:08 -08:00
d2e518e9b4

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
b636228c0a

embedding : enable --no-warmup option (#11475) Daniel Bevenius 2025-01-29 09:38:54 +01:00
325afb370a

llama: fix missing k_cache store for rwkv6qwen2 (#11445) Molly Sophia 2025-01-29 12:07:21 +08:00
794fe23f29

cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
cf8cc856d7

server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
d0c08040b6

ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
be5ef7963f

HIP: Supress transformation warning in softmax.cu uvos 2025-01-28 23:06:32 +01:00
cae9fb4361

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) Nikita Sarychev 2025-01-28 07:42:20 -08:00
7fee2889e6

Add github protocol pulling and http:// (#11465) Eric Curtin 2025-01-28 15:45:41 +01:00
d7d1eccacc

docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
4bf3119d61

cmake : don't fail on GGML_CPU=OFF (#11457) someone13574 2025-01-28 09:15:34 -05:00
f643120bad

docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
6e84b0ab8e

SYCL : SOFTMAX F16 mask support and other fixes (#11261) Akarshan Biswas 2025-01-28 15:26:58 +05:30
2b8525d5c8

Handle missing model in CLI parameters for llama-run (#11399) Michael Engel 2025-01-28 09:32:40 +01:00
a4417ddda9

Add new hf protocol for ollama (#11449) Eric Curtin 2025-01-27 19:36:10 +01:00
d6d24cd9ed

AMD: parse the architecture as supplied by gcnArchName (#11244) Haus1 2025-01-27 08:58:17 -05:00
a5203b4465

llama : minor fixes for up llama load model speed (#11448) lexasub 2025-01-27 17:42:09 +04:00
df984e0147

llama: refactor llama_decode_impl (#11381) Johannes Gäßler 2025-01-27 12:07:12 +01:00
acd38efee3

metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441) Ihar Hrachyshka 2025-01-27 02:41:59 -05:00
caf773f249

docker : fix ARM build and Vulkan build (#11434) Xuan Son Nguyen 2025-01-26 22:45:32 +01:00
178a7eb952

metal : use residency sets (#11427) Georgi Gerganov 2025-01-26 20:06:16 +02:00
6f53d8a6b4

docker: add missing vulkan library to base layer and update to 24.04 (#11422) Nuno 2025-01-26 18:22:43 +01:00
19f65187cb

cmake: add ggml find package (#11369) bandoti 2025-01-26 12:07:48 -04:00
1d8ee06000

rpc: fix register position (#11424) Frank Mai 2025-01-26 23:20:34 +08:00
2cc9b8c32c

readme : update hot topics Georgi Gerganov 2025-01-26 14:30:15 +02:00
f35726c2fb

build: apply MSVC /bigobj option to c/cpp files only (#11423) Jeff Bolz 2025-01-25 20:10:03 -06:00
4a75d19376

vulkan: compile shaders on-demand (#11406) Jeff Bolz 2025-01-25 15:29:57 -06:00
26771a1491

Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) uvos 2025-01-25 21:01:12 +01:00
ca6baf76c1

build: add /bigobj to MSVC build (#11407) Jeff Bolz 2025-01-25 11:26:37 -06:00
6e264a905b

docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) Diego Devesa 2025-01-25 17:22:41 +01:00
49b0e3cec4

server : fix cleaning up stream task (#11418) Xuan Son Nguyen 2025-01-25 16:36:44 +01:00
20a758155b

docker : fix CPU ARM build (#11403) Diego Devesa 2025-01-25 15:22:29 +01:00
00c24acb2a

ci : fix line breaks on windows builds (#11409) Georgi Gerganov 2025-01-25 13:36:48 +02:00
466ea66f33

CANN: Add Ascend CANN build ci (#10217) jiahao su 2025-01-25 07:26:01 +08:00
5f0db9522f

hip : Add hipGraph and VMM support to ROCM (#11362) uvos 2025-01-25 00:02:23 +01:00
c5d9effb49

CUDA: fix FP16 cuBLAS GEMM (#11396) Johannes Gäßler 2025-01-24 21:02:43 +01:00
9fbadaef4f

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) uvos 2025-01-24 17:50:49 +01:00
9755129c27

release : pack /lib in the packages (#11392) Georgi Gerganov 2025-01-24 18:41:30 +02:00
a07c2c8a52

docs : Update readme to build targets for local docker build (#11368) Jafar Uruç 2025-01-24 13:30:13 +00:00
8137b4bb2b

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) Johannes Gäßler 2025-01-24 12:38:31 +01:00