llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

3d652bfddf

readme : update bindings (#12229) Lucas Moura Belo 2025-03-06 16:15:13 -03:00
5220a16d18

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (#12222) Johannes Gäßler 2025-03-06 18:45:09 +01:00
3ffbbd5ce1

HIP: rocWMMA documentation and enabling in workflow builds (#12179) David Huang 2025-03-06 21:14:11 +08:00
42994048a3

update function-calling.md w/ template override for functionary-small-v3.2 (#12214) Olivier Chafik 2025-03-06 09:03:31 +00:00
e9b2f84f14

llava: add big-endian conversion for image encoder (#12218) Aaron Teo 2025-03-06 16:33:21 +08:00
e721c05c93

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of replaceing it. (#12209) uvos 2025-03-06 08:20:52 +01:00
57b6abf85a

android : fix KV cache log message condition (#12212) Han Yin 2025-03-05 22:22:49 -08:00
94bb63e4f0

opencl : fix buffer alignment (#12197) Henry Linjamäki 2025-03-06 03:33:40 +02:00
f79243992c

opencl : fix ulong kernel args were set from int variables (#12174) Henry Linjamäki 2025-03-06 03:31:14 +02:00
ed4ce0dda2

opencl : fix profile-related errors (#12095) simon886212 2025-03-06 09:30:05 +08:00
07d1572347

ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) Rémy O 2025-03-06 02:26:10 +01:00
5e43f104cc

SYCL: Disable f16 Unary OPs as not supported by the kernels (#12201) Akarshan Biswas 2025-03-05 21:28:23 +05:30
16e4b22c5e

ggml : fix GGMLMetalClass ODR (#12200) Plamen Minev 2025-03-05 17:16:01 +02:00
074c4fd39d

ci : add fetch-depth to xcframework upload (#12195) Daniel Bevenius 2025-03-05 14:16:40 +01:00
669912d9a5

tool-call: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) Olivier Chafik 2025-03-05 13:05:13 +00:00
fa31c438e0

ci : fix xcframework artifact tag (#12191) Daniel Bevenius 2025-03-05 10:22:29 +01:00
3ccbfe5a71

ci : remove xframework upload (#12190) Daniel Bevenius 2025-03-05 08:34:02 +01:00
06a92a193a

server : fix cache reuse logic (#12161) Clauszy 2025-03-05 15:25:45 +08:00
a057897ad4

llama : add xcframework build script (#11996) Daniel Bevenius 2025-03-05 06:30:31 +01:00
5bbe6a9fe9

ggml : portability fixes for VS 2017 (#12150) mgroeber9110 2025-03-04 17:53:26 +01:00
20a9b8f5e1

readme : fix roadmap link (#12185) Georgi Gerganov 2025-03-04 18:42:44 +02:00
56d7a9f812

main: allow preloading conversation with -p and add -st / --single-turn (#12145) Sigbjørn Skjæret 2025-03-04 17:19:39 +01:00
1a24c4621f

server: fix deadly typo in response_format.json_schema.schema handling (#12168) Olivier Chafik 2025-03-04 06:24:07 +00:00
becade5de7

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (#12032) David Huang 2025-03-04 05:10:54 +08:00
dfd6b2c0be sync : ggml Georgi Gerganov 2025-03-03 17:57:38 +02:00
b64d7cc272 cuda: unary ops as float + de-duplicate (ggml/1130) cmdr2 2025-03-03 20:51:31 +05:30
3d1cf3cf33 sync : ggml Georgi Gerganov 2025-02-28 12:37:35 +02:00
0cbee131ad cuda/vulkan: specify fp32-only support for some operations in supports_op (ggml/1129) cmdr2 2025-02-28 12:36:46 +02:00
8371d44595 sync : ggml Georgi Gerganov 2025-02-28 09:09:58 +02:00
87abb7e903 cuda/cpu: Increase support for fp16 unary operations (ggml/1125) cmdr2 2025-02-28 12:34:39 +05:30
6d4c23b81b whisper : support GGML_BACKEND_DL (whisper/2843) Diego Devesa 2025-02-27 13:35:07 +01:00
6512a90037 cmake : fix compile assumptions for power9/etc (whisper/2777) midnight 2025-02-05 04:41:10 -08:00
4512055792 Told cmake to install ggml-cpp.h as a public header file. (ggml/1126) petterreinholdtsen 2025-02-26 21:44:00 +01:00
f54a4ba11e Support pure float16 add/sub/mul/div operations in the CUDA (and CPU) backend (ggml/1121) cmdr2 2025-02-25 18:06:34 +05:30
aede2074f6 scripts : sync-ggml-am.sh fix Georgi Gerganov 2025-02-28 09:09:38 +02:00
2679c3b55d

ci : set GITHUB_ACTION env var for server tests (#12162) Daniel Bevenius 2025-03-03 16:17:36 +01:00
c43af9276b

tts: add speaker file support (#12048) dm4 2025-03-03 21:09:29 +08:00
d5c63cd7f9

test-backend-ops : add option -p to filter by op params (#12155) Diego Devesa 2025-03-03 14:00:46 +01:00
9660ffef58

ggml : fix kleidiai build (#12159) ag2s20150909 2025-03-03 20:54:08 +08:00
c950a1f692

Adding UTF-8 support to llama.cpp (#12111) Eric Curtin 2025-03-03 12:44:56 +00:00
7b69003af7

webui : add ?m=... and ?q=... params (#12148) Xuan-Son Nguyen 2025-03-03 11:42:45 +01:00
ece9745bb8

SYCL: Move CPY kernels to a separate file and add few missing kernels (#12133) Akarshan Biswas 2025-03-03 15:37:22 +05:30
cc473cac7c

ggml-backend : keep paths in native string type when possible (#12144) Diego Devesa 2025-03-02 22:11:00 +01:00
14dec0c2f2

main: use jinja chat template system prompt by default (#12118) Sigbjørn Skjæret 2025-03-02 14:53:48 +01:00
1782cdfed6

main: update outdated system prompt message (followup to #12131) (#12132) Sigbjørn Skjæret 2025-03-01 15:22:27 +01:00
45a8e76745

common : add --system-prompt parameter, replace behavior of -p in conversation mode (#12131) Sigbjørn Skjæret 2025-03-01 13:56:45 +01:00
80c41ddd8f

CUDA: compress mode option and default to size (#12029) Erik Scholz 2025-03-01 12:57:22 +01:00
2cc4a5e44a

webui : minor typo fixes (#12116) Vivian 2025-03-01 15:45:09 +05:30
06c2b1561d

convert : fix Norway problem when parsing YAML (#12114) Xuan-Son Nguyen 2025-02-28 17:44:46 +01:00
70680c48e5

ggml : upgrade init_tensor API to return a ggml_status (#11854) William Tambellini 2025-02-28 05:41:47 -08:00
c43a3e7996

llama : add Phi-4-mini support (supersede #12099) (#12108) Xuan-Son Nguyen 2025-02-28 12:44:11 +01:00
84d5f4bc19

Update granite vision docs for 3.2 model (#12105) Alex Brooks 2025-02-28 04:31:47 -07:00
438a83926a

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595) Rémy O 2025-02-28 09:42:52 +01:00
9c42b1718c

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098) Johannes Gäßler 2025-02-28 09:26:43 +01:00
05e6f5aad0

ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064) Prashant Vithule 2025-02-28 13:06:12 +05:30
673cfef9aa

CANN: Fix build error with GCC 13 (#11990) hipudding 2025-02-28 15:23:47 +08:00
fbeda9002d

vulkan: matmul dequantization improvements (#12015) Eve 2025-02-28 07:20:08 +00:00
581650b7ca

vulkan: improve im2col (#11826) Daniele 2025-02-28 06:52:51 +00:00
b95c8af37c

cmake: Fix ggml backend dependencies and installation (#11818) Vladimir Vuksanovic 2025-02-27 08:42:48 +01:00
a800ae46da

llava : add struct for FFI bindgen (#12079) Ting Lou 2025-02-26 22:26:52 +08:00
69050a11be

Refactor gguf scripts to improve metadata handling (#11909) Sigbjørn Skjæret 2025-02-26 14:04:48 +01:00
3567ee3a94

gguf-py: enable reading non-native endian files (#12081) Aleksei Nikiforov 2025-02-26 12:39:27 +01:00
53e4db1012

readme : update infra list (#9096) Kante Yin 2025-02-26 15:49:36 +08:00
d7cfe1ffe0

docs: add docs/function-calling.md to lighten server/README.md's plight (#12069) Olivier Chafik 2025-02-25 18:52:56 +00:00
a82c9e7c23

vulkan: fix assertion when qy_needs_dequant (#12068) Jeff Bolz 2025-02-25 09:30:21 -06:00
401af80b54

server: handle echo=false on /v1/completions (#12060) rhjdvsgsgks 2025-02-25 11:52:52 +00:00
c132239bfb

add OP sigmoid (#12056) Judd 2025-02-25 19:32:20 +08:00
393fca629e

ggml-cpu: Fix build with sve (#12059) Molly Sophia 2025-02-25 19:28:22 +08:00
61d4f39dfe

vulkan: implement more backpropagation operators (#11914) Rémy O 2025-02-25 12:04:45 +01:00
0b52745649

server: support add_generation_prompt query param (#12062) Olivier Chafik 2025-02-25 10:40:22 +00:00
4d1051a40f

Add Doc for Converting Granite Vision -> GGUF (#12006) Alex Brooks 2025-02-25 02:46:05 -07:00
3e9a2860e9

llama : expose llama_model_n_head_kv in the API (#11997) Vitali Lovich 2025-02-25 01:29:33 -08:00
58d07a8043

metal : copy kernels for quant to F32/F16 conversions (#12017) Gian-Carlo Pascutto 2025-02-25 10:27:58 +01:00
34a846b584

opencl: fix for small models (#11950) lhez 2025-02-24 13:47:07 -08:00
7a2c913e66

llava : Add Granite Vision Support (#11794) Alex Brooks 2025-02-24 09:09:51 -07:00
08d5986290

[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035) Neo Zhang Jianyu 2025-02-24 22:33:23 +08:00
651adf4b66

gguf_convert_endian.py: implement byteswapping for q4_k and q6_k (#11349) Aleksei Nikiforov 2025-02-24 12:27:01 +01:00
8303e8b0fb

SYCL: Fix GGML_SYCL_DEBUG macro (#11995) Akarshan Biswas 2025-02-24 15:48:25 +05:30
7ad0779f5d

run: allow to customize prompt by env var LLAMA_PROMPT_PREFIX (#12041) Florent BENOIT 2025-02-23 18:15:51 +01:00
f777a73e18

Some llama-run cleanups (#11973) Eric Curtin 2025-02-23 13:14:32 +00:00
af7747c95a

ggml-cpu: Support s390x SIMD Instruction Set (#12019) Aaron Teo 2025-02-23 05:39:24 +08:00
a28e0d5eb1

CUDA: app option to compile without FlashAttention (#12025) Johannes Gäßler 2025-02-22 20:44:34 +01:00
36c258ee92

llava: build clip image from pixels (#11999) Ting Lou 2025-02-22 22:28:28 +08:00
f3e64859ed

ci : fix arm upload artifacts (#12024) Georgi Gerganov 2025-02-22 15:03:00 +02:00
5fa07c2f93

CUDA: optimize FA for GQA + large batches (#12014) Johannes Gäßler 2025-02-22 12:20:17 +01:00
335eb04a91

ci : Build on Github-hosted arm64 runners (#12009) Rohanjames1997 2025-02-22 04:48:57 -06:00
cf756d6e0a

server : disable Nagle's algorithm (#12020) Georgi Gerganov 2025-02-22 12:46:31 +02:00
d70908421f

cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000) Gian-Carlo Pascutto 2025-02-22 09:43:24 +01:00
de8b5a3624

llama.swiftui : add "Done" dismiss button to help view (#11998) Daniel Bevenius 2025-02-22 06:33:29 +01:00
51f311e057

llama : skip loading unused tensors (#12004) Georgi Gerganov 2025-02-21 18:33:18 +02:00
586d5fe6eb

doc: update contributing guidelines [no ci] (#11969) Johannes Gäßler 2025-02-21 12:51:25 +01:00
ecc8e3aeff

CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984) PureJourney 2025-02-21 19:21:05 +08:00
0b3863ff95

MUSA: support ARM64 and enable dp4a .etc (#11843) Bodhi 2025-02-21 15:46:23 +08:00
ee02ad02c5

clip : fix visual encoders with no CLS (#11982) Alex Brooks 2025-02-20 23:11:03 -07:00
c392e5094d

server (webui): Fix Premature Submission During IME Conversion (#11971) momonga 2025-02-21 03:43:22 +09:00
c5d91a7400

ggml-cpu: Add CPU backend support for KleidiAI library (#11390) Charles Xu 2025-02-20 14:06:51 +01:00
4806498bf1

ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917) Prashant Vithule 2025-02-20 15:38:32 +05:30
0d559580a0

run : add --chat-template-file (#11961) Michael Engel 2025-02-20 09:35:11 +01:00
d04e7163c8

doc: add links to ggml examples [no ci] (#11958) Johannes Gäßler 2025-02-19 20:45:17 +01:00
d07c621393

common : add llama.vim preset for Qwen2.5 Coder (#11945) Daniel Bevenius 2025-02-19 12:29:52 +01:00