llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

d84c48505f

llama : fix Baichuan2 13B (#6092) slaren 2024-03-15 22:14:16 +01:00
877b4d0c62

llama : add support for control vectors (#5970) Theia Vogel 2024-03-15 13:43:02 -07:00
12247f4c69

llama : add Command-R support (#6033) Andrew Canis 2024-03-15 16:41:22 -04:00
4e9a7f7f7f

llava : change API to pure C style for Rust FFI bindgen (#6079) Ting Lou 2024-03-15 22:31:05 +08:00
3020327f6c

cuda : disable unused cudaLaunchHostFunc code (#6078) slaren 2024-03-15 13:24:03 +01:00
46acb36767

fix set main gpu error (#6073) Neo Zhang Jianyu 2024-03-15 18:53:53 +08:00
131b058409

make : ggml-metal.o depends on ggml.h Georgi Gerganov 2024-03-15 11:36:50 +02:00
753e36f650

[SYCL] Fix non-intel device selection (#6042) AidanBeltonS 2024-03-15 09:26:20 +00:00
7ce2c77f88

gguf : add support for I64 and F64 arrays (#6062) Ondřej Čertík 2024-03-15 02:46:51 -06:00
aab606a11f

llama : add Orion chat template (#6066) Xuan Son Nguyen 2024-03-15 09:44:57 +01:00
b0bc9f4a9d

llama-bench : use random tokens to improve accuracy with mixtral (#6069) slaren 2024-03-15 09:22:24 +01:00
4755afd1cb

llama : fix integer overflow during quantization (#6063) Georgi Gerganov 2024-03-14 22:58:41 +02:00
6e0438da3c

gguf : fix resource leaks (#6061) Steve Grubb 2024-03-14 14:29:32 -04:00
727107707a

gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
69ff61397d

llama : support models without vocabulary (#5798) Michael Podvitskiy 2024-03-14 17:21:56 +01:00
044ec4b2a5

embedding : add EOS token if not present (#899) Georgi Gerganov 2024-03-14 15:14:14 +02:00
77178eedc8

gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
15a333260a

readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
43241adf22

server: disable debug release type sanitizer, simplify trigger (#6047) Pierrick Hymbert 2024-03-14 12:15:39 +01:00
a44bc969e4

llama : fix typo Georgi Gerganov 2024-03-14 13:13:06 +02:00
2c4fb69246

llama : optimize defrag moves + fix fragmentation calculation (#6037) Michael Podvitskiy 2024-03-14 11:56:48 +01:00
3ca23481dd

gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
3fe8d7a17f

ggml : designate enum vals for integer types (#6050) Georgi Gerganov 2024-03-14 12:38:37 +02:00
68265ebfc6

embedding : print all resulting embeddings (#899) Georgi Gerganov 2024-03-14 12:37:20 +02:00
381da2d9f0

metal : build metallib + fix embed path (#6015) Georgi Gerganov 2024-03-14 11:55:23 +02:00
0fd6c1f015

embedding : print cosine similarity (#899) Georgi Gerganov 2024-03-14 10:12:29 +02:00
19885d205e

readme : update details about running llama in Termux on Android (#6039) Linwei Wang 2024-03-14 02:34:40 +08:00
76a936c893

readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
463628372d

grammar : handle missing "root" node (#6004) Clint Herron 2024-03-13 14:10:40 -04:00
f30ea47a87

llama : add pipeline parallelism support (#6017) slaren 2024-03-13 18:54:21 +01:00
d8fd0ccf6a

test-backend-ops : skip CPU backend by default (#6028) slaren 2024-03-13 14:58:30 +01:00
b3d978600f

Update get version (#6025) AidanBeltonS 2024-03-13 13:17:54 +00:00
99b71c068f

Server: Use multi-task for embeddings endpoint (#6001) Xuan Son Nguyen 2024-03-13 11:39:11 +01:00
306d34be7a

ci : remove tidy-review (#6021) slaren 2024-03-12 16:55:19 +01:00
8030da7afe

ggml : reuse quantum structs across backends (#5943) Georgi Gerganov 2024-03-12 14:27:20 +02:00
184215e783

ggml : fix UB in IQ2_S and IQ3_S (#6012) Georgi Gerganov 2024-03-12 13:49:55 +02:00
48358b2e5b

sycl : update IQ1_S kernels (WIP - not working!) (#5995) Georgi Gerganov 2024-03-12 11:15:05 +02:00
5cdb371731

grammar : fix unnecessarily retained pointer to rules (#6003) gliptic 2024-03-11 20:59:03 +01:00
44ca159faf

1.5 bit: we can do even better (#5999) Kawrakow 2024-03-11 16:53:15 +01:00
05b06210c9

llama : more consistent names of count variables (#5994) Georgi Gerganov 2024-03-11 17:49:47 +02:00
83796e62bc

llama : refactor unicode stuff (#5992) Georgi Gerganov 2024-03-11 17:47:47 +02:00
828defefb6

Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
caa106d4e0

Server: format error to json (#5961) Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
3202361c5b

ggml, ci : Windows ARM runner and build fixes (#5979) Michael Podvitskiy 2024-03-11 10:28:51 +01:00
332bdfd798

server : maintain chat completion id for streaming responses (#5988) Minsoo Cheong 2024-03-11 17:09:32 +09:00
ecab1c75de

cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) Gilad S 2024-03-11 10:00:08 +02:00
ee35600b90

llama : fix F16/F32 downcast + improve names (#5980) Georgi Gerganov 2024-03-11 09:56:47 +02:00
be858f6205

Better 1.5 bit quantization (#5971) Kawrakow 2024-03-11 07:51:49 +01:00
ef3ced26a3

[SYCL] Add q3_s and q1_s (#5886) Abhilash Majumder 2024-03-11 10:27:56 +05:30
3814a07392

[SYCL] Add support for SYCL Nvidia target (#5738) AidanBeltonS 2024-03-11 01:13:57 +00:00
bb6d00bbf9

metal : move mm_id indices to shared mem (#5982) Georgi Gerganov 2024-03-10 23:12:48 +02:00
7ab7b733bb

android : fix utf8 decoding error (#5935) Dean 2024-03-11 04:03:17 +08:00
d9f65c97c3

readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
b838b53ad6

sync : ggml Georgi Gerganov 2024-03-10 20:10:46 +02:00
df4dc3e7cb

ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
bf47a5eefc

ggml : remove __constant__ specifier for CUDA tables (#5940) Georgi Gerganov 2024-03-10 20:09:24 +02:00
fa8a809a91

server: ci: windows build and tests (#5968) Pierrick Hymbert 2024-03-10 18:17:47 +01:00
bcebd7dbf6

llama : add support for GritLM (#5959) DAN™ 2024-03-10 11:56:30 -04:00
2960eae847

grammar : verify parsed state (#5950) Clint Herron 2024-03-10 11:17:43 -04:00
c78541479c

nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
621e86b331

server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
77d1ac7e00

server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
d894f352bf

perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
098dbaab44

readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
8380ecfb21

ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) Georgi Gerganov 2024-03-09 17:36:20 +02:00
58308a0ecc

server : fix metrics init (#5964) Georgi Gerganov 2024-03-09 17:34:15 +02:00
5b09797321

ggml : remove old quantization functions (#5942) Georgi Gerganov 2024-03-09 15:53:59 +02:00
97c09585d6

server : clarify some items in the readme (#5957) Georgi Gerganov 2024-03-09 15:47:47 +02:00
fb215c3832

server : normalize embeddings (#5956) SeungWon Jeong 2024-03-09 21:27:58 +09:00
2c4f566c88

tests : gitignore ggml-common.h Georgi Gerganov 2024-03-09 14:17:11 +02:00
0db32beaf0

server : fix passing prompt as tokens (#5955) Alexey Parfenov 2024-03-09 11:16:53 +00:00
8a3012a4ad

ggml : add ggml-common.h to deduplicate shared code (#5940) Georgi Gerganov 2024-03-09 12:47:57 +02:00
9674aaf35c

server : simplify logic for empty prompts (#5953) Georgi Gerganov 2024-03-09 12:34:18 +02:00
950ba1ab84

Server: reorganize some http logic (#5939) Xuan Son Nguyen 2024-03-09 11:27:53 +01:00
e1fa9569ba

server : add SSL support (#5926) Gabe Goodhart 2024-03-09 02:57:09 -07:00
fd72d2d2a5

server: tests: add truncated prompt tests, better kv cache size (#5933) Pierrick Hymbert 2024-03-09 10:30:04 +01:00
c2101a2e90

llama : support Mamba Selective State Space Models (#5328) compilade 2024-03-08 17:31:00 -05:00
515f7d0d4f

llama : fix quantization of shared token_embd (#5944) compilade 2024-03-08 10:53:37 -05:00
76e868821a

server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Pierrick Hymbert 2024-03-08 12:25:04 +01:00
e457fb3540

llama : assume tied weights if lm_head/output weights is missing (#5824) Don Mahurin 2024-03-08 02:41:50 -08:00
af37fd8b30

server : fix EOS token detection with disabled cache (#5938) Georgi Gerganov 2024-03-08 12:40:02 +02:00
581ed5c4fe

log : fix MSVC compile errors (#5643) UEXTM.com 2024-03-08 04:35:04 -05:00
6cdabe6526

llama-bench : add embeddings option (#5924) Georgi Gerganov 2024-03-07 16:32:38 +02:00
89fb735fcf

Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
55a2a900ff

server : add /v1/completions endpoint (#5914) Minsoo Cheong 2024-03-07 19:42:39 +09:00
2002bc96bf

server : refactor (#5882) Georgi Gerganov 2024-03-07 11:41:53 +02:00
ceca1aef07

[SYCL] fix error when set main gpu to non-zero (#5901) Neo Zhang Jianyu 2024-03-07 16:34:31 +08:00
e04e04f8fa

ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) Jared Van Bortel 2024-03-06 15:42:23 -05:00
e25fb4b18f

ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) bobqianic 2024-03-06 07:35:07 +00:00
1e35d619a6

convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
8ced9f7e32

add wait() to make code stable (#5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
652ca2bded

compare-llama-bench.py : remove mul_mat_q (#5892) slaren 2024-03-05 22:27:29 +01:00
bd836944f8

quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889) Jared Van Bortel 2024-03-05 11:56:37 -05:00
3de31677d3

grammars : blacklists character control set (#5888) ExtReMLapin 2024-03-05 17:33:08 +01:00
82cb31eb93

Revert "grammars : don't allow to output unescaped new line in string (#5885)" Georgi Gerganov 2024-03-05 15:56:24 +02:00
b1a4e994fd

grammars : don't allow to output unescaped new line in string (#5885) ExtReMLapin 2024-03-05 14:44:29 +01:00
61d1c88e15

Vulkan Improvements (#5835) 0cc4m 2024-03-05 13:33:42 +01:00
21b0867433

[SYCL] fix mul_mat fault in CI/unit-test (#5862) Neo Zhang Jianyu 2024-03-05 16:08:35 +08:00
6a87ac3a52

fix editorconfig check break (#5879) Minsoo Cheong 2024-03-05 15:12:23 +09:00
29eee40474

fix speculative decoding build on windows (#5874) Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00