llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

afd9909a64

rpc : backend refactoring (#9912) Radoslav Gerganov 2024-10-18 14:33:58 +03:00
87421a23e8

[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
60ce97c9d8

add amx kernel for gemm (#8998) Ma Mingfei 2024-10-18 13:34:36 +08:00
8901755ba3

server : add n_indent parameter for line indentation requirement (#9929) Georgi Gerganov 2024-10-18 07:32:19 +03:00
6f55bccbb8

llama : rename batch_all to batch (#8881) Daniel Bevenius 2024-10-18 01:41:51 +02:00
17bb928080

readme : remove --memory-f32 references (#9925) Georgi Gerganov 2024-10-17 23:43:05 +03:00
9f45fc1e99

llama : change warning to debug log Georgi Gerganov 2024-10-17 23:26:32 +03:00
99bd4ac28c

llama : infill sampling handle very long tokens (#9924) Georgi Gerganov 2024-10-17 22:32:47 +03:00
3752217ed5

readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
f010b77a37

vulkan : add backend registry / device interfaces (#9721) Diego Devesa 2024-10-17 02:46:58 +02:00
2194200278

fix: allocating CPU buffer with size 0 (#9917) Gilad S. 2024-10-17 02:34:22 +03:00
73afe681aa

fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) Gilad S. 2024-10-17 01:36:51 +03:00
9e04102448

llama : suppress conversion from 'size_t' to 'int' (#9046) Daniel Bevenius 2024-10-16 19:34:28 +02:00
dbf18e4de9

llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
66c2c93082

grammar : fix JSON Schema for string regex with top-level alt. (#9903) Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
10433e8b45

llama : add tensor name for "result_norm" (#9907) Molly Sophia 2024-10-16 18:10:21 +08:00
1f66b699c4

server : fix the disappearance of the end of the text (#9867) Alexey Parfenov 2024-10-16 08:35:53 +00:00
0e41b300ed

sync : ggml Georgi Gerganov 2024-10-16 11:28:14 +03:00
cd60b88bf7

ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
becfd387f6

[CANN] Fix cann compilation error (#9891) leo-pony 2024-10-16 08:51:46 +08:00
755a9b2bf0

llama : add infill sampler (#9896) Georgi Gerganov 2024-10-15 16:35:33 +03:00
223c25a72f

server : improve infill context reuse (#9894) Georgi Gerganov 2024-10-15 16:28:55 +03:00
fbc98b748e

sampling : add XTC sampler (#9742) MaggotHATE 2024-10-15 15:54:55 +05:00
dcdd535302

server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
4c42f93b22

readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
a89f75e1b7

server : handle "logprobs" field with false value (#9871) VoidIsVoid 2024-10-14 15:04:36 +08:00
13dca2a54a

Vectorize load instructions in dmmv f16 CUDA kernel (#9816) agray3 2024-10-14 01:49:08 +01:00
d4c19c0f5c

server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
c7181bd294

server : reuse cached context chunks (#9866) Georgi Gerganov 2024-10-13 18:52:48 +03:00
92be9f1216

flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
edc265661c

server : add option to time limit the generation phase (#9865) Georgi Gerganov 2024-10-12 16:14:27 +03:00
1bde94dd02

server : remove self-extend features (#9860) Georgi Gerganov 2024-10-12 16:06:31 +03:00
95c76e8e92

server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
11ac9800af

llama : improve infill support and special token detection (#9798) Georgi Gerganov 2024-10-12 08:21:51 +03:00
943d20b411

musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
96776405a1

ggml : move more prints to the ggml log system (#9839) Diego Devesa 2024-10-11 15:34:45 +02:00
7eee341bee

common : use common_ prefix for common library functions (#9805) Diego Devesa 2024-10-10 22:57:42 +02:00
0e9f760eb1

rpc : add backend registry / device interfaces (#9812) Diego Devesa 2024-10-10 20:14:55 +02:00
cf8e0a3bb9

musa: add docker image support (#9685) R0CKSTAR 2024-10-11 02:10:37 +08:00
c7499c557c

examples : do not use common library in simple example (#9803) Diego Devesa 2024-10-10 19:50:49 +02:00
c81f3bbb05

cmake : do not build common library by default when standalone (#9804) Diego Devesa 2024-10-09 18:49:52 +02:00
e7022064ab

perplexity : fix integer overflow (#9783) Georgi Gerganov 2024-10-09 17:00:18 +03:00
3dc48fe75a

examples : remove llama.vim Georgi Gerganov 2024-10-09 10:55:42 +03:00
dca1d4b58a

ggml : fix BLAS with unsupported types (#9775) Diego Devesa 2024-10-08 14:21:43 +02:00
458367a906

server : better security control for public deployments (#9776) Xuan Son Nguyen 2024-10-08 13:27:04 +02:00
fa42aa6d89

scripts : fix spelling typo in messages and comments (#9782) standby24x7 2024-10-08 15:19:53 +09:00
6374743747

ggml : add backend registry / device interfaces to BLAS backend (#9752) Diego Devesa 2024-10-07 21:55:08 +02:00
f1af42fa8c

Update building for Android (#9672) Andrew Minh Nguyen 2024-10-07 09:37:31 -07:00
6279dac039

flake.lock: Update (#9753) Georgi Gerganov 2024-10-07 19:35:42 +03:00
d5ac8cf2f2

ggml : add metal backend registry / device (#9713) Georgi Gerganov 2024-10-07 18:27:51 +03:00
96b6912103

metal : single allocation of encode_async block (#9747) Paul Tsochantaris 2024-10-07 13:26:31 +01:00
d5cb86844f

contrib : simplify + minor edits [no ci] Georgi Gerganov 2024-10-06 14:15:27 +03:00
f4b2dcdf49

readme : fix typo [no ci] Georgi Gerganov 2024-10-06 13:49:41 +03:00
b6d6c5289f

sync : llama.cpp Georgi Gerganov 2024-10-06 12:53:28 +03:00
b0915d5b51

vulkan : retry allocation with fallback flags (whisper/2451) SRHMorris 2024-10-06 08:34:20 +01:00
8c475b97b8

rerank : use [SEP] token instead of [BOS] (#9737) Georgi Gerganov 2024-10-05 15:55:04 +03:00
58b16695e1

sync : ggml Georgi Gerganov 2024-10-05 15:53:49 +03:00
905f5485b2

metal : zero-init buffer contexts (whisper/0) Georgi Gerganov 2024-10-05 14:33:54 +03:00
71967c2a6d

Add Llama Assistant (#9744) Viet-Anh NGUYEN (Andrew) 2024-10-05 01:29:35 +07:00
17880771ad

sync : ggml Georgi Gerganov 2024-10-04 18:50:25 +03:00
55951c018d

ggml : fix typo in example usage ggml_gallocr_new (ggml/984) Daniel Bevenius 2024-10-04 15:46:18 +02:00
ff565769f2

ggml : fixes after sync (ggml/983) Diego Devesa 2024-10-04 08:41:40 +02:00
f3fdcfaa79

ci : fine-grant permission (#9710) Xuan Son Nguyen 2024-10-04 11:47:19 +02:00
133c7b46b3

Fixed RNG seed docs (#9723) Daniel Kleine 2024-10-04 10:54:44 +02:00
d5ed2b929d

metal : remove abort (skip) (ggml/0) Georgi Gerganov 2024-10-03 21:18:19 +03:00
1bb8a64ebf

sync : ggml Georgi Gerganov 2024-10-03 21:17:49 +03:00
fabdc3bda3

ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) Johannes Gäßler 2024-10-03 17:29:59 +02:00
eee39bdc96

ggml: refactor cross entropy loss CPU impl. (ggml/976) Johannes Gäßler 2024-10-02 15:32:39 +02:00
5d5ab1e5cc

metal : fix compute pass descriptor autorelease crash (#9718) Jack Mousseau 2024-10-03 11:01:46 -07:00
a7ad553513

ggml-backend : add device description to CPU backend (#9720) Diego Devesa 2024-10-03 17:39:18 +02:00
d6fe7abf04

ggml: unify backend logging mechanism (#9709) bandoti 2024-10-03 12:39:03 -03:00
e3c355ba65

convert : handle tokenizer merges format from transformers 4.45 (#9696) compilade 2024-10-03 10:22:15 -04:00
841713e1e4

rpc : enable vulkan (#9714) Radoslav Gerganov 2024-10-03 13:00:52 +03:00
5639971466

Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) Ouadie EL FAROUKI 2024-10-03 07:50:44 +01:00
c83ad6d01e

ggml-backend : add device and backend reg interfaces (#9707) Diego Devesa 2024-10-03 01:49:47 +02:00
a39ab216aa

llama : reduce compile time and binary size (#9712) Xuan Son Nguyen 2024-10-02 15:49:55 +02:00
f536f4c439

[SYCL] Initial cmake support of SYCL for AMD GPUs (#9658) Alberto Cabrera Pérez 2024-10-02 13:57:18 +01:00
00b7317e63

vulkan : do not use tensor->extra (#9407) Radoslav Gerganov 2024-10-02 13:49:16 +03:00
76b37d1541

gguf-split : improve --split and --merge logic (#9619) Zhenwei Jin 2024-10-02 15:21:57 +08:00
148844fe97

examples : remove benchmark (#9704) Georgi Gerganov 2024-10-02 10:14:44 +03:00
3f1ae2e32c

Update README.md (#9591) Paweł Wodnicki 2024-10-01 12:18:46 -05:00
f1b8c42711

sync : ggml Georgi Gerganov 2024-10-01 16:09:42 +03:00
e98c1c188e

test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) Johannes Gäßler 2024-09-30 09:55:23 +02:00
cb00020504

vulkan : mul_mat: fix UB with small warps (ggml/952) Salvatore Mesoraca 2024-09-30 09:14:09 +02:00
6c5322481a

ggml : fix ggml_cast (ggml/973) Borislav Stanimirov 2024-09-30 10:11:41 +03:00
7254cdf7e8

ggml: fix gradient allocation logic (ggml/966) Johannes Gäßler 2024-09-29 23:18:02 +02:00
cad341d889

metal : reduce command encoding overhead (#9698) Georgi Gerganov 2024-10-01 16:00:25 +03:00
a90484c6d9

llama : print correct model type for Llama 3.2 1B and 3B Georgi Gerganov 2024-10-01 11:42:01 +03:00
1927378bcc

convert : refactor rope_freqs generation (#9396) compilade 2024-10-01 02:31:36 -04:00
6f1d9d71f4

Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641) serhii-nakon 2024-09-30 21:57:12 +03:00
511636df0c

ci : reduce severity of unused Pyright ignore comments (#9697) compilade 2024-09-30 14:13:16 -04:00
08a43d05b6

py : update transfomers version (#9694) vb 2024-09-30 17:03:47 +02:00
ace4f4be37

flake.lock: Update (#9680) Georgi Gerganov 2024-09-30 17:48:49 +03:00
8277a817f1

console : utf-8 fix for windows stdin (#9690) Ruchira Hasaranga 2024-09-30 13:53:42 +05:30
c919d5db39

ggml : define missing HWCAP flags (#9684) Georgi Gerganov 2024-09-29 21:18:23 +03:00
d0b1d663e4

sync : ggml Georgi Gerganov 2024-09-29 21:16:07 +03:00
aaa4099925

CUDA: remove bad assert (ggml/972) Johannes Gäßler 2024-09-29 19:56:17 +02:00
641002fba8

vulkan : multithread pipeline creation (ggml/963) Jeff Bolz 2024-09-29 11:50:17 -05:00
0de8b203f1

vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961) Jeff Bolz 2024-09-27 02:58:01 -05:00
544f409b4b

vulkan : argsort barriers must be under uniform control flow (ggml/951) Salvatore Mesoraca 2024-09-26 08:59:42 +02:00