llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

a6803cab94

flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -04:00
7dabc66f3c

make : use pkg-config for OpenBLAS (#2222) wzy 2023-07-15 03:05:08 +08:00
7cdd30bf1f

cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) Bach Le 2023-07-15 03:00:58 +08:00
e8035f141e

ggml : fix static_assert with older compilers #2024 (#2218) Evan Miller 2023-07-14 14:55:56 -04:00
7513b7b0a1

llama : add functions that work directly on model (#2197) Bach Le 2023-07-15 02:55:24 +08:00
de8342423d

build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -07:00
c48c525f87

examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +08:00
206e01de11

cuda : support broadcast add & mul (#2192) Jiahao Li 2023-07-15 02:38:24 +08:00
4304bd3cde

CUDA: mul_mat_vec_q kernels for k-quants (#2203) Johannes Gäßler 2023-07-14 19:44:08 +02:00
229aab351c

make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) James Reynolds 2023-07-14 11:34:40 -06:00
697966680b

ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) Georgi Gerganov 2023-07-14 16:36:41 +03:00
27ad57a69b

Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +03:00
32c5411631

Revert "Support using mmap when applying LoRA (#2095)" (#2206) Howard Su 2023-07-13 21:58:25 +08:00
ff5d58faec

Fix compile error on Windows CUDA (#2207) Howard Su 2023-07-13 21:58:09 +08:00
b782422a3e

devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +02:00
1cbf561466

metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -04:00
975221e954

ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +03:00
4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
4e7464ef88

FP16 is supported in CM=6.0 (#2177) Howard Su 2023-07-12 20:18:40 +08:00
2b5eb72e10

Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) Johannes Gäßler 2023-07-12 10:38:52 +02:00
f7d278faf3

ggml : revert CUDA broadcast changes from #2183 (#2191) Georgi Gerganov 2023-07-12 10:54:19 +03:00
20d7740a9b

ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) Georgi Gerganov 2023-07-11 22:53:34 +03:00
5bf2a27718

ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) Spencer Sutton 2023-07-11 12:31:10 -04:00
c9c74b4e3f

llama : add classifier-free guidance (#2135) Bach Le 2023-07-12 00:18:43 +08:00
3ec7e596b2

docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +09:00
917831c63a

readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -05:00
2347463201

Support using mmap when applying LoRA (#2095) Howard Su 2023-07-11 22:37:01 +08:00
bbef28218f

Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) LostRuins 2023-07-11 22:01:08 +08:00
5656d10599

mpi : add support for distributed inference via MPI (#2099) Evan Miller 2023-07-10 11:49:56 -04:00
1d16309969

llama : remove "first token must be BOS" restriction (#2153) oobabooga 2023-07-09 05:59:53 -03:00
db4047ad5c

main : escape prompt prefix/suffix (#2151) Nigel Bosch 2023-07-09 03:56:18 -05:00
18780e0a5e

readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -03:00
3bbc1a11f0

ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) clyang 2023-07-09 16:12:20 +08:00
2492a53fd0

readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +08:00
64639555ff

Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) Johannes Gäßler 2023-07-08 20:01:44 +02:00
061f5f8d21

CUDA: add __restrict__ to mul mat vec kernels (#2140) Johannes Gäßler 2023-07-08 00:25:15 +02:00
84525e7962

docker : add support for CUDA in docker (#1461) dylan 2023-07-07 11:25:25 -07:00
a7e20edf22

ci : switch threads to 1 (#2138) Georgi Gerganov 2023-07-07 21:23:57 +03:00
1d656d6360

ggml : change ggml_graph_compute() API to not require context (#1999) Qingyou Meng 2023-07-08 00:24:01 +08:00
7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) Georgi Gerganov 2023-07-07 18:36:37 +03:00
3e08ae99ce

convert.py: add mapping for safetensors bf16 (#1598) Aarni Koskela 2023-07-07 16:12:49 +03:00
481f793acc

Fix opencl by wrap #if-else-endif with \n (#2086) Howard Su 2023-07-07 11:34:18 +08:00
dfd9fce6d6

ggml : fix restrict usage Georgi Gerganov 2023-07-06 19:41:31 +03:00
36680f6e40

convert : update for baichuan (#2081) Judd 2023-07-07 00:23:49 +08:00
a17a2683d8

alpaca.sh : update model file name (#2074) tslmy 2023-07-06 09:17:50 -07:00
31cfbb1013

Expose generation timings from server & update completions.js (#2116) Tobias Lütke 2023-07-05 16:51:13 -04:00
983b555e9d

Update Server Instructions (#2113) Jesse Jojo Johnson 2023-07-05 18:03:19 +00:00
ec326d350c

ggml : fix bug introduced in #1237 Georgi Gerganov 2023-07-05 20:44:11 +03:00
1b6efeab82

tests : fix test-grad0 Georgi Gerganov 2023-07-05 20:20:05 +03:00
1b107b8550

ggml : generalize quantize_fns for simpler FP16 handling (#1237) Stephan Walter 2023-07-05 16:13:06 +00:00
8567c76b53

Update server instructions for web front end (#2103) Jesse Jojo Johnson 2023-07-05 15:13:35 +00:00
924dd22fd3

Quantized dot products for CUDA mul mat vec (#2067) Johannes Gäßler 2023-07-05 14:19:42 +02:00
051c70dcd5

llama: Don't double count the sampling time (#2107) Howard Su 2023-07-05 18:31:23 +08:00
9e4475f5cf

Fixed OpenCL offloading prints (#2082) Johannes Gäßler 2023-07-05 08:58:05 +02:00
7f0e9a775e

embd-input: Fix input embedding example unsigned int seed (#2105) Nigel Bosch 2023-07-04 18:33:33 -05:00
b472f3fca5

readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +03:00
ed9a54e512

ggml : sync latest (new ops, macros, refactoring) (#2106) Georgi Gerganov 2023-07-04 21:54:11 +03:00
f257fd2550

Add an API example using server.cpp similar to OAI. (#2009) jwj7140 2023-07-05 03:06:12 +09:00
7ee76e45af

Simple webchat for server (#1998) Tobias Lütke 2023-07-04 10:05:27 -04:00
acc111caf9

Allow old Make to build server. (#2098) Henri Vasserman 2023-07-04 15:38:04 +03:00
23c7c6fc91

Update Makefile: clean simple (#2097) ZhouYuChen 2023-07-04 20:15:16 +08:00
698efad5fb

CI: make the brew update temporarily optional. (#2092) Erik Scholz 2023-07-04 01:50:12 +02:00
14a2cc71f6

[ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +08:00
1cf14ccef1

fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +03:00
cc45a7feb8

Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +08:00
55dbb915cc

[llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +08:00
d7d2e6a0f0

server: add option to output probabilities for completion (#1962) WangHaoranRobin 2023-07-03 05:38:44 +08:00
46088f7231 ggml : fix build with OpenBLAS (close #2066) Georgi Gerganov 2023-07-02 09:46:46 +03:00
0bc2cdfc87

Better CUDA synchronization logic (#2057) Johannes Gäßler 2023-07-01 21:49:44 +02:00
befb3a3562

Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +02:00
b213227067

cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +02:00
2f8cd979ec

metal : release buffers when freeing metal context (#2062) Aaron Miller 2023-07-01 11:14:59 -07:00
471aab6e4c

convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +08:00
463f2f4c4f

llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +03:00
cb44dbc7de

llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +08:00
79f634a19d

embd-input : fix returning ptr to temporary Georgi Gerganov 2023-07-01 18:46:00 +03:00
04606a1599

train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +03:00
b1ca8f36a9

ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +08:00
b8c8dda75f

Use unsigned for random seed (#2006) Howard Su 2023-06-29 21:15:15 +08:00
96a712ca1b

Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +08:00
d3494bb86b

llama : replacing auto &kv with const auto &kv (#2041) m3ndax 2023-06-28 20:39:08 +02:00
5b351e94d0

cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) Salvador E. Tropea 2023-06-28 14:27:31 -03:00
6432aabb6d

cuda : fix missing const qualifier in casts (#2027) Salvador E. Tropea 2023-06-28 14:26:26 -03:00
b922bc351b

llama : remove shards weight file support (#2000) Howard Su 2023-06-28 10:13:02 -07:00
7f9753fa12

CUDA GPU acceleration for LoRAs + f16 models (#1970) Johannes Gäßler 2023-06-28 18:35:54 +02:00
cfa0750bc9

llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +08:00
9d23589d63

fix pthreads setaffinity usage on android (#2020) Erik Scholz 2023-06-27 19:06:33 +02:00
0be54f75a6

baby-llama : fix build after ggml_rope change (#2016) Howard Su 2023-06-27 13:07:13 +08:00
181e8d9755

llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +03:00
d9779021bd

ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +03:00
d38e451578

readme : add Scala 3 bindings repo (#2010) Roman Parykin 2023-06-26 22:47:59 +03:00
eaa6ca5a61

ggml : increase max tensor name + clean up compiler warnings in train-text (#1988) David Yang 2023-06-27 03:45:32 +08:00
aa777abbb7

readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007) Gustavo Rocha Dias 2023-06-26 16:34:45 -03:00
c824d2e368

ggml : avoid conv 2d kernel round up Georgi Gerganov 2023-06-26 21:03:59 +03:00
b853d45601

ggml : add NUMA support (#1556) zrm 2023-06-26 13:57:59 -04:00
9225baef71

k-quants : fix indentation Georgi Gerganov 2023-06-26 20:10:52 +03:00
a84ab1da8d

tests : fix quantize perf (#1990) katsu560 2023-06-27 01:47:02 +09:00
5743ca8092

k-quants : add AVX support to dot functions (#1916) katsu560 2023-06-27 01:46:07 +09:00
412c60e473

readme : add link to new k-quants for visibility Georgi Gerganov 2023-06-26 19:45:09 +03:00