llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

6769e944c7

k-quants : support for super-block size of 64 (#2001) Kawrakow 2023-06-26 19:43:07 +03:00
cbebf61ca7

Fix assert when free invalid cuda pointer (#2005) Howard Su 2023-06-26 23:15:47 +08:00
447ccbe8c3

readme : add new roadmap + manifesto Georgi Gerganov 2023-06-25 16:08:12 +03:00
bd34cdde38

ggml : sync latest ggml (custom operators) Georgi Gerganov 2023-06-25 14:25:08 +03:00
c2a08f87b8

fix server sampling: top k sampler first (#1977) anon998 2023-06-25 08:48:36 +00:00
66a2555ba6

readme : add Azure CI discussion link Georgi Gerganov 2023-06-25 09:07:03 +03:00
e65ca7e14a

zig : upgrade build system support (#1981) sjinzh 2023-06-25 13:45:44 +08:00
5ec8dd5a3c

#1869 Fix null reference errors when training from scratch with CUDA (#1907) Robyn 2023-06-25 04:10:29 +10:00
65bdd52a86

tests : sync test-grad0 from ggml Georgi Gerganov 2023-06-24 19:40:18 +03:00
fdd1860911

flake : fix ggml-metal.metal path and run nixfmt (#1974) Rowan Hart 2023-06-24 04:07:08 -07:00
c943d823c1

convert : fix invalid params in write_vocab_only (#1975) AN Long 2023-06-24 19:02:06 +08:00
f2c754e1c3

ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978) slaren 2023-06-24 12:57:18 +02:00
11da1a85cd

readme : fix whitespaces Georgi Gerganov 2023-06-24 13:38:18 +03:00
235b610d65

readme : fixed termux instructions (#1973) Alberto 2023-06-24 12:32:13 +02:00
b061ba9e2a

llama : fix top-p sampling to match the canonical definition (#1953) Alex Renda 2023-06-24 03:15:01 -07:00
527b6fba1d

llama : make model stateless and context stateful (llama_state) (#1797) Didzis Gosko 2023-06-24 11:47:58 +03:00
d7b7484f74

Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -04:00
7487137227

rework convert.py to read hyper-parameters from config.json (#1958) Erik Scholz 2023-06-22 14:20:47 +02:00
bbca06e269

cmake: revert CUDA arch default to 52, 61 if f16 (#1959) Johannes Gäßler 2023-06-21 23:49:25 +02:00
fb98254f99

Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +05:30
049aa16b8c

readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +03:00
2322ec223a

Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -07:00
aacdbd4056

llama : fix params struct slignment (#1936) Ettore Di Giacinto 2023-06-20 03:24:39 +02:00
20568fe60f

[Fix] Reenable server embedding endpoint (#1937) Henri Vasserman 2023-06-20 01:12:39 +03:00
18b35625c3

ggml : fix bug in LBFGS optimizer (found by ggml tests) Georgi Gerganov 2023-06-19 20:43:30 +03:00
ba4e85a833

llama : use aligned memory during ggml_init call from loading saved sessions (#1934) l3utterfly 2023-06-19 23:20:06 +08:00
23fc5c219a

cmake : fix trailing whitespaces Georgi Gerganov 2023-06-19 18:18:34 +03:00
cb40dfca69

llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932) Kawrakow 2023-06-19 18:17:03 +03:00
ca7c3f4da5

cuda : faster k-quants on older GPUs (#1930) Kawrakow 2023-06-19 18:14:09 +03:00
b97ca431db

ggml : sync latest ggml repo (#1924) Georgi Gerganov 2023-06-19 18:12:33 +03:00
1e3abfcef0

cmake : fix build shared ggml when CUDA is enabled (#1929) Howard Su 2023-06-19 23:10:37 +08:00
16b9cd1939

Convert vector to f16 for dequantize mul mat vec (#1913) Johannes Gäßler 2023-06-19 10:23:56 +02:00
b24c3049d9

Added tokens per second to info prints (#1928) Johannes Gäßler 2023-06-18 17:41:26 +02:00
0ede372a51

Fixed incorrectly applying RMS norm twice (#1925) Johannes Gäßler 2023-06-18 16:07:09 +02:00
8596af4277

ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918) l3utterfly 2023-06-18 19:19:16 +08:00
e1886cf4fe

readme : update Android build instructions (#1922) Mike 2023-06-18 16:28:26 +08:00
8ab8ba62eb

llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921) Kawrakow 2023-06-18 11:13:43 +03:00
90cc59d6ab

examples : fix examples/metal (#1920) Kawrakow 2023-06-18 10:52:10 +03:00
ce2c7d72e2

metal : handle buffers larger than device's maxBufferLength (#1826) Georgi Gerganov 2023-06-18 09:09:47 +03:00
57cd69460f

cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917) Howard Su 2023-06-18 12:29:47 +08:00
b2416493ab

make : do not print help for simple example Georgi Gerganov 2023-06-17 20:55:03 +03:00
4f9c43e3bd

minor : warning fixes Georgi Gerganov 2023-06-17 20:24:11 +03:00
2c9380dd2f

Only one CUDA stream per device for async compute (#1898) Johannes Gäßler 2023-06-17 19:15:02 +02:00
051e1b0e6a

llama : fix kv_cache n init (close #1903) Georgi Gerganov 2023-06-17 19:30:22 +03:00
86c7571864

make : update for latest Arch (#1701) DaniAndTheWeb 2023-06-17 18:17:22 +02:00
3d59ec5935

ggml : fix warnings under MSVC (#1908) Howard Su 2023-06-17 23:46:15 +08:00
0711a5f6dc

metal : add norm, cpy f16->f16, alibi kernels (#1823) Aaron Miller 2023-06-17 07:37:49 -07:00
fc45a81bc6

exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc (#1863) Faez Shakil 2023-06-17 17:13:05 +05:00
794db3e7b9

Server Example Refactor and Improvements (#1570) Randall Fitzgerald 2023-06-17 07:53:04 -04:00
5ddf7ea1fb

hooks : setting up flake8 and pre-commit hooks (#1681) Jiří Podivín 2023-06-17 12:32:48 +02:00
bac19927c3

readme : alternative way to build for Android with CLBlast. (#1828) Gustavo Rocha Dias 2023-06-17 06:01:06 -03:00
b4c6f46f17

Allow cmake to build ggml as a library (#1896) Kerfuffle 2023-06-17 01:49:42 -06:00
92f20d9942

train : get raw text instead of page with html (#1905) David Yang 2023-06-17 14:51:54 +08:00
d411968e99

opencl : support k-quants (#1836) 0cc4m 2023-06-16 20:59:49 +02:00
b41b4cad6f

examples : add "simple" (#1840) SuperUserNameMan 2023-06-16 20:58:09 +02:00
13fe9d2d84

cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886) Zenix 2023-06-17 03:53:04 +09:00
ac3b886953

llama : fix embd when offloading non-repeating layers (#1891) Johannes Gäßler 2023-06-16 20:25:51 +02:00
5b9ccaf104

Fixed possible macro redefinition (#1892) FrankHB 2023-06-17 02:25:01 +08:00
9cbf50c041

build : fix and ignore MSVC warnings (#1889) Borislav Stanimirov 2023-06-16 21:23:53 +03:00
3d01122610

CUDA : faster k-quant dot kernels (#1862) Kawrakow 2023-06-16 20:08:44 +03:00
602c748863

gitignore : add several entries specific to Visual Studio (#1888) Borislav Stanimirov 2023-06-16 09:58:11 +03:00
a09f9195be

Fixed CUDA runtime version check (#1879) Johannes Gäßler 2023-06-15 21:49:08 +02:00
bed9275617

cmake : remove whitespaces Georgi Gerganov 2023-06-15 21:56:50 +03:00
c36e81da62

examples : add chat-vicuna.sh (#1854) yangli2 2023-06-15 11:05:53 -07:00
3559433fec

cmake : set include path for OpenBlas (#1830) Igor Okulist 2023-06-15 12:51:26 -05:00
69b34a0e80

swift : Package compile breaks due to ggml-metal.metal (#1831) Frederik Vogel 2023-06-16 02:47:04 +09:00
cf267d1c71

make : add train-text-from-scratch (#1850) daboe01 2023-06-15 19:42:48 +02:00
9dda13e5e1

readme : server compile flag (#1874) Srinivas Billa 2023-06-15 18:36:38 +01:00
37e257c48e

make : clean *.so files (#1857) sandyiscool 2023-06-15 23:06:06 +05:30
64cc19b4fe

Fix the validation of main device (#1872) Howard Su 2023-06-16 01:29:59 +08:00
4bfcc855ab

metal : parallel command buffer encoding (#1860) Georgi Gerganov 2023-06-15 20:29:48 +03:00
6b8312e797

Better error when using both LoRA + GPU layers (#1861) Johannes Gäßler 2023-06-15 19:06:46 +02:00
254a7a7a5f

CUDA full GPU acceleration, KV cache in VRAM (#1827) Johannes Gäßler 2023-06-14 19:47:19 +02:00
9254920265

baby-llama : fix operator!= (#1821) 0xspringtime 2023-06-13 15:37:54 -04:00
e32089b2c2

train : improved training-from-scratch example (#1652) xaedes 2023-06-13 21:04:40 +02:00
2347e45e7b

llama : do a warm-up eval at start for better timings (#1824) Georgi Gerganov 2023-06-13 20:20:07 +03:00
74d4cfa343

Allow "quantizing" to f16 and f32 (#1787) Kerfuffle 2023-06-13 04:23:23 -06:00
74a6d922f1

Metal implementation for all k_quants (#1807) Kawrakow 2023-06-12 22:39:21 +03:00
e4caa8da59

ci : run when changing only the CUDA sources (#1800) slaren 2023-06-12 19:12:47 +02:00
58970a4c39

Leverage mmap for offloading tensors to GPU (#1597) Howard Su 2023-06-12 20:44:16 +08:00
8c0a10e64d

metal : fix failure to load model (#1817) Kawrakow 2023-06-12 14:31:36 +03:00
fa84c4b3e8

Fix issue where interactive mode crashes when input exceeds ctx size (#1789) Kerfuffle 2023-06-11 08:19:17 -06:00
12b063f0ec

Fixed WSL cuda's OOM error (#1594) Kyle Liang 2023-06-11 21:20:52 +08:00
31d2b5f4a4

Update SHA256SUMS with current hashes for models quantized using q4_0 (#1798) Ryan Landay 2023-06-11 17:38:53 +08:00
4de0334f5c

cmake : fix Metal build (close #1791) Georgi Gerganov 2023-06-10 22:56:53 +03:00
3f1223155a

k-quants : GCC12 compilation fix (#1792) Artyom Lebedev 2023-06-10 22:51:36 +03:00
303f5809f1

metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782) Andrei 2023-06-10 10:47:34 -04:00
059e99066d

doc : fix wrong address of BLIS.md (#1772) Aisuko 2023-06-11 00:08:11 +10:00
17c10acfb4

ggml : force no_alloc == false when creating opt tensors (close #1699) Georgi Gerganov 2023-06-10 12:06:45 +03:00
e9b66ee982

metal : add Q4_1 implementation (#1785) Kawrakow 2023-06-10 11:28:11 +03:00
4f0154b0ba

llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) Kerfuffle 2023-06-10 01:59:17 -06:00
ef3171d162

ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) Xingchen Song(宋星辰) 2023-06-10 15:49:40 +08:00
555275a693

make : add SSSE3 compilation use case (#1659) rankaiyx 2023-06-10 14:41:59 +08:00
98ed165574

OpenCL: Add release memory (#1741) Robert Sung-wook Shin 2023-06-10 01:24:40 +09:00
ae9663f188

Windows nvcc workaround (#1753) Johannes Gäßler 2023-06-09 13:58:15 +02:00
b33dee282f

metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +03:00
92f44ff7f7

metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -04:00
245fc3c37d

metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +03:00
72ff5282bf

metal : add Q2_K implementation (#1762) Kawrakow 2023-06-08 22:28:21 +03:00
0bf7cf1b29

Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)" Georgi Gerganov 2023-06-08 20:48:14 +03:00