llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

87a6f846d3

Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +00:00
ea3ad7eb60

Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +02:00
859fee6dfb

quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +02:00
4afcc37869

Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +00:00
667c501334

py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +02:00
bb98e77be7

nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +02:00
7a32fcb3b2

ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +03:00
dd0eabc049

ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +02:00
54bb60e268

ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +02:00
8a0f8673ba

ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +03:00
0c5692345d

examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +02:00
957c8ae21d

llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +03:00
9b0a4d4214

examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +02:00
2ec83428de

Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +00:00
e4cf982e0d

Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +02:00
c4fe84fb0d

llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +03:00
1d78fecdab

Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +02:00
284685f169

scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +03:00
edce63baa9

Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -07:00
ec9cdb6752

ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +03:00
e4422e299c

ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +03:00
53c8434398

Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +00:00
c6524f46eb

readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +02:00
c9e2c26f41

A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +08:00
0e018fe008

ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +03:00
857308d1e8

ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +00:00
c50b628810

Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +00:00
5f939498d5

ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +02:00
36b4f7e064

llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +08:00
10f19c1121

llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -04:00
7e312f165c

cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +08:00
872c365a91 ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +03:00
955ef9a5d5

ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +03:00
c5aa5e5777

ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +00:00
e9a9cb0c54

examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -04:00
b6e7f9b09e

llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +02:00
50cb666b8a

Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +02:00
25d7abbd1f

llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -05:00
018f2279f5

cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +08:00
9411288271

main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -07:00
8687c1f258

llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +02:00
1bfc153e2f

ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +02:00
3d59769c3b

Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +02:00
d40fded93e

llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +03:00
2510c1831f

Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +00:00
12b5900dbc

ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +03:00
9ff334f3c9

ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +03:00
2005469ea1

Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +02:00
8a1756abdf

ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +03:00
66aab46079

ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +03:00
38de86a711

llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +02:00
e0305ead3a

ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +03:00
6a9661ea5a

ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +02:00
5addcb120c

fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +08:00
c8c2c52482

AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +00:00
02d6988121

Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +02:00
834695fe3a

Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -05:00
f7d05095b4

Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +02:00
884e7d7a2b

ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +03:00
7cd5c4a3e9

readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +03:00
f3d4edf504

ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +00:00
8944a13296

Add NVIDIA cuBLAS support (#1044) slaren 2023-04-19 11:22:45 +02:00
6667401238

Multi-threaded ggml_cpy (#1035) slaren 2023-04-19 00:53:24 +02:00
77a73403ca

ggml : add new Q4_2 quantization (ARM only) (#1046) Georgi Gerganov 2023-04-18 23:54:57 +03:00
50a8a2af97

ggml : scratch that - vmlaq_n_f32 is always better Georgi Gerganov 2023-04-18 23:11:23 +03:00
4caebf6d40

gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +03:00
dcdd65e296

ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators Georgi Gerganov 2023-04-18 22:59:17 +03:00
5ecff35151

Adding a simple program to measure speed of dot products (#1041) Kawrakow 2023-04-18 21:00:14 +02:00
7faa7460f0

readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +03:00
5af8e32238

ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +03:00
42747220b4

Do not close file after mmap (Windows version) (#1034) Ivan Komarov 2023-04-18 03:15:50 +02:00
e9298af389

readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +09:00
4ad73137a1

add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -07:00
315a95a4d3

Add LoRA support (#820) slaren 2023-04-17 17:28:55 +02:00
efd05648c8

llama : well-defined static initialization of complex objects (#927) Arik Poznanski 2023-04-17 17:41:53 +03:00
eb17a026fd

quantize-stats : fix bug in --type argument Georgi Gerganov 2023-04-17 17:31:06 +03:00
69b740289f

ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c Georgi Gerganov 2023-04-17 16:16:23 +03:00
f266259ad9

Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) Ivan Komarov 2023-04-17 15:10:57 +02:00
47f61aaa5f

Fix: do not close file on mmap (#1017) slaren 2023-04-16 21:27:38 +02:00
3173a62eb9

stdout : vertical align outputs for better readibility Georgi Gerganov 2023-04-16 13:58:48 +03:00
489537e6cf

examples: add missing <ctime> include for time() (#1011) Pavol Rusnak 2023-04-16 12:13:00 +02:00
2d3481c721

Fix msys2 build error and warnings (#1009) nanahi 2023-04-16 17:13:42 +08:00
74f5899df4

convert.py: Fix loading safetensors and ggml format on Windows (#991) comex 2023-04-15 14:53:21 -07:00
2f7c8e014e

Fix potential int8 overflow in non-SIMD vec_dot (#986) Stephan Walter 2023-04-15 18:28:56 +00:00
0ad964631f

Refactor ggml.c for future tensor types (#1001) Stephan Walter 2023-04-15 16:25:38 +00:00
e95b6554b4

ggml : add Q8_0 quantization for intermediate results (#951) Georgi Gerganov 2023-04-15 17:53:22 +03:00
aa485cee33

ggml : use posix_memalign on non-Windows env Georgi Gerganov 2023-04-15 14:25:45 +03:00
c12b14b77f

benchmark : fix result validation in benchmark-q4_0-matmult (#987) Ivan Komarov 2023-04-15 07:51:54 +02:00
106faaf297

cmake : add finding the OpenBLAS header file (#992) katsu560 2023-04-15 14:51:11 +09:00
c85e03d12e

Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982) Pavol Rusnak 2023-04-14 21:58:43 +02:00
489093548c

py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976) Pavol Rusnak 2023-04-14 21:46:49 +02:00
93265e988a

make : fix dependencies, use auto variables (#983) Stephan Walter 2023-04-14 19:39:48 +00:00
c56b715269

Expose type name from ggml (#970) Pavol Rusnak 2023-04-14 20:05:37 +02:00
f4d277ae17

main : alternative instruct mode (Vicuna support, etc.) (#863) Tomáš Pazdiora 2023-04-14 17:19:17 +02:00
c9a59b70a5

ggml : add unary and binary map operations (#874) Kerfuffle 2023-04-14 08:43:55 -06:00
a32f7acc9f

py : cleanup dependencies (#962) Pavol Rusnak 2023-04-14 15:37:11 +02:00
43ffdefb74

py : fix flake8 and isort nitpicks (#960) Pavol Rusnak 2023-04-14 14:23:21 +02:00
1623a6e9b4

ggml : minor Georgi Gerganov 2023-04-14 13:31:29 +03:00
c14e0d2f23

ggml : always allocate buffers with size multiple of GGML_MEM_ALIGN Georgi Gerganov 2023-04-14 13:31:15 +03:00
723dac55fa

py : new conversion script (#545) comex 2023-04-14 00:03:03 -07:00