llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

905d87b70a

ggml : GPU-accelerated token generation (#1412) Johannes Gäßler 2023-05-13 15:38:36 +02:00
f954edda93

ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360) xaedes 2023-05-13 14:56:40 +02:00
f048af0230

ggml : sync alibi fix from ggml repo Georgi Gerganov 2023-05-13 11:54:33 +03:00
ac0cd259d5

Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413) 3ooabkhxtn 2023-05-13 10:43:33 +02:00
0cd22e190a

llama : fix various warnings Georgi Gerganov 2023-05-13 11:23:15 +03:00
6456a4eb9f

embedding : remove unused code (#1426) Rinne 2023-05-13 15:24:20 +08:00
cdd5350892

readme : update Q4_0 perplexities Georgi Gerganov 2023-05-13 09:12:44 +03:00
738ace394a

llama : free ggml context in set / copy state data (close #1425) Georgi Gerganov 2023-05-13 09:08:52 +03:00
699b1ad7fe

opencl : fix kernels for the new formats (#1422) Henri Vasserman 2023-05-13 09:01:15 +03:00
fb62f92433

llama : fix --mtest option (close #1414) Georgi Gerganov 2023-05-12 21:44:20 +03:00
773ee249fb

CLI args use - instead of _, backwards compatible (#1416) Johannes Gäßler 2023-05-12 16:34:55 +02:00
553fd4d4b5

Add clang-tidy reviews to CI (#1407) slaren 2023-05-12 15:40:53 +02:00
089b1c93ba

readme : add C#/.NET bindings repo (#1409) Rinne 2023-05-12 13:39:40 +08:00
b9fd7eee57

ggml : remove bit shuffling (#1405) Georgi Gerganov 2023-05-12 00:23:08 +03:00
b608b55a3e

prompts : model agnostic DAN (#1304) CRD716 2023-05-11 10:10:19 -05:00
cf348a60e0

main : add option to save full output to session (#1338) Evan Jones 2023-05-10 11:37:14 -04:00
e6a46b0ed1

Locale fix for Windows (#1379) DannyDaemonic 2023-05-09 10:53:28 -07:00
9f8dbc4787

use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314) Sami Farin 2023-05-09 15:29:20 +03:00
41654efea8

Interface improvements and --multiline-input (previously --author-mode) (#1040) DannyDaemonic 2023-05-08 19:45:48 -07:00
56551bc11f

readme : add notice about upcoming breaking change Georgi Gerganov 2023-05-08 22:52:18 +03:00
fe60904eef

readme : add TOC and Pygmalion instructions (#1359) AlpinDale 2023-05-08 21:03:30 +04:30
003ba2fb43

llama : fix hparams shadow (#1367) Pavol Rusnak 2023-05-08 16:48:21 +02:00
f9a6364912

llama : require first token to be BOS (#1303) Georgi Gerganov 2023-05-08 17:41:54 +03:00
95078cc554

convert: add ability to convert safetensors files (#1276) ubik2 2023-05-08 04:54:26 -07:00
1f48b0abcf

Documented CUDA reproducibility, added warning (#1346) Johannes Gäßler 2023-05-08 02:42:01 +02:00
e1295513a4

CI: add Windows CLBlast and OpenBLAS builds (#1277) Henri Vasserman 2023-05-07 14:20:09 +03:00
1b0fd45465

ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336) swittk 2023-05-07 10:03:23 +07:00
3924088512

Remove default arguments from sampling functions (#1343) Jed Fox 2023-05-06 17:01:47 -04:00
173d0e6419

makefile: automatic Arch Linux detection (#1332) DaniAndTheWeb 2023-05-05 23:57:14 +02:00
a3b85b28da

ci : add cublas to windows release (#1271) Erik Scholz 2023-05-05 22:56:09 +02:00
921dcee00a

readme: add missing info (#1324) Pavol Rusnak 2023-05-05 16:43:36 +02:00
2d13786e91

Fix for OpenCL / clbast builds on macOS. (#1329) Ionoclast Laboratories 2023-05-05 08:18:21 -04:00
a90e96b266

Convert.py @staticmethod (#1327) Benjamin Lecaillon 2023-05-05 02:17:07 +02:00
94c5652fc0

quantize: make output filename optional, default to ggml-model-<ftype>.bin (#1301) slaren 2023-05-05 00:58:56 +02:00
34d9f22f44

Wrap exceptions in std::exception to verbose output on exception. (#1316) Ivan Stepanov 2023-05-04 19:56:27 +03:00
d3e8093e9b

convert: support DT_BF16 tensors (#1309) Ivan Stepanov 2023-05-04 19:54:37 +03:00
360cfe5bec

readme : add OpenBuddy link (#1321) 44670 2023-05-05 00:33:31 +08:00
2edbdb0f99

main : add --in-suffix option (#1318) 44670 2023-05-04 23:41:12 +08:00
20fbf2a2a0

ggml : change immintrin.h to intrin.h for compatibility (#1307) Ron Jailall 2023-05-04 11:05:59 -04:00
db1080876a

Only escape prompts when used with -e (#1311) DannyDaemonic 2023-05-04 05:08:25 -07:00
c65a7fbfa9

Update main's README.md with new features (#1296) DannyDaemonic 2023-05-04 03:02:59 -07:00
f647ce040f

fix #1224 reverse prompt and multi line (#1297) Tomas 2023-05-04 17:02:30 +07:00
799fdc1b5d

ggml : vectorize Q8_0 quantization Georgi Gerganov 2023-05-03 23:24:20 +03:00
6daa09d879

examples : read chat prompts from a template file (#1196) khimaros 2023-05-03 10:58:11 -07:00
bca9ad938a

minor : fix whitespaces (#1302) Georgi Gerganov 2023-05-03 20:09:42 +03:00
e2a937ca6a

minor : fix trailing whitespaces Georgi Gerganov 2023-05-03 18:43:23 +03:00
b0c71c7b6d

scripts : platform independent script to verify sha256 checksums (#1203) KASR 2023-05-03 17:31:28 +02:00
a8a2efdc81

examples : various prompt and example fixes (#1298) CRD716 2023-05-03 10:26:47 -05:00
e216aa0463

llama : only copy used KV cache in get / set state (#1272) Evan Jones 2023-05-02 22:26:13 -04:00
2485d7a4d3

Process escape sequences given in prompts (#1173) DannyDaemonic 2023-05-02 18:46:20 -07:00
13b0c68ed7

Handle signals properly on Windows (#1123) DannyDaemonic 2023-05-02 18:01:57 -07:00
55bc5f0900

Call sh on build-info.sh (#1294) DannyDaemonic 2023-05-02 17:52:35 -07:00
9daff419f6

fix build-info.h for git submodules (#1289) kuvaus 2023-05-03 03:43:43 +03:00
bf4b22ffe4

fix missing parameters in llama_init_from_gpt_params (#1293) slaren 2023-05-03 01:36:45 +02:00
67c77799e0

examples : add llama_init_from_gpt_params() common function (#1290) Ron Evans 2023-05-02 22:39:51 +02:00
0e6cbff1b7

llama : fix compile warnings Georgi Gerganov 2023-05-02 23:09:08 +03:00
5d5817ca60

ggml : fix 32-bit ARM Georgi Gerganov 2023-05-02 22:14:50 +03:00
8c9be35ff9

examples : improve vertical alignment of a few variables (#1286) Ron Evans 2023-05-02 19:53:52 +02:00
cc0bb7235c

ggml : fix ppc64le build error and make cmake detect Power processors (#1284) Marvin Gießing 2023-05-02 18:42:16 +02:00
2bb992f034

llama : allow 0 as a seed number. (#1275) Robert Brisita 2023-05-02 12:23:44 -04:00
e2cd506999

main : switch input_noecho to input_echo to remove negation (#979) Ron Evans 2023-05-02 18:13:26 +02:00
2d099e5193

ggml: add names to tensors (#1268) slaren 2023-05-02 16:03:00 +02:00
f4cef87edf

Add git-based build information for better issue tracking (#1232) DannyDaemonic 2023-05-01 09:23:47 -07:00
58b367c2d7

cuBLAS: refactor and optimize f16 mat mul performance (#1259) slaren 2023-05-01 18:11:07 +02:00
ea3a0ad6b6

llama : update stubs for systems without mmap and mlock (#1266) xloem 2023-05-01 08:58:51 -04:00
2bdc09646d

ggml : fix ggml_used_mem() (#1264) Kerfuffle 2023-05-01 05:56:07 -06:00
70269cae37

llama : fix session load / save (#1263) Georgi Gerganov 2023-05-01 14:54:59 +03:00
b925f1f1b0

cuBLAS: fall back to pageable memory if pinned alloc fails (#1233) slaren 2023-05-01 13:32:22 +02:00
90b19bd6ee

llama : let context be const when accessing const data (#1261) Alex Klinkhamer 2023-05-01 00:24:20 -07:00
7ff0dcd320

ggml : fix UB (int << 31) Georgi Gerganov 2023-04-30 22:28:51 +03:00
6f79699286

build: add armv{6,7,8} support to cmake (#1251) Pavol Rusnak 2023-04-30 20:48:38 +02:00
a5d30b1f53

common : better default number of threads (#934) jon-chuang 2023-04-30 14:41:35 -04:00
76a884920a

ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225) 0cc4m 2023-04-30 20:34:52 +02:00
6bc4400e67

ggml : add Q5 WASM SIMD + GGML_FTYPE Georgi Gerganov 2023-04-30 19:07:00 +03:00
f0d70f147d

Various fixes to mat_mul benchmark (#1253) Stephan Walter 2023-04-30 12:32:37 +00:00
3e5aa8a1c4

ggml : fix labels for GGML_OP_ALIBI Georgi Gerganov 2023-04-30 10:25:46 +03:00
c3ca7a5f05

ggml : fix 32-bit ARM NEON Georgi Gerganov 2023-04-29 21:34:23 +03:00
e8c051611a

ggml : use vzip instead of vuzp for consistency Georgi Gerganov 2023-04-29 21:12:56 +03:00
0b5a935099

ggml : fix visibility and unused warnings Georgi Gerganov 2023-04-29 19:28:36 +03:00
ec728e44d7

ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) Georgi Gerganov 2023-04-29 18:43:42 +03:00
214b6a3570

ggml : adjust mul_mat_f16 work memory (#1226) Georgi Gerganov 2023-04-29 18:43:28 +03:00
305eb5afd5

build : fix reference to old llama_util.h Georgi Gerganov 2023-04-29 13:53:12 +03:00
84ca9c2ecf

examples : fix save-load-state + rename llama-util.h Georgi Gerganov 2023-04-29 13:48:11 +03:00
334637e43e

common : change default parameters to pre-#1126 (#1223) Georgi Gerganov 2023-04-29 09:51:06 +03:00
dd7eff57d8

llama : new sampling algorithms (#1126) Ivan Stepanov 2023-04-29 08:34:41 +03:00
7fc50c051a

cuBLAS: use host pinned memory and dequantize while copying (#1207) slaren 2023-04-29 02:04:18 +02:00
b1ee8f59b4

cuBLAS: non-contiguous tensor support (#1215) Henri Vasserman 2023-04-29 02:31:56 +03:00
36d19a603b

Remove Q4_3 which is no better than Q5 (#1218) Stephan Walter 2023-04-28 23:10:43 +00:00
7f15c5c477

readme : update hot topics Georgi Gerganov 2023-04-28 21:32:52 +03:00
55390bcaf2

ggml : sync ggml (ggml_alibi) Georgi Gerganov 2023-04-28 20:37:43 +03:00
5fba3c016b

examples : add Jeopardy example (#1168) CRD716 2023-04-28 11:13:33 -05:00
1481a9cf25

llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -04:00
11d902364b

ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +03:00
7296c961d9

ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +02:00
78ec543733

Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +05:00
92a6e13a31

Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +02:00
04aaae1d79

add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +08:00
0b2da20538

ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +00:00
f9be42add0

readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +03:00
574406dc7e

ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +03:00