Commit graph

  • 905d87b70a
    ggml : GPU-accelerated token generation (#1412) Johannes Gäßler 2023-05-13 15:38:36 +02:00
  • f954edda93
    ggml : implement backward pass for llama + small training-llama-from-scratch example (#1360) xaedes 2023-05-13 14:56:40 +02:00
  • f048af0230
    ggml : sync alibi fix from ggml repo Georgi Gerganov 2023-05-13 11:54:33 +03:00
  • ac0cd259d5
    Adding SSE instructions to ggml_vec_dot_q4_0_q8_0 (#1413) 3ooabkhxtn 2023-05-13 10:43:33 +02:00
  • 0cd22e190a
    llama : fix various warnings Georgi Gerganov 2023-05-13 11:23:15 +03:00
  • 6456a4eb9f
    embedding : remove unused code (#1426) Rinne 2023-05-13 15:24:20 +08:00
  • cdd5350892
    readme : update Q4_0 perplexities Georgi Gerganov 2023-05-13 09:12:44 +03:00
  • 738ace394a
    llama : free ggml context in set / copy state data (close #1425) Georgi Gerganov 2023-05-13 09:08:52 +03:00
  • 699b1ad7fe
    opencl : fix kernels for the new formats (#1422) Henri Vasserman 2023-05-13 09:01:15 +03:00
  • fb62f92433
    llama : fix --mtest option (close #1414) Georgi Gerganov 2023-05-12 21:44:20 +03:00
  • 773ee249fb
    CLI args use - instead of _, backwards compatible (#1416) Johannes Gäßler 2023-05-12 16:34:55 +02:00
  • 553fd4d4b5
    Add clang-tidy reviews to CI (#1407) slaren 2023-05-12 15:40:53 +02:00
  • 089b1c93ba
    readme : add C#/.NET bindings repo (#1409) Rinne 2023-05-12 13:39:40 +08:00
  • b9fd7eee57
    ggml : remove bit shuffling (#1405) Georgi Gerganov 2023-05-12 00:23:08 +03:00
  • b608b55a3e
    prompts : model agnostic DAN (#1304) CRD716 2023-05-11 10:10:19 -05:00
  • cf348a60e0
    main : add option to save full output to session (#1338) Evan Jones 2023-05-10 11:37:14 -04:00
  • e6a46b0ed1
    Locale fix for Windows (#1379) DannyDaemonic 2023-05-09 10:53:28 -07:00
  • 9f8dbc4787
    use pause asm insn in busyloop to run the CPU (13600K) 10 °C cooler (#1314) Sami Farin 2023-05-09 15:29:20 +03:00
  • 41654efea8
    Interface improvements and --multiline-input (previously --author-mode) (#1040) DannyDaemonic 2023-05-08 19:45:48 -07:00
  • 56551bc11f
    readme : add notice about upcoming breaking change Georgi Gerganov 2023-05-08 22:52:18 +03:00
  • fe60904eef
    readme : add TOC and Pygmalion instructions (#1359) AlpinDale 2023-05-08 21:03:30 +04:30
  • 003ba2fb43
    llama : fix hparams shadow (#1367) Pavol Rusnak 2023-05-08 16:48:21 +02:00
  • f9a6364912
    llama : require first token to be BOS (#1303) Georgi Gerganov 2023-05-08 17:41:54 +03:00
  • 95078cc554
    convert: add ability to convert safetensors files (#1276) ubik2 2023-05-08 04:54:26 -07:00
  • 1f48b0abcf
    Documented CUDA reproducibility, added warning (#1346) Johannes Gäßler 2023-05-08 02:42:01 +02:00
  • e1295513a4
    CI: add Windows CLBlast and OpenBLAS builds (#1277) Henri Vasserman 2023-05-07 14:20:09 +03:00
  • 1b0fd45465
    ggml : Allow usage of CLBlast alongside Accelerate.framework (#1336) swittk 2023-05-07 10:03:23 +07:00
  • 3924088512
    Remove default arguments from sampling functions (#1343) Jed Fox 2023-05-06 17:01:47 -04:00
  • 173d0e6419
    makefile: automatic Arch Linux detection (#1332) DaniAndTheWeb 2023-05-05 23:57:14 +02:00
  • a3b85b28da
    ci : add cublas to windows release (#1271) Erik Scholz 2023-05-05 22:56:09 +02:00
  • 921dcee00a
    readme: add missing info (#1324) Pavol Rusnak 2023-05-05 16:43:36 +02:00
  • 2d13786e91
    Fix for OpenCL / clbast builds on macOS. (#1329) Ionoclast Laboratories 2023-05-05 08:18:21 -04:00
  • a90e96b266
    Convert.py @staticmethod (#1327) Benjamin Lecaillon 2023-05-05 02:17:07 +02:00
  • 94c5652fc0
    quantize: make output filename optional, default to ggml-model-<ftype>.bin (#1301) slaren 2023-05-05 00:58:56 +02:00
  • 34d9f22f44
    Wrap exceptions in std::exception to verbose output on exception. (#1316) Ivan Stepanov 2023-05-04 19:56:27 +03:00
  • d3e8093e9b
    convert: support DT_BF16 tensors (#1309) Ivan Stepanov 2023-05-04 19:54:37 +03:00
  • 360cfe5bec
    readme : add OpenBuddy link (#1321) 44670 2023-05-05 00:33:31 +08:00
  • 2edbdb0f99
    main : add --in-suffix option (#1318) 44670 2023-05-04 23:41:12 +08:00
  • 20fbf2a2a0
    ggml : change immintrin.h to intrin.h for compatibility (#1307) Ron Jailall 2023-05-04 11:05:59 -04:00
  • db1080876a
    Only escape prompts when used with -e (#1311) DannyDaemonic 2023-05-04 05:08:25 -07:00
  • c65a7fbfa9
    Update main's README.md with new features (#1296) DannyDaemonic 2023-05-04 03:02:59 -07:00
  • f647ce040f
    fix #1224 reverse prompt and multi line (#1297) Tomas 2023-05-04 17:02:30 +07:00
  • 799fdc1b5d
    ggml : vectorize Q8_0 quantization Georgi Gerganov 2023-05-03 23:24:20 +03:00
  • 6daa09d879
    examples : read chat prompts from a template file (#1196) khimaros 2023-05-03 10:58:11 -07:00
  • bca9ad938a
    minor : fix whitespaces (#1302) Georgi Gerganov 2023-05-03 20:09:42 +03:00
  • e2a937ca6a
    minor : fix trailing whitespaces Georgi Gerganov 2023-05-03 18:43:23 +03:00
  • b0c71c7b6d
    scripts : platform independent script to verify sha256 checksums (#1203) KASR 2023-05-03 17:31:28 +02:00
  • a8a2efdc81
    examples : various prompt and example fixes (#1298) CRD716 2023-05-03 10:26:47 -05:00
  • e216aa0463
    llama : only copy used KV cache in get / set state (#1272) Evan Jones 2023-05-02 22:26:13 -04:00
  • 2485d7a4d3
    Process escape sequences given in prompts (#1173) DannyDaemonic 2023-05-02 18:46:20 -07:00
  • 13b0c68ed7
    Handle signals properly on Windows (#1123) DannyDaemonic 2023-05-02 18:01:57 -07:00
  • 55bc5f0900
    Call sh on build-info.sh (#1294) DannyDaemonic 2023-05-02 17:52:35 -07:00
  • 9daff419f6
    fix build-info.h for git submodules (#1289) kuvaus 2023-05-03 03:43:43 +03:00
  • bf4b22ffe4
    fix missing parameters in llama_init_from_gpt_params (#1293) slaren 2023-05-03 01:36:45 +02:00
  • 67c77799e0
    examples : add llama_init_from_gpt_params() common function (#1290) Ron Evans 2023-05-02 22:39:51 +02:00
  • 0e6cbff1b7
    llama : fix compile warnings Georgi Gerganov 2023-05-02 23:09:08 +03:00
  • 5d5817ca60
    ggml : fix 32-bit ARM Georgi Gerganov 2023-05-02 22:14:50 +03:00
  • 8c9be35ff9
    examples : improve vertical alignment of a few variables (#1286) Ron Evans 2023-05-02 19:53:52 +02:00
  • cc0bb7235c
    ggml : fix ppc64le build error and make cmake detect Power processors (#1284) Marvin Gießing 2023-05-02 18:42:16 +02:00
  • 2bb992f034
    llama : allow 0 as a seed number. (#1275) Robert Brisita 2023-05-02 12:23:44 -04:00
  • e2cd506999
    main : switch input_noecho to input_echo to remove negation (#979) Ron Evans 2023-05-02 18:13:26 +02:00
  • 2d099e5193
    ggml: add names to tensors (#1268) slaren 2023-05-02 16:03:00 +02:00
  • f4cef87edf
    Add git-based build information for better issue tracking (#1232) DannyDaemonic 2023-05-01 09:23:47 -07:00
  • 58b367c2d7
    cuBLAS: refactor and optimize f16 mat mul performance (#1259) slaren 2023-05-01 18:11:07 +02:00
  • ea3a0ad6b6
    llama : update stubs for systems without mmap and mlock (#1266) xloem 2023-05-01 08:58:51 -04:00
  • 2bdc09646d
    ggml : fix ggml_used_mem() (#1264) Kerfuffle 2023-05-01 05:56:07 -06:00
  • 70269cae37
    llama : fix session load / save (#1263) Georgi Gerganov 2023-05-01 14:54:59 +03:00
  • b925f1f1b0
    cuBLAS: fall back to pageable memory if pinned alloc fails (#1233) slaren 2023-05-01 13:32:22 +02:00
  • 90b19bd6ee
    llama : let context be const when accessing const data (#1261) Alex Klinkhamer 2023-05-01 00:24:20 -07:00
  • 7ff0dcd320
    ggml : fix UB (int << 31) Georgi Gerganov 2023-04-30 22:28:51 +03:00
  • 6f79699286
    build: add armv{6,7,8} support to cmake (#1251) Pavol Rusnak 2023-04-30 20:48:38 +02:00
  • a5d30b1f53
    common : better default number of threads (#934) jon-chuang 2023-04-30 14:41:35 -04:00
  • 76a884920a
    ggml : add CLBlast q5_0, q5_1, q8_0 dequant kernels (#1225) 0cc4m 2023-04-30 20:34:52 +02:00
  • 6bc4400e67
    ggml : add Q5 WASM SIMD + GGML_FTYPE Georgi Gerganov 2023-04-30 19:07:00 +03:00
  • f0d70f147d
    Various fixes to mat_mul benchmark (#1253) Stephan Walter 2023-04-30 12:32:37 +00:00
  • 3e5aa8a1c4
    ggml : fix labels for GGML_OP_ALIBI Georgi Gerganov 2023-04-30 10:25:46 +03:00
  • c3ca7a5f05
    ggml : fix 32-bit ARM NEON Georgi Gerganov 2023-04-29 21:34:23 +03:00
  • e8c051611a
    ggml : use vzip instead of vuzp for consistency Georgi Gerganov 2023-04-29 21:12:56 +03:00
  • 0b5a935099
    ggml : fix visibility and unused warnings Georgi Gerganov 2023-04-29 19:28:36 +03:00
  • ec728e44d7
    ggml : fix #if for f32_f32 mul_mat (CLBlast) (#1229) Georgi Gerganov 2023-04-29 18:43:42 +03:00
  • 214b6a3570
    ggml : adjust mul_mat_f16 work memory (#1226) Georgi Gerganov 2023-04-29 18:43:28 +03:00
  • 305eb5afd5
    build : fix reference to old llama_util.h Georgi Gerganov 2023-04-29 13:53:12 +03:00
  • 84ca9c2ecf
    examples : fix save-load-state + rename llama-util.h Georgi Gerganov 2023-04-29 13:48:11 +03:00
  • 334637e43e
    common : change default parameters to pre-#1126 (#1223) Georgi Gerganov 2023-04-29 09:51:06 +03:00
  • dd7eff57d8
    llama : new sampling algorithms (#1126) Ivan Stepanov 2023-04-29 08:34:41 +03:00
  • 7fc50c051a
    cuBLAS: use host pinned memory and dequantize while copying (#1207) slaren 2023-04-29 02:04:18 +02:00
  • b1ee8f59b4
    cuBLAS: non-contiguous tensor support (#1215) Henri Vasserman 2023-04-29 02:31:56 +03:00
  • 36d19a603b
    Remove Q4_3 which is no better than Q5 (#1218) Stephan Walter 2023-04-28 23:10:43 +00:00
  • 7f15c5c477
    readme : update hot topics Georgi Gerganov 2023-04-28 21:32:52 +03:00
  • 55390bcaf2
    ggml : sync ggml (ggml_alibi) Georgi Gerganov 2023-04-28 20:37:43 +03:00
  • 5fba3c016b
    examples : add Jeopardy example (#1168) CRD716 2023-04-28 11:13:33 -05:00
  • 1481a9cf25
    llama : add session file format and saved sessions in main (#1169) Evan Jones 2023-04-28 11:59:37 -04:00
  • 11d902364b
    ggml : add helper debug printf in soft_max Georgi Gerganov 2023-04-28 17:58:44 +03:00
  • 7296c961d9
    ggml : add CLBlast support (#1164) 0cc4m 2023-04-28 16:57:16 +02:00
  • 78ec543733
    Correcting link to w64devkit (#1214) Folko-Ven 2023-04-28 19:22:48 +05:00
  • 92a6e13a31
    Add Manjaro CUDA include and lib dirs to Makefile (#1212) Johannes Gäßler 2023-04-28 15:40:32 +02:00
  • 04aaae1d79
    add avx2 for dot_q8_0_q8_0, 2x faster than scalar (#1211) Yann Follet 2023-04-28 19:59:48 +08:00
  • 0b2da20538
    ggml : slightly faster AVX2 implementation for Q5 (#1197) Stephan Walter 2023-04-26 20:26:42 +00:00
  • f9be42add0
    readme : add quantization info Georgi Gerganov 2023-04-26 23:24:42 +03:00
  • 574406dc7e
    ggml : add Q5_0 and Q5_1 quantization (#1187) Georgi Gerganov 2023-04-26 23:14:13 +03:00