Commit graph

  • 87a6f846d3
    Allow setting the rng seed after initialization. (#1184) Ásgeir Bjarni Ingvarsson 2023-04-26 20:08:43 +00:00
  • ea3ad7eb60
    Updating build instructions to include BLAS support (#1183) DaniAndTheWeb 2023-04-26 22:03:03 +02:00
  • 859fee6dfb
    quantize : use map to assign quantization type from string (#1191) Pavol Rusnak 2023-04-26 18:43:27 +02:00
  • 4afcc37869
    Update SHA256SUMS after quantization change (#1181) Stephan Walter 2023-04-25 21:41:56 +00:00
  • 667c501334
    py : cast lora_alpha to int in convert-lora-to-ggml (#1170) ostix360 2023-04-25 23:33:08 +02:00
  • bb98e77be7
    nix: use convert.py instead of legacy wrapper convert-pth-to-ggml.py (#981) Pavol Rusnak 2023-04-25 23:19:57 +02:00
  • 7a32fcb3b2
    ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) (#1179) Georgi Gerganov 2023-04-25 23:40:51 +03:00
  • dd0eabc049
    ggml : use full range for Q4_0 and Q4_2 quantization (#729) unbounded 2023-04-25 19:20:46 +02:00
  • 54bb60e268
    ggml : fix bug in ggml_compute_forward_sum_f32 (#1162) xaedes 2023-04-24 23:02:02 +02:00
  • 8a0f8673ba
    ggml : export symbols (#1155) Georgi Gerganov 2023-04-24 22:18:25 +03:00
  • 0c5692345d
    examples : add save_load_state example (#1150) xaedes 2023-04-24 18:23:31 +02:00
  • 957c8ae21d
    llama : increase scratch buffer size for 65B (ref #1152) Georgi Gerganov 2023-04-24 18:47:03 +03:00
  • 9b0a4d4214
    examples/main README improvements and some light refactoring (#1131) mgroeber9110 2023-04-24 17:45:32 +02:00
  • 2ec83428de
    Fix build for gcc 8 and test in CI (#1154) Stephan Walter 2023-04-24 15:38:26 +00:00
  • e4cf982e0d
    Fix cuda compilation (#1128) slaren 2023-04-24 17:29:58 +02:00
  • c4fe84fb0d
    llama : refactor get / set state + remove redundant kv cache API (#1143) Georgi Gerganov 2023-04-24 07:40:02 +03:00
  • 1d78fecdab
    Fix LoRA acronym (#1145) slaren 2023-04-23 23:03:44 +02:00
  • 284685f169
    scripts : add helper scripts to synch ggml repo Georgi Gerganov 2023-04-23 19:57:09 +03:00
  • edce63baa9
    Added README.md for main with examples and explanations (#1139) DannyDaemonic 2023-04-23 08:37:02 -07:00
  • ec9cdb6752
    ggml : do not print perf ops that have not been used at all Georgi Gerganov 2023-04-23 18:32:52 +03:00
  • e4422e299c
    ggml : better PERF prints + support "LLAMA_PERF=1 make" Georgi Gerganov 2023-04-23 18:15:39 +03:00
  • 53c8434398
    Improve AVX2 for vec_dot_q4_3_q8_0 (#1138) Stephan Walter 2023-04-23 11:01:03 +00:00
  • c6524f46eb
    readme : update gpt4all instructions (#980) Pavol Rusnak 2023-04-23 10:21:26 +02:00
  • c9e2c26f41
    A better packNibbles and mul_sum_i8_pairs_float implementation using AVX512 (#1119) Yishuo Wang 2023-04-23 15:57:05 +08:00
  • 0e018fe008
    ggml : fix Q4_3 cuBLAS Georgi Gerganov 2023-04-22 16:31:56 +03:00
  • 857308d1e8
    ci : trigger CI for drafts, but not most PR actions (#1125) Stephan Walter 2023-04-22 13:12:29 +00:00
  • c50b628810
    Fix CI: ARM NEON, quantization unit tests, editorconfig (#1122) Stephan Walter 2023-04-22 10:54:13 +00:00
  • 5f939498d5
    ggml : unit test for quantization functions (#953) unbounded 2023-04-22 11:10:39 +02:00
  • 36b4f7e064
    llama : print timings on ctrl+c exit (#1021) wbpxre150 2023-04-22 16:56:35 +08:00
  • 10f19c1121
    llama : have n_batch default to 512 (#1091) eiery 2023-04-22 04:27:05 -04:00
  • 7e312f165c
    cmake : fix build under Windows when enable BUILD_SHARED_LIBS (#1100) Howard Su 2023-04-22 16:18:20 +08:00
  • 872c365a91 ggml : fix AVX build + update to new Q8_0 format Georgi Gerganov 2023-04-22 11:08:12 +03:00
  • 955ef9a5d5
    ggml : alternative Q4_3 implementation using modified Q8_0 (#1109) Georgi Gerganov 2023-04-22 10:55:35 +03:00
  • c5aa5e5777
    ggml : AVX2 optimization for vec_dot_q4_3_q8_0 and refactoring (#1099) Stephan Walter 2023-04-22 07:37:05 +00:00
  • e9a9cb0c54
    examples : Improve Alpaca Default Repeat Penalty: Better Match Alpaca.cpp Experience (#1107) Clint Herron 2023-04-22 02:54:33 -04:00
  • b6e7f9b09e
    llama : add api for getting/setting the complete state: rng, logits, embedding and kv_cache (#1105) xaedes 2023-04-22 08:21:32 +02:00
  • 50cb666b8a
    Improve cuBLAS performance by using a memory pool (#1094) slaren 2023-04-21 21:59:17 +02:00
  • 25d7abbd1f
    llama : fixed rlimit error message (#888) apaz 2023-04-21 13:48:06 -05:00
  • 018f2279f5
    cmake : link threads publicly to ggml (#1042) 源文雨 2023-04-22 02:27:06 +08:00
  • 9411288271
    main : evaluate tokens in batches after swapping context (#1014) Alex Klinkhamer 2023-04-21 11:18:09 -07:00
  • 8687c1f258
    llama : remember and restore kv cache data pointers (#1104) xaedes 2023-04-21 17:25:21 +02:00
  • 1bfc153e2f
    ggml : a faster version for Q4_1 x Q8_0 dot products (#1083) Kawrakow 2023-04-21 17:18:26 +02:00
  • 3d59769c3b
    Show perplexity ETA in hours and minutes (#1096) slaren 2023-04-21 14:57:57 +02:00
  • d40fded93e
    llama : fix comment for "output.weight" tensor Georgi Gerganov 2023-04-21 10:23:36 +03:00
  • 2510c1831f
    Add ggml-model-*.bin checksums for 7B, 13B, 30B, 65B (#1088) Stephan Walter 2023-04-20 21:56:44 +00:00
  • 12b5900dbc
    ggml : sync ggml (add GPT-NeoX RoPE implementation) Georgi Gerganov 2023-04-20 23:32:59 +03:00
  • 9ff334f3c9
    ggml : fix bug in ggml_compute_forward_dup_f32() Georgi Gerganov 2023-04-20 21:58:05 +03:00
  • 2005469ea1
    Add Q4_3 support to cuBLAS (#1086) slaren 2023-04-20 20:49:53 +02:00
  • 8a1756abdf
    ggml : do not break cuBLAS build (Q4_3 is not yet implemented) Georgi Gerganov 2023-04-20 21:43:50 +03:00
  • 66aab46079
    ggml : fix Q4_3 quantization Georgi Gerganov 2023-04-20 20:44:05 +03:00
  • 38de86a711
    llama : multi-threaded quantization (#1075) Kawrakow 2023-04-20 19:42:27 +02:00
  • e0305ead3a
    ggml : add Q4_3 quantization (#1082) Georgi Gerganov 2023-04-20 20:35:53 +03:00
  • 6a9661ea5a
    ci : remove the LLAMA_ACCELERATE matrix dimension from Ubuntu builds in the CI (#1074) Ivan Komarov 2023-04-20 17:15:18 +02:00
  • 5addcb120c
    fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080) 源文雨 2023-04-20 21:28:43 +08:00
  • c8c2c52482
    AVX2 optimization for vec_dot_q4_2_q8_0 (#1068) Stephan Walter 2023-04-20 06:45:41 +00:00
  • 02d6988121
    Improve cuBLAS performance by dequantizing on the GPU (#1065) slaren 2023-04-20 03:14:14 +02:00
  • 834695fe3a
    Minor: Readme fixed grammar, spelling, and misc updates (#1071) CRD716 2023-04-19 14:52:14 -05:00
  • f7d05095b4
    Q4_2 quantization with rmse-optimized scale and quants (#1062) Kawrakow 2023-04-19 20:20:14 +02:00
  • 884e7d7a2b
    ggml : use 8-bit precision for Q4_1 intermediate results (#1047) Georgi Gerganov 2023-04-19 20:10:08 +03:00
  • 7cd5c4a3e9
    readme : add warning about Q4_2 and Q4_3 Georgi Gerganov 2023-04-19 19:07:54 +03:00
  • f3d4edf504
    ggml : Q4 cleanup - remove 4-bit dot product code (#1061) Stephan Walter 2023-04-19 16:06:37 +00:00
  • 8944a13296
    Add NVIDIA cuBLAS support (#1044) slaren 2023-04-19 11:22:45 +02:00
  • 6667401238
    Multi-threaded ggml_cpy (#1035) slaren 2023-04-19 00:53:24 +02:00
  • 77a73403ca
    ggml : add new Q4_2 quantization (ARM only) (#1046) Georgi Gerganov 2023-04-18 23:54:57 +03:00
  • 50a8a2af97
    ggml : scratch that - vmlaq_n_f32 is always better Georgi Gerganov 2023-04-18 23:11:23 +03:00
  • 4caebf6d40
    gitignore : vdot Georgi Gerganov 2023-04-18 23:00:08 +03:00
  • dcdd65e296
    ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators Georgi Gerganov 2023-04-18 22:59:17 +03:00
  • 5ecff35151
    Adding a simple program to measure speed of dot products (#1041) Kawrakow 2023-04-18 21:00:14 +02:00
  • 7faa7460f0
    readme : update hot topics about new LoRA functionality Georgi Gerganov 2023-04-18 20:10:26 +03:00
  • 5af8e32238
    ci : do not run on drafts Georgi Gerganov 2023-04-17 18:00:10 +03:00
  • 42747220b4
    Do not close file after mmap (Windows version) (#1034) Ivan Komarov 2023-04-18 03:15:50 +02:00
  • e9298af389
    readme : add Ruby bindings (#1029) Atsushi Tatsuma 2023-04-18 04:34:35 +09:00
  • 4ad73137a1
    add 4_0 to default outfile namestr dict (#1031) Cameron 2023-04-17 11:26:23 -07:00
  • 315a95a4d3
    Add LoRA support (#820) slaren 2023-04-17 17:28:55 +02:00
  • efd05648c8
    llama : well-defined static initialization of complex objects (#927) Arik Poznanski 2023-04-17 17:41:53 +03:00
  • eb17a026fd
    quantize-stats : fix bug in --type argument Georgi Gerganov 2023-04-17 17:31:06 +03:00
  • 69b740289f
    ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c Georgi Gerganov 2023-04-17 16:16:23 +03:00
  • f266259ad9
    Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() (#933) Ivan Komarov 2023-04-17 15:10:57 +02:00
  • 47f61aaa5f
    Fix: do not close file on mmap (#1017) slaren 2023-04-16 21:27:38 +02:00
  • 3173a62eb9
    stdout : vertical align outputs for better readibility Georgi Gerganov 2023-04-16 13:58:48 +03:00
  • 489537e6cf
    examples: add missing <ctime> include for time() (#1011) Pavol Rusnak 2023-04-16 12:13:00 +02:00
  • 2d3481c721
    Fix msys2 build error and warnings (#1009) nanahi 2023-04-16 17:13:42 +08:00
  • 74f5899df4
    convert.py: Fix loading safetensors and ggml format on Windows (#991) comex 2023-04-15 14:53:21 -07:00
  • 2f7c8e014e
    Fix potential int8 overflow in non-SIMD vec_dot (#986) Stephan Walter 2023-04-15 18:28:56 +00:00
  • 0ad964631f
    Refactor ggml.c for future tensor types (#1001) Stephan Walter 2023-04-15 16:25:38 +00:00
  • e95b6554b4
    ggml : add Q8_0 quantization for intermediate results (#951) Georgi Gerganov 2023-04-15 17:53:22 +03:00
  • aa485cee33
    ggml : use posix_memalign on non-Windows env Georgi Gerganov 2023-04-15 14:25:45 +03:00
  • c12b14b77f
    benchmark : fix result validation in benchmark-q4_0-matmult (#987) Ivan Komarov 2023-04-15 07:51:54 +02:00
  • 106faaf297
    cmake : add finding the OpenBLAS header file (#992) katsu560 2023-04-15 14:51:11 +09:00
  • c85e03d12e
    Revert "main : alternative instruct mode (Vicuna support, etc.) (#863)" (#982) Pavol Rusnak 2023-04-14 21:58:43 +02:00
  • 489093548c
    py : bump sentencepiece to 0.1.98 to support Python 3.11 (#976) Pavol Rusnak 2023-04-14 21:46:49 +02:00
  • 93265e988a
    make : fix dependencies, use auto variables (#983) Stephan Walter 2023-04-14 19:39:48 +00:00
  • c56b715269
    Expose type name from ggml (#970) Pavol Rusnak 2023-04-14 20:05:37 +02:00
  • f4d277ae17
    main : alternative instruct mode (Vicuna support, etc.) (#863) Tomáš Pazdiora 2023-04-14 17:19:17 +02:00
  • c9a59b70a5
    ggml : add unary and binary map operations (#874) Kerfuffle 2023-04-14 08:43:55 -06:00
  • a32f7acc9f
    py : cleanup dependencies (#962) Pavol Rusnak 2023-04-14 15:37:11 +02:00
  • 43ffdefb74
    py : fix flake8 and isort nitpicks (#960) Pavol Rusnak 2023-04-14 14:23:21 +02:00
  • 1623a6e9b4
    ggml : minor Georgi Gerganov 2023-04-14 13:31:29 +03:00
  • c14e0d2f23
    ggml : always allocate buffers with size multiple of GGML_MEM_ALIGN Georgi Gerganov 2023-04-14 13:31:15 +03:00
  • 723dac55fa
    py : new conversion script (#545) comex 2023-04-14 00:03:03 -07:00