Commit graph

  • 8432d4d9f7
    ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738) le.chang 2023-06-09 00:47:56 +08:00
  • 0f291e1f65
    metal : Q6_K implementation (#1752) Kawrakow 2023-06-08 19:46:22 +03:00
  • 8fc8179919
    Add llama.cpp docker support for non-latin languages (#1673) qingfengfenga 2023-06-08 15:58:53 +08:00
  • b50b570ed9
    ggml : fix fprintf warnings (#1720) Steven Roussey 2023-06-08 00:12:28 -07:00
  • 53aba3f393
    clang-tidy : restore dot file from accidental deletion Georgi Gerganov 2023-06-08 10:09:08 +03:00
  • 4161bdc04d
    metal : add Q4_K implementation (#1733) Kawrakow 2023-06-08 10:08:23 +03:00
  • 0035858273
    k-quants : add missing compile definition to CMakeLists (#1748) johnson442 2023-06-08 08:02:48 +01:00
  • 5c64a0952e
    k-quants : allow to optionally disable at compile time (#1734) Georgi Gerganov 2023-06-07 10:59:52 +03:00
  • 5b57a5b726
    flake : update to support metal on m1/m2 (#1724) jacobi petrucciani 2023-06-07 00:15:31 -04:00
  • 4dc62c545d
    readme : add June roadmap Georgi Gerganov 2023-06-07 07:15:08 +03:00
  • 35a84916fb
    main: add the possibility to open the prompt cache read-only (#1640) Willy Tarreau 2023-06-07 04:10:17 +02:00
  • 2d7bf110ed
    llama : fix vram_scratch var Georgi Gerganov 2023-06-06 22:54:39 +03:00
  • 2a4e41a086
    llama : fix compile warnings Georgi Gerganov 2023-06-06 22:41:53 +03:00
  • 17366df842
    Multi GPU support, CUDA refactor, CUDA scratch buffer (#1703) Johannes Gäßler 2023-06-06 21:33:23 +02:00
  • 44f906e853
    metal : add f16 support Georgi Gerganov 2023-06-06 20:16:57 +03:00
  • d5b111f53d
    Clblast fixes + enhancements to save VRAM and offload more layers (#1675) LostRuins 2023-06-07 01:00:01 +08:00
  • 2d43387daf
    ggml : fix builds, add ggml-quants-k.o (close #1712, close #1710) Georgi Gerganov 2023-06-06 10:18:03 +03:00
  • 7ad7750c5c
    gitignore : add .clang-tidy Georgi Gerganov 2023-06-06 09:55:10 +03:00
  • 7a74dee6b4
    llama : temporary disable Q6_K output quantization (#1711) Georgi Gerganov 2023-06-06 09:39:38 +03:00
  • 590250f7a9
    metal : add checks for buffer size (#1706) Spencer Sutton 2023-06-05 23:28:17 -04:00
  • f4c55d3bd7
    docs : add performance troubleshoot + example benchmark documentation (#1674) Yuval Peled 2023-06-05 23:32:36 +03:00
  • f1465624c2
    readme : fix typo (#1700) Foul-Tarnished 2023-06-05 22:28:37 +02:00
  • c2df36d60d
    llama : consistently catch and throw only exceptions deriving from std::exception (#1599) mgroeber9110 2023-06-05 22:24:29 +02:00
  • 9d0693bce3
    metal : use shared buffers between CPU and GPU (#1696) kiltyj 2023-06-05 13:24:04 -07:00
  • efe0507632
    ggml : fix internal overflow in ggml_time_us on Windows (#1702) grahameth 2023-06-05 22:11:49 +02:00
  • e7fe66e670
    ci : disable auto tidy (#1705) Georgi Gerganov 2023-06-05 23:05:05 +03:00
  • 99009e72f8
    ggml : add SOTA 2,3,4,5,6 bit k-quantizations (#1684) Kawrakow 2023-06-05 22:56:18 +03:00
  • 5220a991a5
    Increase 3B scratch buffers. (#1698) Henri Vasserman 2023-06-05 13:43:08 +03:00
  • d1f563a743
    llama : fix Metal KV cache sync (close #1695) Georgi Gerganov 2023-06-05 10:19:03 +03:00
  • 827f5eda91
    readme : update hot topics Georgi Gerganov 2023-06-04 23:38:19 +03:00
  • ecb217db4f
    llama : Metal inference (#1642) Georgi Gerganov 2023-06-04 23:34:30 +03:00
  • dcb2ed4826
    OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653) 0cc4m 2023-06-04 08:12:05 +02:00
  • d8bd0013e8
    Add info about CUDA_VISIBLE_DEVICES (#1682) Henri Vasserman 2023-06-03 16:35:20 +03:00
  • b5c85468a3
    Docker: change to calling convert.py (#1641) Jiří Podivín 2023-06-03 14:11:53 +02:00
  • 136476e898
    Fix prompt cache saving and chat-persistent rollover (#1678) Evan Jones 2023-06-03 07:28:45 -04:00
  • ffb06a345e
    OpenLLaMA 3B support (#1588) Henri Vasserman 2023-05-30 21:24:22 +03:00
  • 7552ac5863
    ggml : sync cgraph import / export API Georgi Gerganov 2023-05-29 19:31:44 +03:00
  • 5d1830b99d
    ggml : fix bug in ggml_alibi Georgi Gerganov 2023-05-29 19:30:49 +03:00
  • 248367605e
    Work around for recalculating logits in cached prompts (Fixes #1585) (#1609) DannyDaemonic 2023-05-29 05:13:40 -07:00
  • 0e730dd23b
    Adding git in container package dependencies (#1621) Jiří Podivín 2023-05-29 06:45:50 +02:00
  • 3b126f654f
    LLAMA_DEBUG adds debug symbols (#1617) Johannes Gäßler 2023-05-28 21:01:02 +02:00
  • 1b78ed2081
    Only show -ngl option when relevant + other doc/arg handling updates (#1625) Kerfuffle 2023-05-28 11:48:57 -06:00
  • 337aea1139
    examples : add --alias option to gpt_params to set use friendly model name (#1614) Vladimir Zorin 2023-05-28 20:14:24 +03:00
  • bb051d9723
    opencl : no need to allocate cl_mem on heap (#1612) Howard Su 2023-05-29 01:13:36 +08:00
  • ca74884f66
    opencl : use strstr to check if fp16 supported (#1611) Howard Su 2023-05-29 01:09:56 +08:00
  • a6704643b6
    ggml : add support for the RISCV architecture (#1616) apcameron 2023-05-27 21:03:25 +01:00
  • 0df7d63e5b
    Include server in releases + other build system cleanups (#1610) Kerfuffle 2023-05-27 11:04:14 -06:00
  • 97c9b77c4f
    Add documentation about CLBlast (#1604) Henri Vasserman 2023-05-27 18:47:55 +03:00
  • 0ecb1bbbeb
    [CI] Fix openblas (#1613) Henri Vasserman 2023-05-27 17:24:06 +03:00
  • 93618031c7
    ggml : add ggml_tensor_overhead() Georgi Gerganov 2023-05-27 16:19:56 +03:00
  • 83c54e6da5
    [CI] CLBlast: Fix directory name (#1606) Henri Vasserman 2023-05-27 15:18:25 +03:00
  • bdbda1b17a
    ggml : sync ggml core (minor additions, e.g. ggml_get_tensor_by_name()) Georgi Gerganov 2023-05-27 12:22:05 +03:00
  • 66874d4fbc
    Some improvements to loading the session with --prompt-cache (#1550) Kerfuffle 2023-05-25 20:18:01 -06:00
  • 1fcdcc28b1
    cuda : performance optimizations (#1530) Johannes Gäßler 2023-05-25 23:07:29 +02:00
  • ac7876ac20
    Update CLBlast to 1.6.0 (#1580) Henri Vasserman 2023-05-24 10:30:09 +03:00
  • c31bbe934b
    readme : add docs for chat-persistent.sh (#1568) Evan Jones 2023-05-24 02:24:01 -04:00
  • 1359b6aba5
    chat-persistent.sh : use bracket expressions in grep (#1564) Senemu 2023-05-24 06:16:22 +00:00
  • 7d873811f3
    Fix handling of "invalid property" when creating OpenCL command queue (#1565) Maarten ter Huurne 2023-05-23 18:01:15 +02:00
  • 2e6cd4b025
    OpenCL Token Generation Acceleration (#1459) 0cc4m 2023-05-22 23:33:24 +02:00
  • 7e4ea5beff
    examples : add server example with REST API (#1443) Steward Garcia 2023-05-21 11:51:18 -06:00
  • 7780e4f479
    make : .PHONY clean (#1553) Stefan Sydow 2023-05-21 16:03:44 +02:00
  • 265db9834e
    ggml : output 3d sizes in ggml_graph_dump_dot() Georgi Gerganov 2023-05-21 11:56:23 +03:00
  • fab49c685e
    ggml : update WASM SIMD Georgi Gerganov 2023-05-20 20:00:41 +03:00
  • b8ee340abe
    feature : support blis and other blas implementation (#1536) Zenix 2023-05-20 23:58:31 +09:00
  • 9ecb30f959
    OpenCL: Fixes for older devices. (#1435) Henri Vasserman 2023-05-20 17:57:39 +03:00
  • 29cf5596fe
    llama : define magic numbers as integer constants (#1518) (#1520) Juuso Alasuutari 2023-05-20 15:58:15 +03:00
  • 3de84b2606
    ggml : add ggml_clamp() (#1539) Georgi Gerganov 2023-05-20 15:34:45 +03:00
  • affc76edfd
    cuda : loading models directly into VRAM, norm calculation on GPU, broadcasting for ggml_mul (#1483) Johannes Gäßler 2023-05-20 14:19:28 +02:00
  • ea600071cb
    Revert "feature : add blis and other BLAS implementation support (#1502)" Georgi Gerganov 2023-05-20 12:03:48 +03:00
  • 07e9ace0f9
    feature : add blis and other BLAS implementation support (#1502) Zenix 2023-05-20 18:02:48 +09:00
  • ec2e10c444
    llama : add llama_init_backend() API (close #1527) Georgi Gerganov 2023-05-20 11:06:11 +03:00
  • d2c59b8ba4
    Fix for mingw (#1462) DannyDaemonic 2023-05-20 00:40:02 -07:00
  • 503db28849
    llama : fix name shadowing and C4146 (#1526) Maxime 2023-05-20 09:22:37 +02:00
  • 8a203f9fa1 llama : fix compile warnings in llama_set_state_data() Georgi Gerganov 2023-05-20 10:14:31 +03:00
  • 4fd3e29297 ggml : fix scalar implementation of Q4_1 dot Georgi Gerganov 2023-05-20 10:13:19 +03:00
  • 2d5db48371
    ggml : use F16 instead of F32 in Q4_0, Q4_1, Q8_0 (#1508) Georgi Gerganov 2023-05-19 22:17:18 +03:00
  • 6986c7835a
    tests : add missing header Georgi Gerganov 2023-05-19 21:17:28 +03:00
  • 943e6081cc
    examples : add persistent chat (#1495) Evan Jones 2023-05-19 13:39:51 -04:00
  • 7694b52b9a
    main : make reverse prompt option act as a stop token in non-interactive mode (#1032) Jason McCartney 2023-05-19 10:24:59 -07:00
  • 79e3efb0e9
    readme : adds WizardLM to the list of supported models (#1485) David Kennedy 2023-05-19 13:16:30 -04:00
  • 4b7e245adf
    minor : fix compile warnings Georgi Gerganov 2023-05-19 20:14:51 +03:00
  • 5ea4339273
    make kv_f16 the default for api users (#1517) Erik Scholz 2023-05-18 19:31:01 +02:00
  • ee9654138a
    Fixes #1511 lambda issue for w64devkit (mingw) (#1513) DannyDaemonic 2023-05-18 10:30:40 -07:00
  • dc271c52ed
    Remove unused n_parts parameter (#1509) Stephan Walter 2023-05-17 22:12:01 +00:00
  • c238b5873a
    benchmark-matmul: Print the average of the test results (#1490) rankaiyx 2023-05-17 22:47:58 +08:00
  • 2b2646931b
    convert.py: Support models which are stored in a single pytorch_model.bin (#1469) Tom Jobbins 2023-05-16 23:04:35 +01:00
  • 42627421ec
    ~7% faster Q5_1 AVX2 code (#1477) Ilya Kurdyukov 2023-05-17 01:36:47 +07:00
  • 9560655409
    define default model path once, sync path with readme (#1366) András Salamon 2023-05-16 16:46:34 +01:00
  • 2a5ee023ad
    Add alternate include path for openblas (#1476) sandyiscool 2023-05-16 14:00:15 +05:30
  • 63d20469b8
    fix get_num_physical_cores() (#1436) zrm 2023-05-14 22:25:42 -04:00
  • b5c9295eef
    benchmark-matmul: fix clang-tidy issues, report results in GFLOPS (#1458) slaren 2023-05-14 22:46:00 +02:00
  • eb363627fd
    cuda : deduplicated dequantization code (#1453) Johannes Gäßler 2023-05-14 20:53:23 +02:00
  • 79b2d5b69d
    ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 (#1454) xaedes 2023-05-14 17:55:02 +02:00
  • 13c351ad72
    ggml : various fixes (#1450) Georgi Gerganov 2023-05-14 18:22:50 +03:00
  • 60f8c361ca
    ggml : add AVX support based on AVX2 code (#1430) katsu560 2023-05-14 19:03:51 +09:00
  • 601a033475
    ggml : add GGML_QNT_VERSION to track quantization format changes Georgi Gerganov 2023-05-14 10:20:19 +03:00
  • 08737ef720 cuda : fix convert function (#1412) Georgi Gerganov 2023-05-13 17:40:58 +03:00
  • bda4d7c215 make : fix PERF build with cuBLAS Georgi Gerganov 2023-05-13 17:25:09 +03:00
  • 5a5aeb1e91
    llama : fix unused warning Georgi Gerganov 2023-05-13 16:55:14 +03:00
  • 66841fdb0e
    ggml : multi-thread mul and diag_mask ops (#1428) Georgi Gerganov 2023-05-13 16:48:03 +03:00