Commit graph

  • 1c641e6aac
    build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809) Olivier Chafik 2024-06-13 00:41:52 +01:00
  • 963552903f
    CUDA: fix broken oob check for FA vec f32 kernel (#7904) Johannes Gäßler 2024-06-12 17:41:51 +02:00
  • a9cae48003
    tests : add non-cont unary tests (#7857) Georgi Gerganov 2024-06-12 16:00:22 +03:00
  • bfaa676b08
    ggml : improve ggml_is_contiguous logic (#7856) Georgi Gerganov 2024-06-12 15:24:20 +03:00
  • 704a35b183
    server : restore numeric prompts (#7883) Georgi Gerganov 2024-06-12 14:42:29 +03:00
  • dcf752707d
    update intel docker oneapi-basekit to 2024.1.1-devel-ubuntu22.04 (#7894) Meng, Hengyu 2024-06-12 17:05:35 +08:00
  • f2b5764beb
    Fix a typo and add Fedora 40 pacakge to install for Vulkan (#7794) [no ci] Patrice Ferlet 2024-06-12 03:18:16 +02:00
  • 73bac2b11d
    vulkan: select only one device for single gpu with multiple drivers (#7582) k.h.lai 2024-06-12 03:26:05 +08:00
  • ef52d1d16a
    Update Vulkan RoPE implementation (#7818) 0cc4m 2024-06-11 21:20:29 +02:00
  • 14f83526cd
    fix broken link in pr template (#7880) [no ci] Deven Mistry 2024-06-11 12:18:58 -04:00
  • 6fe42d073f
    github: move PR template to .github/ root (#7868) Brian 2024-06-12 00:43:41 +10:00
  • 148995e5e5
    llama-bench: more compact markdown tables (#7879) Johannes Gäßler 2024-06-11 14:45:40 +02:00
  • 4bfe50f741
    tests : check the Python version (#7872) Georgi Gerganov 2024-06-11 10:10:20 +03:00
  • bdcb8f4222
    CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (#7860) Johannes Gäßler 2024-06-11 08:26:07 +02:00
  • c2ce6c47e4
    fix CUDA CI by using a windows-2019 image (#7861) slaren 2024-06-11 07:59:20 +02:00
  • b61eb9644d
    json: refine constraint for whitespace to avoid runaways yet allow pretty print (#7866) Olivier Chafik 2024-06-11 02:22:57 +01:00
  • 396b18dfec
    json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841) Olivier Chafik 2024-06-11 01:00:30 +01:00
  • 864a99e7a0
    cmake : fix CMake requirement for CUDA (#7821) Jared Van Bortel 2024-06-10 18:32:10 -04:00
  • fd5ea0f897
    ci : try win-2019 on server windows test (#7854) slaren 2024-06-10 14:18:41 +02:00
  • c28a83902c
    examples : remove --instruct remnants (#7846) Georgi Gerganov 2024-06-10 15:00:15 +03:00
  • d9da0e4986
    server : improve "prompt" handling (#7847) Georgi Gerganov 2024-06-10 14:59:55 +03:00
  • 1f0dabda8d
    CUDA: use tensor cores for MMQ (#7676) Johannes Gäßler 2024-06-10 11:45:13 +02:00
  • af4ae502dd
    use the correct SYCL context for host USM allocations (#7777) Ben Ashbaugh 2024-06-10 02:21:31 -07:00
  • 10ceba354a
    flake.lock: Update (#7838) Georgi Gerganov 2024-06-10 02:04:50 +03:00
  • e95beeb1fc
    imatrix : handle partial entries (#7833) Georgi Gerganov 2024-06-09 20:19:35 +03:00
  • 57bf62ce7c
    docs: Added initial PR template with directions for doc only changes and squash merges [no ci] (#7700) Nicolás Pérez 2024-06-09 11:24:29 -04:00
  • 3e2ee44315
    server: do not remove whitespace at the start of a completion chunk (#7830) mgroeber9110 2024-06-09 12:50:35 +02:00
  • 42b53d192f
    CUDA: revise q8_1 data layout for mul_mat_q (#7824) Johannes Gäßler 2024-06-09 09:42:25 +02:00
  • 2decf57bc6
    convert-hf : set the model name based on cli arg, if present (#7693) sasha0552 2024-06-09 06:39:25 +00:00
  • 5795b94182
    convert-hf : match model part name prefix and suffix (#7687) compilade 2024-06-08 22:47:25 -04:00
  • ed9f252118
    gguf-py : decouple adding metadata from writing in GGUFWriter (#7827) compilade 2024-06-08 22:34:29 -04:00
  • fe1e3917cf
    Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (#7682)" (#7808) slaren 2024-06-09 01:43:39 +02:00
  • d4d915d351
    url: save -mu downloads to new cache location (#7826) Olivier Chafik 2024-06-08 20:21:08 +01:00
  • 7a16ce7db2
    server : smart slot selection using Longest Common Prefix (#7728) sasha0552 2024-06-08 07:50:31 +00:00
  • da799b4189
    vulkan : reuse parent extra for views (#7806) slaren 2024-06-07 19:47:49 +02:00
  • c00fad71e5
    gguf-split : change binary multi-byte units to decimal (#7803) Christian Zhou-Zheng 2024-06-07 08:56:01 -04:00
  • 27615f5ab2
    cmake : fix BUILD_SHARED_LIBS=ON build (#7784) intelmatt 2024-06-07 05:15:07 -07:00
  • 7027b27d76
    server: update cache_prompt documentation [no ci] (#7745) Johannes Gäßler 2024-06-07 11:15:49 +02:00
  • a5cabd7649
    server : do not get prompt in infill mode (#7286) woodx 2024-06-07 15:09:45 +08:00
  • d5c938cd77
    [SYCL] fix softmax r2r result wrong issue (#7811) pengxin99 2024-06-07 14:28:26 +08:00
  • c9ee7118d5
    check for nans in imatrix and quantize (#7807) slaren 2024-06-07 08:01:29 +02:00
  • ee459f40f6
    server : fix --threads-http arg (#7801) Georgi Gerganov 2024-06-06 19:19:59 +03:00
  • f83351f9a6
    imatrix : migrate to gpt_params (#7771) Georgi Gerganov 2024-06-06 16:30:58 +03:00
  • ad675e1c67
    Added support for . (any character) token in grammar engine. (#6467) Clint Herron 2024-06-06 06:08:52 -07:00
  • a143c04375
    README minor fixes (#7798) [no ci] Mattheus Chediak 2024-06-06 09:17:54 -03:00
  • 55b2d0849d
    grammars: x{min,max} repetition operator (#6640) Olivier Chafik 2024-06-06 10:07:06 +01:00
  • f5d7b268ec
    llama : add jina v2 base code (#7596) Joan Fontanals 2024-06-06 09:22:41 +02:00
  • 2d08b7fbb4
    docker : build only main and server in their images (#7782) slaren 2024-06-06 07:19:49 +02:00
  • d67caea0d6
    docker : add openmp lib (#7780) slaren 2024-06-06 07:17:21 +02:00
  • 7672adeec7
    Fix encoding in python scripts (#7733) Galunid 2024-06-05 19:07:24 +02:00
  • 7d1a378b8f
    CUDA: refactor mmq, dmmv, mmvq (#7716) Johannes Gäßler 2024-06-05 16:53:00 +02:00
  • 2b3389677a
    ggml : refactor rope norm/neox (#7634) Georgi Gerganov 2024-06-05 11:29:20 +03:00
  • 9973e81c5c
    readme : remove -ins (#7759) arch-btw 2024-06-04 23:40:49 -07:00
  • c90dbe026b
    Fix per token atrributes bits (#7749) jaime-m-p 2024-06-05 01:26:14 +02:00
  • b90dc566c1
    Allow number of nodes in CUDA graph to change (#7738) agray3 2024-06-04 21:06:49 +01:00
  • 1442677f92
    common : refactor cli arg parsing (#7675) Georgi Gerganov 2024-06-04 21:23:39 +03:00
  • 554c247caf
    ggml : remove OpenCL (#7735) Georgi Gerganov 2024-06-04 21:23:20 +03:00
  • 0cd6bd3483
    llama : remove beam search (#7736) Georgi Gerganov 2024-06-04 21:23:05 +03:00
  • 5ca0944a15
    readme : remove obsolete Zig instructions (#7471) Georgi Gerganov 2024-06-04 19:43:01 +03:00
  • adc9ff3841
    llama-bench : allow using a different printer for stderr with -oe (#7722) slaren 2024-06-04 14:32:42 +02:00
  • 987d743d6b
    Improve hipBLAS support in CMake (#7696) Daniele 2024-06-04 12:09:15 +00:00
  • b226c1227b
    refine .gitignore (#7688) zhouwg 2024-06-04 19:21:26 +08:00
  • 3b38d48609
    Per token attributes (#7685) jaime-m-p 2024-06-04 09:17:17 +02:00
  • 6d1616944d
    ggml : prevent builds with -ffinite-math-only (#7726) Georgi Gerganov 2024-06-04 10:01:09 +03:00
  • bde7cd3cd9
    llama : offload to RPC in addition to other backends (#7640) Radoslav Gerganov 2024-06-03 20:03:26 +03:00
  • a5735e4426
    ggml : use OpenMP as a thread pool (#7606) Masaya, Kato 2024-06-04 00:14:15 +09:00
  • 0b832d53ba
    make: fix debug options not being applied to NVCC (#7714) Johannes Gäßler 2024-06-03 16:28:58 +02:00
  • 3d7ebf6312
    Vulkan Mixture of Experts (MoE) support (#7628) 0cc4m 2024-06-03 10:59:14 +02:00
  • a10cda58d3
    cmake : add pkg-config spec file for llama.cpp (#7702) Andy Tai 2024-06-03 01:06:24 -07:00
  • 6f28a333c1
    llama : MiniCPM support tied embeddings (#7664) zhangkaihuo 2024-06-03 15:49:30 +08:00
  • 549279d804
    llama : avoid double token-to-piece cache (#7654) Georgi Gerganov 2024-06-03 08:34:43 +03:00
  • 9e405b6e2e
    kompute : implement op_getrows_f32 (#6403) woachk 2024-06-03 07:32:16 +02:00
  • 3413ae2193
    fix bug introduced in using calloc (#7701) Dave Airlie 2024-06-03 07:59:54 +10:00
  • 1669810d7c
    flake.lock: Update (#7686) Georgi Gerganov 2024-06-03 00:13:12 +03:00
  • 7c4e5b7eae
    chore : add ignore rule for generated server themes (#7689) Austin 2024-06-02 13:39:08 -04:00
  • 9422c5e34b
    [SYCL] Update rpc-server.cpp to include SYCL backend (#7682) nickp27 2024-06-02 19:13:54 +10:00
  • e141ce624a
    Fix FlashAttention debug test, FP32 assert (#7684) Johannes Gäßler 2024-06-01 23:26:10 +02:00
  • 2e666832e6
    server : new UI (#7633) Yazan Agha-Schrader 2024-06-01 21:31:48 +02:00
  • 2ac95c9d56
    SimpleChat: Simple histogram/repeatMatching driven garbageTrimming, Settings UI, Streaming mode, OpenAi Compat (Model, Authorization Bearer), Save/Restore session, Auto Settings UI (#7548) HanishKVC 2024-06-01 21:50:18 +05:30
  • 750f60c03e
    CUDA: fix Pascal FA, deq. KV to FP16 for batch > 8 (#7681) Johannes Gäßler 2024-06-01 15:47:04 +02:00
  • 9b596417af
    CUDA: quantized KV support for FA vec (#7527) Johannes Gäßler 2024-06-01 08:44:14 +02:00
  • a323ec60af
    server : update js (#7670) Georgi Gerganov 2024-05-31 22:23:04 +03:00
  • 0515ad93f4
    convert-hf : Handle NotImplementedError in convert-hf-to-gguf (#7660) Galunid 2024-05-31 17:42:33 +02:00
  • c8047d538f
    scripts: update compare_llama_bench.py [no ci] (#7673) Johannes Gäßler 2024-05-31 16:26:21 +02:00
  • 30e238b246
    Improve HIP compatibility (#7672) Daniele 2024-05-31 14:00:29 +00:00
  • 16926dff92
    readme : link homebrew discussion Georgi Gerganov 2024-05-31 15:04:58 +03:00
  • 0c27e6f62e
    ggml : fix loongson compile warnings (#7537) Georgi Gerganov 2024-05-31 14:17:10 +03:00
  • 2e32f874e6
    Somehow '**' got lost (#7663) Galunid 2024-05-31 10:24:41 +02:00
  • 1af511fc22
    Add convert.py removal to hot topics (#7662) Galunid 2024-05-31 10:09:20 +02:00
  • 0541f06296
    [no ci] docs: add aikit to readme (#7650) Sertaç Özercan 2024-05-30 16:57:16 -07:00
  • 9022c33646
    Fixed painfully slow single process builds. (#7326) JohnnyB 2024-05-30 21:32:38 +01:00
  • 5921b8f089
    llama : cache llama_token_to_piece (#7587) Georgi Gerganov 2024-05-30 19:01:41 +03:00
  • 5dcdf94676
    Fix conan badge display [no ci] (#7645) Martin Delille 2024-05-30 17:07:39 +02:00
  • 2e2340de17
    Add brew installation instruction to README [no ci] (#7616) Manuel 2024-05-30 16:58:15 +02:00
  • 7846540bd2
    readme : add Conan badge (#7638) Martin Delille 2024-05-30 14:52:50 +02:00
  • e6157f94c8
    github: add contact links to issues and convert question into research [no ci] (#7612) Brian 2024-05-30 21:55:36 +10:00
  • 9c4c9cc83f
    Move convert.py to examples/convert-legacy-llama.py (#7430) Galunid 2024-05-30 13:40:00 +02:00
  • 59b0d07766
    faster avx512 exp implementation (#7551) Chris Elrod 2024-05-30 07:32:55 -04:00
  • d5c05821f3
    ggml : fix loongarch build (O2 issue) (#7636) junchao-loongson 2024-05-30 17:30:10 +08:00
  • 972b555ab9
    README: explain parallel build [no ci] (#7618) Johannes Gäßler 2024-05-30 09:52:39 +02:00