Commit graph

  • 642330ac7c
    llama : add enum for built-in chat templates (#10623) Xuan Son Nguyen 2024-12-02 22:10:19 +01:00
  • 8648c52101
    make : deprecate (#10514) Georgi Gerganov 2024-12-02 21:22:53 +02:00
  • 64ed2091b2
    server: Add "tokens per second" information in the backend (#10548) haopeng 2024-12-02 21:45:54 +08:00
  • 991f8aabee
    SYCL: Fix and switch to GGML_LOG system instead of fprintf (#10579) Akarshan Biswas 2024-12-02 12:34:11 +05:30
  • 4cb003dd8d
    contrib : refresh (#10593) Georgi Gerganov 2024-12-02 08:53:27 +02:00
  • 917786f43d
    Add mistral-v1, mistral-v3, mistral-v3-tekken and mistral-v7 chat template types (#10572) Juk Armstrong 2024-12-01 22:09:49 +00:00
  • 5e1ed95583
    grammars : add English-only grammar (#10612) Georgi Gerganov 2024-12-01 21:37:54 +02:00
  • 5c7a5aa0c3
    ci: add error handling for Python venv creation in run.sh (#10608) Wang Qin 2024-12-01 10:11:42 -08:00
  • 3420909dff
    ggml : automatic selection of best CPU backend (#10606) Diego Devesa 2024-12-01 16:12:41 +01:00
  • 86dc11c5bc
    server : bind to any port when specified (#10590) alek3y 2024-12-01 12:33:12 +01:00
  • 6acce39710
    readme : update the usage section with examples (#10596) Georgi Gerganov 2024-12-01 11:25:17 +02:00
  • 43957ef203
    build: update Makefile comments for C++ version change (#10598) Wang Qin 2024-11-30 19:19:44 -08:00
  • 0c39f44d70
    ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() (#10567) Adrien Gallouët 2024-11-30 18:13:18 +01:00
  • 3e0ba0e604
    readme : remove old badge Georgi Gerganov 2024-11-30 10:09:21 +02:00
  • abadba05be
    readme : refresh (#10587) Georgi Gerganov 2024-11-30 09:47:07 +02:00
  • 0533e7fb38
    vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536) Eve 2024-11-30 07:00:02 +00:00
  • 7cc2d2c889
    ggml : move AMX to the CPU backend (#10570) Diego Devesa 2024-11-29 21:54:58 +01:00
  • b782e5c7d4
    server : add more test cases (#10569) Xuan Son Nguyen 2024-11-29 21:48:56 +01:00
  • 3a8e9af402
    imatrix : support combine-only (#10492) Robert Collins 2024-11-29 12:21:37 -05:00
  • a3a3048e7a
    cleanup UI link list (#10577) Diego Devesa 2024-11-29 17:45:08 +01:00
  • f0678c5ff4
    ggml : fix I8MM Q4_1 scaling factor conversion (#10562) Georgi Gerganov 2024-11-29 16:25:39 +02:00
  • 4b3242bbea
    ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580) Shupei Fan 2024-11-29 21:49:02 +08:00
  • 0f77aae560
    sycl : offload of get_rows set to 0 (#10432) Alberto Cabrera Pérez 2024-11-29 12:38:45 +00:00
  • 266b8519ee
    sycl : Reroute permuted mul_mats through oneMKL (#10408) Alberto Cabrera Pérez 2024-11-29 09:49:43 +00:00
  • 938f608742
    CANN: RoPE operator optimization (#10563) Chenguang Li 2024-11-29 14:46:55 +08:00
  • f095a649ec
    vulkan: get the first command buffer submitted sooner (#10499) Jeff Bolz 2024-11-29 00:18:02 -06:00
  • 678d7994f4
    llava: return false instead of exit (#10546) Ting Lou 2024-11-29 08:09:46 +08:00
  • dc22344088
    ggml : remove redundant copyright notice + update authors Georgi Gerganov 2024-11-28 20:46:40 +02:00
  • 4c0a95b107
    llama : add missing model types Georgi Gerganov 2024-11-28 20:45:07 +02:00
  • 6c59567689
    server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568) Xuan Son Nguyen 2024-11-28 19:17:49 +01:00
  • 890719311b
    common: fix warning message when no GPU found (#10564) Johannes Gäßler 2024-11-28 18:15:25 +01:00
  • 7281cf13ad
    docs: fix outdated usage of llama-simple (#10565) Random Fly 2024-11-28 23:03:11 +08:00
  • e90688edd0
    ci : fix tag name in cuda and hip releases (#10566) Diego Devesa 2024-11-28 15:58:54 +01:00
  • 76b27d29c2
    ggml : fix row condition for i8mm kernels (#10561) Georgi Gerganov 2024-11-28 14:56:37 +02:00
  • eea986f215
    cmake : fix ARM feature detection (#10543) Georgi Gerganov 2024-11-28 14:56:23 +02:00
  • c202cef168
    ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541) Shupei Fan 2024-11-28 20:52:03 +08:00
  • 2025fa67e9
    kompute : improve backend to pass test_backend_ops (#10542) Sergio López 2024-11-28 12:51:38 +01:00
  • c6bc73951e
    CANN: Update cann.md to display correctly in CLion (#10538) Ruixin Huang 2024-11-28 15:27:11 +08:00
  • 605fa66c50
    CANN: Fix SOC_TYPE compile bug (#10519) leo-pony 2024-11-28 15:25:24 +08:00
  • b7420131bf
    CANN: ROPE operator optimization (#10540) Chenguang Li 2024-11-28 14:24:46 +08:00
  • 9f912511bc
    common : fix duplicated file name with hf_repo and hf_file (#10550) Xuan Son Nguyen 2024-11-27 22:30:52 +01:00
  • 3ad5451f3b
    Add some minimal optimizations for CDNA (#10498) uvos 2024-11-27 17:10:08 +01:00
  • 46c69e0e75
    ci : faster CUDA toolkit installation method and use ccache (#10537) Diego Devesa 2024-11-27 11:03:25 +01:00
  • 9e2301f4a4
    metal : fix group_norm support condition (#0) Georgi Gerganov 2024-11-27 11:22:14 +02:00
  • fee824a1a1
    sync : ggml Georgi Gerganov 2024-11-27 11:10:42 +02:00
  • 9150f8fef9
    Do not include arm_neon.h when compiling CUDA code (ggml/1028) Frankie Robertson 2024-11-26 15:50:26 +02:00
  • c31ed2abfc
    vulkan: define all quant data structures in types.comp (#10440) Jeff Bolz 2024-11-27 01:32:54 -06:00
  • 5b3466bedf
    vulkan: Handle GPUs with less shared memory (#10468) Jeff Bolz 2024-11-27 01:30:27 -06:00
  • 249a7902ec
    vulkan: further optimize q5_k mul_mat_vec (#10479) Jeff Bolz 2024-11-27 01:21:59 -06:00
  • 71a64989a5
    vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506) Jeff Bolz 2024-11-27 01:08:54 -06:00
  • 4a57d362e1
    vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459) Jeff Bolz 2024-11-27 01:00:50 -06:00
  • c9b00a70b0
    ci : fix cuda releases (#10532) Diego Devesa 2024-11-26 22:12:10 +01:00
  • de5097351c
    Add OLMo 2 model in docs (#10530) Shane A 2024-11-26 12:55:29 -08:00
  • 5a349f2809
    ci : remove nix workflows (#10526) Diego Devesa 2024-11-26 21:13:54 +01:00
  • 30ec398321
    llama : disable warnings for 3rd party sha1 dependency (#10527) Diego Devesa 2024-11-26 21:01:47 +01:00
  • be0e350c8b
    Fix HIP flag inconsistency & build docs (#10524) Tristan Druyen 2024-11-26 19:27:28 +01:00
  • 249cd93da3
    mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516) R0CKSTAR 2024-11-27 00:00:41 +08:00
  • 904109ed0d
    vulkan: fix group_norm (#10496) Jeff Bolz 2024-11-26 09:45:05 -06:00
  • 45abe0f74e
    server : replace behave with pytest (#10416) Xuan Son Nguyen 2024-11-26 16:20:18 +01:00
  • 0bbd2262a3
    restore the condistion to build & update pacakge when merge (#10507) Neo Zhang Jianyu 2024-11-26 21:43:47 +08:00
  • ab96610b1e
    cmake : enable warnings in llama (#10474) Georgi Gerganov 2024-11-26 14:18:08 +02:00
  • 7db3846a94
    ci : publish the docker images created during scheduled runs (#10515) Diego Devesa 2024-11-26 13:05:20 +01:00
  • c6807b3f28
    ci : add ubuntu cuda build, build with one arch on windows (#10456) Diego Devesa 2024-11-26 13:05:07 +01:00
  • 25669aa92c
    ggml-cpu: cmake add arm64 cpu feature check for macos (#10487) Charles Xu 2024-11-26 12:37:05 +01:00
  • 84e1c33cde
    server : fix parallel speculative decoding (#10513) Georgi Gerganov 2024-11-26 13:36:40 +02:00
  • 811872a59d
    speculative : simplify the implementation (#10504) Georgi Gerganov 2024-11-26 12:29:38 +02:00
  • 9a4b79bcfa
    CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454) Shanshan Shen 2024-11-26 18:08:37 +08:00
  • 7066b4cce2
    CANN: RoPE and CANCAT operator optimization (#10488) Chenguang Li 2024-11-26 17:31:05 +08:00
  • 0eb4e12bee
    vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484) Junil Kim 2024-11-26 10:47:20 +09:00
  • 0cc63754b8
    Introduce llama-run (#10291) Eric Curtin 2024-11-25 16:56:24 -05:00
  • 50d5cecbda
    ci : build docker images only once daily (#10503) Diego Devesa 2024-11-25 22:05:39 +01:00
  • 9fd8c2687f
    server : add more information about error (#10455) Georgi Gerganov 2024-11-25 22:28:27 +02:00
  • 47f931c8f9
    server : enable cache_prompt by default (#10501) Georgi Gerganov 2024-11-25 21:50:07 +02:00
  • 106964e3d2
    metal : enable mat-vec kernels for bs <= 4 (#10491) Georgi Gerganov 2024-11-25 21:49:31 +02:00
  • 80acb7b430
    Rename Olmo1124 to Olmo2 (#10500) Shane A 2024-11-25 10:36:09 -08:00
  • 10bce0450f
    llama : accept a list of devices to use to offload a model (#10497) Diego Devesa 2024-11-25 19:30:06 +01:00
  • 1f922254f0
    Github: update issue templates [no ci] (#10489) Johannes Gäßler 2024-11-25 19:18:37 +01:00
  • a9a678a6b2
    Add download chat feature to server chat (#10481) brucepro 2024-11-25 08:11:55 -08:00
  • 9ca2e67762
    server : add speculative decoding support (#10455) Georgi Gerganov 2024-11-25 16:31:38 +02:00
  • 5931c1f233
    ggml : add support for dynamic loading of backends (#10469) Diego Devesa 2024-11-25 15:13:39 +01:00
  • f6d12e7df8
    tests : fix compile warning Georgi Gerganov 2024-11-25 15:17:32 +02:00
  • b756441104
    metal : minor code formatting Georgi Gerganov 2024-11-25 15:08:04 +02:00
  • 5a8987793f
    [SYCL] Fix building Win package for oneAPI 2025.0 update (#10483) Neo Zhang Jianyu 2024-11-25 17:31:10 +08:00
  • d9d54e498d
    speculative : refactor and add a simpler example (#10362) Georgi Gerganov 2024-11-25 09:58:41 +02:00
  • cce5a90075
    flake.lock: Update (#10470) Georgi Gerganov 2024-11-24 18:03:25 +02:00
  • dc39012cba
    llama : fix op mul check with command-r-plus (#10476) Diego Devesa 2024-11-24 16:10:26 +01:00
  • 9336db462c
    convert : XLMRoberta Type Vocab Size (#10458) Gabe Goodhart 2024-11-24 02:02:34 -07:00
  • 96fa2c5e2d
    fix gguf-py: Conversion error when multiple licenses are configured (#9807) momonga 2024-11-24 09:09:22 +09:00
  • 55ed008b2d
    ggml : do not use ARM features not included in the build (#10457) Diego Devesa 2024-11-23 14:41:12 +01:00
  • 6dfcfef078
    ci: Update oneAPI runtime dll packaging (#10428) 蕭澧邦 2024-11-22 17:44:08 +08:00
  • 599b3e0cd4
    GitHub: ask for more info in issue templates (#10426) Johannes Gäßler 2024-11-22 08:32:40 +01:00
  • c18610b4ee
    CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216) leo-pony 2024-11-22 14:07:20 +08:00
  • a5e47592b6
    cuda : optimize argmax (#10441) Diego Devesa 2024-11-21 18:18:50 +01:00
  • 1bb30bf28c
    llama : handle KV shift for recurrent models (#10402) Georgi Gerganov 2024-11-21 10:22:47 +02:00
  • 87a533be57
    sync : ggml Georgi Gerganov 2024-11-21 09:22:11 +02:00
  • 59b9172822
    ggml/sched : do not skip views in pre-assignments slaren 2024-11-20 13:25:08 +01:00
  • 02e4eaf22f
    ggml-opt: fix data corruption (ggml/1022) Johannes Gäßler 2024-11-20 14:56:04 +01:00
  • 9abe9eeae9
    vulkan: predicate max operation in soft_max shaders/soft_max (#10437) Jeff Bolz 2024-11-20 13:47:36 -06:00
  • f95caa7954
    cmake: add link dependencies to cmake find pkg (#10433) bandoti 2024-11-20 12:22:19 -04:00
  • fab5d30ff6
    llama : add .clang-format file (#10415) Diego Devesa 2024-11-20 12:57:53 +01:00