Commit graph

  • 8fd4b7fa29
    vulkan: copy iq4_nl LUT into shared memory (#10409) Jeff Bolz 2024-11-20 01:40:18 -06:00
  • 1bacb9f625
    vulkan: further optimize mul_mat_vec using larger loads (#10387) Jeff Bolz 2024-11-20 01:11:00 -06:00
  • ad21c9e1f1
    update rel to 4040 (#10395) Neo Zhang Jianyu 2024-11-20 13:54:25 +08:00
  • 3952a221af
    Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413) Anthony Van de Gejuchte 2024-11-19 23:18:17 +01:00
  • 42ae10bbcd
    add cmake rvv support (#10411) haopeng 2024-11-20 04:10:31 +08:00
  • 9fe0fb0626 sync : ggml Georgi Gerganov 2024-11-19 19:15:50 +02:00
  • 611fabd792 metal : fox offset integer overflows in im2col (ggml/1015) Plamen Minev 2024-11-18 15:02:27 +02:00
  • 12b0ad953a metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) PAB 2024-11-18 10:02:49 +01:00
  • 342397dc7e
    cmake: force MSVC compiler charset to utf-8 (#9989) 蕭澧邦 2024-11-20 01:42:00 +08:00
  • 2a11b6b094
    Add required ggml-base and backend libs to cmake pkg (#10407) bandoti 2024-11-19 12:10:30 -04:00
  • 3ee6382d48
    cuda : fix CUDA_FLAGS not being applied (#10403) Diego Devesa 2024-11-19 14:29:38 +01:00
  • 8e752a777b
    llama : add check for KV cache shifts (#10401) Georgi Gerganov 2024-11-19 13:29:26 +02:00
  • a88ad007de
    llama : add OLMo November 2024 support (#10394) Shane A 2024-11-19 01:04:08 -08:00
  • 2a1507c162
    sycl : Add option to set the SYCL architecture for all targets (#10266) Romain Biessy 2024-11-19 09:02:23 +01:00
  • b3e585988f
    vulkan: Optimize soft_max (#10301) Jeff Bolz 2024-11-19 01:25:17 -06:00
  • 557924f222
    sycl: Revert MUL_MAT_OP support changes (#10385) Alberto Cabrera Pérez 2024-11-19 00:50:04 +00:00
  • d3481e6316
    cuda : only use native when supported by cmake (#10389) Diego Devesa 2024-11-18 18:43:40 +01:00
  • 531cb1c233
    Skip searching root path for cross-compile builds (#10383) bandoti 2024-11-18 11:23:58 -04:00
  • f139d2ea61
    vulkan: remove use of null initializer (#10372) Jeff Bolz 2024-11-18 08:28:42 -06:00
  • 2eb76b2a5e
    flake.lock: Update (#10346) Georgi Gerganov 2024-11-18 16:08:20 +02:00
  • 9b75f03cd2
    Vulkan: Fix device info output format specifiers (#10366) 0cc4m 2024-11-18 11:02:43 +01:00
  • 75207b3a88
    docker: use GGML_NATIVE=OFF (#10368) Johannes Gäßler 2024-11-18 00:21:53 +01:00
  • 76e9e58b78
    CUDA: fix MMV kernel being used for FP16 src1 (#10357) Johannes Gäßler 2024-11-17 23:20:42 +01:00
  • ce2e59ba10
    CMake: fix typo in comment [no ci] (#10360) Johannes Gäßler 2024-11-17 12:59:38 +01:00
  • be5caccef9
    llama : only use default buffer types for the KV cache (#10358) Diego Devesa 2024-11-17 12:25:45 +01:00
  • 20a780c7b6
    gitignore : ignore local run scripts [no ci] Georgi Gerganov 2024-11-17 13:12:22 +02:00
  • cf32a9b93a
    metal : refactor kernel args into structs (#10238) Georgi Gerganov 2024-11-17 11:23:01 +02:00
  • a43178299c
    ggml : fix undefined reference to 'getcpu' (#10354) FirstTimeEZ 2024-11-17 21:39:22 +13:00
  • c3ea58aca4
    CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) Johannes Gäßler 2024-11-17 09:09:55 +01:00
  • 467576b6cc
    CMake: default to -arch=native for CUDA build (#10320) Johannes Gäßler 2024-11-17 09:06:34 +01:00
  • eda7e1d4f5
    ggml : fix possible buffer use after free in sched reserve (#9930) Diego Devesa 2024-11-17 07:31:17 +01:00
  • 24203e9dd7 ggml : inttypes.h -> cinttypes (#0) Georgi Gerganov 2024-11-16 23:40:39 +02:00
  • 5d9e59979c ggml : adapt AMX to tensor->grad removal (#0) Georgi Gerganov 2024-11-16 21:38:01 +02:00
  • a4200cafad make : add ggml-opt (#0) Georgi Gerganov 2024-11-16 21:35:31 +02:00
  • 84274a10c3 tests : remove test-grad0 Georgi Gerganov 2024-11-16 21:34:03 +02:00
  • 68fcb4759c ggml : fix compile warnings (#0) Georgi Gerganov 2024-11-16 21:32:41 +02:00
  • 8a43e940ab ggml: new optimization interface (ggml/988) Johannes Gäßler 2024-11-16 22:17:59 +02:00
  • 5c9a8b22b1 scripts : update sync Georgi Gerganov 2024-11-16 22:16:04 +02:00
  • 0fff7fd798
    docs : vulkan build instructions to use git bash mingw64 (#10303) FirstTimeEZ 2024-11-17 12:29:18 +13:00
  • 4e54be0ec6
    llama/ex: remove --logdir argument (#10339) Johannes Gäßler 2024-11-16 23:00:41 +01:00
  • db4cfd5dbc llamafile : fix include path (#0) Georgi Gerganov 2024-11-16 17:58:56 +02:00
  • 8ee0d09ae6 make : auto-determine dependencies (#0) Georgi Gerganov 2024-11-16 17:58:32 +02:00
  • bcdb7a2386
    server: (web UI) Add samplers sequence customization (#10255) MaggotHATE 2024-11-16 18:26:54 +05:00
  • f245cc28d4
    scripts : fix missing key in compare-llama-bench.py (#10332) Georgi Gerganov 2024-11-16 10:32:50 +02:00
  • 772703c8ff
    vulkan: Optimize some mat-vec mul quant shaders (#10296) Jeff Bolz 2024-11-16 00:26:57 -06:00
  • dd3a6ce9f8
    vulkan : add cmake preset debug/release (#10306) FirstTimeEZ 2024-11-16 14:59:33 +13:00
  • 1e58ee1318
    ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) Dan Johansson 2024-11-16 01:53:37 +01:00
  • 89e4caaaf0
    llama : save number of parameters and the size in llama_model (#10286) FirstTimeEZ 2024-11-16 13:42:13 +13:00
  • 74d73dc85c
    Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) Srihari-mcw 2024-11-16 02:57:00 +05:30
  • 4047be74da
    scripts: update compare-llama-bench.py (#10319) Johannes Gäßler 2024-11-15 21:19:03 +01:00
  • 883d206fbd ggml : fix some build issues slaren 2024-11-15 20:20:54 +01:00
  • 09ecbcb596 cmake : fix ppc64 check (whisper/0) Georgi Gerganov 2024-11-15 15:35:22 +02:00
  • 3225008973 ggml : vulkan logs (whisper/2547) thewh1teagle 2024-11-15 15:33:53 +02:00
  • cbf5541a82 sync : ggml Georgi Gerganov 2024-11-15 15:31:16 +02:00
  • 18429220bd
    AVX BF16 and single scale quant optimizations (#10212) Eve 2024-11-15 11:47:58 +00:00
  • f0204a0ec7
    ci: build test musa with cmake (#10298) R0CKSTAR 2024-11-15 19:47:25 +08:00
  • 57f8355b29
    sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) Romain Biessy 2024-11-15 12:10:45 +01:00
  • 9901068ac7
    server : (web UI) add copy button for code block, fix api key (#10242) Xuan Son Nguyen 2024-11-15 05:48:49 -04:00
  • 231f9360d9
    cann: dockerfile and doc adjustment (#10302) Chenguang Li 2024-11-15 15:09:35 +08:00
  • 4802ad350b
    scripts : fix regex in sync [no ci] Georgi Gerganov 2024-11-15 08:38:43 +02:00
  • 5a54af4d4f
    sycl: Use syclcompat::dp4a (#10267) Romain Biessy 2024-11-15 04:09:12 +01:00
  • 1607a5e5b0
    backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) Charles Xu 2024-11-15 01:28:50 +01:00
  • ae8de6d50a
    ggml : build backends as libraries (#10256) Diego Devesa 2024-11-14 18:04:35 +01:00
  • 4a8ccb37ad
    CUDA: no -sm row for very small matrices (#10185) Johannes Gäßler 2024-11-14 13:00:15 +01:00
  • 2a82891a85
    speculative : fix out-of-bounds access (#10289) Georgi Gerganov 2024-11-14 11:44:15 +02:00
  • af148c9386
    vulkan: Optimize binary ops (#10270) Jeff Bolz 2024-11-13 23:22:55 -06:00
  • 66798e42fb
    vulkan: Use macros to make the mat mul pipeline creation more concise (#10259) Jeff Bolz 2024-11-13 14:59:47 -06:00
  • fb4a0ec083
    llama : propagate the results of graph_compute (#9525) Michael Podvitskiy 2024-11-13 20:00:35 +02:00
  • 5ea926dad7
    sync : ggml Georgi Gerganov 2024-11-13 18:11:54 +02:00
  • 1ee9eea094
    docs : update bindings list (#10261) Small Grass Forest 2024-11-13 19:17:10 +08:00
  • ff7fb670d0
    server : add missing docs (#10269) Alexey Parfenov 2024-11-13 11:16:30 +00:00
  • 0e712a5acb
    server : fix incorrect res in validate_model_chat_template (#10272) Jhen-Jie Hong 2024-11-13 19:15:23 +08:00
  • a0ec17b32e
    metadata: Detailed Dataset Authorship Metadata (#8875) Brian 2024-11-13 21:10:38 +11:00
  • 2e82ffa4af
    sycl : Fixes to broken builds and test-backend-ops (#10257) Alberto Cabrera Pérez 2024-11-13 09:40:57 +00:00
  • 80dd7ff22f
    vulkan: Optimize contiguous copies (#10254) Jeff Bolz 2024-11-13 00:58:57 -06:00
  • 54ef9cfc72
    vulkan: Throttle the number of shader compiles during the build step. (#10222) Jeff Bolz 2024-11-11 11:13:51 -06:00
  • b0cefea58a
    metal : more precise Q*K in FA vec kernel (#10247) Georgi Gerganov 2024-11-11 08:39:13 +02:00
  • b141e5f6ef
    server : enable KV cache defrag by default (#10233) Georgi Gerganov 2024-11-11 08:38:43 +02:00
  • 4b3a9212b6
    flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
  • 505f33274d
    server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
  • 160687b3ed
    vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) Jeff Bolz 2024-11-10 05:37:56 -06:00
  • 6423c65aa8
    metal : reorder write loop in mul mat kernel + style (#10231) Georgi Gerganov 2024-11-09 11:53:13 +02:00
  • 39a334a9aa
    metal : fix build and some more comments (#10229) Georgi Gerganov 2024-11-09 11:53:02 +02:00
  • bb38cdd8ba
    metal : fix F32 accumulation in FA vec kernel (#10232) Georgi Gerganov 2024-11-09 11:52:45 +02:00
  • f018acba22
    llama : fix Qwen model type strings Georgi Gerganov 2024-11-09 11:26:34 +02:00
  • 46323fa9ef
    metal : hide debug messages from normal log Georgi Gerganov 2024-11-09 11:21:49 +02:00
  • 5b359bb1e3
    ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) SXX 2024-11-09 15:35:46 +08:00
  • e89213492d
    ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) amritahs-ibm 2024-11-09 12:47:50 +05:30
  • 8fc393f246
    scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
  • ec450d3bbf
    metal : opt-in compile flag for BF16 (#10218) Georgi Gerganov 2024-11-08 21:59:46 +02:00
  • 695ad752b2
    metal : improve clarity (minor) (#10171) Georgi Gerganov 2024-11-08 18:37:41 +02:00
  • 841f27abdb
    metal : optimize FA kernels (#10171) Georgi Gerganov 2024-11-08 13:47:22 +02:00
  • d05b3127bd
    swift : exclude ggml-metal-embed.metal (#10211) Jhen-Jie Hong 2024-11-08 17:34:06 +08:00
  • 76c6e7f105
    server : minor UI fix (#10207) Xuan Son Nguyen 2024-11-07 18:44:38 -04:00
  • a71d81cf8c
    server : revamp chat UI with vuejs and daisyui (#10175) Xuan Son Nguyen 2024-11-07 17:31:10 -04:00
  • eec4d71737
    scripts : add amx to sync-ggml.sh [no ci] Georgi Gerganov 2024-11-07 23:11:36 +02:00
  • 3b08828674
    sync : ggml Georgi Gerganov 2024-11-07 23:08:24 +02:00
  • a2c6fd747c
    scripts : sync update Georgi Gerganov 2024-11-07 23:07:55 +02:00
  • 97404c4a03
    ggml : add ggml-cpu.h to the public headers (#10204) Diego Devesa 2024-11-07 18:16:08 +01:00
  • 60e17ce23c
    Remove identical wte/etw logic for jais (#10203) Faisal Zaghloul 2024-11-07 11:46:12 -05:00