Commit graph

  • 80f19b4186
    opencl: split ggml-opencl.cl into multiple files and cleanup (#12886) lhez 2025-04-15 12:26:00 -07:00
  • f8f820cc4d
    metal : add FA-vec kernels for head size 96 (#12952) Georgi Gerganov 2025-04-15 14:45:05 +03:00
  • 54a7272043
    CANN: Add x86 build ci (#12950) hipudding 2025-04-15 19:08:55 +08:00
  • 84778e9770
    CUDA/HIP: Share the same unified memory allocation logic. (#12934) David Huang 2025-04-15 17:20:38 +08:00
  • 510676475f
    SYCL: Add ROPE vision kernel (#12887) Akarshan Biswas 2025-04-15 14:07:42 +05:30
  • daa422881a
    llama : DeepSeek V2/V3 MLA implementation (#12801) Juk Armstrong 2025-04-15 07:49:57 +01:00
  • eccc7a1602
    ggml : Add AVX512 implementation of GEMM - Q4_Kx8 (#12829) Srihari-mcw 2025-04-15 11:52:36 +05:30
  • 0019279bb5
    CANN: Opt ROPE optimization (#12865) Chenguang Li 2025-04-15 10:09:35 +08:00
  • b0c75ac9f9
    CANN: Optimize CANN buffer pool memory management (#12875) Xinpeng Dou 2025-04-15 10:04:24 +08:00
  • d6d2c2ab8c
    Add performance print for gemma3 in example (#12929) Russyyds 2025-04-15 01:18:20 +08:00
  • 75afa0ae31
    SYCL: Fix im2col (#12910) Akarshan Biswas 2025-04-14 17:53:53 +05:30
  • c772d54926
    rpc : use ggml_context_ptr (#12938) Radoslav Gerganov 2025-04-14 13:59:34 +03:00
  • 81c7e64fc2
    dsiable curl lib check, this action is missed by commit bd3f59f812 (#12761) (#12937) Neo Zhang Jianyu 2025-04-14 18:19:07 +08:00
  • 526739b879 sync : ggml Georgi Gerganov 2025-04-14 08:52:10 +03:00
  • a25355e264 cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190) cmdr2 2025-04-11 12:14:19 +05:30
  • e959d32b1c
    ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773) SXX 2025-04-14 13:47:55 +08:00
  • 307bfa253d
    ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891) Alan Gray 2025-04-13 22:12:21 +01:00
  • 71e90e8813
    quantize: Handle user-defined quantization levels for additional tensors (#12511) Ed Addario 2025-04-13 19:29:28 +01:00
  • bc091a4dc5
    common : Define cache directory on AIX (#12915) Prajwal B Mehendarkar 2025-04-12 21:03:39 +05:30
  • a4837577aa
    vulkan: use aligned loads for flash attention mask (#12853) Jeff Bolz 2025-04-12 03:44:48 -05:00
  • e59ea539b8
    llava: Fix cpu-only clip image encoding sefault (#12907) Matt Clayton 2025-04-12 01:29:03 -04:00
  • c94085df28
    server : add VSCode's Github Copilot Chat support (#12896) Georgi Gerganov 2025-04-11 23:37:41 +03:00
  • e8a62631b3
    rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903) yuri@FreeBSD 2025-04-11 13:04:14 -07:00
  • b6930ebc42
    tool-call: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900) Olivier Chafik 2025-04-11 12:47:52 -07:00
  • 68b08f36d0
    common : Define cache directory on FreeBSD (#12892) yuri@FreeBSD 2025-04-11 12:45:44 -07:00
  • 578754b315
    sycl: Support sycl_ext_oneapi_limited_graph (#12873) Ewan Crawford 2025-04-11 15:32:14 +02:00
  • b2034c2b55
    contrib: support modelscope community (#12664) tastelikefeet 2025-04-11 20:01:56 +08:00
  • 06bb53ad9b
    llama-model : add Glm4Model implementation for GLM-4-0414 (#12867) Yuxuan Zhang 2025-04-11 18:10:10 +08:00
  • 0c50923944
    clip : use smart pointer (⚠️ breaking change) (#12869) Xuan-Son Nguyen 2025-04-11 12:09:39 +02:00
  • fccf9cae83
    SYCL: Add fp16 type support to unary op kernels (#12788) Akarshan Biswas 2025-04-11 13:33:50 +05:30
  • ec6c09d0fa
    convert : Llama4 RoPE fix (#12889) Daniel Han 2025-04-11 00:49:09 -07:00
  • 8ac9f5d765
    ci : Replace freediskspace to free_disk_space in docker.yml (#12861) R0CKSTAR 2025-04-11 15:26:17 +08:00
  • 12e9158f25
    xcf : add check for visionos build version (#12854) Daniel Bevenius 2025-04-11 09:24:34 +02:00
  • 5b1f13cb64
    convert : proper tensor name mapping for llama4 (#12870) Xuan-Son Nguyen 2025-04-11 09:23:37 +02:00
  • 8b91d5355a
    llama : correct rms norm for llama 4 (#12882) Xuan-Son Nguyen 2025-04-11 08:49:50 +02:00
  • 0fed24c347
    ggml: fix compilation error s390x (#12848) Aaron Teo 2025-04-11 13:20:07 +08:00
  • 47ba87d0a4 sync : ggml Georgi Gerganov 2025-04-11 00:08:23 +03:00
  • 1d2b613445 tests : fix init order (#0) Georgi Gerganov 2025-04-11 00:04:25 +03:00
  • eb420e1148 sync : ggml Georgi Gerganov 2025-04-10 23:59:16 +03:00
  • cb79c2e7fa ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) cmdr2 2025-04-10 17:53:08 +05:30
  • fe92821ea9 ggml : add bilinear upscale support (ggml/1185) Diego Devesa 2025-04-09 12:32:13 +02:00
  • 459895c326 ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) Diego Devesa 2025-04-09 12:31:34 +02:00
  • e4bf72d631 scripts : fix sync-ggml-am.sh Georgi Gerganov 2025-04-10 23:59:01 +03:00
  • 8b9cc7cdd8
    llava : introduce libmtmd (#12849) Xuan-Son Nguyen 2025-04-10 22:57:16 +02:00
  • 64eda5deb9
    convert : ability to lazy-load safetensors remotely without downloading to disk (#12820) Xuan-Son Nguyen 2025-04-10 17:24:44 +02:00
  • fe5b78c896
    CANN: Support more ops (#12841) Chenguang Li 2025-04-10 08:51:52 +08:00
  • 11d07e1e69
    Fixes #12823 (#12830) Prajwal B Mehendarkar 2025-04-10 04:48:01 +05:30
  • b0091ecc1e
    docker : added all CPU to GPU images (#12749) Rudi Servo 2025-04-09 23:17:12 +00:00
  • 31f7803bc4
    ggml-cpu-impl.h: do not redefine bool on POWER9 (#12856) Piotr Kubaj 2025-04-09 23:00:34 +00:00
  • 2391506ace
    ggml-impl.h: fix build on POWER9 (#12855) Piotr Kubaj 2025-04-09 23:00:25 +00:00
  • d3bd7193ba
    llama : Support Qwen3 and Qwen3MoE (#12828) Bo Zheng 2025-04-09 17:47:36 +08:00
  • d9a63b2f2e
    musa: enable freediskspace for docker image build (#12839) R0CKSTAR 2025-04-09 17:22:30 +08:00
  • 8ed71242f4
    sycl: update documentation to use -no-cnv (#12845) Romain Biessy 2025-04-09 11:22:04 +02:00
  • 381603a775
    ci: detach common from the library (#12827) Plamen Minev 2025-04-09 11:11:11 +03:00
  • 65a69e6e1b
    clip : do not print ftype (#12832) Xuan-Son Nguyen 2025-04-09 10:09:53 +02:00
  • 47277d6d1d
    readme : add rpc backend (#12842) Georgi Gerganov 2025-04-09 10:54:42 +03:00
  • 6e1c4cebdb
    CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786) Chenguang Li 2025-04-09 14:04:14 +08:00
  • 0090950f67
    vulkan: In coopmat2 mmq, load q4_k/q5_k scales through shared memory (#12833) Jeff Bolz 2025-04-09 00:25:08 -05:00
  • 7ecd780b1a
    vulkan: Use fp16 for the flash attention P*V multiplication (#12783) Jeff Bolz 2025-04-09 00:12:57 -05:00
  • 7538246e7c
    cuda : add f32 to bf16 copy op (#12806) Sigbjørn Skjæret 2025-04-08 23:21:31 +02:00
  • b32efad2bc
    llava: improve clip_ctx destructor to not memleak load_image_size (#12834) Matt Clayton 2025-04-08 16:01:58 -04:00
  • a19b5cef16
    llama : fix FA when KV cache is not used (i.e. embeddings) (#12825) Georgi Gerganov 2025-04-08 19:54:51 +03:00
  • 78a1ba0a4f
    server : fix thread.join() on exit (#12831) Xuan-Son Nguyen 2025-04-08 18:37:06 +02:00
  • 2dabf759e7
    llava: add more helper functions to check projector types in clip context (#12824) dm4 2025-04-08 21:49:13 +08:00
  • 1d343b4069
    arg : Including limits file on AIX (#12822) Prajwal B Mehendarkar 2025-04-08 18:00:59 +05:30
  • 8ca6e1c3a4
    server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785) characharm 2025-04-08 14:14:59 +05:00
  • 656babd6c2
    Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (#12812) Neo Zhang Jianyu 2025-04-08 15:03:21 +08:00
  • a226bc7a9a
    gguf-py : support lazy tensor splitting (#12809) compilade 2025-04-08 03:03:07 -04:00
  • 1466621e73
    llama : Support llama 4 text-only (#12791) Xuan-Son Nguyen 2025-04-07 23:06:44 +02:00
  • 82974011f3
    opencl: better identify Adreno GPU (#12760) lhez 2025-04-07 13:22:54 -07:00
  • 4ccea213bc
    hellaswag: display estimated score confidence interval (#12797) stduhpf 2025-04-07 17:47:08 +02:00
  • 1a1ab7e7a4 cuda : fix HIP and MUSA BF16 (#0) Georgi Gerganov 2025-04-07 13:18:07 +03:00
  • a4e46e28f9 sync : ggml Georgi Gerganov 2025-04-07 12:32:39 +03:00
  • ff067dbcb9 ggml : simplify Arm fp16 CPU logic (ggml/1177) Georgi Gerganov 2025-04-07 12:25:15 +03:00
  • 36ca8b3628 CUDA: don't convert BF16 weights to FP32 (ggml/1174) Sigbjørn Skjæret 2025-04-04 21:05:12 +02:00
  • 995083e4ed cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) cmdr2 2025-04-02 17:46:16 +05:30
  • 518a01480e
    sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (#12734) zhouwg 2025-04-07 23:22:57 +08:00
  • e391d3ee8d
    ci : no curl on ggml-ci (#12796) Xuan-Son Nguyen 2025-04-07 14:37:28 +02:00
  • bd3f59f812
    cmake : enable curl by default (#12761) Xuan-Son Nguyen 2025-04-07 13:35:19 +02:00
  • 52b3d71f12
    CANN: fix typo in ggml-cann (#12733) zhouwg 2025-04-07 19:34:14 +08:00
  • d0d5b2232b
    CANN: Refactor to reduce duplicate code (#12731) hipudding 2025-04-07 17:10:36 +08:00
  • 916c83bfe7
    musa: fix compilation warnings in mp_22/31 (#12780) R0CKSTAR 2025-04-06 21:23:54 +08:00
  • 0c74b04376
    vulkan: fix NaN issue in flash attention shader (#12776) Jeff Bolz 2025-04-06 04:03:47 -05:00
  • 80b717d493
    vulkan: Use unclamped loads for flash attention mask (#12720) Jeff Bolz 2025-04-06 03:47:13 -05:00
  • 6bf28f0111
    Vulkan: Tune Vulkan mmq int dot shader for performance (#12767) 0cc4m 2025-04-05 18:04:03 +02:00
  • f1e3eb4249
    common : fix includes in arg.cpp and gemma3-cli.cpp (#12766) Sergey Fedorov 2025-04-05 23:46:00 +08:00
  • 0364178ca2
    clip : refactor clip_init, add tests (#12757) Xuan-Son Nguyen 2025-04-05 17:17:40 +02:00
  • c6ff5d2a8d
    common: custom hf endpoint support (#12769) エシュナヴァリシア 2025-04-05 21:31:42 +08:00
  • 7a84777f42
    sync: minja (#12739) Olivier Chafik 2025-04-04 13:16:39 -07:00
  • 3e1d29348b
    kv-cache : simplify + fix warning for recurrent models (#12756) Georgi Gerganov 2025-04-04 21:48:10 +03:00
  • 1be76e4620
    ci: add Linux cross-compile build (#12428) bandoti 2025-04-04 14:05:12 -03:00
  • b772394297
    server : webui : Upgrade daisyui, tailwindcss. (#12735) Nauful Shaikh 2025-04-04 09:09:52 -05:00
  • 23106f94ea
    gguf-split : --merge now respects --dry-run option (#12681) nick huang 2025-04-04 22:09:12 +08:00
  • 94148ba330
    sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution (#12625) Nicolò Scipione 2025-04-04 16:00:46 +02:00
  • 9ac4d611d0
    cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747) Ronny Brendel 2025-04-04 15:12:40 +02:00
  • 348888e0dc
    docs : add XCFramework section to README.md [no ci] (#12746) Daniel Bevenius 2025-04-04 10:24:12 +02:00
  • 74d4f5b041
    vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency (#12630) Jeff Bolz 2025-04-04 00:54:35 -05:00
  • 35e592eb30
    vulkan: set cmake minimum and project name in vulkan-shaders (#12744) Jeff Bolz 2025-04-04 00:53:20 -05:00
  • 7d7b1bafa7
    opencl: update doc for OpenCL (#12702) lhez 2025-04-03 22:18:17 -07:00
  • c262beddf2
    CUDA: Prefer vector flash decoding kernel for Gemma models (#12738) Gaurav Garg 2025-04-03 21:50:29 +05:30