Commit graph

  • 42158ae2e8
    server : fix first message identification (#13634) Dorin-Andrei Geman 2025-05-21 16:07:57 +03:00
  • 797f2ac062
    kv-cache : simplify the interface (#13660) Georgi Gerganov 2025-05-21 15:11:13 +03:00
  • b44890df2e
    model : disable SWA for Phi models (#13676) Georgi Gerganov 2025-05-21 13:09:21 +03:00
  • 33983057d0
    musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647) R0CKSTAR 2025-05-21 09:58:49 +08:00
  • fb1cab201c
    vulkan: fix warnings (#13626) Eve 2025-05-20 21:35:16 +00:00
  • b7a17463ec
    mtmd-helper : bug fix to token batching in mtmd (#13650) l3utterfly 2025-05-21 00:55:30 +08:00
  • be0239693c
    model : fix llama4 graph (#13663) Georgi Gerganov 2025-05-20 19:21:04 +03:00
  • a4090d1174
    llama : remove llama_kv_cache_view API + remove deprecated (#13653) Georgi Gerganov 2025-05-20 16:13:16 +03:00
  • b69f1647f9
    CUDA: skip fully masked-out KV in FA vec kernel (#13584) Johannes Gäßler 2025-05-20 14:45:07 +02:00
  • 759e37b0d8
    tests : avoid github urls due to throttling (#13654) Sigbjørn Skjæret 2025-05-20 12:03:17 +02:00
  • 4245e622e0
    sycl: disable reorder for sycl mulmat (#13536) Svetlozar Georgiev 2025-05-20 10:34:15 +01:00
  • c9c64dee57
    Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output (#13639) 0cc4m 2025-05-20 10:11:56 +02:00
  • c00a2634be
    metal : fix typo in FA kernel comments (#13651) Georgi Gerganov 2025-05-20 10:41:40 +03:00
  • e298d2fbd0
    kv-cache : add SWA support (#13194) Georgi Gerganov 2025-05-20 08:05:46 +03:00
  • f0adb80bf7
    CANN: Update CANN model support (#13162) Xinpeng Dou 2025-05-20 11:43:43 +08:00
  • f7c9429c85
    sycl : Overcoming workaround for mmap() allocation on Windows (#13482) Nicolò Scipione 2025-05-20 02:54:43 +02:00
  • 1dfbf2cf3a
    common : add load_progress_callback (#13617) psocolovsky 2025-05-19 21:17:36 +02:00
  • 8960efd0a6
    Vulkan: Add f32 accumulator support to quantized mul mat to fix GLM4 32B incoherence (#13607) 0cc4m 2025-05-19 17:54:08 +02:00
  • 725f23f1f3
    sycl : backend documentation review (#13544) Alberto Cabrera Pérez 2025-05-19 14:38:20 +01:00
  • 92ecdcc06a
    mtmd : add vision support for llama 4 (#13282) Xuan-Son Nguyen 2025-05-19 13:04:14 +02:00
  • f71f40a284
    ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532) Alberto Cabrera Pérez 2025-05-19 11:46:09 +01:00
  • d30cb5a7fa sync : ggml Georgi Gerganov 2025-05-19 12:50:29 +03:00
  • 6c35981a64 mnist: fix segmentation fault (ggml/1227) Johannes Gäßler 2025-05-19 09:33:35 +02:00
  • 8b5e19aea6 ggml : fix apple OS check in ggml_print_backtrace (ggml/1229) Diego Devesa 2025-05-18 18:30:13 -07:00
  • 60aea028b5 ggml : Fix missing backtrace on Linux (ggml/1228) Daniel Tang 2025-05-17 19:06:26 -04:00
  • 9c55e5c5c2
    fix: check model pointer validity before use (#13631) Nick 2025-05-19 18:25:41 +08:00
  • 33d7aed4a8
    CANN: Support MOE Model MUL_MAT_ID (#13042) Chenguang Li 2025-05-19 14:21:17 +08:00
  • 6a2bc8bfb7
    server : added --no-prefill-assistant flag (#13608) Isaac McFadyen 2025-05-17 17:59:48 -04:00
  • e3a7cf6c5b
    cmake: use the current build config for vulkan-shaders-gen (#13595) Gilad S. 2025-05-17 21:26:43 +03:00
  • 518329b2d4
    parallel : add option for non-shared and larger prompts (#13598) Georgi Gerganov 2025-05-17 12:58:55 +03:00
  • 2f5a4e1e09
    vulkan: move common FA code to flash_attn_base.comp (#13556) Jeff Bolz 2025-05-17 16:14:55 +09:00
  • 4f41ee11d6
    vulkan: use scalar FA rather than coopmat2 when N==1 (#13554) Jeff Bolz 2025-05-17 15:35:47 +09:00
  • 3e0be1cace
    llguidance : official v0.7.20 release (no actual changes) [noci] (#13594) Z 2025-05-16 14:56:28 -06:00
  • 6aa892ec2a
    server : do not return error out of context (with ctx shift disabled) (#13577) Xuan-Son Nguyen 2025-05-16 21:50:00 +02:00
  • aea9f8b4e7
    webui : improve accessibility for visually impaired people (#13551) Xuan-Son Nguyen 2025-05-16 21:49:01 +02:00
  • 06c1e4abc1
    readme : add list of dependencies and their license (#13591) Xuan-Son Nguyen 2025-05-16 20:04:18 +02:00
  • 415e40a357
    releases : use arm version of curl for arm releases (#13592) Diego Devesa 2025-05-16 10:36:51 -07:00
  • 654a67794f
    metal : add FA-vec kernel for head size 64 (#13583) Georgi Gerganov 2025-05-16 20:32:58 +03:00
  • 5364ae4ba5
    llama : print hint when loading a model when no backends are loaded (#13589) Diego Devesa 2025-05-16 07:38:07 -07:00
  • 7c07ac244d
    ci : add ppc64el to build-linux-cross (#13575) Sigbjørn Skjæret 2025-05-16 14:54:23 +02:00
  • 0a338ed013
    sycl : fixed compilation warnings (#13582) Łukasz Ślusarczyk 2025-05-16 12:15:29 +02:00
  • bc098c3cf0
    minja: sync (qwen3) (#13573) Olivier Chafik 2025-05-15 23:29:10 +01:00
  • c6a2c9e741
    gguf : use ggml log system (#13571) Diego Devesa 2025-05-15 10:13:11 -07:00
  • 07ad2b6db3
    gguf-py : fix disconnect-before-connect in editor-gui (#13569) Daniel Tang 2025-05-15 12:47:10 -04:00
  • c531edfa34
    convert : fix conversion for llama 4 (#13567) Xuan-Son Nguyen 2025-05-15 17:40:07 +02:00
  • 02cdd2d8b0
    sycl: simplify bin_bcast_kernel (#13383) Atharva Dubey 2025-05-15 16:39:52 +01:00
  • 64bb51cf90
    sycl: reordered Q4_K MMVQ (#13109) Svetlozar Georgiev 2025-05-15 16:35:44 +01:00
  • 9c404ed54c
    sycl: use oneDNN for matrices multiplication (#12972) Łukasz Ślusarczyk 2025-05-15 16:53:41 +02:00
  • 6c8b91500e
    llama-bench : fix -ot with dl backends (#13563) Diego Devesa 2025-05-15 06:46:55 -07:00
  • 3cc1f1f1d2
    webui : handle PDF input (as text or image) + convert pasted long content to file (#13562) Xuan-Son Nguyen 2025-05-15 14:24:50 +02:00
  • c753d7bed0
    server : proper error handling for missing elements in messages array (OpenAI compatible backend) (#13540) Piotr Wilkin (ilintar) 2025-05-15 08:40:58 +02:00
  • b2838049cc
    bench : handle decode errors (#13548) Georgi Gerganov 2025-05-15 05:57:02 +03:00
  • aa48e373f2
    server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802) Olivier Chafik 2025-05-15 02:39:51 +01:00
  • e3a9421b78
    kv-cache : fix out-of-bounds view during reserve graph (#13547) Georgi Gerganov 2025-05-14 23:15:15 +03:00
  • 5ab5d5fb25
    arm64: optimize q6_k_q8_k kernel with i8mm (#13519) Yibo Cai 2025-05-15 03:53:52 +08:00
  • 3198405e98
    common: add partial regex support (#12808) Olivier Chafik 2025-05-14 19:50:57 +01:00
  • f5170c1d7a
    editorconfig : fix trailing whitespace from #13542 (#13546) Sigbjørn Skjæret 2025-05-14 20:22:49 +02:00
  • 017f10b5fa
    fix: crash when calling llama_state_get_size on a context without a KV cache (#13542) Gilad S. 2025-05-14 19:18:18 +03:00
  • 4696d56749
    CUDA: fix crash on large batch size for quant. MoE (#13537) Johannes Gäßler 2025-05-14 16:41:02 +02:00
  • b7d2672082
    llama : fix quantize with dl backends (#13539) Diego Devesa 2025-05-14 07:12:36 -07:00
  • 6da34fa276
    CUDA: faster Deepseek FA, add Turing support (#13435) Johannes Gäßler 2025-05-14 16:08:20 +02:00
  • 5e7d95e22e
    fix: Move build_inp_pos to the top of the graph section for build_granite (#13538) Gabe Goodhart 2025-05-14 06:53:59 -06:00
  • 053174436f
    server : passthrough the /models endpoint during loading (#13535) Georgi Gerganov 2025-05-14 15:42:10 +03:00
  • 360a9c98e1
    server : fix cache_tokens bug with no cache_prompt (#13533) Xuan-Son Nguyen 2025-05-14 13:35:07 +02:00
  • 09d13d94fb
    cmake: simplify vulkan shader test logic (#13263) bandoti 2025-05-14 07:53:57 -03:00
  • 24e86cae72
    vulkan: KHR_coopmat flash attention (#13506) Jeff Bolz 2025-05-14 18:55:26 +09:00
  • bb1681fbd5
    webui : use fflate for more deterministic gzip compress (#13525) Xuan-Son Nguyen 2025-05-14 10:26:12 +02:00
  • d486dd3e8e
    webui: Allow pasting file from clipboard (#13526) Luca Stefani 2025-05-14 10:07:31 +02:00
  • 21ca987fba
    docs: Update link to ggml-org in multimodal.md (#13513) ddpasa 2025-05-14 09:59:12 +02:00
  • be1d4a13db
    scripts : fix compare-llama-bench.py show parameter (#13514) Sigbjørn Skjæret 2025-05-14 08:41:01 +02:00
  • ab3971f2a0
    vulkan: workaround FA compile failures on macos (#13517) Jeff Bolz 2025-05-14 13:15:50 +09:00
  • e5c834f718
    quantize : improve tensor-type pattern matching (#13033) Ed Addario 2025-05-13 18:12:31 +01:00
  • 71bdbdb587
    clip : clip.h become private API (⚠️ breaking change) (#13510) Xuan-Son Nguyen 2025-05-13 17:07:21 +02:00
  • f0995d28ce
    metal : use FA-vec kernel up to batch size 20 (#13496) Georgi Gerganov 2025-05-13 18:04:39 +03:00
  • c252e0c409
    metal : optimize multi-sequence FA vec kernel (#13493) Georgi Gerganov 2025-05-13 18:04:00 +03:00
  • 4f711afed5
    ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509) Dan Johansson 2025-05-13 17:02:28 +02:00
  • b89d605a91
    batched-bench : fix pp batch contents (#13492) Georgi Gerganov 2025-05-13 18:01:53 +03:00
  • b4726345ac
    mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (#13460) Xuan-Son Nguyen 2025-05-13 15:33:58 +02:00
  • bf79371120
    scripts : support arbitrary input file formats in compare-llama-bench.py (#13455) Sigbjørn Skjæret 2025-05-13 15:31:12 +02:00
  • d590cd4c24
    model : Granite MoE shared (#13269) Gabe Goodhart 2025-05-13 07:12:01 -06:00
  • 1e2809bc4b sync : ggml Georgi Gerganov 2025-05-13 14:01:45 +03:00
  • cf0a43bb64
    llama-bench : add defrag-thold, check for invalid ranges (#13487) Diego Devesa 2025-05-12 15:31:37 -07:00
  • f0d46ef157
    opencl: remove unnecessary assert for add (#13257) lhez 2025-05-12 13:13:49 -07:00
  • de4c07f937
    clip : cap max image size 1024 for qwen vl model (#13478) Xuan-Son Nguyen 2025-05-12 15:06:51 +02:00
  • 10d2af0eaa
    llama/ggml: add LLM training support (#10544) Johannes Gäßler 2025-05-12 14:44:49 +02:00
  • 064cc596ac
    context : fix state io for memory-less contexts (#13470) Georgi Gerganov 2025-05-12 15:12:27 +03:00
  • 91159ee9df
    server : allow content to be null in oaicompat_completion_params_parse (#13477) Anudit Nagar 2025-05-12 18:56:42 +07:00
  • 22cdab343b
    llama-bench : accept ranges for integer parameters (#13410) Diego Devesa 2025-05-12 13:08:22 +02:00
  • a71a4075cd
    ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053) Dan Johansson 2025-05-12 13:06:19 +02:00
  • 95e18884fc
    CUDA: fix misaligned synchronization in FA (#13469) Johannes Gäßler 2025-05-12 10:51:21 +02:00
  • df8491922f
    ggml : add mrope kernel for metal (#13457) Xuan-Son Nguyen 2025-05-12 10:29:13 +02:00
  • 14492144c2
    enable dpcpp nightly builds with libraries (#13406) Atharva Dubey 2025-05-12 06:15:32 +01:00
  • c104023994
    mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459) City 2025-05-12 00:39:06 +02:00
  • 9a390c4829
    tools : fix uninitialized llama_batch in server (#13436) Anthony Umfer 2025-05-11 11:08:26 -04:00
  • 09232370fc
    scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451) Sigbjørn Skjæret 2025-05-11 16:20:39 +02:00
  • 7474e00b34
    CUDA: fix crash with partial offloading of MoE (#13439) Johannes Gäßler 2025-05-11 16:09:33 +02:00
  • 7f323a589f
    Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (#13386) David Huang 2025-05-11 20:18:39 +08:00
  • 3eac209319
    mtmd : support InternVL 3 38B and 78B mmproj (#13443) City 2025-05-11 11:35:52 +02:00
  • a634d75d1b
    mtmd : move helpers to dedicated file (#13442) Xuan-Son Nguyen 2025-05-11 11:34:23 +02:00
  • 62d4250e52
    docs : Fix typo in InternVL3 model name (#13440) Thomas Germer 2025-05-10 22:26:46 +02:00