Commit graph

  • 0208355f42
    CUDA: fix race conditions FlashAttention kernels (#13438) Johannes Gäßler 2025-05-10 22:22:48 +02:00
  • d2a4ef05c6
    vocab : add ByteDance-Seed/Seed-Coder (#13423) Sigbjørn Skjæret 2025-05-10 22:08:07 +02:00
  • 15e6125a39
    mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434) Xuan-Son Nguyen 2025-05-10 19:57:54 +02:00
  • 3b24d26c22
    server : update docs (#13432) Xuan-Son Nguyen 2025-05-10 18:44:49 +02:00
  • 43dfd741a5
    llguidance : set tokenizer slices to default (#13424) Sigbjørn Skjæret 2025-05-10 17:19:52 +02:00
  • b064a51a4e
    ci: free_disk_space flag enabled for intel variant (#13426) Thammachart Chinvarapon 2025-05-10 21:34:48 +07:00
  • 053367d149
    mtmd : support InternVL 2.5 and 3 (#13422) Xuan-Son Nguyen 2025-05-10 16:26:42 +02:00
  • d8919424f1
    CUDA: fix FlashAttention on Turing (#13415) Johannes Gäßler 2025-05-10 09:16:52 +02:00
  • 7fef11766c
    arg : add env var to control mmproj (#13416) Xuan-Son Nguyen 2025-05-10 08:16:29 +02:00
  • dc1d2adfc0
    vulkan: scalar flash attention implementation (#13324) Jeff Bolz 2025-05-09 23:07:07 -07:00
  • 7c28a74e07
    chore(llguidance): use tagged version that does not break the build (#13413) Helton Reis 2025-05-09 17:15:39 -03:00
  • 33eff40240
    server : vision support via libmtmd (#12898) Xuan-Son Nguyen 2025-05-09 19:29:37 +02:00
  • 17512a94d6
    sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) Alberto Cabrera Pérez 2025-05-09 16:34:08 +01:00
  • 611aa914ef
    metal : optimize MoE for large batches (#13388) Georgi Gerganov 2025-05-09 15:14:56 +03:00
  • 0cf6725e9f
    CUDA: FA support for Deepseek (Ampere or newer) (#13306) Johannes Gäßler 2025-05-09 13:34:58 +02:00
  • 27ebfcacba
    llama : do not crash if there is no CPU backend (#13395) Diego Devesa 2025-05-09 13:02:07 +02:00
  • 5c86c9ed3e
    CUDA: fix crash on large batch size for MoE models (#13384) Johannes Gäßler 2025-05-09 12:14:04 +02:00
  • efb8b47eda
    imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389) Bartowski 2025-05-09 05:53:58 -04:00
  • 0527771dd8
    llama-run: add support for downloading models from ModelScope (#13370) R0CKSTAR 2025-05-09 17:25:50 +08:00
  • 2189fd3b63
    mtmd : fix batch_view for m-rope (#13397) Xuan-Son Nguyen 2025-05-09 11:18:02 +02:00
  • 3f96aeff39
    llama : one-off chat template fix for Mistral-Small-2503 (#13398) Xuan-Son Nguyen 2025-05-09 11:17:51 +02:00
  • b486ba05bf
    rpc : add rpc_msg_set_tensor_hash_req (#13353) Radoslav Gerganov 2025-05-09 10:31:07 +03:00
  • 02115dcd9a
    vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326) Jeff Bolz 2025-05-09 02:23:41 -05:00
  • d9c4accaff
    server : (webui) rename has_multimodal --> modalities (#13393) Xuan-Son Nguyen 2025-05-09 09:06:37 +02:00
  • 15e03282bb
    ci : limit write permission to only the release step + fixes (#13392) Diego Devesa 2025-05-08 23:45:22 +02:00
  • f05a6d71a0
    mtmd : Expose helper_decode_image_chunk (#13366) Matt Clayton 2025-05-08 14:25:39 -04:00
  • ee01d71e58
    server : (webui) fix a very small misalignment (#13387) Xuan-Son Nguyen 2025-05-08 18:51:45 +02:00
  • 8c83449cb7
    server : (webui) revamp the input area, plus many small UI improvements (#13365) Xuan-Son Nguyen 2025-05-08 15:37:29 +02:00
  • 1a844be132
    convert : support rope_scaling type and rope_type (#13349) Sigbjørn Skjæret 2025-05-08 15:34:29 +02:00
  • 0ccc121354
    mtmd : fix the calculation of n_tokens for smolvlm (#13381) welix 2025-05-08 22:03:53 +09:00
  • 6562e5a4d6
    context : allow cache-less context for embeddings (#13108) Georgi Gerganov 2025-05-08 14:28:33 +03:00
  • 51fb96b1ff
    context : remove logits_all flag (#13284) Georgi Gerganov 2025-05-08 14:26:50 +03:00
  • 70a6991edf
    ci : move release workflow to a separate file (#13362) Diego Devesa 2025-05-08 13:15:28 +02:00
  • f061021206
    llama : print size and type of overridden tensors (#13364) Diego Devesa 2025-05-08 13:15:15 +02:00
  • 8733e0cf6e
    sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343) Alberto Cabrera Pérez 2025-05-08 10:08:01 +01:00
  • 814f795e06
    docker : disable arm64 and intel images (#13356) Diego Devesa 2025-05-07 16:36:33 +02:00
  • d879433824 sync : ggml Georgi Gerganov 2025-05-07 16:39:36 +03:00
  • 13b0a04597 whisper: remove MSVC warnings pragmas (whisper/3090) Daniel Bevenius 2025-05-05 13:09:35 +02:00
  • bba9d945c1 cmake : removed stdc++fs (whisper/3097) Jared Tweed 2025-05-02 02:41:35 -07:00
  • bc4e1128f7
    llama : deci : support ffn-free with attention (#13296) Sigbjørn Skjæret 2025-05-07 12:49:27 +02:00
  • 39e73ae0d6
    common : Add a warning when we can't match samplers from a string or char. (#13330) Ycros 2025-05-07 18:23:28 +10:00
  • 1f73301b63
    cuda : remove nrows_x in mul_mat_q_process_tile (#13325) R0CKSTAR 2025-05-07 15:48:23 +08:00
  • 4773d7a02f
    examples : remove infill (#13283) Georgi Gerganov 2025-05-07 10:28:02 +03:00
  • 6c7fd67b64
    llama : support tie embedding for chatglm models (#13328) piDack 2025-05-07 15:23:11 +08:00
  • 141a908a59
    CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135) Johannes Gäßler 2025-05-06 23:35:51 +02:00
  • 32916a4907
    clip : refactor graph builder (#13321) Xuan-Son Nguyen 2025-05-06 22:40:24 +02:00
  • ffc727203a
    sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345) DocShotgun 2025-05-06 13:36:24 -07:00
  • 91a86a6f35
    sampling : don't consider -infinity values in top_n_sigma (#13344) oobabooga 2025-05-06 15:24:15 -03:00
  • f4ed10b69c
    cmake : remove arm64 msvc presets (#13342) Diego Devesa 2025-05-06 20:15:31 +02:00
  • 1e333d5bba
    SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254) Akarshan Biswas 2025-05-06 20:27:06 +05:30
  • 2f54e348ad
    llama : fix build_ffn without gate (#13336) Xuan-Son Nguyen 2025-05-06 14:25:40 +02:00
  • 2356fb1d53
    CUDA: fix bad asserts for partial offload (#13337) Johannes Gäßler 2025-05-06 13:58:51 +02:00
  • 764b85627b
    convert : qwen2/3moe : set yarn metadata if present (#13331) Sigbjørn Skjæret 2025-05-06 11:12:06 +02:00
  • 15a28ec8c7
    CUDA: fix --split-mode row for MMQ (#13323) Johannes Gäßler 2025-05-06 08:36:46 +02:00
  • a7366faa5b
    gguf-py : avoid requiring pyside6 for other scripts (#13036) compilade 2025-05-05 22:27:31 -04:00
  • 9070365020
    CUDA: fix logic for clearing padding with -ngl 0 (#13320) Johannes Gäßler 2025-05-05 22:32:13 +02:00
  • 233461f812
    sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264) oobabooga 2025-05-05 17:12:19 -03:00
  • b34c859146
    server : Webui - change setText command from parent window to also send the message. (#13309) igardev 2025-05-05 17:03:31 +03:00
  • 9b61acf060
    mtmd : rename llava directory to mtmd (#13311) Xuan-Son Nguyen 2025-05-05 16:02:55 +02:00
  • 5215b91e93
    clip : fix confused naming ffn_up and ffn_down (#13290) Xuan-Son Nguyen 2025-05-05 12:54:44 +02:00
  • ae803bfc3d
    convert : bailingmoe : set yarn metadata if present (#13312) Sigbjørn Skjæret 2025-05-05 12:34:26 +02:00
  • 66645a5285
    SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308) Akarshan Biswas 2025-05-05 13:39:10 +05:30
  • 27aa259532
    mtmd : add C public API (#13184) Xuan-Son Nguyen 2025-05-04 23:43:42 +02:00
  • 9fdfcdaedd
    rpc : use backend registry, support dl backends (#13304) Diego Devesa 2025-05-04 21:25:43 +02:00
  • 6eb7d25c70
    ggml : activate s390x simd for Q3_K (#13301) Aaron Teo 2025-05-05 01:49:12 +08:00
  • 86bd60d3fe
    llava/mtmd : fixes to fully support dl backends (#13303) Diego Devesa 2025-05-04 17:05:20 +02:00
  • 9f2da5871f
    llama : build windows releases with dl backends (#13220) Diego Devesa 2025-05-04 14:20:49 +02:00
  • 93c4e23905
    CUDA: fix race condition in MMQ stream-k fixup (#13299) Johannes Gäßler 2025-05-04 14:16:39 +02:00
  • 8afbd96818
    CUDA: fix race condition in MMQ ids_dst (#13294) Johannes Gäßler 2025-05-04 13:58:38 +02:00
  • 8ae5ebcf85
    vulkan: Additional type support for unary, binary, and copy (#13266) Jeff Bolz 2025-05-04 00:17:16 -05:00
  • 3e959f0976
    imatrix: fix oob writes if src1 is not contiguous (#13286) Johannes Gäßler 2025-05-04 00:50:37 +02:00
  • 36667c8edc
    clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259) Xuan-Son Nguyen 2025-05-03 20:07:54 +02:00
  • 3bf785f3ef
    llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843) ymcki 2025-05-03 23:39:51 +08:00
  • 1d36b3670b
    llama : move end-user examples to tools directory (#13249) Diego Devesa 2025-05-02 20:27:13 +02:00
  • b34443923c
    sync : ggml (#13268) Georgi Gerganov 2025-05-02 20:54:30 +03:00
  • a75cb30dc9
    context : fix reorder logic (#13267) Georgi Gerganov 2025-05-02 20:54:13 +03:00
  • 3f3769ba76
    ggml : Enable MMA for BF16 in llamafile_sgemm (#13148) shalinib-ibm 2025-05-02 22:23:12 +05:30
  • 2f567611c0
    llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245) Jared Van Bortel 2025-05-02 11:42:30 -04:00
  • 7d2123484e
    convert : use correct context length for nomic-embed-text-v2 (#13216) Jared Van Bortel 2025-05-02 11:41:54 -04:00
  • 074e42ab31
    convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209) Xuan-Son Nguyen 2025-05-02 17:17:15 +02:00
  • c642bc014c
    kv-cache : separate recurrent vs non-recurrent impl (#12799) Georgi Gerganov 2025-05-02 17:48:36 +03:00
  • cb06a3c363
    llama : orion rope type is neox (#13261) Sigbjørn Skjæret 2025-05-02 12:44:24 +02:00
  • 626083faf7
    llama : plamo rope type is neox (#13260) Sigbjørn Skjæret 2025-05-02 12:40:56 +02:00
  • 2af6880178
    llama-chat : reset glmedge chat template (#13253) piDack 2025-05-02 17:06:09 +08:00
  • e84773ab60
    mtmd-cli : fix out_of_range when input image path is empty (#13244) Shakil Ahmed 2025-05-02 14:20:27 +06:00
  • fab647e884
    server : add cache reuse card link to help (#13230) Georgi Gerganov 2025-05-02 09:48:31 +03:00
  • dcf886007d
    convert : explicitly disable trust_remote_code for AutoConfig (#13246) Xuan-Son Nguyen 2025-05-02 08:45:10 +02:00
  • d24d592808
    ci: fix cross-compile sync issues (#12804) bandoti 2025-05-01 19:06:39 -03:00
  • 8efbdadc61
    rpc : avoid uninitialized memory in serialize_tensor (#13210) Justin Santa Barbara 2025-05-01 17:32:11 -04:00
  • f057808ffa
    ggml: Don't assert fail when tensor data changes (#13222) Jesse Gross 2025-05-01 13:46:10 -07:00
  • d7a14c42a1
    build : fix build info on windows (#13239) Diego Devesa 2025-05-01 21:48:08 +02:00
  • b6e4ff69b8
    clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237) Loïc Carrère 2025-05-01 21:32:21 +02:00
  • e0f572c846
    llama-chat : update GLM4 chat template (#13238) matteo 2025-05-01 21:16:38 +02:00
  • 79f26e9e12
    vulkan: Add bfloat16 support (#12554) Jeff Bolz 2025-05-01 13:49:39 -05:00
  • fc727bcdd5
    vulkan: Handle src1 batch dimension in non-contiguous mat-vec-mul shader (#13191) Jeff Bolz 2025-05-01 13:19:31 -05:00
  • b0ecbd434b
    test: non-cont. b in test-backend-ops -o MUL_MAT (#13187) Johannes Gäßler 2025-05-01 20:18:56 +02:00
  • b1dd4d08e8 sync : ggml Georgi Gerganov 2025-05-01 17:07:13 +03:00
  • 99881f77d8 whisper : add check that target name exists (whisper/3103) Daniel Bevenius 2025-05-01 10:05:24 +02:00
  • b5769d92b4 ggml : suppress Windows compiler warnings (whisper/3075) Daniel Bevenius 2025-04-29 15:47:55 +02:00
  • 8936784f7a
    mtmd : add **vision** support for Mistral Small 3.1 (#13231) Xuan-Son Nguyen 2025-05-01 17:05:42 +02:00