Commit graph

  • 5dd5d1ab00
    vocab : use string_view::find() to avoid unnecessary looking up beyond the fragment range (#12706) yumeyao 2025-04-03 23:32:54 +08:00
  • 1c059995e0
    vulkan: Fix missing cmake logic for dot product extension (#12721) Jeff Bolz 2025-04-03 10:08:26 -05:00
  • 2004644b7a
    ci : add env variable in ggml-ci and document the same in SYCL.md (#12736) Atharva Dubey 2025-04-03 13:12:39 +01:00
  • 5f696e88e0
    sync : minja (inclusionAI/Ling) and update tests (#12699) R0CKSTAR 2025-04-03 19:51:35 +08:00
  • 193c3e03a6
    fix MUSA compiler warning (#12704) a3sh 2025-04-03 15:32:55 +08:00
  • 65cfe136a0
    CANN: Support operator SIN COS ARGMAX (#12709) Chenguang Li 2025-04-03 15:18:08 +08:00
  • 3f9da22c2b
    Simplify and improve CUDA graphs through use of indirect copy pointers (#9017) Alan Gray 2025-04-03 02:31:15 +01:00
  • 2a0dc97e56
    CANN: Fix failed test cases (#12708) hipudding 2025-04-03 08:49:51 +08:00
  • 97a20c012b
    opencl: use max_alloc_size in backend ctx instead of querying again (#12705) lhez 2025-04-02 17:01:42 -07:00
  • f01bd02376
    vulkan: Implement split_k for coopmat2 flash attention. (#12627) Jeff Bolz 2025-04-02 14:25:08 -05:00
  • 6f3bd38640
    cmake: remove caching from vulkan coopmat checks (#12719) bandoti 2025-04-02 14:56:26 -03:00
  • be0a0f8cae
    vulkan: Implement grouped query attention in the coopmat2 FA shader (#12559) Jeff Bolz 2025-04-02 12:40:32 -05:00
  • 92e3006bb6
    Vulkan: Fix mmq int dot float cache size (#12722) 0cc4m 2025-04-02 19:12:30 +02:00
  • 833e2b7409
    model : print tensor size during load (#12711) Georgi Gerganov 2025-04-02 16:38:54 +03:00
  • e0e912f49b
    llama : add option to override model tensor buffers (#11397) Diego Devesa 2025-04-02 14:52:01 +02:00
  • a10b36c91a
    llama : refactor kv cache guard (#12695) Georgi Gerganov 2025-04-02 14:32:59 +03:00
  • 83a88bd6af
    vocab : BailingMoE : change possessive quantifiers to greedy (#12677) Sigbjørn Skjæret 2025-04-02 11:21:48 +02:00
  • 42eb248f46
    common : remove json.hpp from common.cpp (#12697) Xuan-Son Nguyen 2025-04-02 09:58:34 +02:00
  • 9bacd6b374
    [CANN] get_rows and dup optimization (#12671) Chenguang Li 2025-04-02 15:22:13 +08:00
  • 267c1399f1
    common : refactor downloading system, handle mmproj with -hf option (#12694) Xuan-Son Nguyen 2025-04-01 23:44:05 +02:00
  • f423981ac8
    opencl : fix memory allocation size (#12649) Junil Kim 2025-04-02 01:54:34 +09:00
  • e39e727e9a
    llama : use LLM_KV_GENERAL_FILE_TYPE instead of gguf_find_key (#12672) jklincn 2025-04-01 20:54:28 +08:00
  • 5936a616e4
    convert : BailingMoE : fix qkv split when head_dim is 0 (#12687) Sigbjørn Skjæret 2025-04-01 14:37:13 +02:00
  • 3fd072a540
    metal : use F32 prec in FA kernels (#12688) Georgi Gerganov 2025-04-01 14:57:19 +03:00
  • a6f32f0b34
    Fix clang warning in gguf_check_reserved_keys (#12686) R0CKSTAR 2025-04-01 19:12:53 +08:00
  • 2bb3597e42
    vulkan: fix build when glslc doesn't support coopmat (#12683) Wagner Bruna 2025-04-01 06:38:07 -03:00
  • 8293970542
    SYCL: Rename oneMKL to oneMath (#12192) Romain Biessy 2025-04-01 10:24:29 +02:00
  • 8bbf26083d
    SYCL: switch to SYCL namespace (#12674) Akarshan Biswas 2025-04-01 13:41:39 +05:30
  • 35782aeedb
    convert : BailingMoE : avoid setting rope_dim to 0 (#12678) Sigbjørn Skjæret 2025-03-31 23:09:48 +02:00
  • c80a7759da
    vocab : add special infill tokens for CodeLlama (#11850) Daniel Bevenius 2025-03-31 18:40:56 +02:00
  • 250d7953e8
    ggml : faster ssm scan (#10558) a3sh 2025-04-01 00:05:13 +08:00
  • 403fbacbbc
    convert : Qwerky : use lora_rank_tokenshift and lora_rank_decay if present (#12667) Sigbjørn Skjæret 2025-03-31 16:36:25 +02:00
  • a8a1f33567
    Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135) 0cc4m 2025-03-31 14:37:01 +02:00
  • 1790e73157 cmake : fix whitespace (#0) Georgi Gerganov 2025-03-31 15:05:30 +03:00
  • 0114a32da0 sync : ggml Georgi Gerganov 2025-03-31 14:59:21 +03:00
  • a7724480fd cmake: improve Vulkan cooperative matrix support checks (whisper/2966) Sandro Hanea 2025-03-31 12:44:36 +02:00
  • 1a85949067
    llava : proper description fix (#12668) Sigbjørn Skjæret 2025-03-31 11:28:30 +02:00
  • 6c02a032fa
    SYCL: Remove misleading ggml_sycl_op_flatten function (#12387) Akarshan Biswas 2025-03-31 14:55:24 +05:30
  • f52d59d771
    llava : fix clip loading GGUFs with missing description (#12660) Sigbjørn Skjæret 2025-03-31 11:07:07 +02:00
  • 52de2e5949
    tts : remove printfs (#12640) marcoStocchi 2025-03-31 10:20:30 +02:00
  • 2c3f8b850a
    llama : support BailingMoE (Ling) (#12634) Sigbjørn Skjæret 2025-03-30 22:21:03 +02:00
  • 4663bd353c
    metal : use constexpr in FA kernels + fix typedef (#12659) Georgi Gerganov 2025-03-30 22:04:04 +03:00
  • b3de7cac73
    llama : add Trillion 7B model support (#12556) Juyoung Suk 2025-03-31 03:38:33 +09:00
  • 7242dd9675
    llama-chat : Add Yandex instruct model template support (#12621) Sergei Vorobyov 2025-03-30 21:12:03 +03:00
  • 492d7f1ff7
    musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (#12611) R0CKSTAR 2025-03-30 16:59:38 +08:00
  • d3f1f0acfb sync : ggml Georgi Gerganov 2025-03-29 15:37:54 +02:00
  • 360dc22c00 cpu : rm unused variable (ggml/1166) Xuan-Son Nguyen 2025-03-29 11:59:56 +01:00
  • a62d7fa7a9 cpu: de-duplicate some of the operators and refactor (ggml/1144) cmdr2 2025-03-29 11:37:13 +05:30
  • e408d4351a ggml : add logging for native build options/vars (whisper/2935) Daniel Bevenius 2025-03-24 09:53:38 +01:00
  • 3891e183c6 examples : command.wasm updates (whisper/2904) Daniel Bevenius 2025-03-20 07:02:18 +01:00
  • af6ae1efb2
    llama : fix non-causal mask for gemma 3 (#12615) Xuan-Son Nguyen 2025-03-30 00:07:37 +01:00
  • 0bb2919335
    llama : change cpu_buft_list order: ACCEL -> GPU host -> CPU extra -> CPU (#12632) Djip007 2025-03-29 14:07:37 +01:00
  • a69f846351
    cmake : fix ccache conflict (#12522) Jay 2025-03-29 18:04:58 +08:00
  • d07a0d7a79
    CANN : remove clang-format in ggml-cann (#12607) hipudding 2025-03-29 18:03:28 +08:00
  • 3714c3ee1a
    llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631) Sigbjørn Skjæret 2025-03-28 22:13:02 +01:00
  • b4ae50810e
    metal : improve FA + improve MoE (#12612) Georgi Gerganov 2025-03-28 20:21:59 +02:00
  • b86f600723
    vulkan: fix coopmat shader generation when cross-compiling (#12272) Icenowy Zheng 2025-03-29 01:51:06 +08:00
  • dd373dd3bf
    llama: fix error on bad grammar (#12628) Johannes Gäßler 2025-03-28 18:08:52 +01:00
  • 5d01670266
    server : include speculative decoding stats when timings_per_token is enabled (#12603) Benson Wong 2025-03-28 01:05:44 -07:00
  • ef03229ff4
    rpc : update README for cache usage (#12620) Radoslav Gerganov 2025-03-28 09:44:13 +02:00
  • 13731766db
    llamafile : ppc64le GEMV forwarding for FP32. (#12594) amritahs-ibm 2025-03-28 13:13:22 +05:30
  • ab6ab8f809
    rpc : send hash when tensor data is above some fixed threshold (#12496) Radoslav Gerganov 2025-03-28 08:18:04 +02:00
  • 2099a9d5db
    server : Support listening on a unix socket (#12613) Piotr 2025-03-27 23:41:04 +01:00
  • 2969019837
    media : add SVG logo [no ci] (#12616) Georgi Gerganov 2025-03-27 23:09:05 +02:00
  • 5dec47dcd4
    opencl: add multi and vision rope, gelu_quick and im2col (#12600) lhez 2025-03-27 08:08:08 -07:00
  • f125b8dccf
    llama : add PLM GGUF Conversion & Inference Support (#12457) Si1w 2025-03-27 10:49:15 +00:00
  • 953c2a62cf
    model : restore support for T5Encoder (#12590) HighDoping 2025-03-27 18:43:33 +08:00
  • d5c6309d91
    convert : Support Qwen2_5_VLForConditionalGeneration (#12595) Csaba Kecskemeti 2025-03-27 03:11:23 -07:00
  • 029c693fdc sync : ggml Georgi Gerganov 2025-03-27 09:36:13 +02:00
  • 771d84371c scripts : update sync + fix cmake merge Georgi Gerganov 2025-03-27 09:22:30 +02:00
  • df0665a483 sync : ggml Georgi Gerganov 2025-03-27 09:01:21 +02:00
  • 0306aad1ca cmake : sync/merge PowerPC build commands (#0) Georgi Gerganov 2025-03-27 09:00:57 +02:00
  • c7b43ab608
    llamafile : ppc64le MMA implementation for Q4_0. (#12489) amritahs-ibm 2025-03-27 12:21:47 +05:30
  • 24feaec057
    ggml : riscv: add 128-bit RVV support (#12530) xctan 2025-03-27 14:38:34 +08:00
  • f28bc4c286
    llama : make loras compatible with repacking (#12593) Georgi Gerganov 2025-03-27 08:24:10 +02:00
  • f17a3bb4e8
    SYCL: implement memset ggml backend buffer interface (#12580) Akarshan Biswas 2025-03-27 07:16:00 +05:30
  • bd40678df7
    HIP: Add support for RDNA4 targets (#12372) Slobodan Josic 2025-03-26 23:46:30 +01:00
  • b3298fa47a
    metal : refactor mat-vec code (#12569) Georgi Gerganov 2025-03-26 21:38:38 +02:00
  • 2447ad8a98
    upgrade to llguidance 0.7.10 (#12576) Michał Moskal 2025-03-26 11:06:09 -07:00
  • 02082f1519
    clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566) Ivy233 2025-03-26 22:06:04 +08:00
  • df4d20cd53
    convert : fix squeeze for ssm_conv tensors (#12573) Georgi Gerganov 2025-03-26 14:21:05 +02:00
  • 5ed38b6852
    ggml : fix MUL_MAT_ID repack with Q8_K (#12544) Georgi Gerganov 2025-03-26 13:02:00 +02:00
  • fd7855f8f5
    doc: [MUSA] minor changes (#12583) R0CKSTAR 2025-03-26 15:09:48 +08:00
  • 53af4dba42
    convert: fix Mistral3/Gemma3 model hparams init (#12571) Sigbjørn Skjæret 2025-03-25 23:03:10 +01:00
  • ef19c71769
    run: de-duplicate fmt and format functions and optimize (#11596) Eric Curtin 2025-03-25 17:46:11 +00:00
  • 053b3f9aae
    ggml-cpu : update KleidiAI to v1.5.0 (#12568) Dan Johansson 2025-03-25 12:10:18 +01:00
  • e2f560175a
    SYCL: disable Q4_0 reorder optimization (#12560) Akarshan Biswas 2025-03-25 16:10:18 +05:30
  • 36ee06dd2d
    docs : add build instructions for KleidiAI (#12563) Dan Johansson 2025-03-25 10:35:20 +01:00
  • 3cd3a39532
    ci: [MUSA] add CI and update doc (#12562) R0CKSTAR 2025-03-25 15:45:08 +08:00
  • 2d77d88e70
    context : fix worst-case reserve outputs (#12545) Georgi Gerganov 2025-03-25 09:19:23 +02:00
  • c95fa362b3
    ci: [SYCL] ggml-ci Use main GPU and enable sysman (#12547) Akarshan Biswas 2025-03-24 23:05:38 +05:30
  • 2b65ae3029
    opencl: simplify kernel embedding logic in cmakefile (#12503) lhez 2025-03-24 09:20:47 -07:00
  • 48d7021c61
    CI: fix SYCL build (#12546) Akarshan Biswas 2025-03-24 18:28:32 +05:30
  • 3361e2deba
    docs: update: improve the Fedoa CUDA guide (#12536) Tei Home 2025-03-24 19:02:26 +08:00
  • 00d53800e0
    llama-vocab : add SuperBPE pre-tokenizer (#12532) compilade 2025-03-24 06:47:24 -04:00
  • 7ea75035b6
    CUDA: Fix clang warnings (#12540) R0CKSTAR 2025-03-24 18:28:34 +08:00
  • c54f6b7988
    mmap : skip resource limit checks on AIX (#12541) Prajwal B Mehendarkar 2025-03-24 15:47:10 +05:30
  • 9b169a4d4e
    vulkan: fix mul_mat_vec failure in backend tests (#12529) Jeff Bolz 2025-03-24 01:56:17 -05:00
  • 77f9c6bbe5
    server : Add verbose output to OAI compatible chat endpoint. (#12246) Marius Gerdes 2025-03-23 19:30:26 +01:00
  • 18b663d8e4
    install : add macports (#12518) Lars Sonchocky-Helldorf 2025-03-23 09:21:48 +01:00