Commit graph

  • db38704f01
    convert : fix rwkv bos/eos token (#13844) Sigbjørn Skjæret 2025-05-30 14:50:43 +02:00
  • 07e4351ce6
    convert : allow partial update to the chkhsh pre-tokenizer list (#13847) Xuan-Son Nguyen 2025-05-30 12:24:37 +02:00
  • 291f2b6913
    llama : add support for DistilBert (#13907) Đinh Trọng Huy 2025-05-30 18:56:02 +09:00
  • 2c90da4c7e
    llama : use llm_build_granite for minicpm (#13911) zhangkaihuo 2025-05-30 16:31:48 +08:00
  • ec9e0301fe
    cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890) Christian Kastner 2025-05-30 01:28:54 +02:00
  • e83ba3e460
    llama : add support for jina-reranker-v2 (#13900) Sigbjørn Skjæret 2025-05-29 21:42:31 +02:00
  • 2b131621e6
    gguf-py : add support for sub_type (in arrays) in GGUFWriter add_key_value method (#13561) Sigbjørn Skjæret 2025-05-29 15:36:05 +02:00
  • 54a2c7a8cd
    arm64: optimize q4_k_q8_k kernel with i8mm (#13886) Yibo Cai 2025-05-29 19:39:20 +08:00
  • 21fcc21ad5
    cmake: Factor out CPU architecture detection (#13883) Christian Kastner 2025-05-29 12:50:25 +02:00
  • dd8ba93416
    ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm (#13882) Vineel Abhinav 2025-05-29 14:48:43 +05:30
  • 66c92061f5
    tests : remove json.hpp from a test (#13880) Georgi Gerganov 2025-05-29 12:17:16 +03:00
  • 5ca82fc1d7
    convert : workaround for AutoConfig dummy labels (#13881) Sigbjørn Skjæret 2025-05-29 10:00:57 +02:00
  • 6385b843a8
    llama : add RobertaForSequenceClassification reranker support (#13875) Sigbjørn Skjæret 2025-05-29 08:15:01 +02:00
  • 1b8fb8152d
    ggml: aarch64: Implement SVE F32 kernels for vector functions (#13843) Vineel Abhinav 2025-05-29 11:31:33 +05:30
  • 53ae30640e
    gguf-py : fix SafetensorRemote return on undefined size (< 0) (#13841) Beinsezii 2025-05-28 14:50:20 -07:00
  • 763d06edb7
    llama : fix KV shift for qwen2vl (#13870) Xuan-Son Nguyen 2025-05-28 22:35:31 +02:00
  • 10961339b2
    mtmd : move helpers to dedicated library (⚠️ breaking change) (#13866) Xuan-Son Nguyen 2025-05-28 22:35:22 +02:00
  • d98f2a35fc
    ci: disable LLAMA_CURL for Linux cross-builds (#13871) bandoti 2025-05-28 15:46:47 -03:00
  • e0e3aa231d
    llama : add support for BertForSequenceClassification reranker (#13858) Đinh Trọng Huy 2025-05-29 02:01:58 +09:00
  • aa6dff05be
    convert: small addition to support LlamaModel (#13838) Đinh Trọng Huy 2025-05-28 23:34:18 +09:00
  • c962ae3382
    server: fix remove 'image_url'/'input_audio' json-object effectlly for 'llama_params' in multimodal-model-mode (#13853) Sky 2025-05-28 22:33:54 +08:00
  • a3938fb53d
    convert : fix qwen omni conversion (#13859) Xuan-Son Nguyen 2025-05-28 16:12:35 +02:00
  • f7873fc698
    tests : change umlaut test (#11600) Alex Fanthome 2025-05-28 14:49:28 +01:00
  • a68247439b
    CUDA: fix FA tg at long context for CC >= 8.9 (#13852) Johannes Gäßler 2025-05-28 13:33:37 +02:00
  • 26b79b6cb3
    convert : fix tensor naming conflict for llama 4 vision (#13836) Xuan-Son Nguyen 2025-05-28 10:05:54 +02:00
  • 1e8659e65a
    CANN: Add SOC TYPE printing in cmake configuration (#13837) leo-pony 2025-05-28 11:54:20 +08:00
  • a3c30846e4
    opencl: add new ops - argsort, div, sub, addrows, sigmoid, group_norm (#13787) lhez 2025-05-27 12:56:08 -07:00
  • 1701d4c54f
    opencl: mark mul_mat f32f32 as supporting non-contiguous tensors (#13790) lhez 2025-05-27 12:53:14 -07:00
  • bef8176387
    vulkan: use timestamp queries for GGML_VULKAN_PERF (#13817) Jeff Bolz 2025-05-27 11:39:07 -05:00
  • 34b7c0439e
    cmake : add llama-cparams.cpp to build (#13832) Georgi Gerganov 2025-05-27 19:08:44 +03:00
  • f3101a8cc6
    SYCL: add gelu_erf kernel (#13749) Akarshan Biswas 2025-05-27 20:52:59 +05:30
  • 1c49c70d07 sync : ggml Georgi Gerganov 2025-05-27 18:04:38 +03:00
  • a8ea03d8ad
    ggml : add ggml_repeat_4d (#13824) Xuan-Son Nguyen 2025-05-27 15:53:55 +02:00
  • 05f6ac6283
    ggml : riscv: add xtheadvector support (#13720) xctan 2025-05-27 21:21:36 +08:00
  • bc583e3c63
    mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784) Xuan-Son Nguyen 2025-05-27 14:06:10 +02:00
  • 72b090da2c
    docs: remove link for llama-cli function calling (#13810) bandoti 2025-05-27 08:52:40 -03:00
  • 7fe03e7446
    ggml-cpu: x86 feature detection is specific to x86 (#13811) Christian Kastner 2025-05-27 13:18:39 +02:00
  • 952f3953c1
    ggml : allow CUDA graphs when using pipeline parallelism (#13814) Diego Devesa 2025-05-27 04:05:18 -07:00
  • 81713121ee
    kv-cells : track min/max used cells and per-sequence positions (#13808) Georgi Gerganov 2025-05-27 13:49:41 +03:00
  • f9cd68398b
    sampling : make sure samplers return at least 1 token (#13822) Georgi Gerganov 2025-05-27 12:07:52 +03:00
  • 4f81b33e32
    llama : validate seq id batch input (#13809) Georgi Gerganov 2025-05-27 09:40:59 +03:00
  • cdf94a1802
    server: --offline mode (#13804) Olivier Chafik 2025-05-26 14:34:27 -07:00
  • a26c4cc11e
    scripts : add option to compare commits in Debug (#13806) Georgi Gerganov 2025-05-26 22:24:01 +03:00
  • 4265a87b59
    cuda : avoid cuGetErrorString (#13791) Georgi Gerganov 2025-05-26 22:14:52 +03:00
  • 6f180b915c
    SYCL: Add non contiguous support in RMS_NORM and NORM kernels (#13611) Akarshan Biswas 2025-05-26 21:10:36 +05:30
  • 03f582ae8f
    server: fix streaming crashes (#13786) Olivier Chafik 2025-05-26 08:03:57 -07:00
  • 88c125f2ac
    examples/training: Fix file name in README (#13803) standby24x7 2025-05-26 23:55:24 +09:00
  • d74e94c1b3
    server: fix format of streamed tool call deltas (diff name, fix id location) (#13800) Olivier Chafik 2025-05-26 06:56:49 -07:00
  • f13847cfb5
    server: fix regression on streamed non-chat completion w/ stops (#13785) Olivier Chafik 2025-05-26 06:16:37 -07:00
  • 79c137f776
    examples : allow extracting embeddings from decoder contexts (#13797) Georgi Gerganov 2025-05-26 14:03:54 +03:00
  • 22229314fc
    llama : clarify deprecation message (#13794) Georgi Gerganov 2025-05-26 12:57:50 +03:00
  • 9012eb9b45
    sycl: Add more debug prints (#13640) Romain Biessy 2025-05-26 10:28:53 +02:00
  • fef693dc6b
    vulkan: mark IM2COL as supporting non-contig (#13783) Jeff Bolz 2025-05-25 23:02:07 -05:00
  • 2d38b6e400
    CANN: Add the basic supports of Flash Attention kernel (#13627) Bizhao Shi 2025-05-26 10:20:18 +08:00
  • e121edc432
    server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771) Olivier Chafik 2025-05-26 00:30:51 +01:00
  • 2f099b510f
    webui : bump max upload file size to 500MB (#13779) Xuan-Son Nguyen 2025-05-25 19:02:18 +02:00
  • aa50ba462f
    tests : improve UGM tokenizer test coverage (#13773) Sigbjørn Skjæret 2025-05-25 16:22:29 +02:00
  • de2ef53a4b
    kv-cache : rework kv_cell (#13706) Georgi Gerganov 2025-05-25 16:34:36 +03:00
  • c508256db2
    rpc : Fix build on OpenBSD (#13541) Percy Piper 2025-05-25 13:35:53 +01:00
  • 40aaa8a403
    mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760) Xuan-Son Nguyen 2025-05-25 14:06:32 +02:00
  • a08c1d2845
    docs : add Moondream2 pre-quantized link (#13745) ddpasa 2025-05-25 14:04:49 +02:00
  • d785f9c1fd
    server: fix/test add_generation_prompt (#13770) Olivier Chafik 2025-05-25 10:45:49 +01:00
  • 4032ca4066
    llama : add support for Qwen3 MoE tied word embeddings (#13768) Piotr Jasiukajtis 2025-05-25 10:29:43 +02:00
  • 515fdbf7ed
    SYCL: revert "sycl: simplify bin_bcast_kernel (#13383)" (#13752) Akarshan Biswas 2025-05-25 12:38:37 +05:30
  • f5cd27b71d
    server: streaming of tool calls and thoughts when --jinja is on (#12379) Olivier Chafik 2025-05-25 01:48:08 +01:00
  • a2d02d5793
    releases : bundle llvm omp library in windows release (#13763) Diego Devesa 2025-05-24 15:55:16 -07:00
  • 17fc817b58
    releases : enable openmp in windows cpu backend build (#13756) Diego Devesa 2025-05-24 13:27:03 -07:00
  • 2bd1b30f69
    ggml-cpu : set openmp wait time if not set (#13758) Diego Devesa 2025-05-24 13:26:47 -07:00
  • 259469c4b5
    Move GLM4 f32 attention fix to the correct function (#13750) 0cc4m 2025-05-24 16:49:12 +02:00
  • 4c32832c59
    ggml : add ggml_gelu_erf() CUDA kernel (#13719) Xuan-Son Nguyen 2025-05-24 13:06:47 +02:00
  • c3a2624339
    vocab : fix ugm tokenizer precision (#13743) Sigbjørn Skjæret 2025-05-24 12:29:09 +02:00
  • ffd0eae60b
    CUDA: fix race condition in FA vector kernels (#13742) Johannes Gäßler 2025-05-24 11:46:19 +02:00
  • b775345d78
    ci : enable winget package updates (#13734) Diego Devesa 2025-05-23 13:14:00 -07:00
  • a70a8a69c2
    ci : add winget package updater (#13732) Diego Devesa 2025-05-23 13:09:38 -07:00
  • d13d0f6135
    hparams : initialize arrays (#13728) Georgi Gerganov 2025-05-23 20:16:13 +03:00
  • 8a2afb7520
    llama : allow custom list of swa_layers (#13726) Xuan-Son Nguyen 2025-05-23 17:07:04 +02:00
  • 9ecf3e66a3
    server : support audio input (#13714) Xuan-Son Nguyen 2025-05-23 11:03:47 +02:00
  • faaaff5f94
    CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705) Chenguang Li 2025-05-23 16:47:53 +08:00
  • e16c4731c7
    ggml : fix the order of ggml_unary_op (#13718) Xuan-Son Nguyen 2025-05-23 08:12:48 +02:00
  • 1dcd01960c
    vulkan: support CPY from any type to itself (#13695) Jeff Bolz 2025-05-23 00:45:02 -04:00
  • c10ed6cbcc
    vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (#13696) Jeff Bolz 2025-05-23 00:33:45 -04:00
  • a127ff1780
    use LOG_WARN to replace std::cerr (#13657) Judd 2025-05-23 12:33:08 +08:00
  • 3079e9ac8e
    release : fix windows hip release (#13707) Diego Devesa 2025-05-22 15:21:37 -07:00
  • 8a1d206f1d
    tts : fix n_ubatch + make WavTokenizer cache-less (#13713) Georgi Gerganov 2025-05-22 22:21:07 +03:00
  • 797990c4bc
    mtmd : add ultravox audio input (#13623) Xuan-Son Nguyen 2025-05-22 20:42:48 +02:00
  • ab86335760
    common: Include torch package for s390x (#13699) Aaron Teo 2025-05-23 02:31:29 +08:00
  • cc74d5be99
    server : pad small embedding batches (#13692) Georgi Gerganov 2025-05-22 16:33:39 +03:00
  • 5be24af73d
    gguf-py : correct charsmap parameter typing (#13701) Sigbjørn Skjæret 2025-05-22 14:25:05 +02:00
  • d394a9aedc
    sycl : Remove waits from function calls (#13702) Nicolò Scipione 2025-05-22 13:54:43 +02:00
  • 6b56a64690
    SYCL: Avoid using with SYCL-Graph for unsupported nodes (#13587) Ewan Crawford 2025-05-22 09:24:09 +01:00
  • a4e8912dfd
    opencl: Add support for multiple devices (#12622) Henry Linjamäki 2025-05-22 02:21:45 +03:00
  • edbf42edfd
    opencl: fix couple crashes (#12795) Henry Linjamäki 2025-05-21 23:21:17 +03:00
  • d643bb2c79
    releases : build CPU backend separately (windows) (#13642) Diego Devesa 2025-05-21 13:09:57 -07:00
  • 8e186ef0e7
    hparams : support models for which all layers use SWA (#13682) Georgi Gerganov 2025-05-21 20:00:49 +03:00
  • 5fbfe384d4
    server : improve error reporting (#13680) Georgi Gerganov 2025-05-21 19:46:56 +03:00
  • c76532e7ba
    convert : add qwen2vl support for unsloth merges (#13686) antichristHater 2025-05-21 19:40:35 +03:00
  • 2aa777d86d
    examples : switch retrieval to llama_encode (#13685) Sigbjørn Skjæret 2025-05-21 16:57:38 +02:00
  • eb0f5c28d3
    gguf-py : display the invalid gguf type (#13687) Emmanuel Ferdman 2025-05-21 17:33:54 +03:00
  • cf4cb59e64
    ggml : add ggml_gelu_erf() (#13667) Xuan-Son Nguyen 2025-05-21 16:26:33 +02:00
  • 0d5c742161
    server : Add the endpoints /api/tags and /api/chat (#13659) Robin Davidsson 2025-05-21 15:15:27 +02:00