Commit graph

  • ccc4c16970 Update rocm build master ver4a 2025-05-10 01:01:02 +02:00
  • 3a9457df96
    vulkan: update windows SDK in CI (#14334) Jeff Bolz 2025-06-23 03:19:24 -05:00
  • fa4a9f2a1c
    quantize : handle user-defined pruning of whole layers (blocks) (#13037) Ed Addario 2025-06-22 22:16:26 +01:00
  • 238005c2dc
    gguf-py : fix SpecialVocab parsing when post_processor is null (#14330) Sigbjørn Skjæret 2025-06-22 19:46:17 +02:00
  • 66aba7aca9
    run : avoid double tokenization (#14327) Ruikai Peng 2025-06-23 01:28:06 +08:00
  • f1f5e82df6
    examples : fix is_first logic for tokenization (#14329) Georgi Gerganov 2025-06-22 20:10:07 +03:00
  • af3373f1ad
    HIP: enable vec fattn on RDNA4 (#14323) uvos 2025-06-22 16:51:23 +02:00
  • 5d5c066de8
    mtmd : fix Pixtral OOM with large images by capping image_size to 1024 (#14326) yuiseki 2025-06-22 21:44:57 +09:00
  • 40bfa04c95
    common : use std::string_view now that we target c++17 (#14319) Sigbjørn Skjæret 2025-06-22 07:37:43 +02:00
  • aa064b2eb7
    CUDA: add mean operation (#14313) Aman Gupta 2025-06-22 12:39:54 +08:00
  • aa0ef5c578
    gguf-py : fix Qwen3-Embedding eos token (#14314) Sigbjørn Skjæret 2025-06-21 18:12:05 +02:00
  • bb16041cae
    Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792) Markus Tavenrath 2025-06-21 08:17:12 +02:00
  • 58cba76a9a
    gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) Sigbjørn Skjæret 2025-06-21 07:33:21 +02:00
  • 67ae5312e2
    metal : fix thread-safety (#14300) Georgi Gerganov 2025-06-21 08:04:18 +03:00
  • 692e3cdd0a
    memory : rename interface to llama_memory_context_i (#14296) Georgi Gerganov 2025-06-21 08:03:46 +03:00
  • b23fa0b3f4
    convert : fix Llama 4 conversion (#14311) Daniel Han 2025-06-20 21:32:01 -07:00
  • 06cbedfca1 sync : ggml Georgi Gerganov 2025-06-20 20:50:24 +03:00
  • b7147673f2 Add ggml_roll (ggml/1274) Acly 2025-06-18 13:34:50 +02:00
  • d860dd99a4
    docs : fix the link to llama.h (#14293) David Chiu 2025-06-21 01:43:35 +08:00
  • c959f462a0
    CUDA: add conv_2d_transpose (#14287) Aman Gupta 2025-06-20 22:48:24 +08:00
  • 22015b2092
    lint : remove trailing whitepace (#14304) Sigbjørn Skjæret 2025-06-20 16:37:44 +02:00
  • dd6e6d0b6a
    vocab : prevent tokenizer overflow (#14301) Ruikai Peng 2025-06-20 22:13:06 +08:00
  • 8308f98c7f
    sycl: add usage of enqueue_functions extension (#14244) Nicolò Scipione 2025-06-20 15:07:21 +02:00
  • 6369be0735
    Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286) Christian Kastner 2025-06-20 12:17:32 +00:00
  • 88fc854b4b
    llama : improve sep token handling (#14272) Sigbjørn Skjæret 2025-06-20 14:04:09 +02:00
  • e28c1b93fd
    cuda : synchronize graph capture and cublas handle destruction (#14288) Diego Devesa 2025-06-20 04:57:36 -07:00
  • d27b3ca175
    ggml : fix repack work size for mul_mat_id (#14292) Georgi Gerganov 2025-06-20 11:19:15 +03:00
  • 9230dbe2c7
    ggml: Update KleidiAI to v1.9.0 (#14277) Charles Xu 2025-06-20 09:51:01 +02:00
  • 812939a9e9
    model : more uniform output id handling (#14275) Georgi Gerganov 2025-06-20 10:50:27 +03:00
  • 4c9fdfbe15
    ubatch : new splitting logic (#14217) Georgi Gerganov 2025-06-20 10:14:14 +03:00
  • 9eaa51e7f0
    CUDA: add conv_2d_dw (#14265) Aman Gupta 2025-06-20 09:50:24 +08:00
  • 8f71d0f3e8
    ggml-cpu : remove unnecesary arm feature detection (#14281) Diego Devesa 2025-06-19 12:24:14 -07:00
  • 381174bbda
    gguf-py : make sentencepiece optional (#14200) Alex Trotta 2025-06-19 09:56:12 -04:00
  • d67341dc18
    server : add server parameters for draft model cache type (#13782) aa956 2025-06-19 16:01:03 +03:00
  • 456af35eb7
    build : suppress gcc15 compile warnings (#14261) fanyang 2025-06-19 20:49:48 +08:00
  • 600e3e9b50
    sycl: Cleanup codepaths in Get Rows in sycl backend (#14215) Anton Mitkov 2025-06-19 11:40:21 +01:00
  • fffcce535e
    llama-bench : add --no-warmup flag (#14224) (#14270) bashayer hijji 2025-06-19 13:24:12 +03:00
  • 5fc7856815
    convert : fix remote option in Windows (#14100) pqnet 2025-06-19 12:21:40 +02:00
  • faed5a5f5d
    llamafile : support s390x SIMD instruction set (#14273) Aaron Teo 2025-06-19 17:48:54 +08:00
  • 10bb545c5b
    Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249) 0cc4m 2025-06-19 09:15:42 +02:00
  • edc4a29eff
    memory : Hybrid recurrent cache (#13979) Gabe Goodhart 2025-06-19 00:08:14 -05:00
  • ed3290ab34
    metal : add mean kernel (#14267) Georgi Gerganov 2025-06-19 08:05:21 +03:00
  • 8d94713654
    docs: add s390x build documentation (#14264) Aaron Teo 2025-06-19 01:10:26 +08:00
  • 50d2227953
    ggml-cpu: reduce asm calls for hsum (#14037) Aaron Teo 2025-06-19 01:10:08 +08:00
  • 6231c5cd6d
    ggml-cpu: fix uncaught underscore terminators (#14023) Aaron Teo 2025-06-19 01:06:49 +08:00
  • ef035803eb
    ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258) Charles Xu 2025-06-18 13:40:07 +02:00
  • 413977de32
    mtmd : refactor llava-uhd preprocessing logic (#14247) Xuan-Son Nguyen 2025-06-18 10:43:57 +02:00
  • 95402553a5
    llama-chat : fix multiple system message for gemma, orion (#14246) Xuan-Son Nguyen 2025-06-18 09:58:43 +02:00
  • 3865cff4f5
    convert : fix null head_dim AutoConfig regression (#14248) Sigbjørn Skjæret 2025-06-18 09:52:07 +02:00
  • d03172cc79 sync : ggml Georgi Gerganov 2025-06-18 09:58:23 +03:00
  • dd8e59f443 ggml : disable warnings for tests when using MSVC (ggml/1273) Daniel Bevenius 2025-06-13 15:06:42 +02:00
  • bbe98d2784 ggml : remove unused ggml_context_container (ggml/1272) Daniel Bevenius 2025-06-13 09:05:44 +02:00
  • c2056ed6d4 examples : include examples in msvc disable warn (ggml/1270) Daniel Bevenius 2025-06-12 12:27:09 +02:00
  • c46503014d
    cmake: remove shader-gen step-targets from ggml-vulkan (#14226) bandoti 2025-06-17 17:33:25 -03:00
  • 860a9e4eef
    ggml-cpu : remove the weak alias trick (#14221) xctan 2025-06-17 17:58:32 +08:00
  • fe9d60e74a
    musa: fix build warning (unused variable) (#14231) R0CKSTAR 2025-06-17 17:48:08 +08:00
  • e434e69183
    common : suggest --jinja when autodetection fails (#14222) Sigbjørn Skjæret 2025-06-16 21:58:42 +02:00
  • 89fea80d29
    server : fix incorrect usage of llama_get_embeddings() (#14225) Georgi Gerganov 2025-06-16 22:33:27 +03:00
  • 6adc3c3ebc
    llama : add thread safety test (#14035) Diego Devesa 2025-06-16 08:11:43 -07:00
  • 0dbcabde8c
    cmake: clean up external project logic for vulkan-shaders-gen (#14179) bandoti 2025-06-16 10:32:13 -03:00
  • ad590be98c
    model : add NeoBERT (#14164) Đinh Trọng Huy 2025-06-16 21:53:41 +09:00
  • 7d6d91babf
    HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202) uvos 2025-06-16 13:47:38 +02:00
  • d3e64b9f49
    llama : rework embeddings logic (#14208) Georgi Gerganov 2025-06-16 14:14:00 +03:00
  • 3ba0d843c6
    ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206) Charles Xu 2025-06-16 11:47:57 +02:00
  • 0bf49eb668
    convert : remove arcee change in convert_hf_to_gguf_update.py (#14207) Bartowski 2025-06-16 09:16:06 +01:00
  • 4ad243677b
    gguf-py : allow key override when adding value to GGUFWriter (#14194) Đinh Trọng Huy 2025-06-16 16:20:59 +09:00
  • c89c2d1ab9
    vulkan: mutex around vkQueueSubmit (#14127) Jeff Bolz 2025-06-16 00:21:08 -06:00
  • 3555b3004b
    ggml-cpu : rework weak alias on apple targets (#14146) xctan 2025-06-16 13:54:15 +08:00
  • d7da8dc83a
    model : Add support for Arcee AI's upcoming AFM model (#14185) Bartowski 2025-06-16 00:04:06 +01:00
  • cd355eda7d
    server : When listening on a unix domain socket don't print http:// and port (#14180) Eric Curtin 2025-06-15 23:36:22 +02:00
  • 30e5b01de2
    quantize : change int to unsigned int for KV overrides (#14197) Ed Addario 2025-06-15 17:53:45 +01:00
  • e54b394082
    CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196) uvos 2025-06-15 17:30:13 +02:00
  • 2c2caa4443
    HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183) uvos 2025-06-15 15:45:27 +02:00
  • 5fce5f948d
    kv-cache : fix use-after-move of defrag info (#14189) Georgi Gerganov 2025-06-15 10:52:11 +03:00
  • 9ae4143bc6
    model : add dots.llm1 architecture support (#14044) (#14118) Mikko Juola 2025-06-15 00:52:06 -07:00
  • c311ac664d
    cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) Georgi Gerganov 2025-06-15 10:08:58 +03:00
  • b9912ac570
    batch : auto-gen positions + verify multi-sequence input (#14177) Georgi Gerganov 2025-06-15 09:18:37 +03:00
  • 00ba772610
    docs : remove WIP since PR has been merged (#13912) Pepijn de Vos 2025-06-15 08:06:37 +02:00
  • 3cb203c89f
    llama-chat : Do not throw when tool parsing fails (#14012) Piotr 2025-06-14 18:25:15 +02:00
  • 2e42be42bd
    compare-llama-bench: add option to plot (#14169) Aman Gupta 2025-06-14 16:34:20 +08:00
  • fb85a288d7
    vocab : fix build (#14175) Georgi Gerganov 2025-06-13 20:03:05 +03:00
  • 40643edb86
    sycl: fix docker image (#14144) Svetlozar Georgiev 2025-06-13 17:32:56 +01:00
  • 3cfbbdb44e
    Merge commit from fork Guy Goldenberg 2025-06-13 19:20:25 +03:00
  • 80709b70a2
    batch : add LLAMA_BATCH_DEBUG environment variable (#14172) Georgi Gerganov 2025-06-13 18:35:00 +03:00
  • 26ff3685bf
    docs : Update multimodal.md (#14122) ddpasa 2025-06-13 15:17:53 +02:00
  • 60c666347b
    batch : rework llama_batch_allocr (#14153) Georgi Gerganov 2025-06-13 13:47:55 +03:00
  • b7cc7745e3
    readme : remove survey link (#14168) Georgi Gerganov 2025-06-13 11:55:44 +03:00
  • cc8d081879
    cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) Christian Kastner 2025-06-13 08:38:52 +00:00
  • d714dadb57
    pooling : make cls_b and cls_out_b optional (#14165) Đinh Trọng Huy 2025-06-13 17:34:08 +09:00
  • ffad043973
    server : fix SWA condition for full context reprocess (#14163) Georgi Gerganov 2025-06-13 11:18:25 +03:00
  • 0889eba570
    sycl: Adding additional cpy dbg print output (#14034) Anton Mitkov 2025-06-13 08:51:39 +01:00
  • c61285e739
    SYCL: Bump oneMath commit (#14152) Ewan Crawford 2025-06-13 08:45:37 +01:00
  • 09cf2c7c65
    cmake : Improve build-info.cpp generation (#14156) Christian Kastner 2025-06-13 06:51:34 +00:00
  • c33fe8b8c4
    vocab : prevent heap overflow when vocab is too small (#14145) Georgi Gerganov 2025-06-13 08:03:54 +03:00
  • ed52f3668e
    sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125) Anton Mitkov 2025-06-12 14:15:11 +01:00
  • a681b4ba83
    readme : remove project status link (#14149) Georgi Gerganov 2025-06-12 14:43:09 +03:00
  • 7d516443dd
    server : re-enable SWA speculative decoding (#14131) Georgi Gerganov 2025-06-12 11:51:38 +03:00
  • f6e1a7aa87
    context : simplify output counting logic during decode (#14142) Georgi Gerganov 2025-06-12 11:50:01 +03:00
  • c3ee46fab4
    batch : remove logits_all flag (#14141) Georgi Gerganov 2025-06-12 11:49:26 +03:00
  • e2c0b6e46a
    cmake : handle whitepsaces in path during metal build (#14126) Georgi Gerganov 2025-06-12 10:14:24 +03:00