Commit graph

  • fbdfefe74e
    llama : gemma3 : use output tensor if it exists in model weight (#12506) Xuan-Son Nguyen 2025-03-22 23:28:19 +01:00
  • ba932dfb50
    ggml : fix quantized cpy op (#12310) Georgi Gerganov 2025-03-22 16:23:26 +02:00
  • fac63a3d78
    musa: refine compute capability (#12493) R0CKSTAR 2025-03-22 17:11:37 +08:00
  • eddfb43850
    vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505) Jeff Bolz 2025-03-22 03:40:11 -05:00
  • 4375415b4a
    Vulkan: RTE rounding for cpy to quant (#12480) stduhpf 2025-03-21 20:34:50 +01:00
  • 30c42ef5cb
    vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472) Eve 2025-03-21 19:27:47 +00:00
  • af04481e6b
    model : do not repack if a GPU device is present (#12498) Georgi Gerganov 2025-03-21 16:14:29 +02:00
  • 960e726077
    chore : cleanup llama_model_loader::TENSOR_ usage (#12492) Sigbjørn Skjæret 2025-03-21 10:21:36 +01:00
  • ea1518e839
    llama-tts : avoid crashes related to bad model file paths (#12482) marcoStocchi 2025-03-21 10:12:45 +01:00
  • 1aa87ee53d
    [SYCL] Fix build on Windows when ccache enabled (#9954) (#9976) 蕭澧邦 2025-03-21 14:58:47 +08:00
  • 9ffcc9e374
    sycl: cleanup oneDNN related code (#12097) Svetlozar Georgiev 2025-03-21 02:15:56 +00:00
  • e04643063b
    webui : Prevent rerendering on textarea input (#12299) Woof Dog 2025-03-20 14:57:43 +00:00
  • dbb3a4739e
    llama : make Qwen2MoE QKV bias optional (#12477) Sigbjørn Skjæret 2025-03-20 12:49:59 +01:00
  • 3d82dbcbce
    ggml : block interleaving support for Q4_K quantization for x86 AVX2 architecture (#12332) Srihari-mcw 2025-03-20 17:05:34 +05:30
  • 732b5fbf5e
    convert : avoid calls to tokenizer.added_tokens_decoder (#12473) Bartowski 2025-03-20 02:36:37 -04:00
  • 568013d0cd
    context : clear sets containing encoder output sequence ids before storing new values (#12470) fairydreaming 2025-03-19 21:01:57 +01:00
  • 517b5ddbf0
    CUDA: Improve flash decoding kernel GPU occupancy for BS=1 case (#12183) Gaurav Garg 2025-03-20 01:22:06 +05:30
  • a9b59288e2
    vulkan: optimize iq1 coopmat2 dequant functions (#12427) Jeff Bolz 2025-03-19 13:56:23 -05:00
  • 0fd8487b14
    Fix visionOS build and add CI (#12415) Guus Waals 2025-03-19 10:15:23 +00:00
  • 108e53c2f1
    llama : add support for GPT2, Bloom and CodeShell tied word embeddings (#12456) Sigbjørn Skjæret 2025-03-19 09:08:49 +01:00
  • a686171ea7
    convert : Support chat_template.json (#12460) Sigbjørn Skjæret 2025-03-19 08:58:13 +01:00
  • c446b2edd2
    vulkan: Submit once enough matmul work has been recorded (#12406) Jeff Bolz 2025-03-19 02:26:26 -05:00
  • d84635b1b0
    opencl: improve profiling (#12442) lhez 2025-03-18 12:54:55 -07:00
  • 75422e8bc4
    graph : normalize Q, K, V shapes + sync cross attention (#12449) Georgi Gerganov 2025-03-18 21:35:19 +02:00
  • bb115d2bf7
    musa: override warp_size of musa device to 32 (#12445) R0CKSTAR 2025-03-19 02:28:26 +08:00
  • 29fff308c7
    llama : support converting Mistral Small text-only (#12450) Xuan-Son Nguyen 2025-03-18 19:16:19 +01:00
  • c6af2161b2
    speculative : fix seg fault in certain cases (#12454) Georgi Gerganov 2025-03-18 19:35:11 +02:00
  • 99aa304fb9
    llama : add support for EXAONE tied word embeddings (#12451) Xuan-Son Nguyen 2025-03-18 17:24:33 +01:00
  • 8551c44d84
    context : always use non-causal attention for encoder graphs (#12447) Georgi Gerganov 2025-03-18 13:05:49 +02:00
  • 35cae5ba05
    SYCL: using graphs is configurable by environment variable and compile option (#12371) Łukasz Ślusarczyk 2025-03-18 11:16:31 +01:00
  • 810e0af3f5
    server : fix warmup draft cache type (#12446) Georgi Gerganov 2025-03-18 12:05:42 +02:00
  • eba92d64c3
    cmake : fix PowerPC build (#12241) Prajwal B Mehendarkar 2025-03-18 15:07:33 +05:30
  • d9a14523bb
    ggml : add SVE support for q6_K_q8_K (#12361) fj-y-saito 2025-03-18 17:14:39 +09:00
  • fd123cfead
    Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues (#12434) 0cc4m 2025-03-18 07:21:40 +01:00
  • a53f7f7b88
    fixed compilation warnings in ggml-sycl (#12424) Łukasz Ślusarczyk 2025-03-18 01:51:25 +01:00
  • 7dfad387e3
    llama: Add support for RWKV v7 architecture (#12412) Molly Sophia 2025-03-18 07:27:50 +08:00
  • 60c902926c
    docs : bring llama-cli conversation/template docs up-to-date (#12426) Sigbjørn Skjæret 2025-03-17 21:14:32 +01:00
  • b1b132efcb
    cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394) Gaurav Garg 2025-03-17 23:55:13 +05:30
  • 01e8f2138b
    ggml-vulkan: remove unused find_program(glslc) (#12416) Guus Waals 2025-03-18 00:35:43 +08:00
  • 484a8ab513
    vulkan: Add N/2 and N/4 optimized paths in coopmat2 shader (#12312) Jeff Bolz 2025-03-17 09:26:18 -05:00
  • cf2270e4d3
    vulkan: subgroup size tuning (#12087) Daniele 2025-03-17 12:42:33 +01:00
  • f07690c930
    vulkan: use fp32 in coopmat2 q4_k dequant function (#12309) Jeff Bolz 2025-03-17 04:43:35 -05:00
  • 891c63956d
    vulkan: Pad N dimension of B matrix for coopmat2 perf, to avoid bounds checking (#12273) Jeff Bolz 2025-03-17 04:41:59 -05:00
  • 2f21123c1d
    vulkan: Adjust coopmat2 tile sizes and selection heuristic (#12258) Jeff Bolz 2025-03-17 04:35:00 -05:00
  • 374101fd74
    cmake : enable building llama.cpp using system libggml (#12321) Christian Kastner 2025-03-17 10:05:23 +01:00
  • b3c9a65673
    SYCL: set extras only on GGML_TYPE_Q4_0 (#12366) Akarshan Biswas 2025-03-17 07:15:12 +05:30
  • 8ba95dca20
    llama : fix OLMo-2-0325-32B-Instruct K-norm size (#12400) Sigbjørn Skjæret 2025-03-16 18:46:36 +01:00
  • dc079cfdff
    context : fix init of n_outputs (#12397) Georgi Gerganov 2025-03-16 19:29:36 +02:00
  • 7b61bcc87c
    ci : add --symlinks to xcframework zip command (#12409) Daniel Bevenius 2025-03-16 18:22:05 +01:00
  • f4c3dd5daa
    llama-tts : add '-o' option (#12398) marcoStocchi 2025-03-15 17:23:11 +01:00
  • 3d35d87b41
    SYCL: Delete redundant plus sign and space (#12391) aubreyli 2025-03-15 22:49:03 +08:00
  • b19bd064c0
    SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399) fairydreaming 2025-03-15 15:19:30 +01:00
  • 92a391327e
    [CANN]MUL_MAT optimization (#12382) Chenguang Li 2025-03-15 09:31:08 +08:00
  • 9f2250ba72
    Add CLI arg to llama-run to adjust the number of threads used (#12370) Eric Curtin 2025-03-14 16:41:20 +00:00
  • 774973b8f3
    main : add -sysf / --system-prompt-file (#12249) (#12250) Sigbjørn Skjæret 2025-03-14 16:57:05 +01:00
  • 8fcb563613
    Load all MoE experts during warmup (#11571) fairydreaming 2025-03-14 13:47:05 +01:00
  • add2a3aa5a
    server: fix "--grammar-file" parameter (#12285) Victor 2025-03-14 11:21:17 +01:00
  • c522ce4143
    graph : simplify attn input build for unified KV cache (#12381) Georgi Gerganov 2025-03-14 10:47:44 +02:00
  • 081bee8c64
    hparams : add SWA rope parameters (#12374) Georgi Gerganov 2025-03-14 09:03:24 +02:00
  • 84d5475541
    llama : fix Gemma3 SWA KV cache shift (#12373) Georgi Gerganov 2025-03-13 19:08:07 +02:00
  • be7c303410
    arg : no n_predict = -2 for examples except for main and infill (#12364) Xuan-Son Nguyen 2025-03-13 12:34:54 +01:00
  • e0dbec0bc6
    llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) Georgi Gerganov 2025-03-13 12:35:44 +02:00
  • 2048b5913d
    server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338) Ishaan Gandhi 2025-03-13 06:10:05 -04:00
  • f08f4b3187
    Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) Oscar Barenys 2025-03-12 20:06:58 +01:00
  • 80a02aa858
    llama.swiftui : fix xcframework dir in README [no ci] (#12353) Daniel Bevenius 2025-03-12 13:45:32 +01:00
  • 363f8c5d67
    sycl : variable sg_size support for mmvq kernels (#12336) Alberto Cabrera Pérez 2025-03-12 09:57:32 +00:00
  • 34c961b181
    CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315) uvos 2025-03-12 10:14:11 +01:00
  • 7841fc723e
    llama : Add Gemma 3 support (+ experimental vision capability) (#12343) Xuan-Son Nguyen 2025-03-12 09:30:24 +01:00
  • bf69cfe62f
    vulkan: fix bug in coopmat1 mul_mat_id (#12316) Jeff Bolz 2025-03-12 00:59:19 -05:00
  • 10f2e81809
    CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. (#12177) uvos 2025-03-11 20:16:03 +01:00
  • ba7654380a
    ggml-backend : fix backend search path (#12330) jklincn 2025-03-11 21:25:17 +08:00
  • 6ab2e4765a
    metal : Cache the Metal library at the device context level (#12265) BB-fat 2025-03-11 19:45:02 +08:00
  • 96e1280839
    clip : bring back GPU support (#12322) Xuan-Son Nguyen 2025-03-11 09:20:16 +01:00
  • 2c9f833d17
    mat vec double buffer (#12188) Eve 2025-03-10 19:28:11 +00:00
  • 251364549f
    musa: support new arch mp_31 and update doc (#12296) R0CKSTAR 2025-03-11 01:18:25 +08:00
  • 8acdacb3ea
    opencl: use OpenCL C standard supported by the device (#12221) Henry Linjamäki 2025-03-10 18:57:00 +02:00
  • 89b2b56e86
    readme: added Sidekick to available UIs (#12311) John Bean 2025-03-10 22:13:09 +08:00
  • e128a1bf5b
    tests : fix test-quantize-fns to init the CPU backend (#12306) Georgi Gerganov 2025-03-10 14:07:15 +02:00
  • 6ef79a67ca
    common : refactor '-o' option (#12278) marcoStocchi 2025-03-10 12:34:13 +01:00
  • 4e39a3c332
    server: extract <think> tags from qwq outputs (#12297) Olivier Chafik 2025-03-10 10:59:03 +00:00
  • be421fc429
    tool-call: ensure there's always a non-empty tool call id (#12292) Olivier Chafik 2025-03-10 09:45:29 +00:00
  • 87c2630546
    allow missing content in message if tool_calls provided (#12293) Olivier Chafik 2025-03-10 09:45:07 +00:00
  • 2b3a25c212
    sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291) Olivier Chafik 2025-03-10 09:44:42 +00:00
  • 8352cdc87b
    llava : fix bug in minicpm-v code (#11513) tc-mb 2025-03-10 16:33:24 +08:00
  • 1e2f78a004
    server : add speculative decoding presets for FIM (#12287) Georgi Gerganov 2025-03-09 19:08:20 +02:00
  • 0fd7ca7a21
    authors : update (#12271) Georgi Gerganov 2025-03-08 18:26:00 +02:00
  • 6fefc05a7a
    ggml-backend : make path_str compatible with C++20 (#12269) Jason C.H 2025-03-09 00:02:39 +08:00
  • 7ab364390f
    server : infill gen ends on new line (#12254) Georgi Gerganov 2025-03-07 20:54:30 +02:00
  • 7c7f3b7f43
    ggml : skip intermediate .air file when compiling .metallib (#12247) Daniel Bevenius 2025-03-07 14:15:27 +01:00
  • 102ac1891d sync : ggml Georgi Gerganov 2025-03-07 14:00:27 +02:00
  • d6ae2fa061 ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) vmobilis 2025-03-07 11:11:40 +03:00
  • 68d0027f3d
    ggml-cpu: faster AVX2 variant for IQ1_M (#12216) Rémy O 2025-03-07 12:54:22 +01:00
  • ea002810a2
    ci : fix save-load test invocations (#12245) Georgi Gerganov 2025-03-07 12:19:31 +02:00
  • 8fad3c7a7c
    server : Log original chat template parsing error (#12233) Sigbjørn Skjæret 2025-03-07 11:15:33 +01:00
  • 7cf64f6bee
    sync: minja - support QwQ-32B (#12235) Olivier Chafik 2025-03-07 09:33:37 +00:00
  • 5e2d57b2b2
    metal : simplify kernel arguments using a struct (#3229) (#12194) BB-fat 2025-03-07 15:35:57 +08:00
  • f1648e91cf
    HIP: fix rocWMMA build flags under Windows (#12230) David Huang 2025-03-07 15:06:08 +08:00
  • d6c95b0740
    metal : fix default.metallib build (#12224) Daniel Bevenius 2025-03-07 06:23:16 +01:00
  • d76a86d967
    opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops (#12217) lhez 2025-03-06 16:20:35 -08:00
  • 776f9e59cc
    cmake : fix undefined reference errors for std::filesystem in ggml (#12092) (#12094) xiaofei 2025-03-07 06:58:25 +08:00