Commit graph

  • 9596506965
    kv-cache : fix split_equal handling in unified implementation (#14130) Georgi Gerganov 2025-06-12 10:02:15 +03:00
  • a20b2b05bc
    context : round n_tokens to next multiple of n_seqs when reserving (#14140) compilade 2025-06-12 02:56:04 -04:00
  • 2e89f76b7a
    common: fix issue with regex_escape routine on windows (#14133) bandoti 2025-06-11 17:19:44 -03:00
  • 532802f938
    Implement GGML_CPU_ALL_VARIANTS for ARM (#14080) Christian Kastner 2025-06-11 19:07:44 +00:00
  • d4e0d95cf5
    chore : clean up relative source dir paths (#14128) Sigbjørn Skjæret 2025-06-11 19:04:23 +02:00
  • cc66a7f78f
    tests : add test-tokenizers-repo (#14017) Sigbjørn Skjæret 2025-06-11 17:16:32 +02:00
  • bd248d4dc7
    vulkan: Better thread-safety for command pools/buffers (#14116) Jeff Bolz 2025-06-11 09:48:52 -05:00
  • 7781e5fe99
    webui: Wrap long numbers instead of infinite horizontal scroll (#14062) Aman 2025-06-11 22:42:25 +08:00
  • 89a184fa71
    kv-cache : relax SWA masking condition (#14119) Georgi Gerganov 2025-06-11 16:48:45 +03:00
  • 2baf07727f
    server : pass default --keep argument (#14120) Taylor 2025-06-11 06:43:43 -04:00
  • 7ae2932116
    kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121) Georgi Gerganov 2025-06-11 12:52:45 +03:00
  • 1f7d50b293
    vulkan: Track descriptor pools/sets per-context (#14109) Jeff Bolz 2025-06-11 00:19:25 -05:00
  • 4c763c8d1b
    opencl: add mul_mv_id_q4_0_f32_8x_flat (#14003) lhez 2025-06-10 16:55:58 -07:00
  • dad5c44398
    kv-cache : avoid modifying recurrent cells when setting inputs (#13834) compilade 2025-06-10 18:20:14 -04:00
  • 55f6b9fa65
    convert : fix duplicate key DeepSeek-R1 conversion error (#14103) Sigbjørn Skjæret 2025-06-10 23:29:52 +02:00
  • 3678b838bb
    llama : support GEGLU for jina-bert-v2 (#14090) Sigbjørn Skjæret 2025-06-10 18:02:08 +02:00
  • 652b70e667
    vulkan: force device 0 in CI (#14106) Jeff Bolz 2025-06-10 10:53:47 -05:00
  • 3a12db23b6
    Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104) Juk Armstrong 2025-06-10 16:48:07 +01:00
  • ae92c1855b sync : ggml Georgi Gerganov 2025-06-10 17:37:45 +03:00
  • b7ce1ad1e3 ggml : fix weak alias win32 (whisper/0) Georgi Gerganov 2025-06-10 11:34:10 +03:00
  • 97340b4c99
    Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099) 0cc4m 2025-06-10 14:01:33 +02:00
  • 2bb0467043
    rpc : nicer error messages for RPC server crash (#14076) Isaac McFadyen 2025-06-10 02:41:01 -04:00
  • b8e2194efc sync : ggml Georgi Gerganov 2025-06-10 09:20:51 +03:00
  • 1a3b5e80f7 Add in-build ggml::ggml ALIAS library (ggml/1260) Kai Pastor 2025-06-03 12:33:28 +02:00
  • 1f63e75f3b
    metal : use less stack memory in FA kernel (#14088) Georgi Gerganov 2025-06-09 23:05:02 +03:00
  • 40cbf571c9
    kv-cache : fix shift and defrag logic (#14081) Georgi Gerganov 2025-06-09 23:04:35 +03:00
  • 7f4fbe5183
    llama : allow building all tests on windows when not using shared libs (#13980) Diego Devesa 2025-06-09 11:03:09 -07:00
  • f470bc36be
    ggml-cpu : split arch-specific implementations (#13892) xctan 2025-06-09 22:47:13 +08:00
  • 8f47e25f56
    cuda : fix device sync on buffer clear (#14033) Diego Devesa 2025-06-09 07:36:26 -07:00
  • 201b31dc2e
    graph : fix geglu (#14077) Georgi Gerganov 2025-06-09 17:17:31 +03:00
  • e21d2d4ae2
    CANN: Simplify the environment variable setting(#13104) Xinpeng Dou 2025-06-09 19:47:39 +08:00
  • dc0623fddb
    webui: fix sidebar being covered by main content (#14082) R0CKSTAR 2025-06-09 18:01:17 +08:00
  • 87d34b381d
    server : fix LRU check (#14079) Georgi Gerganov 2025-06-09 12:57:58 +03:00
  • b460d16ae8
    sycl: Add reorder to Q6_K mmvq implementation (#13885) Nicolò Scipione 2025-06-09 11:47:07 +02:00
  • 91a8ee6a6f
    add geglu activation function (#14074) Đinh Trọng Huy 2025-06-09 13:15:31 +09:00
  • 056eb74534
    CANN: Enable labeler for Ascend NPU (#13914) Yuanhao Ji 2025-06-09 11:20:06 +08:00
  • 247e5c6e44
    cuda : fix buffer type check with integrated GPUs (#14069) Diego Devesa 2025-06-08 11:39:56 -07:00
  • 5787b5da57
    ci: add LoongArch cross-compile build (#13944) 吴小白 2025-06-07 21:39:11 +08:00
  • 228f34c9ce
    SYCL: Implement few same quantized type copy kernels (#13739) Akarshan Biswas 2025-06-07 18:58:20 +05:30
  • 0974ad7a7c
    llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050) Sigbjørn Skjæret 2025-06-07 14:13:12 +02:00
  • 745aa5319b
    llama : deprecate llama_kv_self_ API (#14030) Georgi Gerganov 2025-06-06 14:11:15 +03:00
  • 487a5e0401
    context : fix SWA-related warning for multiple sequences (#14045) Georgi Gerganov 2025-06-06 13:29:18 +03:00
  • d17a809ef0
    llama : support multiple classifier outputs and labels (#13940) Sigbjørn Skjæret 2025-06-06 09:03:25 +02:00
  • 1caae7fc6c
    gguf-py : add add_classifier_output_labels method to writer (#14031) Sigbjørn Skjæret 2025-06-05 17:42:31 +02:00
  • 669c13e0f6
    vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001) Masato Nakasaka 2025-06-05 23:00:29 +09:00
  • 146b88e8b3
    ci: fix CUDA build failure on autodl cloud machines (#14005) pockers21 2025-06-05 06:25:29 -07:00
  • 7f37b6cf1e
    memory : migrate from llama_kv_cache to more generic llama_memory (#14006) Georgi Gerganov 2025-06-05 15:29:22 +03:00
  • 3a077146a4
    llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) Diego Devesa 2025-06-05 02:57:42 -07:00
  • d01d112abb
    readme : add badge (#13938) Olexandr88 2025-06-05 10:50:55 +03:00
  • 9f47fa5792
    vocab : warn about missing mask token (#14022) Sigbjørn Skjæret 2025-06-05 09:29:18 +02:00
  • 9e31bec4fd
    context : fix pos_min initialization upon error decode (#14008) Georgi Gerganov 2025-06-05 09:06:29 +03:00
  • 5a8ae3053c
    vulkan: automatically deduce size of push constants (#13936) Jeff Bolz 2025-06-05 00:17:58 -05:00
  • 0d3984424f
    ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813) Ervin Áron Tasnádi 2025-06-04 22:02:00 +02:00
  • 3e63a58ef7
    kv-cache : refactor the update/defrag mechanism (#13988) Georgi Gerganov 2025-06-04 18:58:20 +03:00
  • 2589ad3704
    ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) Diego Devesa 2025-06-04 06:37:40 -07:00
  • 482548716f
    releases : use dl backend for linux release, remove arm64 linux release (#13996) Diego Devesa 2025-06-04 04:15:54 -07:00
  • 3ac67535c8
    llama-graph : use ggml_repeat_4d (#13998) Xuan-Son Nguyen 2025-06-04 10:11:26 +02:00
  • 0b4be4c435
    CUDA: fix FTZ in FA for Gemma 3 (#13991) Johannes Gäßler 2025-06-04 08:57:05 +02:00
  • e0e806f52e
    kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985) Georgi Gerganov 2025-06-04 09:50:32 +03:00
  • 7e00e60ef8
    vulkan: fix warnings in perf logger querypool code (#13937) Jeff Bolz 2025-06-03 13:30:22 -05:00
  • ea1431b0fa
    docs : add "Quick start" section for new users (#13862) Xuan-Son Nguyen 2025-06-03 13:09:36 +02:00
  • 71e74a3ac9
    opencl: add backend_synchronize (#13939) lhez 2025-06-02 16:54:58 -07:00
  • bfb1e012a0
    OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840) rmatif 2025-06-02 23:53:36 +00:00
  • 3637576288
    server : disable speculative decoding for SWA models (#13970) Georgi Gerganov 2025-06-02 21:34:40 +03:00
  • ea394d7ab1
    metal : use F32 accumulators in FA kernels (#13975) Georgi Gerganov 2025-06-02 21:33:40 +03:00
  • 5582c49c39
    gemma : more consistent attention scaling for v2 and v3 (#13951) Georgi Gerganov 2025-06-02 20:54:26 +03:00
  • c9bbc77931
    server: update deepseek reasoning format (pass reasoning_content as diffs) (#13933) Olivier Chafik 2025-06-02 10:15:44 -07:00
  • bfd322796c
    mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961) Xuan-Son Nguyen 2025-06-02 16:29:28 +02:00
  • 093e3f1feb
    cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966) shalinib-ibm 2025-06-02 17:48:36 +05:30
  • 663445b0de
    sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826) Atharva Dubey 2025-06-02 10:12:20 +01:00
  • 7675c555a1
    gguf: fix failure on version == 0 (#13956) Johannes Gäßler 2025-06-01 18:08:05 +02:00
  • 5e1c3aed40
    convert : fix nomic-bert-moe mask token (#13757) Sigbjørn Skjæret 2025-06-01 18:07:21 +02:00
  • c496fe0b1d
    convert : fix vocab padding code for bert models (#13954) Sigbjørn Skjæret 2025-06-01 17:23:11 +02:00
  • e57bb87ced
    ggml: check if non-native endian model is being loaded (#13943) Aaron Teo 2025-06-01 22:53:57 +08:00
  • f3a4b1659c sync : ggml Georgi Gerganov 2025-06-01 12:23:14 +03:00
  • 108009f5c7 vulkan : Remove unexpected ; (ggml/1253) Kai Pastor 2025-05-31 12:49:55 +02:00
  • d337252acf cmake : Fix broken CMake error messages (ggml/1252) Kai Pastor 2025-05-31 12:39:19 +02:00
  • af6f91db47 ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247) Radoslav Gerganov 2025-05-30 09:11:09 +03:00
  • a7b8d35f78 sync : whisper.cpp (ggml/1250) Georgi Gerganov 2025-05-29 13:29:50 +03:00
  • 6eba72b71c ggml : install dynamic backends (ggml/1240) Radoslav Gerganov 2025-05-29 08:34:46 +03:00
  • fedf034a98 ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) Daniel Tang 2025-05-27 20:58:46 -04:00
  • 8726392d3d
    readme : update bindings (#13950) ddh0 2025-06-01 03:44:30 -05:00
  • c04621711a
    parallel : fix n_junk == 0 (#13952) Georgi Gerganov 2025-06-01 11:42:16 +03:00
  • 0fc16b42e8
    kv-cache : split implementation in separate sources (#13920) Georgi Gerganov 2025-06-01 11:39:27 +03:00
  • 053b1539c0
    threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995) Max Krasnyansky 2025-05-31 15:39:19 -07:00
  • b3a89c3d9e
    docs : Note about necessity of having libcurl installed for standard build. (#13945) Jiří Podivín 2025-05-31 18:58:35 +02:00
  • e15898d1c7
    server: allow unclosed thinking tags (#13931) Olivier Chafik 2025-05-31 08:26:10 -07:00
  • 803f8baf4f
    llama : deprecate explicit kv_self defrag/update calls (#13921) Georgi Gerganov 2025-05-31 15:58:33 +03:00
  • 3600cc2886
    llama : use n_swa + n_ubatch cells for SWA cache (#13833) Georgi Gerganov 2025-05-31 15:57:44 +03:00
  • c7e0a2054b
    webui : Replace alert and confirm with custom modals. (#13711) igardev 2025-05-31 12:56:08 +03:00
  • 3f55f781f1
    llama : auto-batch preparation (#13845) Georgi Gerganov 2025-05-31 12:55:57 +03:00
  • 51fa76f172
    mtmd : drop _shared from libmtmd name, merge helpers into libmtmd (⚠️ breaking change) (#13917) Xuan-Son Nguyen 2025-05-31 10:14:29 +02:00
  • 12d0188c0d
    kv-cache : refactor + add llama_memory_state_i (#13746) Georgi Gerganov 2025-05-31 10:24:04 +03:00
  • eb3949938e
    CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (#13895) Shawn yang 2025-05-31 14:48:04 +08:00
  • e562eece7c
    CUDA: fix typo in FlashAttention code (#13926) Johannes Gäßler 2025-05-30 21:22:03 +02:00
  • b47ab7b8e9
    sched : avoid changing cur_copy when a graph is already allocated (#13922) Diego Devesa 2025-05-30 09:56:19 -07:00
  • dd665cc9d4
    parallel : increase the variability of the prompt lengths (#13927) Georgi Gerganov 2025-05-30 19:38:07 +03:00
  • df0c0c7d02
    cuda : prevent using split buffers with 3d/4d matrices (#13919) Diego Devesa 2025-05-30 07:37:18 -07:00
  • b49a8ff96b
    SYCL: Add mrope kernel (#13755) Akarshan Biswas 2025-05-30 19:40:57 +05:30
  • 53f925074d
    sync : vendor (#13901) Georgi Gerganov 2025-05-30 16:25:45 +03:00