Commit graph

  • 1ec208083c
    llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644) SAMI 2025-02-05 14:45:40 +07:00
  • 9f4cc8f8d3
    sync: minja (#11641) Olivier Chafik 2025-02-05 01:00:12 +00:00
  • fd08255d0d
    CUDA: non-contiguous (RMS) norm support (#11659) Johannes Gäßler 2025-02-04 22:21:42 +01:00
  • 3ec9fd4b77
    HIP: force max threads per block to be 1024 (#11621) fxzjshm 2025-02-05 02:18:38 +08:00
  • 3962fc1a79
    server : add try..catch to places not covered by set_exception_handler (#11620) Xuan-Son Nguyen 2025-02-04 18:25:42 +01:00
  • 1bef571f6a
    arg : list RPC devices first when using --list-devices (#11655) Radoslav Gerganov 2025-02-04 18:16:20 +02:00
  • db288b60cb
    tool-call: command r7b fix for normal responses (#11608) Olivier Chafik 2025-02-04 15:48:53 +00:00
  • 106045e7bb
    readme : add llm_client Rust crate to readme bindings (#11628) Shelby Jenkins 2025-02-04 05:20:55 -06:00
  • f117d84b48
    swift : fix llama-vocab api usage (#11645) Jhen-Jie Hong 2025-02-04 19:15:24 +08:00
  • 534c46b53c
    metal : use residency set for other platforms (#11648) Jhen-Jie Hong 2025-02-04 19:07:18 +08:00
  • 387a1598ca
    authors : update Georgi Gerganov 2025-02-04 13:04:10 +02:00
  • 7c9e0ca520
    sync : ggml Georgi Gerganov 2025-02-04 12:59:21 +02:00
  • 8f8290ada9
    cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096) Christian Kastner 2025-02-04 00:17:15 +01:00
  • b34aedd558
    ci : do not stale-close roadmap issues Georgi Gerganov 2025-02-04 09:30:42 +02:00
  • cde3833239
    tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) Olivier Chafik 2025-02-03 23:49:27 +00:00
  • b3451785ac
    server : (webui) revert hacky solution from #11626 (#11634) Xuan-Son Nguyen 2025-02-04 00:10:52 +01:00
  • 1d1e6a90bc
    server : (webui) allow typing and submitting during llm response (#11626) Woof Dog 2025-02-03 22:16:27 +00:00
  • 5598f475be
    server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622) Daniel Bevenius 2025-02-03 16:45:38 +01:00
  • 8ec05832fa
    sync : ggml Georgi Gerganov 2025-02-03 14:57:08 +02:00
  • 21c84b5d2d
    CUDA: fix Volta FlashAttention logic (#11615) Johannes Gäßler 2025-02-03 13:25:56 +01:00
  • d92cb67e37
    server : (webui) Fix Shift+Enter handling (#11609) mashdragon 2025-02-03 09:42:55 +00:00
  • 6eecde3cc8
    HIP: fix flash_attn_stream_k_fixup warning (#11604) Johannes Gäßler 2025-02-02 23:48:29 +01:00
  • 396856b400
    CUDA/HIP: add support for selectable warp size to mmv (#11519) uvos 2025-02-02 22:40:09 +01:00
  • 4d0598e144
    HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601) uvos 2025-02-02 22:08:05 +01:00
  • 90f9b88afb
    nit: more informative crash when grammar sampler fails (#11593) Olivier Chafik 2025-02-02 19:58:34 +00:00
  • 864a0b67a6
    CUDA: use mma PTX instructions for FlashAttention (#11583) Johannes Gäßler 2025-02-02 19:31:09 +01:00
  • 84ec8a58f7
    Name colors (#11573) Eric Curtin 2025-02-02 16:14:48 +01:00
  • bfcce4d693
    tool-call: support Command R7B (+ return tool_plan "thoughts" in API) (#11585) Olivier Chafik 2025-02-02 09:25:38 +00:00
  • 69804487e0
    Fix exotic ci env that lacks ostringstream::str (#11581) Olivier Chafik 2025-02-02 09:10:15 +00:00
  • ff227703d6
    sampling : support for llguidance grammars (#10224) Michał Moskal 2025-02-01 23:55:32 -08:00
  • 0cec062a63
    llama : add support for GLM-Edge and GLM-Edge-V series models (#10573) piDack 2025-02-02 15:48:46 +08:00
  • 53debe6f3c
    ci: use sccache on windows HIP jobs (#11553) Olivier Chafik 2025-02-01 18:22:38 +00:00
  • cfd74c86db
    sync: minja (418a2364b5) (#11574) Olivier Chafik 2025-02-01 12:24:51 +00:00
  • ecef206ccb
    Implement s3:// protocol (#11511) Eric Curtin 2025-02-01 11:30:54 +01:00
  • 5bbc7362cb
    ci: simplify cmake build commands (#11548) Olivier Chafik 2025-02-01 00:01:20 +00:00
  • aa6fb13213
    ci: use sccache on windows instead of ccache (#11545) Olivier Chafik 2025-01-31 17:12:40 +00:00
  • a83f528688
    tool-call: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539) Olivier Chafik 2025-01-31 14:15:25 +00:00
  • b1bcd309fc
    fix stop regression (#11543) Olivier Chafik 2025-01-31 13:48:31 +00:00
  • 5783575c9d
    Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533) Olivier Chafik 2025-01-31 08:24:29 +00:00
  • 4a2b196d03
    server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531) Olivier Chafik 2025-01-31 08:12:40 +00:00
  • 1bd3047a93
    common: Add missing va_end (#11529) Steve Grubb 2025-01-31 00:58:55 -05:00
  • a2df2787b3
    server : update help metrics processing/deferred (#11512) Daniel Bevenius 2025-01-31 06:04:53 +01:00
  • 553f1e46e9
    ci: ccache for all github worfklows (#11516) Olivier Chafik 2025-01-30 22:01:06 +00:00
  • 8b576b6c55
    Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639) Olivier Chafik 2025-01-30 19:13:58 +00:00
  • 27d135c970 HIP: require at least HIP 5.5 uvos 2025-01-29 19:36:00 +01:00
  • 6af1ca48cb HIP: Prepare reduction operators for wave 64 uvos 2025-01-29 19:12:42 +01:00
  • c300e68ef4 CUDA/HIP: add warp_size to cuda_device_info uvos 2025-01-29 17:46:23 +01:00
  • 3d804dec76
    sync: minja (#11499) Olivier Chafik 2025-01-30 10:30:27 +00:00
  • ffd0821c57
    vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496) mgroeber9110 2025-01-30 11:10:59 +01:00
  • 4314e56c4f
    server : use lambda instead of std::bind (#11507) Daniel Bevenius 2025-01-30 11:05:00 +01:00
  • 496e5bf46b
    server : (docs) added response format for /apply-template [no ci] (#11503) Isaac McFadyen 2025-01-30 04:11:53 -05:00
  • 7919256c57
    readme : reference examples relative links (#11505) Guspan Tanadi 2025-01-30 12:58:02 +07:00
  • e0449763a4
    server : update json snippets in README.md [no ci] (#11492) Daniel Bevenius 2025-01-30 05:48:14 +01:00
  • eb7cf15a80
    server : add /apply-template endpoint for additional use cases of Minja functionality (#11489) Nigel Bosch 2025-01-29 12:45:44 -06:00
  • 66ee4f297c
    vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360) Rémy Oudompheng 2025-01-29 18:29:39 +01:00
  • e51c47b401
    server : update auto gen files comments [no ci] (#11484) Daniel Bevenius 2025-01-29 16:34:18 +01:00
  • 2711d0215f
    vulkan: Catch pipeline creation failure and print an error message (#11436) Jeff Bolz 2025-01-29 09:26:50 -06:00
  • f0d4b29edf
    Parse https://ollama.com/library/ syntax (#11480) Eric Curtin 2025-01-29 12:23:10 +01:00
  • 815857791d
    sync : ggml Georgi Gerganov 2025-01-29 11:25:29 +02:00
  • 1a0e87d291
    ggml : add option to not print stack on abort (ggml/1081) William Tambellini 2025-01-23 11:59:08 -08:00
  • d2e518e9b4
    ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) issixx 2025-01-17 21:29:08 +09:00
  • b636228c0a
    embedding : enable --no-warmup option (#11475) Daniel Bevenius 2025-01-29 09:38:54 +01:00
  • 325afb370a
    llama: fix missing k_cache store for rwkv6qwen2 (#11445) Molly Sophia 2025-01-29 12:07:21 +08:00
  • 794fe23f29
    cmake: add hints for locating ggml on Windows using Llama find-package (#11466) Emreerdog 2025-01-29 02:22:06 +03:00
  • cf8cc856d7
    server : Fixed wrong function name in llamacpp server unit test (#11473) peidaqi 2025-01-28 16:03:42 -07:00
  • d0c08040b6
    ci : fix build CPU arm64 (#11472) Xuan-Son Nguyen 2025-01-29 00:02:56 +01:00
  • be5ef7963f
    HIP: Supress transformation warning in softmax.cu uvos 2025-01-28 23:06:32 +01:00
  • cae9fb4361
    HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080) Nikita Sarychev 2025-01-28 07:42:20 -08:00
  • 7fee2889e6
    Add github protocol pulling and http:// (#11465) Eric Curtin 2025-01-28 15:45:41 +01:00
  • d7d1eccacc
    docker: allow installing pip packages system-wide (#11437) Nuno 2025-01-28 15:17:25 +01:00
  • 4bf3119d61
    cmake : don't fail on GGML_CPU=OFF (#11457) someone13574 2025-01-28 09:15:34 -05:00
  • f643120bad
    docker: add perplexity and bench commands to full image (#11438) Nuno 2025-01-28 11:42:32 +01:00
  • 6e84b0ab8e
    SYCL : SOFTMAX F16 mask support and other fixes (#11261) Akarshan Biswas 2025-01-28 15:26:58 +05:30
  • 2b8525d5c8
    Handle missing model in CLI parameters for llama-run (#11399) Michael Engel 2025-01-28 09:32:40 +01:00
  • a4417ddda9
    Add new hf protocol for ollama (#11449) Eric Curtin 2025-01-27 19:36:10 +01:00
  • d6d24cd9ed
    AMD: parse the architecture as supplied by gcnArchName (#11244) Haus1 2025-01-27 08:58:17 -05:00
  • a5203b4465
    llama : minor fixes for up llama load model speed (#11448) lexasub 2025-01-27 17:42:09 +04:00
  • df984e0147
    llama: refactor llama_decode_impl (#11381) Johannes Gäßler 2025-01-27 12:07:12 +01:00
  • acd38efee3
    metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441) Ihar Hrachyshka 2025-01-27 02:41:59 -05:00
  • caf773f249
    docker : fix ARM build and Vulkan build (#11434) Xuan Son Nguyen 2025-01-26 22:45:32 +01:00
  • 178a7eb952
    metal : use residency sets (#11427) Georgi Gerganov 2025-01-26 20:06:16 +02:00
  • 6f53d8a6b4
    docker: add missing vulkan library to base layer and update to 24.04 (#11422) Nuno 2025-01-26 18:22:43 +01:00
  • 19f65187cb
    cmake: add ggml find package (#11369) bandoti 2025-01-26 12:07:48 -04:00
  • 1d8ee06000
    rpc: fix register position (#11424) Frank Mai 2025-01-26 23:20:34 +08:00
  • 2cc9b8c32c
    readme : update hot topics Georgi Gerganov 2025-01-26 14:30:15 +02:00
  • f35726c2fb
    build: apply MSVC /bigobj option to c/cpp files only (#11423) Jeff Bolz 2025-01-25 20:10:03 -06:00
  • 4a75d19376
    vulkan: compile shaders on-demand (#11406) Jeff Bolz 2025-01-25 15:29:57 -06:00
  • 26771a1491
    Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420) uvos 2025-01-25 21:01:12 +01:00
  • ca6baf76c1
    build: add /bigobj to MSVC build (#11407) Jeff Bolz 2025-01-25 11:26:37 -06:00
  • 6e264a905b
    docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419) Diego Devesa 2025-01-25 17:22:41 +01:00
  • 49b0e3cec4
    server : fix cleaning up stream task (#11418) Xuan Son Nguyen 2025-01-25 16:36:44 +01:00
  • 20a758155b
    docker : fix CPU ARM build (#11403) Diego Devesa 2025-01-25 15:22:29 +01:00
  • 00c24acb2a
    ci : fix line breaks on windows builds (#11409) Georgi Gerganov 2025-01-25 13:36:48 +02:00
  • 466ea66f33
    CANN: Add Ascend CANN build ci (#10217) jiahao su 2025-01-25 07:26:01 +08:00
  • 5f0db9522f
    hip : Add hipGraph and VMM support to ROCM (#11362) uvos 2025-01-25 00:02:23 +01:00
  • c5d9effb49
    CUDA: fix FP16 cuBLAS GEMM (#11396) Johannes Gäßler 2025-01-24 21:02:43 +01:00
  • 9fbadaef4f
    rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356) uvos 2025-01-24 17:50:49 +01:00
  • 9755129c27
    release : pack /lib in the packages (#11392) Georgi Gerganov 2025-01-24 18:41:30 +02:00
  • a07c2c8a52
    docs : Update readme to build targets for local docker build (#11368) Jafar Uruç 2025-01-24 13:30:13 +00:00
  • 8137b4bb2b
    CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380) Johannes Gäßler 2025-01-24 12:38:31 +01:00