Commit graph

  • 1af6945eb0
    cmake : avoid -march=native when reproducible build is wanted (#11366) Bernhard M. Wiedemann 2025-01-24 12:21:35 +01:00
  • 01f37edf1a
    Update llama-run README.md (#11386) Eric Curtin 2025-01-24 09:39:24 +00:00
  • c07e87f38b
    server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364) stduhpf 2025-01-24 09:02:38 +01:00
  • 564804b79b
    tests: fix some mul_mat test gaps (#11375) Jeff Bolz 2025-01-23 14:51:24 -06:00
  • 05f63cc9ee
    Update documentation (#11373) Eric Curtin 2025-01-23 20:04:31 +00:00
  • f7fb43cd0b
    Add -ngl (#11372) Eric Curtin 2025-01-23 16:16:18 +00:00
  • 5845661640
    server : add more clean up when cancel_tasks is called (#11340) Xuan Son Nguyen 2025-01-23 13:56:05 +01:00
  • f211d1dc10
    Treat hf.co/ prefix the same as hf:// (#11350) Eric Curtin 2025-01-23 10:38:20 +00:00
  • 955a6c2d91
    Vulkan-run-test: fix mmq_wg_denoms (#11343) amd-dwang 2025-01-23 15:14:28 +08:00
  • 1971adf55e
    vulkan: sort shaders for more deterministic binary (#11315) Jeff Bolz 2025-01-23 01:07:50 -06:00
  • 5245729e33
    vulkan: fix diag_mask_inf (#11323) Jeff Bolz 2025-01-23 01:01:17 -06:00
  • 6152129d05
    main : update README documentation for batch size (#11353) Diego Devesa 2025-01-22 19:22:20 +01:00
  • 16d3df7ab0
    readme : add plugin links (#11355) Georgi Gerganov 2025-01-22 19:44:26 +02:00
  • 12c2bdf2de
    server : fix draft context not being released (#11354) Diego Devesa 2025-01-22 17:44:40 +01:00
  • c64d2becb1
    minja: sync at 0f5f7f2b37 (#11352) Olivier Chafik 2025-01-22 16:16:27 +00:00
  • 96f4053934
    Adding logprobs to /v1/completions (#11344) Jiří Podivín 2025-01-22 12:51:32 +01:00
  • a94f3b2727
    common: utils to split / join / repeat strings (from json converter) (#11342) Olivier Chafik 2025-01-22 09:51:44 +00:00
  • 3e3357fd77
    llava : support Minicpm-omni (#11289) tc-mb 2025-01-22 15:35:48 +08:00
  • 6171c9d258
    Add Jinja template support (#11016) Olivier Chafik 2025-01-21 13:18:51 +00:00
  • e28245f35f
    export-lora : fix tok_embd tensor (#11330) Xuan Son Nguyen 2025-01-21 14:07:12 +01:00
  • 6da5bec81c
    rpc : better caching of the base buffer pointer (#11331) Radoslav Gerganov 2025-01-21 15:06:41 +02:00
  • 2e2f8f093c
    linenoise.cpp refactoring (#11301) Eric Curtin 2025-01-21 09:32:35 +00:00
  • 2139667ec4
    metal : fix out-of-bounds write (#11314) Georgi Gerganov 2025-01-21 08:48:13 +02:00
  • 80d0d6b4b7
    common : add -hfd option for the draft model (#11318) Georgi Gerganov 2025-01-20 22:29:43 +02:00
  • aea8ddd516
    vulkan: fix coopmat2 validation failures (#11284) Jeff Bolz 2025-01-20 10:38:32 -06:00
  • 9f7add1cde
    examples : fix add_special conditions (#11311) Georgi Gerganov 2025-01-20 16:36:08 +02:00
  • 90d987b105
    mmap: add include for cerrno (#11296) Christopher Nielsen 2025-01-20 09:02:43 -05:00
  • a4251edd6f
    cmake: fix shell command quoting in build-info script (#11309) Michael Podvitskiy 2025-01-20 15:02:15 +01:00
  • ec7f3ac9ab
    llama : add support for Deepseek-R1-Qwen distill model (#11310) Xuan Son Nguyen 2025-01-20 14:35:07 +01:00
  • ef6dada60c
    cont : fix whitespaces (#11305) Georgi Gerganov 2025-01-20 09:29:32 +02:00
  • ae3c1db2f9
    llama : re-add LLM_ARCH_PHIMOE (#11305) Kyle Bruene 2025-01-20 01:21:01 -06:00
  • 92bc493917
    tests : increase timeout when sanitizers are enabled (#11300) Georgi Gerganov 2025-01-19 20:22:30 +02:00
  • b9daaffe02
    simple-chat : fix BOS being added to each message (#11278) Georgi Gerganov 2025-01-19 18:12:09 +02:00
  • 99487b57d4
    SYCL: Introducing memory host pool (#11251) Nicolò Scipione 2025-01-19 14:33:34 +01:00
  • a1649cc13f
    Adding linenoise.cpp to llama-run (#11252) Eric Curtin 2025-01-18 14:42:31 +00:00
  • 4dd34ff831
    cmake : add sanitizer flags for llama.cpp (#11279) Georgi Gerganov 2025-01-18 16:18:15 +02:00
  • f30f099228
    server : implement cancellable request (#11285) Xuan Son Nguyen 2025-01-18 14:12:05 +01:00
  • f26c874179
    scripts : restore hf.sh (#11288) Georgi Gerganov 2025-01-18 13:18:32 +02:00
  • 6390a998bf
    tts : add guide tokens support (#11186) LostRuins Concedo 2025-01-18 18:20:57 +08:00
  • 44e18ef939
    vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281) Jeff Bolz 2025-01-18 02:26:50 -06:00
  • 3edfa7d375
    llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) codezjx 2025-01-17 20:57:56 +08:00
  • 667d72846c
    rpc : early register backend devices (#11262) Radoslav Gerganov 2025-01-17 10:57:09 +02:00
  • a133566d34
    vocab : fix double-eos check (#11273) Georgi Gerganov 2025-01-17 09:28:00 +02:00
  • 960ec65273
    llama : fix deprecation message: vocabable -> vocab (#11269) David Renshaw 2025-01-17 02:12:01 -05:00
  • 7a689c415e
    README : added kalavai to infrastructure list (#11216) musoles 2025-01-17 00:10:49 +00:00
  • bd38ddea01
    vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166) Jeff Bolz 2025-01-16 15:47:10 -06:00
  • 466300fe14
    vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206) Jeff Bolz 2025-01-16 15:23:49 -06:00
  • 206bc53422
    vulkan: optimize coopmat2 q2_k dequant function (#11130) Jeff Bolz 2025-01-16 15:16:39 -06:00
  • 4dbc8b9cb7
    llama : add internlm3 support (#11233) RunningLeon 2025-01-17 02:10:38 +08:00
  • 9c8dcefe17
    CUDA: backwards pass for misc. ops, add tests (#11257) Johannes Gäßler 2025-01-16 16:43:38 +01:00
  • 681149ced2
    llama : add llama_model_load_from_splits (#11255) Xuan Son Nguyen 2025-01-16 13:54:08 +01:00
  • c67cc9837d
    ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227) fj-y-saito 2025-01-16 18:11:49 +09:00
  • adc5dd92e8
    vulkan: scale caching for k quants + misc fixes (#11081) Eve 2025-01-15 19:50:13 +00:00
  • f11cfdfd7f
    ci : use -no-cnv in gguf-split tests (#11254) Georgi Gerganov 2025-01-15 18:28:35 +02:00
  • 1d8504338e
    fix: ggml: fix vulkan-shaders-gen build (#10448) Junil Kim 2025-01-15 22:17:42 +09:00
  • 432df2d5f9
    RoPE: fix back, CUDA support for back + noncont. (#11240) Johannes Gäßler 2025-01-15 12:51:37 +01:00
  • 0ccd7f3eb2
    examples : add embd_to_audio to tts-outetts.py [no ci] (#11235) Daniel Bevenius 2025-01-15 05:44:38 +01:00
  • f446c2cf6a
    SYCL: Add gated linear attention kernel (#11175) Akarshan Biswas 2025-01-15 08:50:17 +05:30
  • b4d92a59a2
    ci : add -no-cnv for tests (#11238) Xuan Son Nguyen 2025-01-14 15:42:23 +01:00
  • bbf3e55e35
    vocab : add dummy tokens for "no_vocab" type (#11231) Georgi Gerganov 2025-01-14 12:54:58 +02:00
  • c5bf0d1bd7
    server : Improve code snippets direction between RTL text (#11221) ebraminio 2025-01-14 14:09:33 +03:30
  • 091592d758
    Refactor test-chat-template.cpp (#11224) Olivier Chafik 2025-01-14 10:16:41 +00:00
  • 44d1e796d0
    sync : ggml Georgi Gerganov 2025-01-14 10:39:42 +02:00
  • a4f3f5d8e6
    scripts : sync gguf (cont) Georgi Gerganov 2025-01-14 09:40:15 +02:00
  • 48e1ae0e61
    scripts : sync gguf Georgi Gerganov 2025-01-14 09:36:58 +02:00
  • d00a80e89d
    scripts : sync opencl Georgi Gerganov 2025-01-14 09:19:58 +02:00
  • 504af20ee4
    server : (UI) Improve messages bubble shape in RTL (#11220) ebraminio 2025-01-13 22:53:31 +03:30
  • 84a44815f7
    cli : auto activate conversation mode if chat template is available (#11214) Xuan Son Nguyen 2025-01-13 20:18:12 +01:00
  • 39509fb082
    cuda : CUDA Graph Compute Function Refactor (precursor for performance improvements) (#11042) Andreas Kieslinger 2025-01-13 16:45:53 +01:00
  • a29f0870d4
    contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:59:26 +02:00
  • 437e05f714
    server : (UI) Support for RTL text as models input or output (#11208) ebraminio 2025-01-13 17:16:39 +03:30
  • ca001f6656
    contrib : add naming guidelines (cont) (#11177) Georgi Gerganov 2025-01-13 15:08:44 +02:00
  • 00b4c3da62
    common : support tag-based --hf-repo like on ollama (#11195) Xuan Son Nguyen 2025-01-13 13:56:23 +01:00
  • 7426a26b24
    contrib : add naming guidelines (#11177) Georgi Gerganov 2025-01-13 14:46:36 +02:00
  • 8f70fc3d1b
    llama : remove 'd' from bad special token log (#11212) Daniel Bevenius 2025-01-13 13:38:20 +01:00
  • 1244cdcf14
    ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL (#11211) Radoslav Gerganov 2025-01-13 13:31:41 +02:00
  • 924518e2e5
    Reset color before we exit (#11205) Eric Curtin 2025-01-12 18:23:10 +00:00
  • 9a483999a6
    llama : fix chat template gguf key (#11201) Xuan Son Nguyen 2025-01-12 13:45:14 +01:00
  • 08f10f69c3
    llama : remove notion of CLS token (#11064) Georgi Gerganov 2025-01-12 12:15:53 +02:00
  • afa8a9ec9b
    llama : add llama_vocab, functions -> methods, naming (#11110) Georgi Gerganov 2025-01-12 11:32:42 +02:00
  • c05e8c9934
    gguf-py: fixed local detection of gguf package (#11180) Vinesh Janarthanan 2025-01-11 03:42:31 -06:00
  • 2739a71e4b
    convert : sort print supported models [no ci] (#11179) Daniel Bevenius 2025-01-11 05:50:33 +01:00
  • ba8a1f9c5b
    examples : add README.md to tts example [no ci] (#11155) Daniel Bevenius 2025-01-10 13:16:16 +01:00
  • ff3fcabc72
    convert : add --print-supported-models option (#11172) Daniel Bevenius 2025-01-10 11:30:53 +01:00
  • c3f9d25706
    Vulkan: Fix float16 use on devices without float16 support + fix subgroup_size_control validation error (#11161) 0cc4m 2025-01-10 06:39:33 +01:00
  • ee7136c6d1
    llama: add support for QRWKV6 model architecture (#11001) Molly Sophia 2025-01-10 09:58:08 +08:00
  • c6860cc734
    SYCL: Refactor ggml_sycl_compute_forward (#11121) Akarshan Biswas 2025-01-10 05:43:03 +05:30
  • 1204f97270
    doc: add cuda guide for fedora (#11135) Tei Home 2025-01-09 19:32:06 +08:00
  • 8eceb888d7
    server : add tooltips to settings and themes btn (#11154) Daniel Bevenius 2025-01-09 11:28:29 +01:00
  • f8feb4b01a
    model: Add support for PhiMoE arch (#11003) Pierrick Hymbert 2025-01-09 11:21:41 +01:00
  • be0e950c91
    media : remove old img [no ci] Georgi Gerganov 2025-01-09 11:15:15 +02:00
  • d9feae1c06
    llama-chat : add phi 4 template (#11148) Xuan Son Nguyen 2025-01-09 10:07:33 +01:00
  • 8d59d91171
    fix: add missing msg in static_assert (#11143) hydai 2025-01-09 04:03:28 +08:00
  • 8a1d9c25fa
    gguf-py : move scripts directory (#11116) Vinesh Janarthanan 2025-01-08 12:54:58 -06:00
  • 1bf839b1e8
    Enhance user input handling for llama-run (#11138) Eric Curtin 2025-01-08 18:47:05 +00:00
  • f7cd13301c
    ci : use actions from ggml-org (#11140) Xuan Son Nguyen 2025-01-08 16:09:20 +01:00
  • 4d2b3d8804
    lora : improve compat with mergekit-extract-lora (#11131) Xuan Son Nguyen 2025-01-08 15:59:53 +01:00
  • c07d437bbd
    llama : avoid hardcoded QK_K (#11061) Georgi Gerganov 2025-01-08 16:19:36 +02:00
  • 99a3755a3c
    sync : ggml Georgi Gerganov 2025-01-08 13:40:30 +02:00
  • c792dcf488
    ggml : allow loading backend with env variable (ggml/1059) Radoslav Gerganov 2025-01-05 09:50:37 +02:00