Commit graph

  • 80ccf5d725
    ci : pin dependency to specific version (#11137) Xuan Son Nguyen 2025-01-08 12:07:20 +01:00
  • a3c1232c3f
    arg : option to exclude arguments from specific examples (#11136) Georgi Gerganov 2025-01-08 12:55:36 +02:00
  • 8cef75c743
    llamafile : ppc64le MMA INT8 implementation (#10912) amritahs-ibm 2025-01-08 16:24:19 +05:30
  • 0d52a69e4b
    ci : fix cmake option (#11125) Georgi Gerganov 2025-01-08 11:29:34 +02:00
  • 02f0430141
    Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117) Mathieu Baudier 2025-01-08 09:18:13 +01:00
  • bec2183f2c
    fix: Vulkan shader gen binary path when Cross-compiling (#11096) ag2s20150909 2025-01-08 16:17:29 +08:00
  • 53ff6b9b9f
    GGUF: C++ refactor, backend support, misc fixes (#11030) Johannes Gäßler 2025-01-07 18:01:58 +01:00
  • 017cc5f446
    ggml-backend : only offload from host buffers (fix) (#11124) Diego Devesa 2025-01-07 16:11:57 +01:00
  • a3d50bc022
    ggml-backend : only offload from host buffers (#11120) Diego Devesa 2025-01-07 12:38:05 +01:00
  • a4dd490069
    rpc : code cleanup (#11107) Radoslav Gerganov 2025-01-07 08:37:02 +02:00
  • c0d6f790d0
    SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#11087) Akarshan Biswas 2025-01-07 11:56:07 +05:30
  • dc7cef9f37
    llama-run : fix context size (#11094) Eric Curtin 2025-01-06 22:45:28 +00:00
  • ecebbd292d
    llama : remove unused headers (#11109) Georgi Gerganov 2025-01-06 17:52:35 +02:00
  • 96be8c3264
    github : add cmd line field to bug report (#11090) Xuan Son Nguyen 2025-01-06 16:34:49 +01:00
  • e6e7c75d94
    server : fix extra BOS in infill endpoint (#11106) Georgi Gerganov 2025-01-06 15:36:08 +02:00
  • 09186fabbe
    llama : remove check flash_attn with lora (#11104) Xuan Son Nguyen 2025-01-06 13:41:12 +01:00
  • 96a1dc27c3
    llama : prevent system info string accumulation across calls (#11101) Asghar Ghorbani 2025-01-06 12:21:46 +01:00
  • 6369f867a4
    llama : rename missed batch params/vars to ubatch (#10059) Daniel Bevenius 2025-01-06 10:28:17 +01:00
  • 47182dd03f
    llama : update llama_model API names (#11063) Georgi Gerganov 2025-01-06 10:55:18 +02:00
  • 3e6e7a6bc2
    tokenize : escape the prompt (#11058) Georgi Gerganov 2025-01-06 10:54:25 +02:00
  • ae2f606bb5
    mmap : fix fileno macro clash (#11076) Georgi Gerganov 2025-01-06 10:52:38 +02:00
  • 727368c60f
    llama : use LLAMA_TOKEN_NULL (#11062) Georgi Gerganov 2025-01-06 10:52:15 +02:00
  • 5047dd3546
    llama : use _impl suffix instead of _internal (#11060) Georgi Gerganov 2025-01-06 10:52:01 +02:00
  • 46e3556e01
    CUDA: add BF16 support (#11093) Johannes Gäßler 2025-01-06 02:33:52 +01:00
  • b56f079e28
    Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver (#11074) 0cc4m 2025-01-04 21:09:59 +01:00
  • 9394bbd484
    llama : Add support for DeepSeek V3 (#11049) fairydreaming 2025-01-04 21:06:11 +01:00
  • f922a9c542
    [GGML][RPC] Support for models with non-512-aligned tensors over RPC. (#11047) matt23654 2025-01-04 16:10:30 +00:00
  • 46be942214
    llama : add support for the cohere2 model architecture (#10900) DAN™ 2025-01-04 09:33:31 -05:00
  • 78c6785175 sync : ggml Georgi Gerganov 2025-01-04 10:54:01 +02:00
  • 5e3b08d606 ggml : do not install metal source when embed library (ggml/1054) Georgi Gerganov 2025-01-04 10:53:54 +02:00
  • db68c93b57 ggml : improve inputs log sched_print_assignments (ggml/1053) Daniel Bevenius 2024-12-19 03:50:12 +01:00
  • c31fc8b966
    fix: Vulkan shader gen binary path (#11037) Gilad S. 2025-01-04 10:17:31 +02:00
  • 4b0c638b9a
    common : disable KV cache shifting automatically for unsupported models (#11053) Molly Sophia 2025-01-03 20:13:18 +08:00
  • e7da954ecc
    metal : avoid uint (#11019) Georgi Gerganov 2025-01-03 11:26:14 +02:00
  • f66f582927
    llama : refactor src/llama.cpp (#10902) Georgi Gerganov 2025-01-03 10:18:53 +02:00
  • 2f0ee84b9b
    server: bench: minor fixes (#10765) Pierrick Hymbert 2025-01-02 18:06:12 +01:00
  • 0da5d86026
    server : allow using LoRA adapters per-request (#10994) Xuan Son Nguyen 2025-01-02 15:05:18 +01:00
  • a45433ba20
    readme : add llama-swap to infrastructure section (#11032) Benson Wong 2025-01-01 23:14:54 -08:00
  • 0827b2c1da
    ggml : fixes for AVXVNNI instruction set with MSVC and Clang (#11027) Srihari-mcw 2024-12-31 19:53:33 +05:30
  • 45095a61bf
    server : clean up built-in template detection (#11026) Xuan Son Nguyen 2024-12-31 15:22:01 +01:00
  • 5896c65232
    server : add OAI compat for /v1/completions (#10974) Xuan Son Nguyen 2024-12-31 12:34:13 +01:00
  • bc7b1f8632
    convert : fix Llama-3_1-Nemotron-51B rope settings (#11008) ymcki 2024-12-31 19:04:48 +08:00
  • 6e1531aca5
    common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON (#11013) Peter 2024-12-31 11:46:06 +11:00
  • 716bd6dec3
    vulkan: optimize mul_mat for small values of N (#10991) Jeff Bolz 2024-12-30 11:27:11 -06:00
  • c250ecb315
    android : fix llama_batch free (#11014) ag2s20150909 2024-12-30 20:35:13 +08:00
  • a813badbbd
    vulkan: im2col and matmul optimizations for stable diffusion (#10942) Jeff Bolz 2024-12-29 03:16:34 -06:00
  • fdd2188912
    vulkan: Use push constant offset to handle misaligned descriptors (#10987) Jeff Bolz 2024-12-29 02:35:11 -06:00
  • f865ea149d
    server: added more docs for response_fields field (#10995) Isaac McFadyen 2024-12-28 10:09:19 -05:00
  • 16cdce7b68
    server : fix token duplication when streaming with stop strings (#10997) Alexey Parfenov 2024-12-28 15:08:54 +00:00
  • d79d8f39b4
    vulkan: multi-row k quants (#10846) Eve 2024-12-26 10:54:44 -05:00
  • d283d02bf2
    examples, ggml : fix GCC compiler warnings (#10983) Peter 2024-12-27 00:59:11 +11:00
  • 9ba399dfa7
    server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) Reza Kakhki 2024-12-24 21:33:04 +01:00
  • 2cd43f4900
    ggml : more perfo with llamafile tinyblas on x86_64 (#10714) Djip007 2024-12-24 18:54:49 +01:00
  • 09fe2e7613
    server: allow filtering llama server response fields (#10940) NeverLucky 2024-12-24 19:39:49 +03:00
  • 30caac3a68
    llama : the WPM vocabs use the CLS token as BOS (#10930) Georgi Gerganov 2024-12-24 09:44:20 +02:00
  • 60cfa728e2
    ggml : use wstring for backend search paths (#10960) Diego Devesa 2024-12-24 04:05:27 +01:00
  • 3327bb0f8d
    ggml : fix arm enabled features check (#10961) Diego Devesa 2024-12-24 04:05:17 +01:00
  • 32d6ee6385
    ggml : fix const usage in SSE path (#10962) Diego Devesa 2024-12-23 20:25:52 +01:00
  • 14b699ecde
    server : fix missing model id in /model endpoint (#10957) Xuan Son Nguyen 2024-12-23 12:52:25 +01:00
  • 485dc01214
    server : add system_fingerprint to chat/completion (#10917) Xuan Son Nguyen 2024-12-23 12:02:44 +01:00
  • 86bf31cfe6
    rpc-server : add support for the SYCL backend (#10934) Radoslav Gerganov 2024-12-23 10:39:30 +02:00
  • b92a14a841
    llama : support InfiniAI Megrez 3b (#10893) Yun Dou 2024-12-23 08:35:44 +08:00
  • 6f0c9e034b
    llama : support for Llama-3_1-Nemotron-51B (#10669) ymcki 2024-12-23 08:22:33 +08:00
  • dab76c92cc
    llama-run : include temperature option (#10899) Eric Curtin 2024-12-23 00:21:40 +00:00
  • 7024d59e6a
    ggml : fix run-time on FreeBSD in get_executable_path() (#10948) yuri@FreeBSD 2024-12-22 16:20:11 -08:00
  • 7c0e285858
    devops : add docker-multi-stage builds (#10832) Rudi Servo 2024-12-22 21:22:58 -01:00
  • 7ae33a616f
    llama : add Falcon3 support (#10883) Billel Mokeddem 2024-12-23 01:09:58 +03:00
  • ebdee9478c
    vulkan: build fixes for 32b (#10927) Jeff Bolz 2024-12-22 03:44:01 -06:00
  • 5cd85b5e00
    convert : add BertForMaskedLM (#10919) Georgi Gerganov 2024-12-21 10:10:18 +02:00
  • a91a41364b
    vulkan: optimize coopmat2 dequant functions (#10855) Jeff Bolz 2024-12-21 01:04:45 -06:00
  • e34c5af43f
    ggml-cpu: replace NEON asm with intrinsics in ggml_gemv_q4_0_4x8_q8_0() (#10874) Adrien Gallouët 2024-12-21 00:33:37 +01:00
  • eb5c3dc64b
    SYCL: Migrate away from deprecated ggml_tensor->backend (#10840) Akarshan Biswas 2024-12-20 21:01:28 +05:30
  • 0ca416c91a
    server : (UI) fix copy to clipboard function (#10916) Xuan Son Nguyen 2024-12-20 14:12:06 +01:00
  • 21ae3b9be8
    ggml : add test for SVE and disable when it fails (#10906) Diego Devesa 2024-12-20 13:31:28 +01:00
  • 0a11f8b7b5
    convert : fix RWKV v6 model conversion (#10913) Molly Sophia 2024-12-20 17:44:58 +08:00
  • d408bb9268
    clip : disable GPU support (#10896) Georgi Gerganov 2024-12-19 18:47:15 +02:00
  • 5cab3e4aaa
    llama : minor grammar refactor (#10897) Georgi Gerganov 2024-12-19 17:42:13 +02:00
  • 36319dec5d
    tts : small QoL for easy model fetch (#10903) Georgi Gerganov 2024-12-19 17:35:15 +02:00
  • 57bb2c40cd
    server : fix logprobs, make it OAI-compatible (#10783) Xuan Son Nguyen 2024-12-19 15:40:08 +01:00
  • a3c33b1dce
    ggml: fix arm build with gcc (#10895) Adrien Gallouët 2024-12-19 14:20:41 +01:00
  • 2fffc52b50
    llama : fix Roberta embeddings (#10856) Sukriti Sharma 2024-12-19 06:04:51 -07:00
  • 7585edbdeb
    convert : Add support for Microsoft Phi-4 model (#10817) fairydreaming 2024-12-19 10:37:12 +01:00
  • cd920d0ac3
    tests: disable GGUF test for bad value size (#10886) Johannes Gäßler 2024-12-19 08:53:58 +01:00
  • 7909e8588d
    llama-run : improve progress bar (#10821) Eric Curtin 2024-12-19 02:58:00 +00:00
  • 9177484f58
    ggml : fix arm build (#10890) Diego Devesa 2024-12-18 23:21:42 +01:00
  • 0bf2d10c55
    tts : add OuteTTS support (#10784) Georgi Gerganov 2024-12-18 19:27:21 +02:00
  • 7bbb5acf12
    server: avoid overwriting Authorization header (#10878) Gaetan Bisson 2024-12-18 04:00:07 -10:00
  • 152610eda9
    server : output embeddings for all tokens when pooling = none (#10861) Georgi Gerganov 2024-12-18 13:01:41 +02:00
  • 0e70ba686e
    server : add "tokens" output (#10853) Georgi Gerganov 2024-12-18 11:05:29 +02:00
  • 46828872c3
    server : (embeddings) using same format for "input" and "content" (#10872) Xuan Son Nguyen 2024-12-18 09:55:09 +01:00
  • 6b064c92b4
    docs: Fix HIP (née hipBLAS) in README (#10880) redbeard 2024-12-18 00:35:00 -08:00
  • 4da69d1abd
    Revert "llama : add Falcon3 support (#10864)" (#10876) Diego Devesa 2024-12-18 01:36:46 +01:00
  • d62b532c52
    Use model->gguf_kv for loading the template instead of using the C API. (#10868) DAN™ 2024-12-17 17:24:22 -05:00
  • 081b29bd2a
    tests: add tests for GGUF (#10830) Johannes Gäßler 2024-12-17 19:09:35 +01:00
  • 5437d4aaf5
    sync : ggml Georgi Gerganov 2024-12-17 18:36:02 +02:00
  • 78f766768d
    cmake : fix "amd64" processor string (whisper/2638) Georgi Gerganov 2024-12-17 18:34:32 +02:00
  • 8dd19a4812
    vulkan : fix soft_max.comp division by zero (whisper/2633) gn64 2024-12-16 19:34:38 +09:00
  • 130d0c90bd
    ggml : remove return from ggml_gallocr_allocate_node (ggml/1048) Daniel Bevenius 2024-12-14 03:23:08 +01:00
  • 3919da8e33
    ggml : add check for grad_accs (ggml/1046) Daniel Bevenius 2024-12-13 08:19:38 +01:00
  • 0006f5a74a
    ggml : update ggml_backend_cpu_device_supports_op (#10867) Georgi Gerganov 2024-12-17 18:35:42 +02:00