Commit graph

  • c837981bba
    py : add Phi-1.5/Phi-2 tokenizer (#9361) daminho 2024-09-12 20:28:20 +09:00
  • 3c26a1644d
    ci : bump actions/checkout to v4 (#9377) Trivikram Kamat 2024-09-12 04:27:45 -07:00
  • ff76e18516
    cmake : fixed the order of linking libraries for llama-quantize (#9450) Michael Podvitskiy 2024-09-12 13:27:14 +02:00
  • 39f852f440
    py : add special tokens in hf_converter for RWKV v6 (#9428) Molly Sophia 2024-09-12 19:25:16 +08:00
  • 2b00fa7997
    riscv : modify Makefile and add a RISCV_VECT to print log info (#9442) Ahmad Tameem 2024-09-12 16:24:31 +05:00
  • d6a04f872d
    ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408) Georgi Gerganov 2024-09-12 14:23:49 +03:00
  • c9c8575a1a
    enhance run script to be easy to change the parameters (#9448) Neo Zhang Jianyu 2024-09-12 17:44:17 +08:00
  • df4b7945ae
    cann: Fix error when running a non-exist op (#9424) Xinpeng Dou 2024-09-12 09:02:35 +08:00
  • 449ccfb6f5
    Add Jais to list of supported models (#9439) Faisal Zaghloul 2024-09-11 20:29:53 -04:00
  • 1b28061400
    llama : skip token bounds check when evaluating embeddings (#9437) slaren 2024-09-11 17:52:13 +02:00
  • 8db003a19d
    py : support converting local models (#7547) Pavel Zloi 2024-09-11 15:29:51 +03:00
  • 0996c5597f
    llava : correct args for minicpmv-cli (#9429) Xuan Son Nguyen 2024-09-11 12:59:13 +02:00
  • 5bb2c5dbd2
    files : remove accidentally added lora_test submodule (#9430) Xuan Son Nguyen 2024-09-11 12:02:09 +02:00
  • 67155ab7f5
    feat: Implements retrying logic for downloading models using --model-url flag (#9255) Farbod Bijary 2024-09-11 12:52:37 +03:30
  • 5af118efda
    CUDA: fix --split-mode row race condition (#9413) Johannes Gäßler 2024-09-11 10:22:40 +02:00
  • d2b496bff4
    batched-bench : remove unused code (#9305) Georgi Gerganov 2024-09-11 10:03:54 +03:00
  • b34e023480
    musa: remove Clang builtins mapping (#9421) R0CKSTAR 2024-09-11 09:46:55 +08:00
  • 51b6038636
    sycl : update support conditions (#9394) Alberto Cabrera Pérez 2024-09-11 01:53:42 +01:00
  • cb9c933eb2
    flake.lock: Update (#9360) Georgi Gerganov 2024-09-11 01:46:59 +03:00
  • 6cd4e03444
    arg : bring back missing ifdef (#9411) Xuan Son Nguyen 2024-09-10 22:41:29 +02:00
  • 8d300bd35f
    enable --special arg for llama-server (#9419) matteo 2024-09-10 22:40:59 +02:00
  • 49006c67b4
    llama : move random seed generation to the samplers (#9398) slaren 2024-09-10 18:04:25 +02:00
  • 00ba2ff781
    metal : fix compile warning with GGML_METAL_NDEBUG (#0) Georgi Gerganov 2024-09-10 10:17:03 +03:00
  • 83008b7cfe
    llama : update llm_build_copy_mask_state comment [no ci] (#9385) Daniel Bevenius 2024-09-10 09:03:21 +02:00
  • 0b4ac75772
    RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387) Molly Sophia 2024-09-10 15:02:30 +08:00
  • fb3f249815
    make : do not run llama-gen-docs when building (#9399) slaren 2024-09-10 08:23:33 +02:00
  • bfe76d4a17
    common : move arg parser code to arg.cpp (#9388) Xuan Son Nguyen 2024-09-09 23:36:09 +02:00
  • 293bebe077
    rpc : fix segfault with nkvo (#9389) Radoslav Gerganov 2024-09-09 18:40:10 +03:00
  • 5fac4d5764
    ggml : vector length agnostic SVE support (#9290) Prashant Vithule 2024-09-09 21:07:18 +05:30
  • 5fb5e24811
    llama : minor sampling refactor (2) (#9386) slaren 2024-09-09 17:10:46 +02:00
  • 38ca6f644b
    readme : update hot topics Georgi Gerganov 2024-09-09 15:51:37 +03:00
  • 8e6e2fbe14
    CUDA: fix variable name conflict for Windows build (#9382) Johannes Gäßler 2024-09-09 14:22:53 +02:00
  • 5ed087573e
    readme : add LLMUnity to UI projects (#9381) Antonis Makropoulos 2024-09-09 14:21:38 +03:00
  • 54f376d0b9
    rpc : update README [no ci] (#9320) Radoslav Gerganov 2024-09-09 11:04:39 +03:00
  • b2e89a3274
    Arm AArch64: Documentation updates (#9321) Dan Johansson 2024-09-09 09:02:45 +02:00
  • daa9623ab0
    Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118) Markus Tavenrath 2024-09-08 21:43:48 +02:00
  • e079bffb66
    cuda : fix FA Q src index (1 -> 0) (#9374) Georgi Gerganov 2024-09-08 22:01:02 +03:00
  • 3f7ccfd649
    common : bring back missing args, add env var duplication check (#9375) Xuan Son Nguyen 2024-09-08 18:08:55 +02:00
  • a249843d89
    common : restore --n-gpu-layers (#9371) slaren 2024-09-08 16:44:42 +02:00
  • 19f4a7b296
    llama : refactor samplers internal implementation (#9370) slaren 2024-09-08 15:52:07 +02:00
  • 2a358fb0c4
    [SYCL] add check malloc result on device (#9346) Neo Zhang Jianyu 2024-09-08 19:05:29 +08:00
  • eae597182c
    llama : sanitize tokens in the upper bound (#9359) slaren 2024-09-08 12:41:51 +02:00
  • 00b02bb249
    imatrix : fix arg parser for imatrix (#9366) Xuan Son Nguyen 2024-09-08 12:12:17 +02:00
  • a876861455 metal : update support condition for im2col + fix warning (#0) Georgi Gerganov 2024-09-08 09:57:57 +03:00
  • 385decbd63 sync : ggml Georgi Gerganov 2024-09-08 09:38:56 +03:00
  • 60a3107ccd scripts : option to increase git patch context Georgi Gerganov 2024-09-08 09:38:42 +03:00
  • 406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947) Salvatore Mesoraca 2024-09-06 14:34:25 +02:00
  • 9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946) Salvatore Mesoraca 2024-09-06 14:34:07 +02:00
  • 202084d31d tests: add gradient tests for all backends (ggml/932) Johannes Gäßler 2024-09-03 17:21:46 +02:00
  • dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) Johannes Gäßler 2024-08-31 14:35:42 +02:00
  • ba1cf846ed cann : fix doxy (ggml/0) Georgi Gerganov 2024-08-28 18:45:01 +03:00
  • d2d3200b38 cann : add Ascend NPU support (whisper/2336) Mengqing Cao 2024-08-09 20:21:56 +08:00
  • 51d964a4ef cuda : mark BF16 CONT as unsupported Georgi Gerganov 2024-08-28 17:08:03 +03:00
  • efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) Salvatore Mesoraca 2024-08-28 10:23:02 +02:00
  • fbb7fcffbc
    llama : set attrs of mislabelled EOT/EOM tokens (#9348) Kevin Gibbons 2024-09-07 22:51:00 -07:00
  • a5b5d9a101
    llama.android : fix build (#9350) Georgi Gerganov 2024-09-08 00:33:50 +03:00
  • f12295b8a9
    llama : fix empty ring buffer push (#9358) Georgi Gerganov 2024-09-08 00:33:33 +03:00
  • faf69d4237
    llama : sanitize invalid tokens (#9357) Georgi Gerganov 2024-09-08 00:33:13 +03:00
  • e536426ded
    llamafile : disable sgemm for batch-size 1 (#9330) Eve 2024-09-07 19:02:26 +00:00
  • 1b9ae5189c
    common : refactor arg parser (#9308) Xuan Son Nguyen 2024-09-07 20:43:51 +02:00
  • e32d0816ed
    ggml : always check bounds on get_rows operations (#9354) slaren 2024-09-07 20:23:07 +02:00
  • df270ef745
    llama : refactor sampling v2 (#9294) Georgi Gerganov 2024-09-07 15:16:19 +03:00
  • 947538acb8
    ggml : fix missing cpu_set_t on emscripten (#9336) Xuan Son Nguyen 2024-09-07 12:01:34 +02:00
  • 6c89eb0b47
    ci : disable rocm image creation (#9340) slaren 2024-09-07 09:48:54 +02:00
  • 9b2c24c099
    server : simplify state machine for slot (#9283) Xuan Son Nguyen 2024-09-06 23:21:29 +02:00
  • 134bc38ecf
    llama-bench : log benchmark progress (#9287) Aarni Koskela 2024-09-07 00:03:01 +03:00
  • 815b1fb20a
    batched-bench : add --output-format jsonl option (#9293) Aarni Koskela 2024-09-06 18:59:58 +03:00
  • 409dc4f8bb
    ggml : fix build break for the vulkan-debug (#9265) Changyeon Kim 2024-09-06 21:54:50 +09:00
  • 4a1411b4f1
    server : fix missing lock (#9334) Xuan Son Nguyen 2024-09-06 14:06:04 +02:00
  • 8ebe8ddebd
    Improve Vulkan shader build system (#9239) Markus Tavenrath 2024-09-06 08:56:17 +02:00
  • 9bc6db28d0
    ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) compilade 2024-09-05 21:48:47 -04:00
  • 32b2ec88bc
    Update build.yml (#9184) awatuna 2024-09-06 06:34:36 +08:00
  • 1031771faa
    CMake fix: host for msvc compiler can only be x86 or x64 (#8624) Michael Podvitskiy 2024-09-06 00:14:12 +02:00
  • 4db04784f9
    cuda : fix defrag with quantized KV (#9319) slaren 2024-09-05 11:13:11 +02:00
  • bdf314f38a
    llama-bench : fix NUL terminators in CPU name (#9313) slaren 2024-09-05 02:19:39 +02:00
  • 581c305186
    ggml : AVX2 support for Q4_0_8_8 (#8713) Srihari-mcw 2024-09-04 22:21:22 +05:30
  • 5910ea9427
    [SYCL] Fix DMMV dequantization (#9279) Ouadie EL FAROUKI 2024-09-04 16:26:33 +01:00
  • c8671ae282
    Fix broken links in docker.md (#9306) 杨朱 · Kiki 2024-09-04 19:45:28 +08:00
  • 82e3b03c11
    rpc : make RPC servers come first in the device list (#9296) Radoslav Gerganov 2024-09-04 11:08:32 +03:00
  • 9379d3cc17
    readme : rename result_format to response_format (#9300) Pascal Patry 2024-09-04 02:45:40 -04:00
  • 7605ae7daf
    flake.lock: Update (#9261) Georgi Gerganov 2024-09-04 02:36:43 +03:00
  • 8962422b1c
    llama-bench : add JSONL (NDJSON) output mode (#9288) Aarni Koskela 2024-09-03 20:58:54 +03:00
  • b69a480af4
    readme : refactor API section + remove old hot topics Georgi Gerganov 2024-09-03 10:00:36 +03:00
  • 48baa61ecc
    server : test script : add timeout for all requests (#9282) Xuan Son Nguyen 2024-09-02 22:08:38 +02:00
  • f1485161e5
    src: make tail invalid when kv cell is intersection for mamba (#9249) Zhenwei Jin 2024-09-03 01:53:23 +08:00
  • 048de848ee
    docker : fix missing binaries in full-cuda image (#9278) slaren 2024-09-02 18:11:13 +02:00
  • f771d064a9
    ggml : add pthread includes on FreeBSD (#9258) yuri@FreeBSD 2024-09-02 08:25:30 -07:00
  • 6e7d133a5f
    server : refactor multitask handling (#9274) Xuan Son Nguyen 2024-09-02 17:11:51 +02:00
  • b60074f1c2
    llama-cli : remove duplicated log message (#9275) Guoliang Hua 2024-09-02 20:36:43 +08:00
  • 9c1ba55733
    build(nix): Package gguf-py (#5664) Tushar 2024-09-02 16:51:01 +05:30
  • c6d4cb4655
    llama : minor style Georgi Gerganov 2024-09-02 11:52:04 +03:00
  • 8f1d81a0b6
    llama : support RWKV v6 models (#8980) Molly Sophia 2024-09-01 22:38:17 +08:00
  • a47667cff4 nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook Echo Nolan 2024-08-22 17:19:14 -04:00
  • ea5d7478b1
    sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908) Srihari-mcw 2024-08-31 13:50:35 +05:30
  • 49271efbaf
    llama : fix typo in xcda_array_view comment [no ci] (#9132) Daniel Bevenius 2024-08-31 09:50:22 +02:00
  • 0ab30f8d82
    llama : fix llama_split_mode enum values in main_gpu document (#9057) Sutou Kouhei 2024-08-31 03:08:10 +09:00
  • cddae4884c
    Correct typo run_llama2.sh > run-llama2.sh (#9149) 蕭澧邦 2024-08-30 20:10:01 +08:00
  • 7ea8d80d53
    llava : the function "clip" should be int (#9237) tc-mb 2024-08-30 13:21:57 +08:00
  • 42c76d1358
    Threadpool: take 2 (#8672) Faisal Zaghloul 2024-08-29 19:20:53 -04:00
  • 9f7d4bcf5c server : fix crash when error handler dumps invalid utf-8 json (#9195) Jan Boon 2024-08-27 18:28:06 +08:00