llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

c837981bba

py : add Phi-1.5/Phi-2 tokenizer (#9361) daminho 2024-09-12 20:28:20 +09:00
3c26a1644d

ci : bump actions/checkout to v4 (#9377) Trivikram Kamat 2024-09-12 04:27:45 -07:00
ff76e18516

cmake : fixed the order of linking libraries for llama-quantize (#9450) Michael Podvitskiy 2024-09-12 13:27:14 +02:00
39f852f440

py : add special tokens in hf_converter for RWKV v6 (#9428) Molly Sophia 2024-09-12 19:25:16 +08:00
2b00fa7997

riscv : modify Makefile and add a RISCV_VECT to print log info (#9442) Ahmad Tameem 2024-09-12 16:24:31 +05:00
d6a04f872d

ggml : hide ggml_object, ggml_cgraph, ggml_hash_set (#9408) Georgi Gerganov 2024-09-12 14:23:49 +03:00
c9c8575a1a

enhance run script to be easy to change the parameters (#9448) Neo Zhang Jianyu 2024-09-12 17:44:17 +08:00
df4b7945ae

cann: Fix error when running a non-exist op (#9424) Xinpeng Dou 2024-09-12 09:02:35 +08:00
449ccfb6f5

Add Jais to list of supported models (#9439) Faisal Zaghloul 2024-09-11 20:29:53 -04:00
1b28061400

llama : skip token bounds check when evaluating embeddings (#9437) slaren 2024-09-11 17:52:13 +02:00
8db003a19d

py : support converting local models (#7547) Pavel Zloi 2024-09-11 15:29:51 +03:00
0996c5597f

llava : correct args for minicpmv-cli (#9429) Xuan Son Nguyen 2024-09-11 12:59:13 +02:00
5bb2c5dbd2

files : remove accidentally added lora_test submodule (#9430) Xuan Son Nguyen 2024-09-11 12:02:09 +02:00
67155ab7f5

feat: Implements retrying logic for downloading models using --model-url flag (#9255) Farbod Bijary 2024-09-11 12:52:37 +03:30
5af118efda

CUDA: fix --split-mode row race condition (#9413) Johannes Gäßler 2024-09-11 10:22:40 +02:00
d2b496bff4

batched-bench : remove unused code (#9305) Georgi Gerganov 2024-09-11 10:03:54 +03:00
b34e023480

musa: remove Clang builtins mapping (#9421) R0CKSTAR 2024-09-11 09:46:55 +08:00
51b6038636

sycl : update support conditions (#9394) Alberto Cabrera Pérez 2024-09-11 01:53:42 +01:00
cb9c933eb2

flake.lock: Update (#9360) Georgi Gerganov 2024-09-11 01:46:59 +03:00
6cd4e03444

arg : bring back missing ifdef (#9411) Xuan Son Nguyen 2024-09-10 22:41:29 +02:00
8d300bd35f

enable --special arg for llama-server (#9419) matteo 2024-09-10 22:40:59 +02:00
49006c67b4

llama : move random seed generation to the samplers (#9398) slaren 2024-09-10 18:04:25 +02:00
00ba2ff781

metal : fix compile warning with GGML_METAL_NDEBUG (#0) Georgi Gerganov 2024-09-10 10:17:03 +03:00
83008b7cfe

llama : update llm_build_copy_mask_state comment [no ci] (#9385) Daniel Bevenius 2024-09-10 09:03:21 +02:00
0b4ac75772

RWKV v6: Add time_mix_decay_w1/w2 in quant exclusion list (#9387) Molly Sophia 2024-09-10 15:02:30 +08:00
fb3f249815

make : do not run llama-gen-docs when building (#9399) slaren 2024-09-10 08:23:33 +02:00
bfe76d4a17

common : move arg parser code to arg.cpp (#9388) Xuan Son Nguyen 2024-09-09 23:36:09 +02:00
293bebe077

rpc : fix segfault with nkvo (#9389) Radoslav Gerganov 2024-09-09 18:40:10 +03:00
5fac4d5764

ggml : vector length agnostic SVE support (#9290) Prashant Vithule 2024-09-09 21:07:18 +05:30
5fb5e24811

llama : minor sampling refactor (2) (#9386) slaren 2024-09-09 17:10:46 +02:00
38ca6f644b

readme : update hot topics Georgi Gerganov 2024-09-09 15:51:37 +03:00
8e6e2fbe14

CUDA: fix variable name conflict for Windows build (#9382) Johannes Gäßler 2024-09-09 14:22:53 +02:00
5ed087573e

readme : add LLMUnity to UI projects (#9381) Antonis Makropoulos 2024-09-09 14:21:38 +03:00
54f376d0b9

rpc : update README [no ci] (#9320) Radoslav Gerganov 2024-09-09 11:04:39 +03:00
b2e89a3274

Arm AArch64: Documentation updates (#9321) Dan Johansson 2024-09-09 09:02:45 +02:00
daa9623ab0

Overlap cmdbuffer creation and cmdbuffer execution in Vulkan backend by submitting smaller cmdbuffers early. (#9118) Markus Tavenrath 2024-09-08 21:43:48 +02:00
e079bffb66

cuda : fix FA Q src index (1 -> 0) (#9374) Georgi Gerganov 2024-09-08 22:01:02 +03:00
3f7ccfd649

common : bring back missing args, add env var duplication check (#9375) Xuan Son Nguyen 2024-09-08 18:08:55 +02:00
a249843d89

common : restore --n-gpu-layers (#9371) slaren 2024-09-08 16:44:42 +02:00
19f4a7b296

llama : refactor samplers internal implementation (#9370) slaren 2024-09-08 15:52:07 +02:00
2a358fb0c4

[SYCL] add check malloc result on device (#9346) Neo Zhang Jianyu 2024-09-08 19:05:29 +08:00
eae597182c

llama : sanitize tokens in the upper bound (#9359) slaren 2024-09-08 12:41:51 +02:00
00b02bb249

imatrix : fix arg parser for imatrix (#9366) Xuan Son Nguyen 2024-09-08 12:12:17 +02:00
a876861455 metal : update support condition for im2col + fix warning (#0) Georgi Gerganov 2024-09-08 09:57:57 +03:00
385decbd63 sync : ggml Georgi Gerganov 2024-09-08 09:38:56 +03:00
60a3107ccd scripts : option to increase git patch context Georgi Gerganov 2024-09-08 09:38:42 +03:00
406c1a32a1 vulkan: add dryrun support to sin and cos ops (ggml/947) Salvatore Mesoraca 2024-09-06 14:34:25 +02:00
9cb9260861 vulkan: correctly report support for OP_CONT (ggml/946) Salvatore Mesoraca 2024-09-06 14:34:07 +02:00
202084d31d tests: add gradient tests for all backends (ggml/932) Johannes Gäßler 2024-09-03 17:21:46 +02:00
dbbebcab33 ggml: fix ggml_graph_cpy undefined behavior (ggml/943) Johannes Gäßler 2024-08-31 14:35:42 +02:00
ba1cf846ed cann : fix doxy (ggml/0) Georgi Gerganov 2024-08-28 18:45:01 +03:00
d2d3200b38 cann : add Ascend NPU support (whisper/2336) Mengqing Cao 2024-08-09 20:21:56 +08:00
51d964a4ef cuda : mark BF16 CONT as unsupported Georgi Gerganov 2024-08-28 17:08:03 +03:00
efe6a83e30 ggml : fix cont with transposed tensors when one dimension is 1 (ggml/934) Salvatore Mesoraca 2024-08-28 10:23:02 +02:00
fbb7fcffbc

llama : set attrs of mislabelled EOT/EOM tokens (#9348) Kevin Gibbons 2024-09-07 22:51:00 -07:00
a5b5d9a101

llama.android : fix build (#9350) Georgi Gerganov 2024-09-08 00:33:50 +03:00
f12295b8a9

llama : fix empty ring buffer push (#9358) Georgi Gerganov 2024-09-08 00:33:33 +03:00
faf69d4237

llama : sanitize invalid tokens (#9357) Georgi Gerganov 2024-09-08 00:33:13 +03:00
e536426ded

llamafile : disable sgemm for batch-size 1 (#9330) Eve 2024-09-07 19:02:26 +00:00
1b9ae5189c

common : refactor arg parser (#9308) Xuan Son Nguyen 2024-09-07 20:43:51 +02:00
e32d0816ed

ggml : always check bounds on get_rows operations (#9354) slaren 2024-09-07 20:23:07 +02:00
df270ef745

llama : refactor sampling v2 (#9294) Georgi Gerganov 2024-09-07 15:16:19 +03:00
947538acb8

ggml : fix missing cpu_set_t on emscripten (#9336) Xuan Son Nguyen 2024-09-07 12:01:34 +02:00
6c89eb0b47

ci : disable rocm image creation (#9340) slaren 2024-09-07 09:48:54 +02:00
9b2c24c099

server : simplify state machine for slot (#9283) Xuan Son Nguyen 2024-09-06 23:21:29 +02:00
134bc38ecf

llama-bench : log benchmark progress (#9287) Aarni Koskela 2024-09-07 00:03:01 +03:00
815b1fb20a

batched-bench : add --output-format jsonl option (#9293) Aarni Koskela 2024-09-06 18:59:58 +03:00
409dc4f8bb

ggml : fix build break for the vulkan-debug (#9265) Changyeon Kim 2024-09-06 21:54:50 +09:00
4a1411b4f1

server : fix missing lock (#9334) Xuan Son Nguyen 2024-09-06 14:06:04 +02:00
8ebe8ddebd

Improve Vulkan shader build system (#9239) Markus Tavenrath 2024-09-06 08:56:17 +02:00
9bc6db28d0

ggml-quants : ternary packing for TriLMs and BitNet b1.58 (#8151) compilade 2024-09-05 21:48:47 -04:00
32b2ec88bc

Update build.yml (#9184) awatuna 2024-09-06 06:34:36 +08:00
1031771faa

CMake fix: host for msvc compiler can only be x86 or x64 (#8624) Michael Podvitskiy 2024-09-06 00:14:12 +02:00
4db04784f9

cuda : fix defrag with quantized KV (#9319) slaren 2024-09-05 11:13:11 +02:00
bdf314f38a

llama-bench : fix NUL terminators in CPU name (#9313) slaren 2024-09-05 02:19:39 +02:00
581c305186

ggml : AVX2 support for Q4_0_8_8 (#8713) Srihari-mcw 2024-09-04 22:21:22 +05:30
5910ea9427

[SYCL] Fix DMMV dequantization (#9279) Ouadie EL FAROUKI 2024-09-04 16:26:33 +01:00
c8671ae282

Fix broken links in docker.md (#9306) 杨朱 · Kiki 2024-09-04 19:45:28 +08:00
82e3b03c11

rpc : make RPC servers come first in the device list (#9296) Radoslav Gerganov 2024-09-04 11:08:32 +03:00
9379d3cc17

readme : rename result_format to response_format (#9300) Pascal Patry 2024-09-04 02:45:40 -04:00
7605ae7daf

flake.lock: Update (#9261) Georgi Gerganov 2024-09-04 02:36:43 +03:00
8962422b1c

llama-bench : add JSONL (NDJSON) output mode (#9288) Aarni Koskela 2024-09-03 20:58:54 +03:00
b69a480af4

readme : refactor API section + remove old hot topics Georgi Gerganov 2024-09-03 10:00:36 +03:00
48baa61ecc

server : test script : add timeout for all requests (#9282) Xuan Son Nguyen 2024-09-02 22:08:38 +02:00
f1485161e5

src: make tail invalid when kv cell is intersection for mamba (#9249) Zhenwei Jin 2024-09-03 01:53:23 +08:00
048de848ee

docker : fix missing binaries in full-cuda image (#9278) slaren 2024-09-02 18:11:13 +02:00
f771d064a9

ggml : add pthread includes on FreeBSD (#9258) yuri@FreeBSD 2024-09-02 08:25:30 -07:00
6e7d133a5f

server : refactor multitask handling (#9274) Xuan Son Nguyen 2024-09-02 17:11:51 +02:00
b60074f1c2

llama-cli : remove duplicated log message (#9275) Guoliang Hua 2024-09-02 20:36:43 +08:00
9c1ba55733

build(nix): Package gguf-py (#5664) Tushar 2024-09-02 16:51:01 +05:30
c6d4cb4655

llama : minor style Georgi Gerganov 2024-09-02 11:52:04 +03:00
8f1d81a0b6

llama : support RWKV v6 models (#8980) Molly Sophia 2024-09-01 22:38:17 +08:00
a47667cff4 nix: fix CUDA build - replace deprecated autoAddOpenGLRunpathHook Echo Nolan 2024-08-22 17:19:14 -04:00
ea5d7478b1

sgemm : improved Q4_0 and Q8_0 performance via 4xN and Mx4 gemm (#8908) Srihari-mcw 2024-08-31 13:50:35 +05:30
49271efbaf

llama : fix typo in xcda_array_view comment [no ci] (#9132) Daniel Bevenius 2024-08-31 09:50:22 +02:00
0ab30f8d82

llama : fix llama_split_mode enum values in main_gpu document (#9057) Sutou Kouhei 2024-08-31 03:08:10 +09:00
cddae4884c

Correct typo run_llama2.sh > run-llama2.sh (#9149) 蕭澧邦 2024-08-30 20:10:01 +08:00
7ea8d80d53

llava : the function "clip" should be int (#9237) tc-mb 2024-08-30 13:21:57 +08:00
42c76d1358

Threadpool: take 2 (#8672) Faisal Zaghloul 2024-08-29 19:20:53 -04:00
9f7d4bcf5c server : fix crash when error handler dumps invalid utf-8 json (#9195) Jan Boon 2024-08-27 18:28:06 +08:00