llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

5107e8cea3

DRY: Fixes clone functionality (#10192) wwoodsTM 2024-11-07 08:20:25 -07:00
2319126a70

fix q4_0_8_8 format for corrupted tokens issue (#10198) snadampal 2024-11-07 02:02:08 -06:00
3bcd40b3c5

Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) Zhiyuan Li 2024-11-07 18:19:10 +11:00
5c333e0140

metal : add BF16 support (#8439) Georgi Gerganov 2024-11-06 19:53:51 +02:00
b11f9ba9b8

server : remove hack for extra parallel slot (#10187) Georgi Gerganov 2024-11-06 13:29:01 +02:00
94d8cb8be1

metal : fix from ptr buffer name (#10189) Diego Devesa 2024-11-06 12:10:07 +01:00
1dc04b2dee

ggml : adjust is_first_call init value (#10193) Georgi Gerganov 2024-11-06 11:20:10 +02:00
a1eaf6a960

metal : add quantized FA support (#10149) Georgi Gerganov 2024-11-06 10:24:23 +02:00
b8deef0ec0

llama : add <|tool_call|> formatting to Granite template (#10177) Gabe Goodhart 2024-11-05 05:23:04 -07:00
a9e8a9a030

ggml : fix arch check in bf16_to_fp32 (#10164) Diego Devesa 2024-11-04 23:17:01 +01:00
3407364776

Q6_K AVX improvements (#10118) Eve 2024-11-04 22:06:31 +00:00
d5a409e57f

ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
401558b7ba

ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
9e0ecfb697

server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
6a066b9978

fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
ea02c753eb

cuda : clear error after changing peer access (#10153) Diego Devesa 2024-11-04 13:10:23 +01:00
05697f670b

metal : simplify f16 and f32 dequant kernels (#0) Georgi Gerganov 2024-11-04 13:49:34 +02:00
f8e58135cf

metal : move dequantize templates to beginning of MSL source (#0) Georgi Gerganov 2024-11-04 13:43:32 +02:00
329ed914c9

CANN: adjust backend registry refactor. (#10158) leo-pony 2024-11-04 19:08:22 +08:00
ce027adfb3

sync : ggml Georgi Gerganov 2024-11-04 10:33:37 +02:00
284e5b0275

cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
e2292aaa17

metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
9f40989351

ggml : move CPU backend to a separate file (#10144) Diego Devesa 2024-11-03 19:34:08 +01:00
08828a6d7d

metal : minor fixup in FA kernel (#10143) Georgi Gerganov 2024-11-03 15:18:40 +02:00
1839f69130

flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
9830b6923b

Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
42cadc74bd

server : fix slot selection by lru (#10126) sasha0552 2024-11-02 16:34:56 +00:00
45950415ed

server : fix endpoint checks (#10135) Georgi Gerganov 2024-11-02 18:34:00 +02:00
1926d6e39d

llama : adjust default context size + print warnings (#10136) Georgi Gerganov 2024-11-02 15:18:56 +02:00
b634f8a26f

simple-chat : only add bos on first prompt (#10129) Diego Devesa 2024-11-02 13:08:53 +01:00
7554aa4655

convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
a6744e43e8

llama : add simple-chat example (#10124) Diego Devesa 2024-11-01 23:50:59 +01:00
e991e3127f

llama : use smart pointers for ggml resources (#10117) Diego Devesa 2024-11-01 23:48:26 +01:00
418f5eef26

vulkan : improve ggml_vk_create_buffer error handling (#9898) Shupei Fan 2024-11-02 02:33:14 +08:00
ba6f62eb79

readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
d865d1478c

server : fix smart selection of available slot (#10120) sasha0552 2024-11-01 13:33:14 +00:00
1804adb0cf

ggml : remove ggml_scratch (#10121) Georgi Gerganov 2024-11-01 12:58:45 +02:00
815fe72adc

sync : ggml Georgi Gerganov 2024-11-01 10:28:24 +02:00
f221d56220

ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
e597e50794

build: fix build error in Windows env with OneAPI setup (#10107) Zhenwei Jin 2024-11-01 11:09:59 +08:00
85679d37f3

llama : improve output buffer type selection (#10098) Diego Devesa 2024-11-01 00:49:53 +01:00
1e9f94994e

quantize : fix --keep-split (#10114) Diego Devesa 2024-11-01 00:45:34 +01:00
c02e5ab2a6

llama : fix buffer checks for mamba and rwk (#10111) Diego Devesa 2024-10-31 22:54:23 +01:00
ab3d71f97f

loader: refactor tensor weights storage (#9935) Zhenwei Jin 2024-11-01 02:50:39 +08:00
0a683e8088

server : include scheme when printing URL (#10106) Kevin Gibbons 2024-10-31 06:02:35 -07:00
dea5e86051

ggml : check tensor name lengths in gguf files (#10100) Diego Devesa 2024-10-31 11:40:59 +01:00
1329c0a75e

kompute: add mul_mat_q4_k shader (#10097) Sergio López 2024-10-31 10:09:52 +01:00
61408e7fad

kompute: add backend registry / device interfaces (#10045) Sergio López 2024-10-30 17:01:52 +01:00
b9e02e8184

ggml : fix memory leaks when loading invalid gguf files (#10094) Diego Devesa 2024-10-30 14:51:21 +01:00
6763f713bb

readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
79a2bc042d

convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
fc83a9e584

ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) xctan 2024-10-30 15:00:40 +08:00
c5b0f4b5d9

llama : refactor model loader with backend registry (#10026) Diego Devesa 2024-10-30 02:01:23 +01:00
8f275a7c45

ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) Changyeon Kim 2024-10-29 17:52:56 +09:00
8d8ff71536

llama : remove Tail-Free sampling (#10071) Georgi Gerganov 2024-10-29 10:42:05 +02:00
61715d5cc8

llama : Add IBM granite template (#10013) arch-btw 2024-10-28 10:45:33 -07:00
07028f9d74

flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
524afeec9d

musa: workaround for Guilty Lockup in cleaning src0 (#10042) R0CKSTAR 2024-10-28 17:02:48 +08:00
8125e6cbfc

server : don't overfill the batch during infill (#10018) Georgi Gerganov 2024-10-28 08:49:32 +02:00
8841ce3f43

llama : switch KQ multiplication to F32 precision by default (#10015) Georgi Gerganov 2024-10-27 20:59:58 +02:00
cc2983d375

sync : ggml Georgi Gerganov 2024-10-26 10:34:08 +03:00
8c60a8a462

increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
9e4a2563ea

scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
668750357e

metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
ff252ea48e

llama : add DRY sampler (#9702) wwoodsTM 2024-10-25 10:07:34 -06:00
d80fb71f8b

llama: string_split fix (#10022) Michael Podvitskiy 2024-10-25 17:57:54 +02:00
2f8bd2b901

llamafile : extend sgemm.cpp support for Q5_0 models (#10010) Srihari-mcw 2024-10-25 12:57:41 +05:30
bc5ba007b2

server : check that the prompt fits in the slot's context (#10030) Georgi Gerganov 2024-10-25 10:13:46 +03:00
958367bf53

server : refactor slot input data, move tokenizer to HTTP thread (#10023) Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
40f2555797

ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
167a515651

CUDA: fix insufficient buffer clearing for MMQ (#10032) Johannes Gäßler 2024-10-24 14:40:23 +02:00
c39665f589

CUDA: fix MMQ for non-contiguous src0, add tests (#10021) Johannes Gäßler 2024-10-24 11:09:36 +02:00
0a1c750c80

server : samplers accept the prompt correctly (#10019) wwoodsTM 2024-10-23 13:27:51 -06:00
190a37d797

sync : ggml Georgi Gerganov 2024-10-23 17:23:55 +03:00
2d3aba9ee8

llama.vim : bump generation time limit to 3s [no ci] Georgi Gerganov 2024-10-23 17:16:56 +03:00
80273a306d CUDA: fix 1D im2col, add tests (ggml/993) Johannes Gäßler 2024-10-18 09:24:44 +02:00
c19af0acb1 ggml : remove redundant set of contexts used field (ggml/978) Daniel Bevenius 2024-10-16 20:10:01 +02:00
ac113a0fee

llama.vim : add classic vim support (#9995) Michael Coppola 2024-10-23 07:09:26 -04:00
4c9388fb96

metal : add POOL2D and fix IM2COL (#9943) Jun Hee Yoo 2024-10-23 19:33:45 +09:00
873279b159 flake.lock: Update github-actions[bot] 2024-10-20 00:22:59 +00:00
c8c07d658a

llama : fix empty batch causing llama_batch_allocr to crash (#9966) Xuan Son Nguyen 2024-10-22 16:59:02 +02:00
19d900a756

llama : rename batch to ubatch (#9950) Daniel Bevenius 2024-10-22 15:31:06 +02:00
11d47057a5

Rwkv chat template fix (#10001) Molly Sophia 2024-10-22 21:22:26 +08:00
c421ac072d

lora : warn user if new token is added in the adapter (#9948) Xuan Son Nguyen 2024-10-22 13:08:41 +02:00
4ff7fe1fb3

llama : add chat template for RWKV-World + fix EOT (#9968) Molly Sophia 2024-10-22 18:33:37 +08:00
6b8447352d

[CANN] Adapt to dynamically loadable backends mechanism (#9970) leo-pony 2024-10-22 16:16:01 +08:00
674804a996

arg : fix typo in embeddings argument help [no ci] (#9994) Daniel Bevenius 2024-10-22 09:40:02 +02:00
e94a138d64

llama.vim : fix info text display [no ci] (#9787) Georgi Gerganov 2024-10-22 00:35:25 +03:00
e01c67affe

llama.vim : move info to the right of screen [no ci] (#9787) Georgi Gerganov 2024-10-21 22:52:22 +03:00
994cfb1acb

readme : update UI list (#9972) Asghar Ghorbani 2024-10-21 20:20:59 +02:00
94008cc760

arg : fix attention non-causal arg value hint (#9985) Daniel Bevenius 2024-10-21 20:12:52 +02:00
dbd5f2f573

llama.vim : plugin for Neovim (#9787) Georgi Gerganov 2024-10-21 20:25:02 +03:00
f594bc80ba

ggml : add asserts for type conversion in fattn kernels (#9971) Georgi Gerganov 2024-10-21 16:20:46 +03:00
d5ebd79c76

rpc : pack only RPC structs (#9959) Radoslav Gerganov 2024-10-21 13:35:40 +03:00
55e47786e3

llama : default sampling changes + greedy update (#9897) Georgi Gerganov 2024-10-21 09:46:40 +03:00
bc21975084

speculative : fix handling of some input params (#9963) Georgi Gerganov 2024-10-21 09:37:12 +03:00
1db8c84fc6

fix mul_mat_vec_q and *_vec_q error (#9939) Neo Zhang Jianyu 2024-10-21 14:26:09 +08:00
45f097645e

readme : update bindings list (#9951) Loïc Carrère 2024-10-20 18:25:41 +02:00
7cab2083c7

readme : update infra list (#9942) icppWorld 2024-10-20 12:01:34 -04:00
cda0e4b648

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) Xuan Son Nguyen 2024-10-18 23:18:01 +02:00