llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

1d1ccce676

flake.lock: Update (#9162) Georgi Gerganov 2024-08-29 07:28:14 +03:00
9fe94ccac9

docker : build images only once (#9225) slaren 2024-08-28 17:28:00 +02:00
66b039a501

docker : update CUDA images (#9213) slaren 2024-08-28 13:20:36 +02:00
20f1789dfb vulkan : fix build (#0) Georgi Gerganov 2024-08-27 22:10:58 +03:00
231cff5f6f sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
3246fe84d7

Fix minicpm example directory (#9111) Xie Yanbo 2024-08-27 20:33:08 +08:00
78eb487bb0

llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156) compilade 2024-08-27 06:09:23 -04:00
a77feb5d71

server : add some missing env variables (#9116) Xuan Son Nguyen 2024-08-27 11:07:01 +02:00
2e59d61c1b

llama : fix ChatGLM4 wrong shape (#9194) CausalLM 2024-08-27 14:58:22 +08:00
75e1dbbaab

llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141) Carsten Kragelund Jørgensen 2024-08-27 08:53:40 +02:00
ad76569f8e

common : Update stb_image.h to latest version (#9161) arch-btw 2024-08-26 22:58:50 -07:00
7d787ed96c

ggml : do not crash when quantizing q4_x_x with an imatrix (#9192) slaren 2024-08-26 19:44:43 +02:00
06658ad7c3

metal : separate scale and mask from QKT in FA kernel (#9189) Georgi Gerganov 2024-08-26 18:31:02 +03:00
fc18425b6a

ggml : add SSM Metal kernels (#8546) Georgi Gerganov 2024-08-26 17:55:36 +03:00
879275ac98

tests : fix compile warnings for unreachable code (#9185) Georgi Gerganov 2024-08-26 16:30:25 +03:00
7a3df798fc

ci : add VULKAN support to ggml-ci (#9055) Georgi Gerganov 2024-08-26 12:19:39 +03:00
e5edb210cd

server : update deps (#9183) Georgi Gerganov 2024-08-26 12:16:57 +03:00
0c41e03ceb

metal : gemma2 flash attention support (#9159) slaren 2024-08-26 11:08:59 +02:00
f12ceaca0c

ggml-ci : try to improve build time (#9160) slaren 2024-08-26 11:03:30 +02:00
436787f170

llama : fix time complexity of string replacement (#9163) Justine Tunney 2024-08-25 23:09:53 -07:00
93bc3839f9

common: fixed not working find argument --n-gpu-layers-draft (#9175) Herman Semenov 2024-08-25 22:54:37 +00:00
f91fc5639b

CUDA: fix Gemma 2 numerical issues for FA (#9166) Johannes Gäßler 2024-08-25 22:11:48 +02:00
e11bd856d5

CPU/CUDA: Gemma 2 FlashAttention support (#8542) Johannes Gäßler 2024-08-24 21:34:59 +02:00
8f824ffe8e

quantize : fix typo in usage help of quantize.cpp (#9145) João Dinis Ferreira 2024-08-24 07:22:45 +01:00
3ba780e2a8

lora : fix llama conversion script with ROPE_FREQS (#9117) Xuan Son Nguyen 2024-08-23 12:58:53 +02:00
a07c32ea54

llama : use F32 precision in GLM4 attention and no FA (#9130) piDack 2024-08-23 15:27:17 +08:00
11b84eb457

[SYCL] Add a space to supress a cmake warning (#9133) Akarshan Biswas 2024-08-22 19:39:47 +05:30
1731d4238f

[SYCL] Add oneDNN primitive support (#9091) luoyu-intel 2024-08-22 12:50:10 +08:00
a1631e53f6

llama : simplify Mamba with advanced batch splits (#8526) compilade 2024-08-21 17:58:11 -04:00
fc54ef0d1c

server : support reading arguments from environment variables (#9105) Xuan Son Nguyen 2024-08-21 11:04:34 +02:00
b40eb84895

llama : support for falcon-mamba architecture (#9074) Younes Belkada 2024-08-21 12:06:36 +04:00
f63f603c87

llava : zero-initialize clip_ctx structure fields with aggregate initialization 908) fairydreaming 2024-08-21 09:45:49 +02:00
8455340b87

llama : std::move llm_bigram_bpe from work_queue (#9062) Daniel Bevenius 2024-08-21 09:32:58 +02:00
2f3c1466ff

llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984) Changyeon Kim 2024-08-21 04:00:00 +09:00
50addec9a5

[SYCL] fallback mmvq (#9088) Meng, Hengyu 2024-08-20 23:50:17 +08:00
4f8d19ff17

[SYCL] Fix SYCL im2col and convert Overflow with Large Dims (#9052) zhentaoyu 2024-08-20 23:06:51 +08:00
90db8146d5

tests : add missing comma in grammar integration tests (#9099) fairydreaming 2024-08-20 11:09:55 +02:00
cfac111e2b

cann: add doc for cann backend (#8867) wangshuai09 2024-08-19 16:46:38 +08:00
1b6ff90ff8

rpc : print error message when failed to connect endpoint (#9042) Radoslav Gerganov 2024-08-19 10:11:45 +03:00
18eaf29f4c

rpc : prevent crashes on invalid input (#9040) Radoslav Gerganov 2024-08-19 10:10:21 +03:00
554b049068

flake.lock: Update (#9068) Georgi Gerganov 2024-08-18 17:43:32 +03:00
2339a0be1c

tests : add integration test for lora adapters (#8957) ltoniazzi 2024-08-18 10:58:04 +01:00
2fb9267887

Fix incorrect use of ctx_split for bias tensors (#9063) Yoshi Suhara 2024-08-17 06:34:21 -07:00
8b3befc0e2

server : refactor middleware and /health endpoint (#9056) Xuan Son Nguyen 2024-08-16 17:19:05 +02:00
d565bb2fd5

llava : support MiniCPM-V-2.6 (#8967) tc-mb 2024-08-16 21:34:41 +08:00
ee2984bdaf

py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928) Farbod Bijary 2024-08-16 14:06:30 +03:30
c8ddce8560

Fix inference example lacks required parameters (#9035) Aisuko 2024-08-16 19:08:59 +10:00
23fd453544

gguf-py : bump version from 0.9.1 to 0.10.0 (#9051) compilade 2024-08-16 02:36:11 -04:00
c679e0cb5c

llama : add EXAONE model support (#9025) Minsoo Cheong 2024-08-16 15:35:18 +09:00
fb487bb567

common : add support for cpu_get_num_physical_cores() on Windows (#8771) Liu Jia 2024-08-16 14:23:12 +08:00
2a24c8caa6

Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922) Yoshi Suhara 2024-08-15 19:23:33 -07:00
e3f6fd56b1

ggml : dynamic ggml_sched_max_splits based on graph_size (#9047) Nico Bosshard 2024-08-16 04:22:55 +02:00
4b9afbbe90

retrieval : fix memory leak in retrieval query handling (#8955) gtygo 2024-08-15 15:40:12 +08:00
37501d9c79

server : fix duplicated n_predict key in the generation_settings (#8994) Riceball LEE 2024-08-15 15:28:05 +08:00
4af8420afb

common : remove duplicate function llama_should_add_bos_token (#8778) Zhenwei Jin 2024-08-15 15:23:23 +08:00
6bda7ce6c3

llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (#8850) Esko Toivonen 2024-08-15 10:17:12 +03:00
d5492f0525

ci : disable bench workflow (#9010) Georgi Gerganov 2024-08-15 10:11:11 +03:00
234b30676a

server : init stop and error fields of the result struct (#9026) Jiří Podivín 2024-08-15 08:21:57 +02:00
5fd89a70ea

Vulkan Optimizations and Fixes (#8959) 0cc4m 2024-08-14 18:32:53 +02:00
98a532d474

server : fix segfault on long system prompt (#8987) compilade 2024-08-14 02:51:02 -04:00
43bdd3ce18

cmake : remove unused option GGML_CURL (#9011) Georgi Gerganov 2024-08-14 09:14:49 +03:00
06943a69f6

ggml : move rope type enum to ggml.h (#8949) Daniel Bevenius 2024-08-13 21:13:15 +02:00
828d6ff7d7

export-lora : throw error if lora is quantized (#9002) Xuan Son Nguyen 2024-08-13 11:41:14 +02:00
fc4ca27b25

ci : fix github workflow vulnerable to script injection (#9008) Diogo Teles Sant'Anna 2024-08-12 13:28:23 -03:00
1f67436c5e

ci : enable RPC in all of the released builds (#9006) Radoslav Gerganov 2024-08-12 19:17:03 +03:00
0fd93cdef5

llama : model-based max number of graph nodes calculation (#8970) Nico Bosshard 2024-08-12 17:13:59 +02:00
84eb2f4fad

docs: introduce gpustack and gguf-parser (#8873) Frank Mai 2024-08-12 20:45:50 +08:00
1262e7ed13

grammar-parser : fix possible null-deref (#9004) DavidKorczynski 2024-08-12 13:36:41 +01:00
df5478fbea

ggml: fix div-by-zero (#9003) DavidKorczynski 2024-08-12 13:21:41 +01:00
2589292cde

Fix a spelling mistake (#9001) Liu Jia 2024-08-12 17:46:03 +08:00
d3ae0ee8d7

py : fix requirements check '==' -> '~=' (#8982) Georgi Gerganov 2024-08-12 11:02:01 +03:00
5ef07e25ac

server : handle models with missing EOS token (#8997) Georgi Gerganov 2024-08-12 10:21:50 +03:00
4134999e01

gguf-py : Numpy dequantization for most types (#8939) compilade 2024-08-11 14:45:41 -04:00
8cd1bcfd3f

flake.lock: Update (#8979) Georgi Gerganov 2024-08-11 16:58:58 +03:00
a21c6fd450

update guide (#8909) Neo Zhang 2024-08-11 16:37:43 +08:00
33309f661a

llama : check all graph nodes when searching for result_embd_pooled (#8956) fairydreaming 2024-08-11 10:35:26 +02:00
7c5bfd57f8

Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943) Markus Tavenrath 2024-08-11 10:09:09 +02:00
6e02327e8b

metal : fix uninitialized abort_callback (#8968) slaren 2024-08-10 15:42:10 +02:00
7eb23840ed

llama : default n_swa for phi-3 (#8931) Xuan Son Nguyen 2024-08-10 13:04:40 +02:00
7c3f55c100

Add support for encoder-only T5 models (#8900) fairydreaming 2024-08-10 11:43:26 +02:00
911b437f22

gguf-py : fix double call to add_architecture() (#8952) Matteo Mortari 2024-08-10 07:58:49 +02:00
b72942fac9

Merge commit from fork Georgi Gerganov 2024-08-09 23:03:21 +03:00
6afd1a99dc

llama : add support for lora adapters in T5 model (#8938) fairydreaming 2024-08-09 18:53:09 +02:00
272e3bd95e

make : fix llava obj file race (#8946) Georgi Gerganov 2024-08-09 18:24:30 +03:00
45a55b91aa

llama : better replace_all (cont) (#8926) Georgi Gerganov 2024-08-09 18:23:52 +03:00
3071c0a5f2

llava : support MiniCPM-V-2.5 (#7599) tc-mb 2024-08-09 18:33:53 +08:00
4305b57c80

sync : ggml Georgi Gerganov 2024-08-09 10:03:48 +03:00
70c0ea3560

whisper : use vulkan as gpu backend when available (whisper/2302) Matt Stephenson 2024-07-16 03:21:09 -04:00
5b2c04f492

embedding : add --pooling option to README.md [no ci] (#8934) Daniel Bevenius 2024-08-09 08:33:30 +02:00
6f6496bb09

llama : fix typo in llama_tensor_get_type comment [no ci] (#8937) Daniel Bevenius 2024-08-09 08:32:23 +02:00
daef3ab233

server : add one level list nesting for embeddings (#8936) Mathieu Geli 2024-08-09 08:32:02 +02:00
345a686d82

llama : reduce useless copies when saving session (#8916) compilade 2024-08-08 23:54:00 -04:00
3a14e00366

gguf-py : simplify support for quant types (#8838) compilade 2024-08-08 13:33:09 -04:00
afd27f01fe

scripts : sync cann files (#0) Georgi Gerganov 2024-08-08 14:56:52 +03:00
366d486c16

scripts : fix sync filenames (#0) Georgi Gerganov 2024-08-08 14:40:12 +03:00
e44a561ab0

sync : ggml Georgi Gerganov 2024-08-08 13:19:47 +03:00
f93d49ab1e

ggml : ignore more msvc warnings (ggml/906) Borislav Stanimirov 2024-08-07 10:00:56 +03:00
5b33ea1ee7

metal : fix struct name (ggml/912) Georgi Gerganov 2024-08-07 09:57:00 +03:00
85fca8deb6

metal : add abort callback (ggml/905) Conrad Kramer 2024-08-07 02:55:49 -04:00
ebd541a570

make : clean llamafile objects (#8923) Pablo Duboue 2024-08-08 04:44:51 -04:00