llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

8fd4b7fa29

vulkan: copy iq4_nl LUT into shared memory (#10409) Jeff Bolz 2024-11-20 01:40:18 -06:00
1bacb9f625

vulkan: further optimize mul_mat_vec using larger loads (#10387) Jeff Bolz 2024-11-20 01:11:00 -06:00
ad21c9e1f1

update rel to 4040 (#10395) Neo Zhang Jianyu 2024-11-20 13:54:25 +08:00
3952a221af

Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413) Anthony Van de Gejuchte 2024-11-19 23:18:17 +01:00
42ae10bbcd

add cmake rvv support (#10411) haopeng 2024-11-20 04:10:31 +08:00
9fe0fb0626 sync : ggml Georgi Gerganov 2024-11-19 19:15:50 +02:00
611fabd792 metal : fox offset integer overflows in im2col (ggml/1015) Plamen Minev 2024-11-18 15:02:27 +02:00
12b0ad953a metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) PAB 2024-11-18 10:02:49 +01:00
342397dc7e

cmake: force MSVC compiler charset to utf-8 (#9989) 蕭澧邦 2024-11-20 01:42:00 +08:00
2a11b6b094

Add required ggml-base and backend libs to cmake pkg (#10407) bandoti 2024-11-19 12:10:30 -04:00
3ee6382d48

cuda : fix CUDA_FLAGS not being applied (#10403) Diego Devesa 2024-11-19 14:29:38 +01:00
8e752a777b

llama : add check for KV cache shifts (#10401) Georgi Gerganov 2024-11-19 13:29:26 +02:00
a88ad007de

llama : add OLMo November 2024 support (#10394) Shane A 2024-11-19 01:04:08 -08:00
2a1507c162

sycl : Add option to set the SYCL architecture for all targets (#10266) Romain Biessy 2024-11-19 09:02:23 +01:00
b3e585988f

vulkan: Optimize soft_max (#10301) Jeff Bolz 2024-11-19 01:25:17 -06:00
557924f222

sycl: Revert MUL_MAT_OP support changes (#10385) Alberto Cabrera Pérez 2024-11-19 00:50:04 +00:00
d3481e6316

cuda : only use native when supported by cmake (#10389) Diego Devesa 2024-11-18 18:43:40 +01:00
531cb1c233

Skip searching root path for cross-compile builds (#10383) bandoti 2024-11-18 11:23:58 -04:00
f139d2ea61

vulkan: remove use of null initializer (#10372) Jeff Bolz 2024-11-18 08:28:42 -06:00
2eb76b2a5e

flake.lock: Update (#10346) Georgi Gerganov 2024-11-18 16:08:20 +02:00
9b75f03cd2

Vulkan: Fix device info output format specifiers (#10366) 0cc4m 2024-11-18 11:02:43 +01:00
75207b3a88

docker: use GGML_NATIVE=OFF (#10368) Johannes Gäßler 2024-11-18 00:21:53 +01:00
76e9e58b78

CUDA: fix MMV kernel being used for FP16 src1 (#10357) Johannes Gäßler 2024-11-17 23:20:42 +01:00
ce2e59ba10

CMake: fix typo in comment [no ci] (#10360) Johannes Gäßler 2024-11-17 12:59:38 +01:00
be5caccef9

llama : only use default buffer types for the KV cache (#10358) Diego Devesa 2024-11-17 12:25:45 +01:00
20a780c7b6

gitignore : ignore local run scripts [no ci] Georgi Gerganov 2024-11-17 13:12:22 +02:00
cf32a9b93a

metal : refactor kernel args into structs (#10238) Georgi Gerganov 2024-11-17 11:23:01 +02:00
a43178299c

ggml : fix undefined reference to 'getcpu' (#10354) FirstTimeEZ 2024-11-17 21:39:22 +13:00
c3ea58aca4

CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) Johannes Gäßler 2024-11-17 09:09:55 +01:00
467576b6cc

CMake: default to -arch=native for CUDA build (#10320) Johannes Gäßler 2024-11-17 09:06:34 +01:00
eda7e1d4f5

ggml : fix possible buffer use after free in sched reserve (#9930) Diego Devesa 2024-11-17 07:31:17 +01:00
24203e9dd7 ggml : inttypes.h -> cinttypes (#0) Georgi Gerganov 2024-11-16 23:40:39 +02:00
5d9e59979c ggml : adapt AMX to tensor->grad removal (#0) Georgi Gerganov 2024-11-16 21:38:01 +02:00
a4200cafad make : add ggml-opt (#0) Georgi Gerganov 2024-11-16 21:35:31 +02:00
84274a10c3 tests : remove test-grad0 Georgi Gerganov 2024-11-16 21:34:03 +02:00
68fcb4759c ggml : fix compile warnings (#0) Georgi Gerganov 2024-11-16 21:32:41 +02:00
8a43e940ab ggml: new optimization interface (ggml/988) Johannes Gäßler 2024-11-16 22:17:59 +02:00
5c9a8b22b1 scripts : update sync Georgi Gerganov 2024-11-16 22:16:04 +02:00
0fff7fd798

docs : vulkan build instructions to use git bash mingw64 (#10303) FirstTimeEZ 2024-11-17 12:29:18 +13:00
4e54be0ec6

llama/ex: remove --logdir argument (#10339) Johannes Gäßler 2024-11-16 23:00:41 +01:00
db4cfd5dbc llamafile : fix include path (#0) Georgi Gerganov 2024-11-16 17:58:56 +02:00
8ee0d09ae6 make : auto-determine dependencies (#0) Georgi Gerganov 2024-11-16 17:58:32 +02:00
bcdb7a2386

server: (web UI) Add samplers sequence customization (#10255) MaggotHATE 2024-11-16 18:26:54 +05:00
f245cc28d4

scripts : fix missing key in compare-llama-bench.py (#10332) Georgi Gerganov 2024-11-16 10:32:50 +02:00
772703c8ff

vulkan: Optimize some mat-vec mul quant shaders (#10296) Jeff Bolz 2024-11-16 00:26:57 -06:00
dd3a6ce9f8

vulkan : add cmake preset debug/release (#10306) FirstTimeEZ 2024-11-16 14:59:33 +13:00
1e58ee1318

ggml : optimize Q4_0 into Q4_0_X_Y repack (#10324) Dan Johansson 2024-11-16 01:53:37 +01:00
89e4caaaf0

llama : save number of parameters and the size in llama_model (#10286) FirstTimeEZ 2024-11-16 13:42:13 +13:00
74d73dc85c

Make updates to fix issues with clang-cl builds while using AVX512 flags (#10314) Srihari-mcw 2024-11-16 02:57:00 +05:30
4047be74da

scripts: update compare-llama-bench.py (#10319) Johannes Gäßler 2024-11-15 21:19:03 +01:00
883d206fbd ggml : fix some build issues slaren 2024-11-15 20:20:54 +01:00
09ecbcb596 cmake : fix ppc64 check (whisper/0) Georgi Gerganov 2024-11-15 15:35:22 +02:00
3225008973 ggml : vulkan logs (whisper/2547) thewh1teagle 2024-11-15 15:33:53 +02:00
cbf5541a82 sync : ggml Georgi Gerganov 2024-11-15 15:31:16 +02:00
18429220bd

AVX BF16 and single scale quant optimizations (#10212) Eve 2024-11-15 11:47:58 +00:00
f0204a0ec7

ci: build test musa with cmake (#10298) R0CKSTAR 2024-11-15 19:47:25 +08:00
57f8355b29

sycl: Update Intel docker images to use DPC++ 2025.0 (#10305) Romain Biessy 2024-11-15 12:10:45 +01:00
9901068ac7

server : (web UI) add copy button for code block, fix api key (#10242) Xuan Son Nguyen 2024-11-15 05:48:49 -04:00
231f9360d9

cann: dockerfile and doc adjustment (#10302) Chenguang Li 2024-11-15 15:09:35 +08:00
4802ad350b

scripts : fix regex in sync [no ci] Georgi Gerganov 2024-11-15 08:38:43 +02:00
5a54af4d4f

sycl: Use syclcompat::dp4a (#10267) Romain Biessy 2024-11-15 04:09:12 +01:00
1607a5e5b0

backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921) Charles Xu 2024-11-15 01:28:50 +01:00
ae8de6d50a

ggml : build backends as libraries (#10256) Diego Devesa 2024-11-14 18:04:35 +01:00
4a8ccb37ad

CUDA: no -sm row for very small matrices (#10185) Johannes Gäßler 2024-11-14 13:00:15 +01:00
2a82891a85

speculative : fix out-of-bounds access (#10289) Georgi Gerganov 2024-11-14 11:44:15 +02:00
af148c9386

vulkan: Optimize binary ops (#10270) Jeff Bolz 2024-11-13 23:22:55 -06:00
66798e42fb

vulkan: Use macros to make the mat mul pipeline creation more concise (#10259) Jeff Bolz 2024-11-13 14:59:47 -06:00
fb4a0ec083

llama : propagate the results of graph_compute (#9525) Michael Podvitskiy 2024-11-13 20:00:35 +02:00
5ea926dad7

sync : ggml Georgi Gerganov 2024-11-13 18:11:54 +02:00
1ee9eea094

docs : update bindings list (#10261) Small Grass Forest 2024-11-13 19:17:10 +08:00
ff7fb670d0

server : add missing docs (#10269) Alexey Parfenov 2024-11-13 11:16:30 +00:00
0e712a5acb

server : fix incorrect res in validate_model_chat_template (#10272) Jhen-Jie Hong 2024-11-13 19:15:23 +08:00
a0ec17b32e

metadata: Detailed Dataset Authorship Metadata (#8875) Brian 2024-11-13 21:10:38 +11:00
2e82ffa4af

sycl : Fixes to broken builds and test-backend-ops (#10257) Alberto Cabrera Pérez 2024-11-13 09:40:57 +00:00
80dd7ff22f

vulkan: Optimize contiguous copies (#10254) Jeff Bolz 2024-11-13 00:58:57 -06:00
54ef9cfc72

vulkan: Throttle the number of shader compiles during the build step. (#10222) Jeff Bolz 2024-11-11 11:13:51 -06:00
b0cefea58a

metal : more precise Q*K in FA vec kernel (#10247) Georgi Gerganov 2024-11-11 08:39:13 +02:00
b141e5f6ef

server : enable KV cache defrag by default (#10233) Georgi Gerganov 2024-11-11 08:38:43 +02:00
4b3a9212b6

flake.lock: Update (#10243) Georgi Gerganov 2024-11-10 21:45:25 +02:00
505f33274d

server : (web UI) Add back sampler settings (#10239) MaggotHATE 2024-11-11 00:42:25 +05:00
160687b3ed

vulkan: Fix newly added tests for permuted mul_mat and 1D im2col (#10226) Jeff Bolz 2024-11-10 05:37:56 -06:00
6423c65aa8

metal : reorder write loop in mul mat kernel + style (#10231) Georgi Gerganov 2024-11-09 11:53:13 +02:00
39a334a9aa

metal : fix build and some more comments (#10229) Georgi Gerganov 2024-11-09 11:53:02 +02:00
bb38cdd8ba

metal : fix F32 accumulation in FA vec kernel (#10232) Georgi Gerganov 2024-11-09 11:52:45 +02:00
f018acba22

llama : fix Qwen model type strings Georgi Gerganov 2024-11-09 11:26:34 +02:00
46323fa9ef

metal : hide debug messages from normal log Georgi Gerganov 2024-11-09 11:21:49 +02:00
5b359bb1e3

ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small (#10213) SXX 2024-11-09 15:35:46 +08:00
e89213492d

ggml : optimize llamafile cpu matrix multiplication for ppc64le (#10156) amritahs-ibm 2024-11-09 12:47:50 +05:30
8fc393f246

scripts : fix pattern and get n_tokens in one go (#10221) haopeng 2024-11-09 15:06:54 +08:00
ec450d3bbf

metal : opt-in compile flag for BF16 (#10218) Georgi Gerganov 2024-11-08 21:59:46 +02:00
695ad752b2

metal : improve clarity (minor) (#10171) Georgi Gerganov 2024-11-08 18:37:41 +02:00
841f27abdb

metal : optimize FA kernels (#10171) Georgi Gerganov 2024-11-08 13:47:22 +02:00
d05b3127bd

swift : exclude ggml-metal-embed.metal (#10211) Jhen-Jie Hong 2024-11-08 17:34:06 +08:00
76c6e7f105

server : minor UI fix (#10207) Xuan Son Nguyen 2024-11-07 18:44:38 -04:00
a71d81cf8c

server : revamp chat UI with vuejs and daisyui (#10175) Xuan Son Nguyen 2024-11-07 17:31:10 -04:00
eec4d71737

scripts : add amx to sync-ggml.sh [no ci] Georgi Gerganov 2024-11-07 23:11:36 +02:00
3b08828674

sync : ggml Georgi Gerganov 2024-11-07 23:08:24 +02:00
a2c6fd747c

scripts : sync update Georgi Gerganov 2024-11-07 23:07:55 +02:00
97404c4a03

ggml : add ggml-cpu.h to the public headers (#10204) Diego Devesa 2024-11-07 18:16:08 +01:00
60e17ce23c

Remove identical wte/etw logic for jais (#10203) Faisal Zaghloul 2024-11-07 11:46:12 -05:00