llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

128dcbd3c9

add --no-mmap in llama-bench (#5257) Neo Zhang Jianyu 2024-02-02 03:48:53 +08:00
4d0924a890

Vulkan Phi Fix for AMD Proprietary Drivers (#5260) 0cc4m 2024-02-01 19:25:24 +01:00
8ca511cade

cuda : fix LLAMA_CUDA_F16 (#5262) slaren 2024-02-01 18:30:17 +01:00
d71ac90985

make : generate .a library for static linking (#5205) Ali Nehzat 2024-02-02 02:18:53 +11:00
ce32060198

llama : support InternLM2 (#5184) Guoteng 2024-02-01 17:19:51 +08:00
1cfb5372cf

Fix broken Vulkan Cmake (properly) (#5230) Eve 2024-01-31 19:21:55 +00:00
d3bac7d584

llama : reorder build_orion() at correct place (#5118) Georgi Gerganov 2024-01-31 18:47:10 +02:00
5cb04dbc16

llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) Georgi Gerganov 2024-01-31 17:30:17 +02:00
efb7bdbbd0

metal : add im2col F32 dst support (#5132) Georgi Gerganov 2024-01-31 15:35:41 +02:00
15606309a0

llava : add MobileVLM support (#5132) JidongZhang-THU 2024-01-31 21:10:15 +08:00
b2b9f025e7

format license text, restore apache license by legal suggestion (#5233) Neo Zhang Jianyu 2024-01-31 21:04:46 +08:00
dabcc5b471

ggml : limit n_threads to the max n_tasks (#5238) slaren 2024-01-31 13:43:03 +01:00
f8e9140cb4

Vulkan Fixes (#5223) 0cc4m 2024-01-31 11:44:19 +01:00
d62520eb2c

Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231) Yiming Cui 2024-01-31 11:04:21 +08:00
01684139c3

support SYCL backend windows build (#5208) Neo Zhang Jianyu 2024-01-31 10:38:07 +08:00
e8dc55d006

kompute : llama-bench support and ggml_cpu_has_kompute() (#5226) Jared Van Bortel 2024-01-30 19:04:37 -05:00
e0085fdf7c

Revert "server : change deps.sh xxd files to string literals (#5221)" Georgi Gerganov 2024-01-30 21:19:26 +02:00
e6f291d158

server : fix context shift (#5195) Georgi Gerganov 2024-01-30 20:17:30 +02:00
4003be0e5f

server : change deps.sh xxd files to string literals (#5221) JohnnyB 2024-01-30 12:15:05 -06:00
fea4fd4ba7

ggml : fix IQ3_XXS on Metal (#5219) Kawrakow 2024-01-30 19:15:28 +02:00
8f8ddfcfad

sync : ggml (#0) Georgi Gerganov 2024-01-30 16:21:57 +02:00
6fb50ebbf0

gguf : fix comparison (ggml/715) Georgi Gerganov 2024-01-29 21:08:18 +02:00
625a699b54

ggml_cuda_cpy support for 4d tensors and float16->float32 upcasting (ggml/686) John Balis 2024-01-29 06:37:33 -06:00
a4b07c057a

gguf : add input validation, prevent integer overflows (ggml/709) Georgi Gerganov 2024-01-29 14:00:10 +02:00
549a1e6cd5

ci : fix yolo URLs + fix metal capture (ggml/712) Georgi Gerganov 2024-01-29 13:29:46 +02:00
5f14ee0b0c

metal : add debug capture backend function (ggml/694) Jack Mousseau 2024-01-29 01:22:23 -08:00
8e14e3ddb3

Faster AVX2 dot product for IQ2_XS (#5187) Kawrakow 2024-01-30 15:15:07 +02:00
f4d7e54974

SOTA 3-bit quants (#5196) Kawrakow 2024-01-30 15:14:12 +02:00
2256f36b79

Vulkan Windows APU Memory Handling (#5199) 0cc4m 2024-01-30 13:59:30 +01:00
7359016c7c

quantize : fix typo (#5211) Vladimir Malyutin 2024-01-30 17:57:07 +07:00
813416991a

main : allow empty --prompt-cache file (#5176) divinity76 2024-01-30 10:18:02 +01:00
5589921ef8

readme : minor (#5204) Romain Neutron 2024-01-30 10:16:38 +01:00
49f44b5c55

readme : update hot topics Georgi Gerganov 2024-01-30 11:14:44 +02:00
6685cc41c2

server : improve README (#5209) Wu Jian Ping 2024-01-30 17:11:46 +08:00
ceebbb5b21

ggml alloc: Fix for null dereference on alloc failure (#5200) Paul Tsochantaris 2024-01-29 22:19:29 +00:00
6daa69ee81

kompute : fix fallback to CPU (#5201) Jared Van Bortel 2024-01-29 17:11:27 -05:00
fbf1ddec69

Nomic Vulkan backend (#4456) Jared Van Bortel 2024-01-29 15:50:50 -05:00
2aed77eb06

fix typo "RLIMIT_MLOCK" (#5175) divinity76 2024-01-29 15:45:41 +01:00
c82d18e863

server : embeddings compatibility for OpenAI (#5190) Wu Jian Ping 2024-01-29 21:48:10 +08:00
14fef85e2d

py : fix except (#5194) Georgi Gerganov 2024-01-29 15:35:54 +02:00
e76627bcce

py : improve BPE tokenizer support (#5189) Sang-Kil Park 2024-01-29 18:24:19 +09:00
fbe7dfa53c

ggml : add max buffer sizes to opencl and metal backends (#5181) slaren 2024-01-29 09:05:13 +01:00
172ac82629

cmake : fix Vulkan build (#5182) Eve 2024-01-29 08:04:47 +00:00
d2f650cb5b

metal : free metal objects (#5161) Paul Tsochantaris 2024-01-28 19:50:16 +00:00
35dec26cc2

sync : ggml Georgi Gerganov 2024-01-28 19:48:05 +02:00
d460510c72

ggml : minor type fix (int64_t -> size_t) Georgi Gerganov 2024-01-28 18:44:58 +02:00
2307523d32

ggml : add Vulkan backend (#2059) 0cc4m 2024-01-28 18:03:59 +01:00
0f648573dd

ggml : add unified SYCL backend for Intel GPUs (#2690) Abhilash Majumder 2024-01-28 21:26:23 +05:30
b764b8f1d0

flake.lock: Update (#5162) Georgi Gerganov 2024-01-28 16:54:54 +02:00
9241c3a2ac

Apply min_p to unsorted tokens (#5115) Johannes Gäßler 2024-01-28 09:59:49 +01:00
b2b2bf988c

Tests for min_p, sampling queue (#5147) Johannes Gäßler 2024-01-28 09:35:14 +01:00
af4980bfed

readme : add link to rust bindings (#5148) Marcus Dunn 2024-01-28 00:30:44 -08:00
f2e69d28c0

llama : add support for Orion-14B (#5118) sharpHL 2024-01-28 16:00:30 +08:00
39baaf55a1

docker : add server-first container images (#5157) Kyle Mistele 2024-01-28 01:55:31 -06:00
6db2b41a76

llava : support for Yi-VL and fix for mobileVLM (#5093) John 2024-01-27 16:09:18 +01:00
753eafed0e

sync : ggml Georgi Gerganov 2024-01-27 16:59:20 +02:00
e976423005

ggml : check ggml_add src1 type (ggml/708) Judd 2024-01-26 21:04:01 +08:00
35a2ee9143

Remove unused data and add fixes (#5154) Michael Klimenko 2024-01-27 15:25:55 +01:00
ec903c0341

server : add self-extend support (#5104) Maximilian Winter 2024-01-27 14:38:05 +01:00
a1d6df129b

Add OpenCL add kernel (#5151) 0cc4m 2024-01-26 23:07:32 +01:00
bbe7c56c99

cmake : pass CPU architecture flags to nvcc (#5146) Jared Van Bortel 2024-01-26 15:34:06 -05:00
62fead3ea0

cuda : fix tensor size calculation for non-split buffer (#5145) slaren 2024-01-26 18:59:43 +01:00
15b4538ff2

ggml-alloc : add 10% margin to the buffer sizes (#5149) slaren 2024-01-26 18:18:26 +01:00
7032f4f634

ggml : update softmax n_task calculation (#5126) snadampal 2024-01-26 11:17:59 -06:00
5f1925a8ce

scripts : move run-with-preset.py from root to scripts folder Georgi Gerganov 2024-01-26 17:09:44 +02:00
3b7c914de2

tests : gitignore test-c.o Georgi Gerganov 2024-01-26 14:48:15 +02:00
48c857aa10

server : refactored the task processing logic (#5065) Xuan Son Nguyen 2024-01-26 13:42:20 +01:00
413e7b0559

ci : add model tests + script wrapper (#4586) crasm 2024-01-26 07:18:00 -05:00
6dd3c28c9c

metal : remove unused n_buffers and buffers (#5129) Paul Tsochantaris 2024-01-26 12:16:07 +00:00
38b431de23

gguf : fix "general.alignment" type in gguf_reader.py (#5136) Riceball LEE 2024-01-26 17:10:28 +08:00
aad0b01d73

readme : update hot topics Georgi Gerganov 2024-01-26 10:52:33 +02:00
1182cf4d4f

Another bucket sort (#5109) Kawrakow 2024-01-26 09:14:39 +02:00
fe54033b69

readme : add MobileVLM 1.7B/3B to the supported models list (#5107) XiaotaoChen 2024-01-26 04:14:32 +08:00
5eaf9964fc

llama : dynamic temperature sampling (#4972) l3utterfly 2024-01-26 05:06:22 +09:00
d292f4f204

examples : make pydantic scripts pass mypy and support py3.8 (#5099) Jared Van Bortel 2024-01-25 14:51:24 -05:00
256d1bb0dd

android : use release cmake build type by default (#5123) Valentin Konovalov 2024-01-25 12:05:51 -05:00
faa3526a1e

Fix Q3_K_XS for MoE models (#5113) Kawrakow 2024-01-25 17:58:53 +02:00
ddc5a5033f

metal : show compile log messages Georgi Gerganov 2024-01-25 11:26:17 +02:00
cd4fddb29f

cuda : fix 2-bit quants on amd hip (#5105) Engininja2 2024-01-24 16:18:15 -06:00
c9b316c78f nix-shell: use addToSearchPath Michael Hueschen 2024-01-22 16:44:10 -07:00
bf63d695b8 nix: add cc to devShell LD_LIBRARY_PATH Michael Hueschen 2024-01-22 03:17:05 -07:00
1387ea2117

llama : pre-allocate input tensors in a separate buffer (#5100) slaren 2024-01-24 12:48:14 +01:00
26d607608d

metal : disable support for MUL_MAT F32 x F16 Georgi Gerganov 2024-01-23 15:50:56 +02:00
44879ee885

Additional KL-divergence statistics (#5081) Kawrakow 2024-01-23 15:17:20 +02:00
9ecdd12e95

CUDA: more info when no device code (#5088) Johannes Gäßler 2024-01-23 13:31:56 +01:00
89758723c7

minor : clean-up some warnings and style (#5094) Georgi Gerganov 2024-01-23 14:12:57 +02:00
2bed4aa3f3

devops : add intel oneapi dockerfile (#5068) Xuan Son Nguyen 2024-01-23 08:11:39 +01:00
125d03a503

llama.vim : added api key support (#5090) Michael Coppola 2024-01-23 01:51:27 -05:00
011e8ec577

llama : fix not enough space in buffer with Qwen (#5086) slaren 2024-01-22 23:42:41 +01:00
6f9939d119

KL-divergence (#5076) Kawrakow 2024-01-22 16:10:14 +02:00
780e24a22e

ggml : parallelize FP32 conversion when using BLAS (#5045) Reinforce-II 2024-01-22 21:15:08 +08:00
3ce7e8f8e7

llava : MobileVLM support (#4954) XiaotaoChen 2024-01-22 21:09:35 +08:00
b2d80e105a flake.nix: add a comment about flakes vs nix Someone Serge 2024-01-21 03:41:37 +00:00
28603cd283 nix: add a comment on the many nixpkgs-with-cuda instances Someone Serge 2024-01-21 03:29:38 +00:00
5e97ec91ae nix: add a comment about makeScope Someone Serge 2024-01-21 03:15:13 +00:00
7251870780 nix: refactor the cleanSource rules Someone Serge 2024-01-13 17:45:01 +00:00
fe8b3c0d4b workflows: nix-ci: drop the redundant "paths" filter Someone Serge 2024-01-13 17:38:32 +00:00
f4dd059259 workflows: nix-build-aarch64: rate limit Someone Serge 2024-01-13 17:16:54 +00:00
f7276f7500 workflows: nix-ci: rebuild on flake.lock updates Someone Serge 2024-01-13 17:10:19 +00:00
15bceec2d7

imatrix : keep intermediate imatrix results (#5077) Kawrakow 2024-01-22 14:18:43 +02:00