llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

edd1ab7bc3 flake.lock: update Someone Serge 2023-12-31 17:42:22 +00:00
198ed7ebfc flake.nix: suggest the binary caches Someone Serge 2023-12-30 18:25:25 +00:00
d836174731 workflows: nix-ci: add a qemu job for jetsons Someone Serge 2023-12-30 18:01:07 +00:00
06f2a5d190 workflows: nix-flakestry: drop tag filters Someone Serge 2023-12-30 17:36:08 +00:00
c5239944ba workflows: weekly nix flake update Someone Serge 2023-12-30 16:38:36 +00:00
1e9ae54cf2 workflows: nix-ci: add a job for eval Someone Serge 2023-12-30 17:19:11 +00:00
7adedecbe3 workflows: nix-ci: init; build flake outputs Someone Serge 2023-12-26 19:17:26 +00:00
356ea17e0f flake.nix: expose checks Someone Serge 2023-12-29 16:21:50 +00:00
a5c088d8c6 flake.nix: rocm not yet supported on aarch64, so hide the output Someone Serge 2023-12-26 23:34:40 +00:00
1e3900ebac flake.nix: expose full scope in legacyPackages Someone Serge 2023-12-29 16:15:37 +00:00
e39106c055

ggml : add ggml_vdotq_s32 alias (#4715) Georgi Gerganov 2023-12-31 11:43:31 +02:00
9fbda719de

clip : refactor + bug fixes (#4696) Georgi Gerganov 2023-12-30 23:24:42 +02:00
39d8bc71ed

CUDA: fixed tensor cores not being used on RDNA3 (#4697) Johannes Gäßler 2023-12-30 13:52:01 +01:00
24a447e20a

ggml : add ggml_cpu_has_avx_vnni() (#4589) automaticcat 2023-12-30 15:07:48 +07:00
a20f3c7465

CUDA: fix tensor core logic for Pascal and HIP (#4682) Johannes Gäßler 2023-12-29 23:12:53 +01:00
0235b9b571

clip : use ggml_backend_buffer_is_host (#4205) Georgi Gerganov 2023-12-29 18:53:34 +02:00
ce18d727a4

clip : enable gpu backend (#4205) Steward Garcia 2023-12-29 11:52:15 -05:00
91bb39cec7

cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687) hydai 2023-12-30 00:31:19 +08:00
04ac0607e9

python : add check-requirements.sh and GitHub workflow (#4585) crasm 2023-12-29 09:50:29 -05:00
68eccbdc5b

flake.nix : rewrite (#4605) Philip Taron 2023-12-29 06:42:26 -08:00
97bbca6e85

cmake : fix ld warning duplicate libraries libllama.a (#4671) Cuong Trinh Manh 2023-12-29 21:39:15 +07:00
4af4801566

llava-cli : refactor to use sampling library (#4669) Justine Tunney 2023-12-29 06:38:38 -08:00
db49ff8ed7

server : replace sleep with condition variables (#4673) Justine Tunney 2023-12-29 06:24:12 -08:00
60f55e888c

server : fix OpenAI server sampling w.r.t. penalty. (#4675) SakuraUmi 2023-12-29 22:22:44 +08:00
b93edd22f5

server : allow to generate multimodal embeddings (#4681) Karthik Sethuraman 2023-12-29 06:22:10 -08:00
82d6eab224

main-cmake-pkg : fix build issue (#4665) andrijdavid 2023-12-29 15:18:20 +01:00
afd997ab60

llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674) Peter Sugihara 2023-12-29 05:58:56 -08:00
c8255f8a6b

scripts : print list of sync commits Georgi Gerganov 2023-12-29 15:12:35 +02:00
441f51dca0

ci : build with CLBlast + ggml-opencl use GGML_API (whisper/1576) Tamotsu Takahashi 2023-12-29 19:23:27 +09:00
38b3de4658

sync : ggml Georgi Gerganov 2023-12-29 14:56:41 +02:00
afc8c19291

ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669) bssrdf 2023-12-29 03:32:31 -05:00
ca38b8d334

scripts : do not sync commits from this repo Georgi Gerganov 2023-12-29 14:41:36 +02:00
65e5f6dadb

Fix OpenAI server sampling w.r.t. temp and seed (#4668) Justine Tunney 2023-12-28 11:20:00 -08:00
ea5497df5d

gpt2 : Add gpt2 architecture integration (#4555) manikbhandari 2023-12-28 09:03:57 -05:00
f6793491b5

llama : add AWQ for llama, llama2, mpt, and mistral models (#4593) Nam D. Tran 2023-12-27 22:39:45 +07:00
879b690a9e

finetune : fix output formatting in print_params (#4653) Daniel Bevenius 2023-12-27 15:16:55 +01:00
b47879b0dd

scripts : add sync-ggml-am.sh Georgi Gerganov 2023-12-27 11:15:31 +02:00
951010fa53

ggml : fix dot product for ARM (#4630) Georgi Gerganov 2023-12-27 11:02:13 +02:00
f56d6077d0

Add byte token type when tokenizer.model is not exists (#4641) wonjun Jang 2023-12-27 17:37:25 +09:00
dc68f0054c

cuda : fix vmm pool with multi GPU (#4620) slaren 2023-12-26 21:23:59 +01:00
de8e496437

Update comment for AdamW implementation reference. (#4604) WillCorticesAI 2023-12-26 05:42:08 -05:00
77465dad48

Fix new CUDA10 compilation errors (#4635) FantasyGmm 2023-12-26 18:38:36 +08:00
a206137f92

Adding Emeltal reference to UI list (#4629) Paul Tsochantaris 2023-12-25 16:09:53 +00:00
b9f47952ff

simplify bug issue template (#4623) slaren 2023-12-24 21:01:12 +01:00
753be377b6

llama : add PLaMo model (#3557) Shintarou Okada 2023-12-24 22:35:49 +09:00
5bf3953d7e

cuda : improve cuda pool efficiency using virtual memory (#4606) slaren 2023-12-24 14:34:22 +01:00
708e179e85

fallback to CPU buffer if host buffer alloc fails (#4610) slaren 2023-12-23 16:10:51 +01:00
925e5584a0

ci(docker): fix tags in "Build and push docker image (tagged)" (#4603) Samuel Maynard 2023-12-23 11:35:55 +02:00
6123979952

server : allow to specify custom prompt for penalty calculation (#3727) Alexey Parfenov 2023-12-23 09:31:49 +00:00
b9ec82d262

grammar : check the full vocab only if necessary (opt) (#4306) kalomaze 2023-12-23 03:27:07 -06:00
e0a4002273

CUDA: fixed row rounding for 0 tensor splits (#4594) Johannes Gäßler 2023-12-23 09:16:33 +01:00
7082d24cec

lookup : add prompt lookup decoding example (#4484) LeonEricsson 2023-12-22 17:05:56 +01:00
ba66175132

sync : ggml (fix im2col) (#4591) Georgi Gerganov 2023-12-22 17:53:43 +02:00
a55876955b

cuda : fix jetson compile error (#4560) FantasyGmm 2023-12-22 23:11:12 +08:00
6724ef1657

Fix CudaMemcpy direction (#4599) Henrik Forstén 2023-12-22 15:34:05 +02:00
48b7ff193e

llama : fix platforms without mmap (#4578) slaren 2023-12-22 12:12:53 +01:00
48b24b170e

ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) Herman Semenov 2023-12-22 09:26:49 +00:00
28cb35a0ec

make : add LLAMA_HIP_UMA option (#4587) Michael Kesper 2023-12-22 09:03:25 +01:00
f31b984898

ci : tag docker image with build number (#4584) rhuddleston 2023-12-21 23:56:34 -07:00
2bb98279c5

readme : add zig bindings (#4581) Deins 2023-12-22 08:49:54 +02:00
0137ef88ea

ggml : extend enum ggml_log_level with GGML_LOG_LEVEL_DEBUG (#4579) bobqianic 2023-12-22 06:47:01 +00:00
c7e9701f86

llama : add ability to cancel model loading (#4462) crasm 2023-12-22 01:19:36 -05:00
afefa319f1

ggml : change ggml_scale to take a float instead of tensor (#4573) Georgi Gerganov 2023-12-21 23:20:49 +02:00
769a7bc85e

gguf-py : fix broken link Georgi Gerganov 2023-12-21 23:20:36 +02:00
32259b2dad

gguf : simplify example dependencies Georgi Gerganov 2023-12-21 23:07:58 +02:00
4a5f9d629e

ci : add jlumbroso/free-disk-space to docker workflow (#4150) Samuel Maynard 2023-12-21 22:36:26 +02:00
d232aca5a7

llama : initial ggml-backend integration (#4520) slaren 2023-12-21 21:07:46 +01:00
31f27758fa

llama : allow getting n_batch from llama_context in c api (#4540) Marcus Dunn 2023-12-21 11:57:48 -08:00
56fa50819f

metal : fix ggml_metal_log vargs (#4373) Finn Voorhees 2023-12-21 14:55:02 -05:00
0f630fbc92

cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) Erik Garrison 2023-12-21 13:45:32 -06:00
562cf222b5

ggml-cuda: Fix HIP build by adding define for __trap (#4569) arlo-phoenix 2023-12-21 20:13:25 +01:00
8fe03ffdda

common : remove incorrect --model-draft default (#4568) Jared Van Bortel 2023-12-21 12:55:34 -05:00
9154494808

CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) Johannes Gäßler 2023-12-21 18:42:59 +01:00
c083718c89

readme : update coding guidelines Georgi Gerganov 2023-12-21 19:27:14 +02:00
880e352277

py : open merges file as 'utf-8' (#4566) howlger 2023-12-21 18:07:34 +01:00
66f35a2f48

cuda : better error message for ggml_get_rows (#4561) bobqianic 2023-12-21 17:06:44 +00:00
1398823922

cuda : replace asserts in wrong architecture checks with __trap (#4556) slaren 2023-12-21 18:02:30 +01:00
d3223afdad

llama : disable per-tensor info prints on model load (#4562) Johannes Gäßler 2023-12-21 17:34:17 +01:00
1d7a1912ce

Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) LoganDark 2023-12-21 01:59:27 -08:00
799fc22689

CUDA: Faster Mixtral prompt processing (#4538) Johannes Gäßler 2023-12-20 15:41:22 +01:00
328b83de23

ggml : fixed check for _MSC_VER (#4535) Eric Sommerlade 2023-12-19 16:17:01 +00:00
a7aee47b98

ggml-cuda: Fix HIP build (#4528) arlo-phoenix 2023-12-18 22:33:45 +01:00
0e18b2e7d0

llama.swiftui : add tinyllama 1.1B F16 Georgi Gerganov 2023-12-18 20:17:43 +02:00
6ff39b129d

llama.swiftui : add more models Georgi Gerganov 2023-12-18 20:05:12 +02:00
b9e74f9bca

llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) Ebey Abraham 2023-12-18 17:27:47 +00:00
3c04bf6da8

llama : fix try_override for bool_value which always return true (#4519) hankcs 2023-12-18 05:14:58 -08:00
2994f0c5a2

decode : fix logits_valid for legacy API (#4516) Jared Van Bortel 2023-12-17 19:39:02 -05:00
b1306c4394

readme : update hot topics Georgi Gerganov 2023-12-17 20:16:23 +02:00
800a489e4a

llama.swiftui : add bench functionality (#4483) Georgi Gerganov 2023-12-17 19:38:41 +02:00
f7f468a97d

gguf-py : fail fast on nonsensical special token IDs (#4489) Jared Van Bortel 2023-12-17 10:45:46 -05:00
919c40660f

build : Check the ROCm installation location (#4485) Matheus Gabriel Alves Silva 2023-12-17 12:23:33 -03:00
45668633fd

finetune : keep allocs alive until all allocations are done (#4486) slaren 2023-12-17 16:05:56 +01:00
0ffc92d2d2

server : disable llm logs if SERVER_VERBOSE is off (#3792) olexiyb 2023-12-17 17:02:16 +02:00
8edd2b40fd

server : fix grammar being ignored (#4494) AdithyanI 2023-12-17 15:57:56 +01:00
eb16dae7e7

server : fix possible ambiguity in content type charset (#4501) Alexey Parfenov 2023-12-17 14:56:09 +00:00
62bd52b7bf

server : allow requests larger than 8K (#4500) mzcu 2023-12-17 15:54:37 +01:00
5daa5f54fd

Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) Bach Le 2023-12-17 18:57:33 +08:00
c6c4fc081c

lora : add support for non-llama models (#3333) slaren 2023-12-16 18:58:46 +01:00
8a5be3bd58

llama : sanity checks for access to logits (#4274) Jared Van Bortel 2023-12-15 22:16:15 -05:00
88ae8952b6

server : add optional API Key Authentication example (#4441) ShadovvBeast 2023-12-15 13:49:01 +02:00