llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

1d41d6f7c2

nix: static build (#5814) hutli 2024-03-05 02:33:08 +01:00
29ae62d2ae

llama : fix embeddings (#5796) Georgi Gerganov 2024-03-04 22:31:20 +02:00
e0843afe1b

flake : fix Georgi Gerganov 2024-03-04 21:50:50 +02:00
a1c6d96ed8 ggml : fix unknown status (#0) Georgi Gerganov 2024-03-04 20:53:27 +02:00
efd8533ef8 sync : ggml Georgi Gerganov 2024-03-04 11:06:39 +02:00
9fa2627347 ggml : introduce ggml_status (ggml/750) Michael Podvitskiy 2024-03-04 10:05:42 +01:00
fe52be11e3

cmake : handle cases where git index is not found in .git (#5844) Dane Madsen 2024-03-05 05:26:55 +11:00
6d341ab6c5

speculative : implement stochastic speculative sampling (#5625) Minsoo Cheong 2024-03-05 03:24:00 +09:00
4ffcdce2ff

add alias for chat template (#5858) Xuan Son Nguyen 2024-03-04 12:22:08 +01:00
a0fc62661f

sync : ggml Georgi Gerganov 2024-03-04 10:40:04 +02:00
7d43c585dc

add some new ops, fix some operators and add batch operations to certain operators. (ggml/747) leejet 2024-03-03 20:23:52 +08:00
82f3e668ad

common : use LLAMA_DEFAULT_SEED (#5855) DAN™ 2024-03-04 03:08:19 -05:00
5a51cc1bb4

main : support special tokens as reverse/anti prompt (#5847) DAN™ 2024-03-04 02:57:20 -05:00
67be2ce101

cuda : fix data race in soft max (#5853) slaren 2024-03-03 14:26:18 +01:00
231ae28f07

readme : add API changes section Georgi Gerganov 2024-03-03 12:44:03 +02:00
475df1d6cf

llama : allow for user specified embedding pooling type (#5849) Douglas Hanley 2024-03-03 04:40:27 -06:00
87c2e8b279

gguf-dump : support i-quants (#5841) Nindaleth 2024-03-03 09:43:42 +01:00
de9692a7d2

llama : fix llama_copy_state_data with fragmented KV cache (#5840) compilade 2024-03-03 03:41:55 -05:00
e6029348e8

ci : schedule slow server tests only on Release or on demand (#5839) Pierrick Hymbert 2024-03-03 09:35:23 +01:00
8ef969afce

server : init http requests thread pool with --parallel if set (#5836) Pierrick Hymbert 2024-03-03 08:48:36 +01:00
fa974646e1

flake.lock: Update (#5842) Georgi Gerganov 2024-03-03 06:11:31 +02:00
9731134296

server: tests: passkey challenge / self-extend with context shift demo (#5832) Pierrick Hymbert 2024-03-02 22:00:14 +01:00
4a6e2d6142

llama : add abort_callback to interrupt computation (#5409) Michael Podvitskiy 2024-03-02 20:52:25 +01:00
494c870326

ggml : fix IQ3_S AVX implementation (#5834) Georgi Gerganov 2024-03-02 20:00:49 +02:00
4d4d2366fc

convert : automatically fall back to HfVocab if tokenizer.model doesn't exist (#5821) Jared Van Bortel 2024-03-02 12:27:26 -05:00
c7a0ad8ec9

convert-hf : make model class definitions self-contained (#5825) Jared Van Bortel 2024-03-02 12:21:47 -05:00
bbde6eb256

ggml : IQ3_S improvements (#5829) Kawrakow 2024-03-02 17:00:51 +02:00
ef2cd694c4

scripts : add pod-llama.sh Georgi Gerganov 2024-03-02 16:54:08 +02:00
6c32d8c7ad

llama : refactor internal quantization functions (#5830) Xuan Son Nguyen 2024-03-02 15:19:09 +01:00
802da0091b

llama : fix segfault from unknown model arch name (#5820) compilade 2024-03-02 08:42:56 -05:00
715641391d

Support multiple GPUs (split mode) on SYCL backend (#5806) Neo Zhang Jianyu 2024-03-02 19:49:30 +08:00
9bf297a02b

workflows : remove nocleanup arg for check-requirements.sh (#5826) crasm 2024-03-02 00:11:06 -05:00
cb5e8f7fc4

build(nix): Introduce flake.formatter for nix fmt (#5687) Tushar 2024-03-02 04:48:26 +05:30
da3b9ba2b7

convert-hf-to-gguf : require einops for InternLM2ForCausalLM (#5792) nold 2024-03-01 22:51:12 +01:00
c29af7e225

llama : add StarCoder2 support (#5795) Sourab Mangrulkar 2024-03-02 01:00:46 +05:30
38d16b1426

server : remove api_like_OAI.py proxy script (#5808) Georgi Gerganov 2024-03-01 20:00:58 +02:00
c2224f003b

ggml-vulkan: fix VULKAN_CHECK_RESULTS flag, which was previously broken (#5813) ddpasa 2024-03-01 18:00:00 +01:00
e743386728

gemma : fix bfloat16 -> float16 conversion issue (#5810) kunal-vaishnavi 2024-03-01 06:08:08 -08:00
f49a535686

common : fix flag --logits-all to --all-logits (#5805) Miwa / Ensan 2024-03-01 22:48:56 +09:00
3ab8b3a92e

llama : cleanup unused mmq flags (#5772) Pierrick Hymbert 2024-03-01 12:39:06 +01:00
9600d59e01

unicode : switch to multimap based nfd_map (#5799) Douglas Hanley 2024-03-01 03:15:36 -06:00
5cb02b4a01

server: allow to override threads server pool with --threads-http (#5794) Pierrick Hymbert 2024-03-01 10:08:08 +01:00
6ea0f010ff

ci : add Ubuntu 22 Vulkan CI run (#5789) Eve 2024-03-01 08:54:53 +00:00
f105471ef6

server : fix newlines in help (#5785) Georgi Gerganov 2024-03-01 09:59:43 +02:00
38d1521608

[SYCL] Use batched mul_mat pathway (#5591) AidanBeltonS 2024-03-01 07:36:47 +00:00
052051d8ae

Server: normalize naming (#5779) Xuan Son Nguyen 2024-02-29 21:42:11 +01:00
d5ab29757e

llama : constified llama_set_state_data's src (#5774) Marcus Dunn 2024-02-29 00:17:23 -08:00
87c91c0766

ci : reduce 3b ppl chunks to 1 to avoid timeout (#5771) Georgi Gerganov 2024-02-28 21:44:21 +02:00
317709b2a8

make portability_enumeration_ext apple only (#5757) Eve 2024-02-28 19:33:37 +00:00
08c5ee87e4

llama : remove deprecated API (#5770) Georgi Gerganov 2024-02-28 18:43:38 +02:00
78aacf3634

awq-py : remove (#5768) Georgi Gerganov 2024-02-28 17:36:53 +02:00
8c0e8f4e73

sync : ggml Georgi Gerganov 2024-02-28 11:17:32 +02:00
2774b0c974

add google magika inference example (ggml/748) slaren 2024-02-25 20:41:35 +01:00
5f70671856

Introduce backend GUIDs (ggml/743) UEXTM.com 2024-02-24 11:27:36 -05:00
a693bea1e6

server : hit Ctrl+C twice to exit (#5734) Xuan Son Nguyen 2024-02-28 09:55:37 +01:00
adcb12a9ba

llama : fix non-quantization of expert gating tensors (#5754) compilade 2024-02-28 03:52:56 -05:00
177628bfd8

llama : improve BERT tokenization (#5740) Douglas Hanley 2024-02-28 02:51:11 -06:00
6c4416868d

readme : add link to LLaVA 1.6 models (#5758) Daniel Bevenius 2024-02-28 09:39:39 +01:00
efc72253f7

server : add "/chat/completions" alias for "/v1/...` (#5722) Jorge A 2024-02-28 01:39:15 -07:00
7c4263d426

ggml : make i-quants work with super-blocks of 64 (CPU,Metal) (#5760) Kawrakow 2024-02-28 10:37:02 +02:00
cb49e0f8c9

Attempt to fix android build (#5752) Kawrakow 2024-02-27 19:16:49 +02:00
0becb22ac0

IQ4_XS: a 4.25 bpw quantization (#5747) Kawrakow 2024-02-27 16:34:24 +02:00
c24a2a6e60

cuda : replace remaining shfl_xor with calls to warp_reduce functions (#5744) Engininja2 2024-02-27 07:22:45 -06:00
1f30b7a9f1

ggml-quants : fix avx2 iq1_s vec_dot when compiled with gcc (#5742) Engininja2 2024-02-27 06:50:18 -06:00
9d533a77d0

llama : fix defrag bugs + add parameter (#5735) Georgi Gerganov 2024-02-27 14:35:51 +02:00
cbbd1efa06

Makefile: use variables for cublas (#5689) le.chang 2024-02-27 10:03:06 +08:00
b11a93df41

fix server hangs on empty prompt (#5733) Xuan Son Nguyen 2024-02-26 23:15:48 +01:00
a33e6a0d2a

Adding IQ2_S and IQ2_M to complete coverage of the 2-3 bit quantization range (#5721) Kawrakow 2024-02-26 18:28:38 +02:00
47bb7b48c7

CUDA: fix DEBUG_CUDA_MALLOC (#5729) Johannes Gäßler 2024-02-26 15:36:38 +01:00
c4d7f81786

readme : update ui list (#5731) Artem 2024-02-26 17:15:28 +03:00
e849078c6e

[SYCL] Add support for soft_max ALiBi (#5639) AidanBeltonS 2024-02-26 14:02:11 +00:00
67fd33132f

unicode : reuse iterator (#5726) Georgi Gerganov 2024-02-26 14:02:12 +02:00
4804215cb8

server: CI fix trailing space (#5728) Pierrick Hymbert 2024-02-26 11:41:34 +01:00
8a533f0d90

server: CI tests reduce build matrix (#5725) Pierrick Hymbert 2024-02-26 09:56:10 +01:00
269de86ba0

llama : fix Gemma rope type (#5691) Georgi Gerganov 2024-02-26 08:30:17 +02:00
c393733988 flake.lock: Update github-actions[bot] 2024-02-25 00:17:11 +00:00
e3965cf35a

server: tests - slow inference causes timeout on the CI (#5715) Pierrick Hymbert 2024-02-25 22:48:33 +01:00
8b350356b2

server: docs - refresh and tease a little bit more the http server (#5718) Pierrick Hymbert 2024-02-25 21:46:29 +01:00
bf08e00643

llama : refactor k-shift implementation + KV defragmentation (#5691) Georgi Gerganov 2024-02-25 22:12:24 +02:00
f7625019c5

server : fix crash when system prompt is bigger than batch size (#5714) compilade 2024-02-25 13:43:50 -05:00
abbabc5e51

ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711) Radosław Gryta 2024-02-25 19:43:00 +01:00
f1a98c5254

make : fix nvcc version is empty (#5713) kwin1412 2024-02-26 00:46:49 +08:00
7d548a1827

readme : add Msty to UI list (#5618) Ashok Gelal 2024-02-25 10:57:34 -05:00
930b178026

server: logs - unified format and --log-format option (#5700) Pierrick Hymbert 2024-02-25 13:50:32 +01:00
d52d7819b8

server: concurrency fix + monitoring - add /metrics prometheus compatible endpoint (#5708) Pierrick Hymbert 2024-02-25 13:49:43 +01:00
1289408817

cmake : fix compilation for Android armeabi-v7a (#5702) Radosław Gryta 2024-02-25 11:53:11 +01:00
ab336a9d5e

code : normalize enum names (#5697) Georgi Gerganov 2024-02-25 12:09:09 +02:00
69917dfa55

py : fix StableLM conversion after config.json changes (#5703) Anas Ahouzi 2024-02-25 10:54:04 +01:00
9e359a4f47

server: continue to update other slots on embedding concurrent request (#5699) Pierrick Hymbert 2024-02-24 19:16:04 +01:00
4c4cb30736

IQ3_S: a much better alternative to Q3_K (#5676) Kawrakow 2024-02-24 16:23:52 +02:00
525213d2f5

server: init functional tests (#5566) Pierrick Hymbert 2024-02-24 12:28:55 +01:00
fd43d66f46

server : add KV cache quantization options (#5684) AlpinDale 2024-02-23 19:31:54 +00:00
54fbcd2ce6

convert : fix missing ftype for gemma (#5690) Jared Van Bortel 2024-02-23 13:39:14 -05:00
15499eb942

mpt : do not duplicate token_embd.weight on disk (#5670) Jared Van Bortel 2024-02-22 17:05:23 -05:00
96633eeca1

gemma : use more bits for the token_embd.weight tensor (#5650) Georgi Gerganov 2024-02-22 23:23:46 +02:00
847eedbdb2

py : add Gemma conversion from HF models (#5647) Georgi Gerganov 2024-02-22 23:22:48 +02:00
7e4f339c40

ggml : always define ggml_fp16_t as uint16_t (#5666) Georgi Gerganov 2024-02-22 23:21:39 +02:00
334f76fa38

sync : ggml Georgi Gerganov 2024-02-22 23:21:05 +02:00
efd56b1c21

ggml : 32-bit arm compat (whisper/1891) Georgi Gerganov 2024-02-22 18:31:40 +02:00
201294ae17

nix: init singularity and docker images (#5056) Someone 2024-02-22 19:44:10 +00:00