llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

9e4e077ec5

ci: server: fix python installation (#6922) Pierrick Hymbert 2024-04-26 11:11:51 +02:00
83b72cb086

Merge pull request from GHSA-p5mv-gjc5-mwqv Georgi Gerganov 2024-04-26 10:41:53 +03:00
d4a9afc100

ci: server: fix python installation (#6918) Pierrick Hymbert 2024-04-26 09:27:49 +02:00
7d641c26ac

ci: fix concurrency for pull_request_target (#6917) Pierrick Hymbert 2024-04-26 09:26:59 +02:00
5790c8dac1

bench: server add stop word for PHI-2 (#6916) Pierrick Hymbert 2024-04-26 09:26:16 +02:00
46e12c4692

llava : add support for moondream vision language model (#6899) vik 2024-04-25 12:38:31 -07:00
dba497e0c1

cmake : restore LLAMA_LLAMAFILE_DEFAULT Georgi Gerganov 2024-04-25 21:31:17 +03:00
fa0b4ad252

cmake : remove obsolete ANDROID check Georgi Gerganov 2024-04-25 18:59:51 +03:00
d6e1d44f16

llama : synchronize before get/set session data (#6911) slaren 2024-04-25 17:59:03 +02:00
853d06ffe2

ci : tmp disable slow tests Georgi Gerganov 2024-04-25 17:06:27 +03:00
3fe0596c18

readme : update model list (#6908) BarfingLemurs 2024-04-25 09:52:28 -04:00
0ead1f1072

llama : check that all the tensor data is in the model file (#6885) slaren 2024-04-25 15:23:47 +02:00
51543729ff

ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906) Georgi Gerganov 2024-04-25 15:48:25 +03:00
4ab99d8d47

clip : rename lerp function to avoid conflict (#6894) Daniel Bevenius 2024-04-25 14:38:14 +02:00
54770413c4

ggml : fix MIN / MAX macros (#6904) Georgi Gerganov 2024-04-25 15:12:28 +03:00
aa750c1ede

tests : minor bash stuff (#6902) Georgi Gerganov 2024-04-25 14:27:20 +03:00
1966eb2615

quantize : add '--keep-split' to quantize model into shards (#6688) jiez 2024-04-25 18:29:35 +08:00
784e11dea1

README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00
b4e4b8a935

llama : add llama_get_pooling_type function (#6862) Douglas Hanley 2024-04-24 08:10:07 -05:00
3fe847b574

server : do not apply Markdown formatting in code sections (#6850) mgroeber9110 2024-04-24 12:54:24 +02:00
37246b1031

common : revert showing control tokens by default for server (#6860) Kyle Mistele 2024-04-24 05:15:29 -05:00
28103f4832

Server: fix seed for multiple slots (#6835) Johannes Gäßler 2024-04-24 11:08:36 +02:00
c0d1b3e03e

ggml : move 32-bit arm compat in ggml-impl.h (#6865) Georgi Gerganov 2024-04-24 12:00:07 +03:00
abd3314064

llama : add phi 3 chat template (#6857) Tristan Druyen 2024-04-24 10:52:37 +02:00
3fec68be4e

convert : add support of codeqwen due to tokenizer (#6707) Junyang Lin 2024-04-24 15:16:21 +08:00
c8297c6af5

llama : add phi3 support (#6852) liuwei-git 2024-04-24 15:00:37 +08:00
4e96a812b3

[SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 flag activated (#6767) Anas Ahouzi 2024-04-23 02:53:18 +02:00
192090bae4

llamafile : improve sgemm.cpp (#6796) Justine Tunney 2024-04-22 15:00:36 -04:00
e931888d50

ggml : fix calloc argument ordering. (#6820) Dave Airlie 2024-04-23 00:05:06 +10:00
8960fe86ae

llama : fix typo in <|im_end|> token text (#6745) Georgi Gerganov 2024-04-22 15:41:11 +03:00
c0956b09ba

ci: fix job are cancelling each other (#6781) Pierrick Hymbert 2024-04-22 13:22:54 +02:00
e9b4a1bf68 flake.lock: Update github-actions[bot] 2024-04-21 00:17:47 +00:00
5cf5e7d490

build: generate hex dump of server assets during build (#6661) Olivier Chafik 2024-04-21 18:48:53 +01:00
40f74e4d73

llama : add option to render special/control tokens (#6807) Georgi Gerganov 2024-04-21 18:36:45 +03:00
b9cc76d87e

ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) Georgi Gerganov 2024-04-21 16:47:57 +03:00
7dbdba5690

llama : add llama-3 chat template (#6751) Wouter 2024-04-21 15:03:39 +02:00
c1386c936e

gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761) pmysl 2024-04-21 14:49:30 +02:00
e8d35f47cb

doc : add link to falcon (#6789) Jan Boon 2024-04-21 20:35:40 +08:00
2cca09d509

readme : add Fedora instructions (#6783) Mohammadreza Hendiani 2024-04-21 16:02:05 +03:30
89b0bf0d5d

llava : use logger in llava-cli (#6797) Justine Tunney 2024-04-21 08:19:04 -04:00
b97bc3966e

llama : support Llama 3 HF conversion (#6745) Pedro Cuenca 2024-04-21 13:50:41 +02:00
b8109bc013

doc : server tests require llama to be built with curl enabled (#6788) Jan Boon 2024-04-21 00:29:50 +08:00
aed82f6837

common : try to fix Android CI (#6780) Georgi Gerganov 2024-04-20 13:27:12 +03:00
0e4802b2ec

ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748) loonerin 2024-04-19 13:03:35 -04:00
637e9a86c2

server: static: upstream upgrade (#6765) Pierrick Hymbert 2024-04-19 13:19:01 +02:00
9958c81b79

Implement the OLMo architecture (#6741) nopperl 2024-04-19 09:35:54 +00:00
8b1b1f4982

train : add general name (#6752) Austin 2024-04-19 03:16:45 -04:00
bca40e9814

fix wrong parameter in cmd in readme-sycl.md (#6755) Neo Zhang 2024-04-19 09:16:31 +08:00
0d56246f4b

ggml : group all experts in a single ggml_mul_mat_id (#6505) slaren 2024-04-18 15:18:48 +02:00
03c0946d73

convert : support models with multiple chat templates (#6588) Sigbjørn Skjæret 2024-04-18 13:49:01 +02:00
e11b2e6e1e

Qwen2 : assume tied weights if lm_head/output weights is missing (#6738) Ren Xuancheng 2024-04-18 19:38:04 +08:00
c71bfd736e

llama : fix compatibility with old 2 expert models (#6735) slaren 2024-04-18 09:04:47 +02:00
3b8f1ec4b1

llamafile : tmp disable + build sgemm.o when needed (#6716) Georgi Gerganov 2024-04-17 23:58:26 +03:00
8dd1ec8b3f

readme : add UI (#6724) Yaroslav 2024-04-17 14:47:50 +02:00
facb8b56f8

convert : fix autoawq gemma (#6704) Zheng.Deng 2024-04-17 04:51:07 +08:00
532c1737a1

llama : make general.name optional (#6709) Georgi Gerganov 2024-04-16 23:50:38 +03:00
666867b799

ggml : fix llamafile sgemm wdata offsets (#6710) Georgi Gerganov 2024-04-16 23:50:22 +03:00
8cc91dc63c

ggml : add llamafile sgemm (#6414) Justine Tunney 2024-04-16 14:55:30 -04:00
dbceec87c0

llama : add StableLM2 12B (#6635) Ashish 2024-04-16 08:48:35 -07:00
f4dea7da18

llama : add qwen2moe (#6074) Shijie 2024-04-16 23:40:48 +08:00
8a56075b07

gritlm : add --outdir option to hf.sh script (#6699) Daniel Bevenius 2024-04-16 08:34:06 +02:00
58227ffdeb

perplexity : require positive --ctx-size arg (#6695) Georgi Gerganov 2024-04-16 09:28:33 +03:00
4fbd8098e6

gguf : add special tokens metadata for FIM/Infill (#6689) Daniel Bevenius 2024-04-16 08:13:13 +02:00
7593639ce3

main: add --json-schema / -j flag (#6659) Olivier Chafik 2024-04-15 18:35:21 +01:00
132f55795e

llama : fix restoring the number of outputs from state files (#6687) compilade 2024-04-15 08:56:55 -04:00
3272896d79

server : revert "minor layout improvements" (#6684) Pierrick Hymbert 2024-04-15 14:18:47 +02:00
7fc16a2c32

swift : linux support (#6590) Steven Prichard 2024-04-15 05:14:46 -05:00
17e98d4c96

fix mul_mat_id() for new input, make the ut pass (#6682) Neo Zhang Jianyu 2024-04-15 17:12:26 +08:00
1958f7e06c

llama : add missing kv clear in llama_beam_search (#6664) David Renshaw 2024-04-14 15:24:15 -04:00
04fbc5f23e

Add Command R chat template (#6650) Chao Jiang 2024-04-15 00:16:34 +08:00
f184dd9208

flake.lock: Update (#6669) Georgi Gerganov 2024-04-14 16:55:30 +03:00
422c2aff1c

Added support for GGML_OP_CLAMP in Metal (#6662) Dave 2024-04-14 07:14:19 -04:00
8800226d65

Fix --split-max-size (#6655) Sigbjørn Skjæret 2024-04-14 13:12:59 +02:00
e689fc4e91

[bug fix] convert github repository_owner to lowercase (#6673) Jaemin Son 2024-04-14 20:12:36 +09:00
a4ec34e1cd

convert : enable the --use-temp-file cli flag (#6645) James A Capozzoli 2024-04-14 04:40:18 -04:00
de17e3f745

fix memcpy() crash, add missed cmd in guide, fix softmax (#6622) Neo Zhang Jianyu 2024-04-14 10:42:29 +08:00
b5e7285baf

CUDA: fix matrix multiplication logic for tests (#6667) Johannes Gäßler 2024-04-14 00:21:55 +02:00
4bd0f93e4a

model: support arch DbrxForCausalLM (#6515) Pierrick Hymbert 2024-04-13 11:33:52 +02:00
ab9a3240a9

JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555) Olivier Chafik 2024-04-12 19:43:38 +01:00
fbbc030ba9

metal : unify mul_mv_id kernels (#6556) slaren 2024-04-12 18:13:20 +02:00
4cc120c744

infill : add download instructions for model (#6626) Daniel Bevenius 2024-04-12 14:11:46 +02:00
24ee66ed0d

server : coherent log output for KV cache full (#6637) Pierrick Hymbert 2024-04-12 13:49:21 +02:00
91c736015b

llama : add gguf_remove_key + remove split meta during quantize (#6591) jiez 2024-04-12 18:45:06 +08:00
5c4d767ac0

chore: Fix markdown warnings (#6625) Rene Leonhardt 2024-04-12 10:52:36 +02:00
ef21ce4ccb

imatrix : remove invalid assert (#6632) Georgi Gerganov 2024-04-12 11:49:58 +03:00
dee7f8d692

Correct free memory and total memory. (#6630) MasterYi1024 2024-04-12 16:28:12 +08:00
81da18e71c

eval-callback: use ggml_op_desc to pretty print unary operator name (#6631) Pierrick Hymbert 2024-04-12 10:26:47 +02:00
9ed2737acc

ci : disable Metal for macOS-latest-cmake-x64 (#6628) Georgi Gerganov 2024-04-12 11:15:05 +03:00
04a5ac211e

Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616) Clint Herron 2024-04-11 21:44:50 -04:00
f7001ccc5a

As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 (#6619) Clint Herron 2024-04-11 17:44:48 -04:00
a474f50ebb

Refactor Error Handling for CUDA (#6575) Nikolas 2024-04-11 21:56:29 +02:00
cbaadc9294

grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) Olivier Chafik 2024-04-11 19:47:34 +01:00
1bbdaf6ecd

ci: download artifacts to release directory (#6612) Hugo Roussel 2024-04-11 19:52:21 +02:00
f4183afe6a

scripts : add --outdir option to hf.sh (#6600) Daniel Bevenius 2024-04-11 15:22:47 +02:00
b804b1ef77

eval-callback: Example how to use eval callback for debugging (#6576) Pierrick Hymbert 2024-04-11 14:51:07 +02:00
8228b66dbc

gguf : add option to not check tensor data (#6582) Daniel Bevenius 2024-04-10 20:16:48 +02:00
b3a96f27f0

minor layout improvements (#6572) Ralph Soika 2024-04-10 19:18:25 +02:00
4f407a0a35

llama : add model types for mixtral (#6589) slaren 2024-04-10 17:24:14 +02:00
65c64dc36f

convert.py : add consolidated.safetensors for mixtral 8x22b (#6587) slaren 2024-04-10 15:23:12 +02:00
67fac4b95f

docs : how to add a model (#6565) Pierrick Hymbert 2024-04-10 08:58:48 +02:00