llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

0f07cacb05

ggml : fix q4_1 dot product types Georgi Gerganov 2023-04-14 09:45:42 +03:00
c5d70f5c9e

ggml : optimize rope function to avoid call powf in the tight loop (#807) Howard Su 2023-04-14 14:24:52 +08:00
be87b6ed20

perplexity : add support for batch size to --perplexity (#407) Gary Linscott 2023-04-13 14:50:42 -07:00
0e07e6a839

common : remove unnecessary includes (#947) CRD716 2023-04-13 10:39:25 -05:00
a3a2a0eda8

ggml : add GGML_DEFAULT_N_THREADS Georgi Gerganov 2023-04-13 18:36:40 +03:00
d990e3fffc

ggml : speed-up ggml_vec_dot_q4_1() ARM_NEON + 32-bit ARM support (#900) Georgi Gerganov 2023-04-13 18:32:36 +03:00
9190e8eac8

llama : merge llama_internal.h into llama.h Georgi Gerganov 2023-04-13 18:04:45 +03:00
c85980acd0

gitignore : benchmark Georgi Gerganov 2023-04-13 18:01:22 +03:00
6232f2d7fd

ggml : optimize non-SIMD Q4_0 vector dot product (#703) Stephan Walter 2023-04-13 14:59:50 +00:00
6c248707f5

ggml : introduce GGML_ALIGNED_MALLOC/GGML_ALIGNED_FREE macros (#884) Pavol Rusnak 2023-04-13 16:08:32 +02:00
8cda5c981d

fix whitespace (#944) CRD716 2023-04-13 09:03:57 -05:00
ec29272175

readme : remove python 3.10 warning (#929) CRD716 2023-04-13 08:59:53 -05:00
7e941b95eb

readme : llama node binding (#911) Genkagaku.GPT 2023-04-13 21:54:27 +08:00
c729ff730a

flake.nix: add all binaries from bin (#848) Pavol Rusnak 2023-04-13 15:49:05 +02:00
4579af95e8

zig : update build.zig (#872) Judd 2023-04-13 21:43:22 +08:00
8c3ffc2f04

ggml : update cblas_sgemm columns var to be more reasonable (#838) Vladimir 2023-04-13 15:24:30 +02:00
107980d970

examples : add -n to alpaca and gpt4all scripts (#706) niansa/tuxifan 2023-04-13 15:03:39 +02:00
585d91a156

cmake : add explicit F16C option (x86) (#576) anzz1 2023-04-13 15:48:21 +03:00
95ea26f6e9

benchmark : add tool for timing q4_0 matrix multiplication (#653) SebastianApel 2023-04-13 14:46:23 +02:00
82d146df9b

do not force the prompt file to end with a new line (#908) Pavol Rusnak 2023-04-13 11:33:16 +02:00
e7f6997f89

Don't crash on ftype (formerly f16) == 4 (#917) Stephan Walter 2023-04-12 15:06:16 +00:00
f76cb3a34d

readme : change "GPU support" link to discussion Georgi Gerganov 2023-04-12 14:48:57 +03:00
782438070f

readme : update hot topics with link to "GPU support" issue Georgi Gerganov 2023-04-12 14:31:12 +03:00
4dbbd40750

readme: link to sha256sums file (#902) Nicolai Weitkemper 2023-04-12 08:46:20 +02:00
8b679987cd

Fix whitespace, add .editorconfig, add GitHub workflow (#883) Pavol Rusnak 2023-04-11 21:45:44 +02:00
3e6e70d8e8

Add enum llama_ftype, sync ggml_type to model files (#709) Stephan Walter 2023-04-11 15:03:51 +00:00
2663d2c678

Windows fixes (#890) comex 2023-04-11 06:19:54 -07:00
a0caa34b16

Add BAIR's Koala to supported models (#877) qouoq 2023-04-11 04:41:53 +08:00
461ba9e66e

ggml : fix WASM build Georgi Gerganov 2023-04-10 23:20:01 +03:00
c3ac702e5e

ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst Georgi Gerganov 2023-04-10 22:40:28 +03:00
9d634ef452

ggml : remove trailing whitespaces Georgi Gerganov 2023-04-10 19:32:45 +03:00
d9a239c410

Simplify to include lower-case windows.h always, fix compile on mingw32 (#747) Marco Matthies 2023-04-10 19:57:59 +02:00
684da25926

ggml : fix quantize_row_q4_1() ARM_NEON (close #876) Georgi Gerganov 2023-04-10 19:29:48 +03:00
180b693a47 Print model version. comex 2023-04-08 13:08:21 -07:00
f963b63afa Rewrite loading code to try to satisfy everyone: comex 2023-04-08 12:24:37 -07:00
aaf3b23deb

fix for windows utf-8 input (#840) Tomáš Pazdiora 2023-04-08 17:49:39 +02:00
f2d1c47294

cmake should link openblas properly with -lopenblas like how it's done in the makefile (#839) eiery 2023-04-08 07:15:17 -04:00
317fb12fbd

Add new binaries to flake.nix (#847) lon 2023-04-08 07:04:23 -03:00
62cfc54f77

Add quantize-stats command for testing quantization (#728) unbounded 2023-04-08 00:09:18 +02:00
698f7b5d63

make : add libllama.so target for llama-cpp-python (#797) bhubbb 2023-04-08 02:11:58 +10:00
c1950c3431

zig : don't link examples/common.cpp for non-example (#814) iacore 2023-04-07 16:05:29 +00:00
4953e9007f

llama : always sort logits before nucleus sampling (#812) Ivan Stepanov 2023-04-07 19:02:12 +03:00
cc9cee8e9e

Do not crash when it has nothing to say. (#796) Sergey Alirzaev 2023-04-06 17:59:11 +02:00
d2beca95dc

Make docker instructions more explicit (#785) Pavol Rusnak 2023-04-06 08:56:58 +02:00
eeaa7b0492

ggml : multi-thread ggml_rope() (~3-4 times faster on M1) (#781) Georgi Gerganov 2023-04-05 22:11:03 +03:00
986b6ce9f9

ggml, llama : avoid heavy V transpose + improvements (#775) Georgi Gerganov 2023-04-05 22:07:33 +03:00
3416298929

Update README.md Georgi Gerganov 2023-04-05 19:54:30 +03:00
5a8c4f6240

llama : define non-positive top_k; top_k range check (#779) Ivan Stepanov 2023-04-05 19:20:05 +03:00
ff05d05c96

miku.sh : add executable bit (#780) at8u 2023-04-05 15:59:13 +00:00
62b3e81aae

media : add logos and banners Georgi Gerganov 2023-04-05 18:58:06 +03:00
8d10406d6e

readme : change logo + add bindings + add uis + add wiki Georgi Gerganov 2023-04-05 18:56:20 +03:00
ed1c214e66

zig : add build.zig (#773) iacore 2023-04-05 15:06:02 +00:00
0c44427df1

make : missing host optimizations in CXXFLAGS (#763) Ivan Stepanov 2023-04-05 17:38:37 +03:00
594cc95fab

readme : update with CMake and windows example (#748) Adithya Balaji 2023-04-05 16:36:12 +02:00
88ed5761b8

examples : add Miku.sh (#724) at8u 2023-04-05 14:32:42 +00:00
58c438cf7d

Add Accelerate/BLAS when using Swift (#765) Andrew Duffy 2023-04-05 11:44:24 +01:00
53dbba7695

Windows: reactive sigint handler after each Ctrl-C (#736) mgroeber9110 2023-04-03 18:00:55 +02:00
437e77855a

10+% performance improvement of ggml_vec_dot_q4_0 on AVX2 (#654) SebastianApel 2023-04-03 09:52:28 +02:00
cd7fa95690

Define non-positive temperature behavior (#720) Ivan Stepanov 2023-04-03 03:19:04 +03:00
a0c0516416

Remove torch GPU dependencies from the Docker.full image (#665) bsilvereagle 2023-04-02 15:13:03 -07:00
d8d4e865cd

Add a missing step to the gpt4all instructions (#690) Thatcher Chamberlin 2023-04-02 06:48:57 -04:00
e986f94829

Added api for getting/setting the kv_cache (#685) Christian Falch 2023-04-02 12:23:04 +02:00
c0bb1d3ce2

ggml : change ne to int64_t (#626) Marian Cepok 2023-04-02 12:21:31 +02:00
6e7801d08d

examples : add gpt4all script (#658) Leonardo Neumann 2023-04-02 04:56:20 -03:00
81040f10aa

llama : do not allocate KV cache for "vocab_only == true" (#682) Stephan Walter 2023-04-02 07:18:53 +00:00
c4f89d8d73

make : use -march=native -mtune=native on x86 (#609) Fabian 2023-04-02 09:17:05 +02:00
5b70e7de4c

fix default params for examples/main (#697) Murilo Santana 2023-04-01 23:41:12 -03:00
a717cba844

py: huggingface -> Hugging Face (#686) Ikko Eltociear Ashimine 2023-04-02 01:38:18 +09:00
d0a7f742e7

readme: replace termux links with homepage, play store is deprecated (#680) rimoliga 2023-04-01 11:57:30 -03:00
0d054e292e Show error message when -f fails Slaren 2023-03-31 20:03:48 +02:00
3525899277

Enable -std= for cmake builds, fix warnings (#598) Stephan Walter 2023-03-31 19:19:16 +00:00
1d08882afa

Optimize AVX2 ggml_vec_dot_q4_0 (#642) slaren 2023-03-31 17:55:52 +02:00
02c5b27e91

Add AVX acceleration (#617) perserk 2023-03-31 16:55:44 +05:00
cbef542879 py : cleanup the code Pavol Rusnak 2023-03-29 21:31:24 +02:00
9733104be5 drop quantize.py (now that models are using a single file) Pavol Rusnak 2023-03-31 00:52:06 +02:00
3df890aef4

readme : update supported models Georgi Gerganov 2023-03-30 22:31:54 +03:00
ee0c40dd6d Introduce GGML migration tool for new file format Justine Tunney 2023-03-30 05:42:56 -07:00
6f23ba5ee2 Ensure --mlock works properly with mmap() support Justine Tunney 2023-03-30 01:53:36 -07:00
78ca9838ee Make loading weights 10-100x faster Justine Tunney 2023-03-29 13:51:37 -07:00
a017390358 Initial windows support (untested) Slaren 2023-03-29 22:22:36 +02:00
ac184d5147 Always initialize mm_addr and mm_length in llama_model Slaren 2023-03-29 08:53:14 +02:00
276e5b7811 Unmap the file in llama_free Slaren 2023-03-29 08:31:26 +02:00
d68c5dc435 Make mmap_file static Slaren 2023-03-29 06:18:18 +02:00
64bde3ffd4 Fix ggml_init_params in quantize Slaren 2023-03-29 05:38:57 +02:00
c03ae8dca1 Add mmap support for model files Slaren 2023-03-29 02:03:43 +02:00
3bcc129ba8

cmake : properly invoke CTest (#629) Stephan Walter 2023-03-30 17:56:59 +00:00
a4755cf288

Remove unused variable (#607) Casey Primozic 2023-03-30 10:53:35 -07:00
1f0414feec

make : fix darwin f16c flags check (#615) david raistrick 2023-03-30 13:34:45 -04:00
77efdf5a50

ggml : fix NEON signs (close #620, #622) Georgi Gerganov 2023-03-30 20:27:32 +03:00
ed3c680bcd

Fix GGML_F32Cx8_STORE in AVX without F16C path (#619) slaren 2023-03-30 11:16:30 +02:00
9cbc404ba6

ci : re-enable AVX512 testing (Windows-MSVC) (#584) anzz1 2023-03-29 23:44:39 +03:00
b51c717d5c

ggml : init time on first ggml_init() call Georgi Gerganov 2023-03-29 22:15:34 +03:00
0ba76c1e73

llama : fix compile warnings when reading the vocab Georgi Gerganov 2023-03-29 22:13:12 +03:00
cea1c85948

ggml : add ARM_NEON dequantize_row_q4_1() Georgi Gerganov 2023-03-29 22:10:01 +03:00
f202ada131

ggml : add ARM_NEON quantize_row_q4_1() Georgi Gerganov 2023-03-29 22:03:02 +03:00
3b44d30d9b

ggml : add ARM_NEON ggml_vec_dot_q4_1() Georgi Gerganov 2023-03-29 21:47:33 +03:00
61cbfff5c9

rename convert_ggml_to_pth.py -> convert-ggml-to-pth.py (#600) Pavol Rusnak 2023-03-29 20:09:25 +02:00
d9ad104440

Create chat-13B.bat (#592) Thérence 2023-03-29 19:21:09 +02:00
b467702b87

readme : fix typos Georgi Gerganov 2023-03-29 19:38:31 +03:00
516d88e75c

readme : add GPT4All instructions (close #588) Georgi Gerganov 2023-03-29 19:37:20 +03:00