llama.cpp

ver4a/llama.cpp

Fork 0

Commit graph

53635c081c

py : add GPT4All conversion script Georgi Gerganov 2023-03-29 19:29:26 +03:00
41318d708e

llama : use the same threshold for OpenBLAS and ggml thread limiting (#577) Maël Kerbiriou 2023-03-29 18:10:07 +02:00
a6956b25a1

add example of re-act pattern (#583) Tobias Lütke 2023-03-29 17:10:24 +02:00
83df5639eb

Fix GCC warning about binary literal (#595) anzz1 2023-03-29 16:20:07 +03:00
a5c42c4b13

Fix typo in llama.h (#593) anzz1 2023-03-29 16:19:29 +03:00
5a5f8b1501

Enable Fused-Multiply-Add (FMA) and F16C/CVT16 vector extensions on MSVC (#375) anzz1 2023-03-28 22:44:29 +03:00
f1217055ea

CI: fix subdirectory path globbing (#546) anzz1 2023-03-28 22:43:25 +03:00
7f4c5c6651

llama : fix linkage with mingw (#551) anzz1 2023-03-28 21:23:09 +03:00
2a98bc18ea

ggml : add AVX2 implementation of quantize_row_q4_1 (#515) slaren 2023-03-28 20:06:03 +02:00
d0aaff571c

py : add temporary script to convert old ggml files to newer version (#539) thement 2023-03-28 19:55:42 +02:00
d0330fd783

py : add capabiliy to convert from ggml back to torch or hf format for further consumption/training/finetuning (#403) Tai Duc Nguyen 2023-03-28 13:51:29 -04:00
99c5b27654

ggml : refactor quantized processing functions (#509) Stephan Walter 2023-03-28 17:13:01 +00:00
692ce3164e

py : removed unused model variable and verified that the code functions correctly with vocab_only setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (#547) DooWoong Lee (David) 2023-03-29 02:02:34 +09:00
96f9c0506f

ci : make ctest verbose, hopefully we see what is wrong with the sanitizer Georgi Gerganov 2023-03-28 20:01:09 +03:00
d502bc7c9d

tests : free llama context at the end of the test Georgi Gerganov 2023-03-28 19:51:55 +03:00
436e561931

all : be more strict about converting float to double (#458) Stephan Walter 2023-03-28 16:48:20 +00:00
20e1e84884

deploy : add a Package.swift for SwiftPM support (#393) Jed Fox 2023-03-28 11:39:01 -05:00
c1f885067c

ggml : introduce structs for the q4 data blocks (#356) Stephan Walter 2023-03-28 15:56:03 +00:00
e0670260fb

gitignore : add "embedding" Georgi Gerganov 2023-03-28 18:34:35 +03:00
28ba975aea

Check the existence of f16_model_path_base in quantize.py (#574) dotpy314 2023-03-28 23:06:28 +08:00
a6bdc47cba

Fix usage of F16C intrinsics in AVX code (#563) slaren 2023-03-28 16:26:55 +02:00
7b8dbcb78b

main.cpp fixes, refactoring (#571) anzz1 2023-03-28 17:09:55 +03:00
4b8efff0e3

Add embedding example to Makefile (#540) RJ Adriaansen 2023-03-28 08:11:09 +02:00
7e5395575a

Fix missing ggml link in cmake for examples/* on w64-mingw32 (#542) Marco Matthies 2023-03-27 06:55:26 +02:00
34c1072e49

ci: add debug build to sanitizer build matrix (#527) Erik Scholz 2023-03-26 17:48:40 +02:00
939ad2d3a5

Fix undefined variables in debug build, remove unused variables (#531) Stephan Walter 2023-03-26 15:34:02 +00:00
8c2ec5e21d

Add support for linux/arm64 platform during Docker Builds (#514) Juan Calderon-Perez 2023-03-26 10:48:42 -04:00
b391579db9

Update README and comments for standalone perplexity tool (#525) Stephan Walter 2023-03-26 13:14:01 +00:00
7a87d31f4f

[main] fix infinite generation (-n == -1) (#523) anzz1 2023-03-26 16:06:10 +03:00
348d6926ee

Add logo to README.md Georgi Gerganov 2023-03-26 10:20:49 +03:00
33e35b8fe8

Exit from interactive mode if input stream is bad (#491) Harald Fernengel 2023-03-26 07:25:46 +02:00
19726169b3

CI: Run other sanitizer builds even if one fails (#511) anzz1 2023-03-26 00:13:28 +02:00
f732695cd5

Clarify console output in convert-pth-to-ggml.py (#512) jp-x-g 2023-03-25 14:53:55 -07:00
2f7bf7dd7c

CMake / CI additions (#497) anzz1 2023-03-25 23:38:11 +02:00
34ab526843

(Windows) Set console to UTF-8 on init (#420) anzz1 2023-03-25 22:29:22 +02:00
c2b25b6912

Fix colors enabling on WIN32 Georgi Gerganov 2023-03-25 21:53:39 +02:00
79b2b266db

If n_predict == -1, generate forever Georgi Gerganov 2023-03-25 21:51:41 +02:00
e2d490dafd

Inifinite generation via context swapping (#71) Georgi Gerganov 2023-03-25 21:36:22 +02:00
03f7e33560

Cleanup STL headers + fix embedding examples + minor stuff Georgi Gerganov 2023-03-25 20:51:14 +02:00
55ad42af84

Move chat scripts into "./examples" Georgi Gerganov 2023-03-25 20:36:52 +02:00
459e93cce0

Add AVX2 implementation of dequantize_row_q4_1 (#505) slaren 2023-03-25 19:31:48 +01:00
a316a425d0

Overhaul the examples structure Georgi Gerganov 2023-03-25 20:26:40 +02:00
ecbe466a36

Retire the ggml_mul_mat() branch for transposed src0 (#500) Georgi Gerganov 2023-03-25 19:47:21 +02:00
502a400192

Disable prompt verbosity by default and add option to enable (#480) Georgi Gerganov 2023-03-25 17:16:50 +02:00
09aecbf628

Add AVX2 implementation of dequantize_row_q4_0 (#467) slaren 2023-03-25 16:06:49 +01:00
4640eff23d

Don't interefe with BLAS for large prompts by running only 1 thread Georgi Gerganov 2023-03-25 17:03:10 +02:00
ab77d76312

Add longer DAN prompt for testing big batch numbers Georgi Gerganov 2023-03-25 16:47:59 +02:00
29b7baab67

Add timings for the prompt evaluation (#478) slaren 2023-03-25 15:34:23 +01:00
4a7129acd2

Remove obsolete information from README Georgi Gerganov 2023-03-25 16:30:32 +02:00
6b6dbc8910

Remove obsolete assert and fix compiler warning Georgi Gerganov 2023-03-25 16:22:05 +02:00
2a2e63ce05

Fix nasty bug in ggml_compute_forward_mul_mat_f32() and reenable BLAS Georgi Gerganov 2023-03-25 16:09:54 +02:00
e899bf54b2

bounds checking for input prefix (#492) anzz1 2023-03-25 14:42:09 +02:00
fbd4d38c64

feat: '--in-prefix STRING' option (#426) anzz1 2023-03-25 14:03:19 +02:00
58e6c9f36f

Add support for file load progress reporting callbacks (#434) Jed Fox 2023-03-25 01:26:28 -04:00
36d07532ef

Add missing struct annotation (#483) Doomsdayrs 2023-03-25 01:21:24 -04:00
6f1ee4b640

Fix crash for 65B model with pre-allocated memory (#485) Chris Kuehl 2023-03-24 23:38:14 -05:00
8520fc310e

Disable BLAS altogether - the bug is not just for qunatized mat mul Georgi Gerganov 2023-03-24 23:47:06 +02:00
b3f460e941

Disable BLAS branch in mul_mat - seems there is a bug Georgi Gerganov 2023-03-24 23:39:17 +02:00
04c6f5ed6f

Immediately start processing the prompt before user input has been provided (#476) Georgi Gerganov 2023-03-24 23:17:58 +02:00
7a9b6c3a8b

Reduce memory usage and allocate enough memory for largest context (#473) Georgi Gerganov 2023-03-24 23:17:37 +02:00
31572d9665

Temporary bump the memory buffer size - hopefully fix issues from 483bab2e Georgi Gerganov 2023-03-24 18:23:56 +02:00
f4f5362edb

Update README.md (#444) Gary Mulder 2023-03-24 15:23:09 +00:00
863f65e2e3

fix instruct mode (#445) rabidcopy 2023-03-24 10:22:39 -05:00
afd220d9c6

Properly free llama_context on failure Georgi Gerganov 2023-03-24 17:21:01 +02:00
481044d50c

additional optimizations for POWER9 (#454) Cameron Kaiser 2023-03-24 08:19:26 -07:00
563cdc391d

Support calling mlock() on loaded model data on Linux and macOS (#453) comex 2023-03-24 08:19:05 -07:00
8d4a855c24

Add embedding mode with arg flag. Currently working (#282) Luciano 2023-03-24 08:05:13 -07:00
b6b268d441

Add link to Roadmap discussion Georgi Gerganov 2023-03-24 09:13:35 +02:00
3cd8dde0d1 Revert "Fix memory allocation issues and seg faults" Georgi Gerganov 2023-03-24 06:22:28 +02:00
4870e455b3

Fix memory allocation issues and seg faults Georgi Gerganov 2023-03-24 00:11:53 +02:00
483bab2e3d

Avoid the transposed X branch in the Z = X * Y matrix multiplication (#439) Georgi Gerganov 2023-03-23 23:22:01 +02:00
404e1da38e

Fix quantize script not finding models in parent directory (#428) Jed Fox 2023-03-23 16:42:52 -04:00
4cc053b6d5

Remove oboslete command from Docker script Georgi Gerganov 2023-03-23 22:39:44 +02:00
0ba5a3a9a5

Obsolete Georgi Gerganov 2023-03-23 22:32:02 +02:00
2e17dfd80a

Replace EOS with newline to prevent context/memory being flushed by EOS in interactive mode (#333) rabidcopy 2023-03-23 15:22:47 -05:00
20a1a4e09c

Fix GPTQ converter (#423) Timmy Knight 2023-03-23 10:18:13 -10:00
ad072fc5ad

Generate library with CMake (#430) nusu-github 2023-03-24 05:16:48 +09:00
ea10d3ded2

Command line args bounds checking (#424) anzz1 2023-03-23 19:54:28 +02:00
a18c19259a Fix Nix build Ben Siraphob 2023-03-22 00:37:02 -05:00
a50e39c6fe

Revert "Delete SHA256SUMS for now" (#429) Stephan Walter 2023-03-23 14:15:48 +00:00
a140219e81

Fix Makefile echo escape codes (by removing them). (#418) Kerfuffle 2023-03-23 05:41:32 -06:00
8a3e5ef801

Move model section from issue template to README.md (#421) Gary Mulder 2023-03-23 11:30:40 +00:00
8eea5ae0e5

Delete SHA256SUMS for now (#416) anzz1 2023-03-23 12:26:19 +02:00
93208cfb92

Adjust repetition penalty .. Georgi Gerganov 2023-03-23 10:46:58 +02:00
03ace14cfd

Add link to recent podcast about whisper.cpp and llama.cpp Georgi Gerganov 2023-03-23 09:48:51 +02:00
e4412b45e3

CI: CMake: Separate build and test steps (#376) anzz1 2023-03-23 04:20:34 +02:00
f7dc43bc0d

Fix instruct mode broken by PR #354 (#409) tjohnman 2023-03-23 01:30:23 +01:00
ee8a788786

Update issue template so people will use it (#404) Gary Mulder 2023-03-22 19:06:18 +00:00
69c92298a9

Deduplicate q4 quantization functions (#383) Stephan Walter 2023-03-22 17:29:06 +00:00
97940520e8

fix: add POSIX functionality for Linux compilation (#51) Valentyn Bezshapkin 2023-03-22 18:20:25 +01:00
305ba6f0e6

Don't force immediate interactive without -i (#354) tjohnman 2023-03-22 18:16:35 +01:00
4122dffff9

cmake: make llama an actual library (#392) Erik Scholz 2023-03-22 17:37:10 +01:00
56e659a0b2

fix perplexity after c-api refactor (#390) Erik Scholz 2023-03-22 17:09:38 +01:00
40ea807a97

Add details on perplexity to README.md (#395) Gary Linscott 2023-03-22 08:53:54 -07:00
d5850c53ca

Add missing header for memcpy (#386) Yusuf Kağan Hanoğlu 2023-03-22 11:55:45 +03:00
ae44e23ee3

When seed <= 0 - use the clock to generate one Georgi Gerganov 2023-03-22 07:47:15 +02:00
928480ef5b

Init llama_context_params properly from CLI (#370) Georgi Gerganov 2023-03-22 07:45:00 +02:00
56817b1f88

Remove temporary notice and update hot topics Georgi Gerganov 2023-03-22 07:34:02 +02:00
f5a77a629b

Introduce C-style API (#370) Georgi Gerganov 2023-03-22 07:32:36 +02:00
da0e9fe90c Add SHA256SUMS file and instructions to README how to obtain and verify the downloads Gary Mulder 2023-03-20 20:14:06 +00:00