Commit graph

  • 6769e944c7
    k-quants : support for super-block size of 64 (#2001) Kawrakow 2023-06-26 19:43:07 +03:00
  • cbebf61ca7
    Fix assert when free invalid cuda pointer (#2005) Howard Su 2023-06-26 23:15:47 +08:00
  • 447ccbe8c3
    readme : add new roadmap + manifesto Georgi Gerganov 2023-06-25 16:08:12 +03:00
  • bd34cdde38
    ggml : sync latest ggml (custom operators) Georgi Gerganov 2023-06-25 14:25:08 +03:00
  • c2a08f87b8
    fix server sampling: top k sampler first (#1977) anon998 2023-06-25 08:48:36 +00:00
  • 66a2555ba6
    readme : add Azure CI discussion link Georgi Gerganov 2023-06-25 09:07:03 +03:00
  • e65ca7e14a
    zig : upgrade build system support (#1981) sjinzh 2023-06-25 13:45:44 +08:00
  • 5ec8dd5a3c
    #1869 Fix null reference errors when training from scratch with CUDA (#1907) Robyn 2023-06-25 04:10:29 +10:00
  • 65bdd52a86
    tests : sync test-grad0 from ggml Georgi Gerganov 2023-06-24 19:40:18 +03:00
  • fdd1860911
    flake : fix ggml-metal.metal path and run nixfmt (#1974) Rowan Hart 2023-06-24 04:07:08 -07:00
  • c943d823c1
    convert : fix invalid params in write_vocab_only (#1975) AN Long 2023-06-24 19:02:06 +08:00
  • f2c754e1c3
    ggml : improve ggml_graph_dump_dot, add ggml_format_name (#1978) slaren 2023-06-24 12:57:18 +02:00
  • 11da1a85cd
    readme : fix whitespaces Georgi Gerganov 2023-06-24 13:38:18 +03:00
  • 235b610d65
    readme : fixed termux instructions (#1973) Alberto 2023-06-24 12:32:13 +02:00
  • b061ba9e2a
    llama : fix top-p sampling to match the canonical definition (#1953) Alex Renda 2023-06-24 03:15:01 -07:00
  • 527b6fba1d
    llama : make model stateless and context stateful (llama_state) (#1797) Didzis Gosko 2023-06-24 11:47:58 +03:00
  • d7b7484f74
    Add OpenLLaMA instructions to the README (#1954) eiery 2023-06-23 04:38:01 -04:00
  • 7487137227
    rework convert.py to read hyper-parameters from config.json (#1958) Erik Scholz 2023-06-22 14:20:47 +02:00
  • bbca06e269
    cmake: revert CUDA arch default to 52, 61 if f16 (#1959) Johannes Gäßler 2023-06-21 23:49:25 +02:00
  • fb98254f99
    Fix typo in README.md (#1961) Rahul Vivek Nair 2023-06-22 03:18:43 +05:30
  • 049aa16b8c
    readme : add link to p1 Georgi Gerganov 2023-06-20 19:05:54 +03:00
  • 2322ec223a
    Fix typo (#1949) Xiake Sun 2023-06-20 05:42:40 -07:00
  • aacdbd4056
    llama : fix params struct slignment (#1936) Ettore Di Giacinto 2023-06-20 03:24:39 +02:00
  • 20568fe60f
    [Fix] Reenable server embedding endpoint (#1937) Henri Vasserman 2023-06-20 01:12:39 +03:00
  • 18b35625c3
    ggml : fix bug in LBFGS optimizer (found by ggml tests) Georgi Gerganov 2023-06-19 20:43:30 +03:00
  • ba4e85a833
    llama : use aligned memory during ggml_init call from loading saved sessions (#1934) l3utterfly 2023-06-19 23:20:06 +08:00
  • 23fc5c219a
    cmake : fix trailing whitespaces Georgi Gerganov 2023-06-19 18:18:34 +03:00
  • cb40dfca69
    llama : only use Q6_K for output weights if tensor size is multiple of 256 (#1932) Kawrakow 2023-06-19 18:17:03 +03:00
  • ca7c3f4da5
    cuda : faster k-quants on older GPUs (#1930) Kawrakow 2023-06-19 18:14:09 +03:00
  • b97ca431db
    ggml : sync latest ggml repo (#1924) Georgi Gerganov 2023-06-19 18:12:33 +03:00
  • 1e3abfcef0
    cmake : fix build shared ggml when CUDA is enabled (#1929) Howard Su 2023-06-19 23:10:37 +08:00
  • 16b9cd1939
    Convert vector to f16 for dequantize mul mat vec (#1913) Johannes Gäßler 2023-06-19 10:23:56 +02:00
  • b24c3049d9
    Added tokens per second to info prints (#1928) Johannes Gäßler 2023-06-18 17:41:26 +02:00
  • 0ede372a51
    Fixed incorrectly applying RMS norm twice (#1925) Johannes Gäßler 2023-06-18 16:07:09 +02:00
  • 8596af4277
    ggml : fix bug in ggml_compute_forward_add_q_f32 (#1918) l3utterfly 2023-06-18 19:19:16 +08:00
  • e1886cf4fe
    readme : update Android build instructions (#1922) Mike 2023-06-18 16:28:26 +08:00
  • 8ab8ba62eb
    llama : prevent usage of k-quants when tensor size is not a multiple of 256 (#1921) Kawrakow 2023-06-18 11:13:43 +03:00
  • 90cc59d6ab
    examples : fix examples/metal (#1920) Kawrakow 2023-06-18 10:52:10 +03:00
  • ce2c7d72e2
    metal : handle buffers larger than device's maxBufferLength (#1826) Georgi Gerganov 2023-06-18 09:09:47 +03:00
  • 57cd69460f
    cmake : add CUDA_ARCHITECTURES to new target ggml_static (#1917) Howard Su 2023-06-18 12:29:47 +08:00
  • b2416493ab
    make : do not print help for simple example Georgi Gerganov 2023-06-17 20:55:03 +03:00
  • 4f9c43e3bd
    minor : warning fixes Georgi Gerganov 2023-06-17 20:24:11 +03:00
  • 2c9380dd2f
    Only one CUDA stream per device for async compute (#1898) Johannes Gäßler 2023-06-17 19:15:02 +02:00
  • 051e1b0e6a
    llama : fix kv_cache n init (close #1903) Georgi Gerganov 2023-06-17 19:30:22 +03:00
  • 86c7571864
    make : update for latest Arch (#1701) DaniAndTheWeb 2023-06-17 18:17:22 +02:00
  • 3d59ec5935
    ggml : fix warnings under MSVC (#1908) Howard Su 2023-06-17 23:46:15 +08:00
  • 0711a5f6dc
    metal : add norm, cpy f16->f16, alibi kernels (#1823) Aaron Miller 2023-06-17 07:37:49 -07:00
  • fc45a81bc6
    exposed modules so that they can be invoked by nix run github:ggerganov/llama.cpp#server etc (#1863) Faez Shakil 2023-06-17 17:13:05 +05:00
  • 794db3e7b9
    Server Example Refactor and Improvements (#1570) Randall Fitzgerald 2023-06-17 07:53:04 -04:00
  • 5ddf7ea1fb
    hooks : setting up flake8 and pre-commit hooks (#1681) Jiří Podivín 2023-06-17 12:32:48 +02:00
  • bac19927c3
    readme : alternative way to build for Android with CLBlast. (#1828) Gustavo Rocha Dias 2023-06-17 06:01:06 -03:00
  • b4c6f46f17
    Allow cmake to build ggml as a library (#1896) Kerfuffle 2023-06-17 01:49:42 -06:00
  • 92f20d9942
    train : get raw text instead of page with html (#1905) David Yang 2023-06-17 14:51:54 +08:00
  • d411968e99
    opencl : support k-quants (#1836) 0cc4m 2023-06-16 20:59:49 +02:00
  • b41b4cad6f
    examples : add "simple" (#1840) SuperUserNameMan 2023-06-16 20:58:09 +02:00
  • 13fe9d2d84
    cmake : add auto detection of BLAS_INCLUDE_DIRS (#1886) Zenix 2023-06-17 03:53:04 +09:00
  • ac3b886953
    llama : fix embd when offloading non-repeating layers (#1891) Johannes Gäßler 2023-06-16 20:25:51 +02:00
  • 5b9ccaf104
    Fixed possible macro redefinition (#1892) FrankHB 2023-06-17 02:25:01 +08:00
  • 9cbf50c041
    build : fix and ignore MSVC warnings (#1889) Borislav Stanimirov 2023-06-16 21:23:53 +03:00
  • 3d01122610
    CUDA : faster k-quant dot kernels (#1862) Kawrakow 2023-06-16 20:08:44 +03:00
  • 602c748863
    gitignore : add several entries specific to Visual Studio (#1888) Borislav Stanimirov 2023-06-16 09:58:11 +03:00
  • a09f9195be
    Fixed CUDA runtime version check (#1879) Johannes Gäßler 2023-06-15 21:49:08 +02:00
  • bed9275617
    cmake : remove whitespaces Georgi Gerganov 2023-06-15 21:56:50 +03:00
  • c36e81da62
    examples : add chat-vicuna.sh (#1854) yangli2 2023-06-15 11:05:53 -07:00
  • 3559433fec
    cmake : set include path for OpenBlas (#1830) Igor Okulist 2023-06-15 12:51:26 -05:00
  • 69b34a0e80
    swift : Package compile breaks due to ggml-metal.metal (#1831) Frederik Vogel 2023-06-16 02:47:04 +09:00
  • cf267d1c71
    make : add train-text-from-scratch (#1850) daboe01 2023-06-15 19:42:48 +02:00
  • 9dda13e5e1
    readme : server compile flag (#1874) Srinivas Billa 2023-06-15 18:36:38 +01:00
  • 37e257c48e
    make : clean *.so files (#1857) sandyiscool 2023-06-15 23:06:06 +05:30
  • 64cc19b4fe
    Fix the validation of main device (#1872) Howard Su 2023-06-16 01:29:59 +08:00
  • 4bfcc855ab
    metal : parallel command buffer encoding (#1860) Georgi Gerganov 2023-06-15 20:29:48 +03:00
  • 6b8312e797
    Better error when using both LoRA + GPU layers (#1861) Johannes Gäßler 2023-06-15 19:06:46 +02:00
  • 254a7a7a5f
    CUDA full GPU acceleration, KV cache in VRAM (#1827) Johannes Gäßler 2023-06-14 19:47:19 +02:00
  • 9254920265
    baby-llama : fix operator!= (#1821) 0xspringtime 2023-06-13 15:37:54 -04:00
  • e32089b2c2
    train : improved training-from-scratch example (#1652) xaedes 2023-06-13 21:04:40 +02:00
  • 2347e45e7b
    llama : do a warm-up eval at start for better timings (#1824) Georgi Gerganov 2023-06-13 20:20:07 +03:00
  • 74d4cfa343
    Allow "quantizing" to f16 and f32 (#1787) Kerfuffle 2023-06-13 04:23:23 -06:00
  • 74a6d922f1
    Metal implementation for all k_quants (#1807) Kawrakow 2023-06-12 22:39:21 +03:00
  • e4caa8da59
    ci : run when changing only the CUDA sources (#1800) slaren 2023-06-12 19:12:47 +02:00
  • 58970a4c39
    Leverage mmap for offloading tensors to GPU (#1597) Howard Su 2023-06-12 20:44:16 +08:00
  • 8c0a10e64d
    metal : fix failure to load model (#1817) Kawrakow 2023-06-12 14:31:36 +03:00
  • fa84c4b3e8
    Fix issue where interactive mode crashes when input exceeds ctx size (#1789) Kerfuffle 2023-06-11 08:19:17 -06:00
  • 12b063f0ec
    Fixed WSL cuda's OOM error (#1594) Kyle Liang 2023-06-11 21:20:52 +08:00
  • 31d2b5f4a4
    Update SHA256SUMS with current hashes for models quantized using q4_0 (#1798) Ryan Landay 2023-06-11 17:38:53 +08:00
  • 4de0334f5c
    cmake : fix Metal build (close #1791) Georgi Gerganov 2023-06-10 22:56:53 +03:00
  • 3f1223155a
    k-quants : GCC12 compilation fix (#1792) Artyom Lebedev 2023-06-10 22:51:36 +03:00
  • 303f5809f1
    metal : fix issue with ggml-metal.metal path. Closes #1769 (#1782) Andrei 2023-06-10 10:47:34 -04:00
  • 059e99066d
    doc : fix wrong address of BLIS.md (#1772) Aisuko 2023-06-11 00:08:11 +10:00
  • 17c10acfb4
    ggml : force no_alloc == false when creating opt tensors (close #1699) Georgi Gerganov 2023-06-10 12:06:45 +03:00
  • e9b66ee982
    metal : add Q4_1 implementation (#1785) Kawrakow 2023-06-10 11:28:11 +03:00
  • 4f0154b0ba
    llama : support requantizing models instead of only allowing quantization from 16/32bit (#1691) Kerfuffle 2023-06-10 01:59:17 -06:00
  • ef3171d162
    ggml : workaround for missing _mm256_setr_m128i in GCC < 8 (#1638) Xingchen Song(宋星辰) 2023-06-10 15:49:40 +08:00
  • 555275a693
    make : add SSSE3 compilation use case (#1659) rankaiyx 2023-06-10 14:41:59 +08:00
  • 98ed165574
    OpenCL: Add release memory (#1741) Robert Sung-wook Shin 2023-06-10 01:24:40 +09:00
  • ae9663f188
    Windows nvcc workaround (#1753) Johannes Gäßler 2023-06-09 13:58:15 +02:00
  • b33dee282f
    metal : fix build "tanhf" -> "tanh" Georgi Gerganov 2023-06-09 11:11:04 +03:00
  • 92f44ff7f7
    metal : add GELU implementation (#1770) AT 2023-06-09 04:00:51 -04:00
  • 245fc3c37d
    metal : faster q4_0 (#1775) Kawrakow 2023-06-09 10:39:59 +03:00
  • 72ff5282bf
    metal : add Q2_K implementation (#1762) Kawrakow 2023-06-08 22:28:21 +03:00
  • 0bf7cf1b29
    Revert "ggml : load data into int8x16x4_t using vld4q_s8 on arm64 (#1738)" Georgi Gerganov 2023-06-08 20:48:14 +03:00