Commit graph

  • a6803cab94
    flake : add runHook preInstall/postInstall to installPhase so hooks function (#2224) Dave Della Costa 2023-07-14 15:13:38 -04:00
  • 7dabc66f3c
    make : use pkg-config for OpenBLAS (#2222) wzy 2023-07-15 03:05:08 +08:00
  • 7cdd30bf1f
    cuda : allocate all temporary ggml_tensor_extra_gpu from a fixed-size buffer (#2220) Bach Le 2023-07-15 03:00:58 +08:00
  • e8035f141e
    ggml : fix static_assert with older compilers #2024 (#2218) Evan Miller 2023-07-14 14:55:56 -04:00
  • 7513b7b0a1
    llama : add functions that work directly on model (#2197) Bach Le 2023-07-15 02:55:24 +08:00
  • de8342423d
    build.zig : install config header (#2216) Ali Chraghi 2023-07-14 11:50:58 -07:00
  • c48c525f87
    examples : fixed path typos in embd-input (#2214) Shangning Xu 2023-07-15 02:40:05 +08:00
  • 206e01de11
    cuda : support broadcast add & mul (#2192) Jiahao Li 2023-07-15 02:38:24 +08:00
  • 4304bd3cde
    CUDA: mul_mat_vec_q kernels for k-quants (#2203) Johannes Gäßler 2023-07-14 19:44:08 +02:00
  • 229aab351c
    make : fix combination of LLAMA_METAL and LLAMA_MPI (#2208) James Reynolds 2023-07-14 11:34:40 -06:00
  • 697966680b
    ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) Georgi Gerganov 2023-07-14 16:36:41 +03:00
  • 27ad57a69b
    Metal: faster Q4_0 and Q4_1 matrix x vector kernels (#2212) Kawrakow 2023-07-14 12:46:21 +03:00
  • 32c5411631
    Revert "Support using mmap when applying LoRA (#2095)" (#2206) Howard Su 2023-07-13 21:58:25 +08:00
  • ff5d58faec
    Fix compile error on Windows CUDA (#2207) Howard Su 2023-07-13 21:58:09 +08:00
  • b782422a3e
    devops : add missing quotes to bash script (#2193) Bodo Graumann 2023-07-13 15:49:14 +02:00
  • 1cbf561466
    metal : new q4_0 matrix-vector kernel (#2188) Shouzheng Liu 2023-07-12 16:10:55 -04:00
  • 975221e954
    ggml : broadcast mul_mat + conv batch support (#2199) Georgi Gerganov 2023-07-12 20:51:29 +03:00
  • 4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d Georgi Gerganov 2023-07-12 20:27:03 +03:00
  • 680e6f9177 cuda : add gelu support Georgi Gerganov 2023-07-12 20:26:18 +03:00
  • 4e7464ef88
    FP16 is supported in CM=6.0 (#2177) Howard Su 2023-07-12 20:18:40 +08:00
  • 2b5eb72e10
    Fixed __dp4a compute capability: 6.0 -> 6.1 (#2189) Johannes Gäßler 2023-07-12 10:38:52 +02:00
  • f7d278faf3
    ggml : revert CUDA broadcast changes from #2183 (#2191) Georgi Gerganov 2023-07-12 10:54:19 +03:00
  • 20d7740a9b
    ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) Georgi Gerganov 2023-07-11 22:53:34 +03:00
  • 5bf2a27718
    ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178) Spencer Sutton 2023-07-11 12:31:10 -04:00
  • c9c74b4e3f
    llama : add classifier-free guidance (#2135) Bach Le 2023-07-12 00:18:43 +08:00
  • 3ec7e596b2
    docker : add '--server' option (#2174) Jinwoo Jeong 2023-07-12 01:12:35 +09:00
  • 917831c63a
    readme : fix zig build instructions (#2171) Chad Brewbaker 2023-07-11 11:03:06 -05:00
  • 2347463201
    Support using mmap when applying LoRA (#2095) Howard Su 2023-07-11 22:37:01 +08:00
  • bbef28218f
    Possible solution to allow K-quants on models with n_vocab!=32000 (#2148) LostRuins 2023-07-11 22:01:08 +08:00
  • 5656d10599
    mpi : add support for distributed inference via MPI (#2099) Evan Miller 2023-07-10 11:49:56 -04:00
  • 1d16309969
    llama : remove "first token must be BOS" restriction (#2153) oobabooga 2023-07-09 05:59:53 -03:00
  • db4047ad5c
    main : escape prompt prefix/suffix (#2151) Nigel Bosch 2023-07-09 03:56:18 -05:00
  • 18780e0a5e
    readme : update Termux instructions (#2147) JackJollimore 2023-07-09 05:20:43 -03:00
  • 3bbc1a11f0
    ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115) clyang 2023-07-09 16:12:20 +08:00
  • 2492a53fd0
    readme : add more docs indexes (#2127) rankaiyx 2023-07-09 15:38:42 +08:00
  • 64639555ff
    Fixed OpenLLaMA 3b CUDA mul_mat_vec_q (#2144) Johannes Gäßler 2023-07-08 20:01:44 +02:00
  • 061f5f8d21
    CUDA: add __restrict__ to mul mat vec kernels (#2140) Johannes Gäßler 2023-07-08 00:25:15 +02:00
  • 84525e7962
    docker : add support for CUDA in docker (#1461) dylan 2023-07-07 11:25:25 -07:00
  • a7e20edf22
    ci : switch threads to 1 (#2138) Georgi Gerganov 2023-07-07 21:23:57 +03:00
  • 1d656d6360
    ggml : change ggml_graph_compute() API to not require context (#1999) Qingyou Meng 2023-07-08 00:24:01 +08:00
  • 7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) Georgi Gerganov 2023-07-07 18:36:37 +03:00
  • 3e08ae99ce
    convert.py: add mapping for safetensors bf16 (#1598) Aarni Koskela 2023-07-07 16:12:49 +03:00
  • 481f793acc
    Fix opencl by wrap #if-else-endif with \n (#2086) Howard Su 2023-07-07 11:34:18 +08:00
  • dfd9fce6d6
    ggml : fix restrict usage Georgi Gerganov 2023-07-06 19:41:31 +03:00
  • 36680f6e40
    convert : update for baichuan (#2081) Judd 2023-07-07 00:23:49 +08:00
  • a17a2683d8
    alpaca.sh : update model file name (#2074) tslmy 2023-07-06 09:17:50 -07:00
  • 31cfbb1013
    Expose generation timings from server & update completions.js (#2116) Tobias Lütke 2023-07-05 16:51:13 -04:00
  • 983b555e9d
    Update Server Instructions (#2113) Jesse Jojo Johnson 2023-07-05 18:03:19 +00:00
  • ec326d350c
    ggml : fix bug introduced in #1237 Georgi Gerganov 2023-07-05 20:44:11 +03:00
  • 1b6efeab82
    tests : fix test-grad0 Georgi Gerganov 2023-07-05 20:20:05 +03:00
  • 1b107b8550
    ggml : generalize quantize_fns for simpler FP16 handling (#1237) Stephan Walter 2023-07-05 16:13:06 +00:00
  • 8567c76b53
    Update server instructions for web front end (#2103) Jesse Jojo Johnson 2023-07-05 15:13:35 +00:00
  • 924dd22fd3
    Quantized dot products for CUDA mul mat vec (#2067) Johannes Gäßler 2023-07-05 14:19:42 +02:00
  • 051c70dcd5
    llama: Don't double count the sampling time (#2107) Howard Su 2023-07-05 18:31:23 +08:00
  • 9e4475f5cf
    Fixed OpenCL offloading prints (#2082) Johannes Gäßler 2023-07-05 08:58:05 +02:00
  • 7f0e9a775e
    embd-input: Fix input embedding example unsigned int seed (#2105) Nigel Bosch 2023-07-04 18:33:33 -05:00
  • b472f3fca5
    readme : add link web chat PR Georgi Gerganov 2023-07-04 22:25:22 +03:00
  • ed9a54e512
    ggml : sync latest (new ops, macros, refactoring) (#2106) Georgi Gerganov 2023-07-04 21:54:11 +03:00
  • f257fd2550
    Add an API example using server.cpp similar to OAI. (#2009) jwj7140 2023-07-05 03:06:12 +09:00
  • 7ee76e45af
    Simple webchat for server (#1998) Tobias Lütke 2023-07-04 10:05:27 -04:00
  • acc111caf9
    Allow old Make to build server. (#2098) Henri Vasserman 2023-07-04 15:38:04 +03:00
  • 23c7c6fc91
    Update Makefile: clean simple (#2097) ZhouYuChen 2023-07-04 20:15:16 +08:00
  • 698efad5fb
    CI: make the brew update temporarily optional. (#2092) Erik Scholz 2023-07-04 01:50:12 +02:00
  • 14a2cc71f6
    [ggml] fix index for ne03 value in ggml_cl_mul_f32 (#2088) Govlzkoy 2023-07-04 07:50:00 +08:00
  • 1cf14ccef1
    fix server crashes (#2076) Henri Vasserman 2023-07-04 00:05:23 +03:00
  • cc45a7feb8
    Fix crash of test-tokenizer-0 under Debug build (#2064) Howard Su 2023-07-04 02:43:55 +08:00
  • 55dbb915cc
    [llama] No need to check file version when loading vocab score (#2079) Howard Su 2023-07-03 19:58:58 +08:00
  • d7d2e6a0f0
    server: add option to output probabilities for completion (#1962) WangHaoranRobin 2023-07-03 05:38:44 +08:00
  • 46088f7231 ggml : fix build with OpenBLAS (close #2066) Georgi Gerganov 2023-07-02 09:46:46 +03:00
  • 0bc2cdfc87
    Better CUDA synchronization logic (#2057) Johannes Gäßler 2023-07-01 21:49:44 +02:00
  • befb3a3562
    Test-based VRAM scratch size + context adjustment (#2056) Johannes Gäßler 2023-07-01 21:47:26 +02:00
  • b213227067
    cmake : don't force -mcpu=native on aarch64 (#2063) Daniel Drake 2023-07-01 20:31:44 +02:00
  • 2f8cd979ec
    metal : release buffers when freeing metal context (#2062) Aaron Miller 2023-07-01 11:14:59 -07:00
  • 471aab6e4c
    convert : add support of baichuan-7b (#2055) Judd 2023-07-02 01:00:25 +08:00
  • 463f2f4c4f
    llama : fix return value of llama_load_session_file_internal (#2022) Georgi Gerganov 2023-07-01 19:05:09 +03:00
  • cb44dbc7de
    llama : catch llama_load_session_file_internal exceptions (#2022) Rand Xie 2023-07-02 00:02:58 +08:00
  • 79f634a19d
    embd-input : fix returning ptr to temporary Georgi Gerganov 2023-07-01 18:46:00 +03:00
  • 04606a1599
    train : fix compile warning Georgi Gerganov 2023-07-01 18:45:44 +03:00
  • b1ca8f36a9
    ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995) Qingyou Meng 2023-07-01 23:42:43 +08:00
  • b8c8dda75f
    Use unsigned for random seed (#2006) Howard Su 2023-06-29 21:15:15 +08:00
  • 96a712ca1b
    Porting the improved K-Quant CUDA kernels to OpenCL (#1966) LostRuins 2023-06-29 11:56:43 +08:00
  • d3494bb86b
    llama : replacing auto &kv with const auto &kv (#2041) m3ndax 2023-06-28 20:39:08 +02:00
  • 5b351e94d0
    cuda : remove nchannels_x argument from mul_mat_vec_nc_f16_f32 (#2028) Salvador E. Tropea 2023-06-28 14:27:31 -03:00
  • 6432aabb6d
    cuda : fix missing const qualifier in casts (#2027) Salvador E. Tropea 2023-06-28 14:26:26 -03:00
  • b922bc351b
    llama : remove shards weight file support (#2000) Howard Su 2023-06-28 10:13:02 -07:00
  • 7f9753fa12
    CUDA GPU acceleration for LoRAs + f16 models (#1970) Johannes Gäßler 2023-06-28 18:35:54 +02:00
  • cfa0750bc9
    llama : support input embeddings directly (#1910) ningshanwutuobang 2023-06-28 23:53:37 +08:00
  • 9d23589d63
    fix pthreads setaffinity usage on android (#2020) Erik Scholz 2023-06-27 19:06:33 +02:00
  • 0be54f75a6
    baby-llama : fix build after ggml_rope change (#2016) Howard Su 2023-06-27 13:07:13 +08:00
  • 181e8d9755
    llama : fix rope usage after ChatGLM change Georgi Gerganov 2023-06-27 00:37:13 +03:00
  • d9779021bd
    ggml : add support for ChatGLM RoPE Georgi Gerganov 2023-06-27 00:06:51 +03:00
  • d38e451578
    readme : add Scala 3 bindings repo (#2010) Roman Parykin 2023-06-26 22:47:59 +03:00
  • eaa6ca5a61
    ggml : increase max tensor name + clean up compiler warnings in train-text (#1988) David Yang 2023-06-27 03:45:32 +08:00
  • aa777abbb7
    readme : LD_LIBRARY_PATH complement for some Android devices when building with CLBlast inside Termux (#2007) Gustavo Rocha Dias 2023-06-26 16:34:45 -03:00
  • c824d2e368
    ggml : avoid conv 2d kernel round up Georgi Gerganov 2023-06-26 21:03:59 +03:00
  • b853d45601
    ggml : add NUMA support (#1556) zrm 2023-06-26 13:57:59 -04:00
  • 9225baef71
    k-quants : fix indentation Georgi Gerganov 2023-06-26 20:10:52 +03:00
  • a84ab1da8d
    tests : fix quantize perf (#1990) katsu560 2023-06-27 01:47:02 +09:00
  • 5743ca8092
    k-quants : add AVX support to dot functions (#1916) katsu560 2023-06-27 01:46:07 +09:00
  • 412c60e473
    readme : add link to new k-quants for visibility Georgi Gerganov 2023-06-26 19:45:09 +03:00