Commit graph

  • c6f1491da0
    metal : fix bug in soft_max kernels (out-of-bounds access) (#3194) Georgi Gerganov 2023-09-15 20:17:24 +03:00
  • e3d87a6c36
    convert : make ftype optional in simple scripts (#3185) Cebtenzzre 2023-09-15 12:29:02 -04:00
  • 8c00b7a6ff
    sync : ggml (Metal F32 support + reduce ggml-alloc size) (#3192) Georgi Gerganov 2023-09-15 19:06:03 +03:00
  • 7e50d34be6
    cmake : fix building shared libs for clang (rocm) on windows (#3176) Engininja2 2023-09-15 06:24:30 -06:00
  • 235f7c193b
    flake : use pkg-config instead of pkgconfig (#3188) Evgeny Kurnevsky 2023-09-15 10:10:22 +02:00
  • a51b687657
    metal : relax conditions on fast matrix multiplication kernel (#3168) Georgi Gerganov 2023-09-15 11:09:24 +03:00
  • 76164fe2e6
    cmake : fix llama.h location when built outside of root directory (#3179) Andrei 2023-09-15 04:07:40 -04:00
  • c2ab6fe661
    ci : Cloud-V for RISC-V builds (#3160) Ali Tariq 2023-09-15 13:06:56 +05:00
  • 2d770505a8
    llama : remove mtest (#3177) Roland 2023-09-15 03:28:45 -04:00
  • 98311c4277
    llama : make quantize example up to 2.7x faster (#3115) Cebtenzzre 2023-09-14 21:09:53 -04:00
  • feea179e9f
    flake : allow $out/include to already exist (#3175) jneem 2023-09-14 13:54:47 -05:00
  • 769266a543
    cmake : compile ggml-rocm with -fpic when building shared library (#3158) Andrei 2023-09-14 13:38:16 -04:00
  • cf8238e7f4
    flake : include llama.h in nix output (#3159) Asbjørn Olling 2023-09-14 19:25:00 +02:00
  • 4b8560e72a
    make : fix clang++ detection, move some definitions to CPPFLAGS (#3155) Cebtenzzre 2023-09-14 13:22:47 -04:00
  • 83a53b753a
    CI: add FreeBSD & simplify CUDA windows (#3053) Alon 2023-09-14 20:21:25 +03:00
  • 5c872dbca2
    falcon : use stated vocab size (#2914) akawrykow 2023-09-14 10:19:42 -07:00
  • 990a5e226a
    cmake : add relocatable Llama package (#2960) bandoti 2023-09-14 14:04:40 -03:00
  • 980ab41afb
    docker : add gpu image CI builds (#3103) dylan 2023-09-14 09:47:00 -07:00
  • e394084166
    gguf-py : support identity operation in TensorNameMap (#3095) Kerfuffle 2023-09-14 10:32:26 -06:00
  • 4c8643dd6e
    feature : support Baichuan serial models (#3009) jameswu2014 2023-09-15 00:32:10 +08:00
  • 35f73049af
    speculative : add heuristic algorithm (#3006) Leng Yue 2023-09-14 09:14:44 -07:00
  • 71ca2fad7d
    whisper : tokenizer fix + re-enable tokenizer test for LLaMa (#3096) goerch 2023-09-13 15:19:44 +02:00
  • 1b6c650d16
    cmake : add a compiler flag check for FP16 format (#3086) Tristan Ross 2023-09-13 06:08:52 -07:00
  • 0a5eebb45d
    CUDA: mul_mat_q RDNA2 tunings (#2910) Johannes Gäßler 2023-09-13 11:20:24 +02:00
  • 84e723653c
    speculative: add --n-gpu-layers-draft option (#3063) FK 2023-09-13 08:50:46 +02:00
  • b52b29ab9d
    arm64 support for windows (#3007) Eric Sommerlade 2023-09-13 02:54:20 +01:00
  • 4f7cd6ba9c
    CUDA: fix LoRAs (#3130) Johannes Gäßler 2023-09-13 00:15:33 +02:00
  • 89e89599fd
    CUDA: fix mul_mat_q not used for output tensor (#3127) Johannes Gäßler 2023-09-11 22:58:41 +02:00
  • d54a4027a6
    CUDA: lower GPU latency + fix Windows performance (#3110) Johannes Gäßler 2023-09-11 19:55:51 +02:00
  • 1b0d09259e
    cmake : support build for iOS/tvOS (#3116) Jhen-Jie Hong 2023-09-11 19:49:06 +08:00
  • 8a4ca9af56
    CUDA: add device number to error messages (#3112) Johannes Gäßler 2023-09-11 13:00:24 +02:00
  • f31b6f4e2d
    metal : PP speedup (#3084) Kawrakow 2023-09-11 09:30:11 +02:00
  • 6eeb4d9083
    convert: remove most of the n_mult usage in convert.py (#3098) Erik Scholz 2023-09-10 17:06:53 +02:00
  • 21ac3a1503
    metal : support for Swift (#3078) kchro3 2023-09-09 02:12:10 -07:00
  • 4fd5477955
    metal : support build for iOS/tvOS (#3089) Jhen-Jie Hong 2023-09-09 16:46:04 +08:00
  • ec2a24fedf
    flake : add train-text-from-scratch to flake.nix (#3042) takov751 2023-09-08 17:06:26 +01:00
  • 7d99aca759
    readme : fix typo (#3043) Ikko Eltociear Ashimine 2023-09-09 01:04:32 +09:00
  • ba7ffbb251
    metal : Q3_K speedup (#2995) Kawrakow 2023-09-08 18:01:04 +02:00
  • e64f5b5578
    examples : make n_ctx warning work again (#3066) Cebtenzzre 2023-09-08 11:43:35 -04:00
  • 94f10b91ed
    readme : update hot tpoics Georgi Gerganov 2023-09-08 18:18:04 +03:00
  • b3e9852e47
    sync : ggml (CUDA GLM RoPE + POSIX) (#3082) Georgi Gerganov 2023-09-08 17:58:07 +03:00
  • cb6c44c5e0
    build : do not use _GNU_SOURCE gratuitously (#2035) Przemysław Pawełczyk 2023-09-08 14:09:21 +02:00
  • a21baeb122
    docker : add git to full-cuda.Dockerfile main-cuda.Dockerfile (#3044) hongbo.mo 2023-09-08 18:57:55 +08:00
  • 6ff712a6d1
    Update deprecated GGML TheBloke links to GGUF (#3079) Yui 2023-09-08 12:32:55 +02:00
  • ebc96086af
    ggml-alloc : correctly check mmap return value for errors (#3075) slaren 2023-09-08 04:04:56 +02:00
  • 7f412dab9c
    enable CPU HBM (#2603) Kunshang Ji 2023-09-08 09:46:56 +08:00
  • 6336d834ec
    convert : fix F32 ftype not being saved (#3048) Cebtenzzre 2023-09-07 14:27:42 -04:00
  • 00d62adb79
    fix some warnings from gcc and clang-tidy (#3038) Cebtenzzre 2023-09-07 13:22:29 -04:00
  • 4fa2cc1750
    make : improve test target (#3031) Cebtenzzre 2023-09-07 10:15:01 -04:00
  • 5ffab089a5
    make : fix CPPFLAGS (#3035) Cebtenzzre 2023-09-07 10:13:50 -04:00
  • 15b67a66c2
    llama-bench : use two tokens in the warmup run for prompt evals (#3059) slaren 2023-09-07 15:52:34 +02:00
  • be8c9c245b
    metal : parallel RoPE on Metal (#3024) Kawrakow 2023-09-07 15:45:01 +02:00
  • be6beeb8d7
    metal : correct fix of kernel_norm (#3060) Kawrakow 2023-09-07 15:42:42 +02:00
  • c4f496648c
    metal : fix kernel_norm (fixes Falcon on Metal) (#3057) Georgi Gerganov 2023-09-07 15:49:09 +03:00
  • fec2fb19e4
    ggml : posixify madvise and pagesize (#3037) Przemysław Pawełczyk 2023-09-07 10:15:06 +02:00
  • 178b1850eb
    k-quants : fix zero-weight guard in Q6_K (ref #3040) Georgi Gerganov 2023-09-06 12:40:57 +03:00
  • ea2c85d5d2
    convert-llama-ggml-to-gguf: Try to handle files older than GGJTv3 (#3023) Kerfuffle 2023-09-06 02:49:11 -06:00
  • 9912b9efc8
    build : add LLAMA_METAL_NDEBUG flag (#3033) Cebtenzzre 2023-09-05 18:21:10 -04:00
  • 9e2023156e
    make : use new flag variables for recent changes (#3019) Cebtenzzre 2023-09-05 15:12:00 -04:00
  • de2fe892af
    examples : replace fprintf to stdout with printf (#3017) Cebtenzzre 2023-09-05 15:10:27 -04:00
  • c9c3220c48
    convert: fix convert.py not working with int filename_stem (#3028) Erik Scholz 2023-09-05 19:41:00 +02:00
  • d59bd97065
    Guard against all weights in a super-block being zero (#3010) Kawrakow 2023-09-05 09:55:33 +02:00
  • 35938ee3b0
    llama : update logic for number of threads when using BLAS Georgi Gerganov 2023-09-05 10:46:39 +03:00
  • 921772104b
    speculative : add grammar support (#2991) Georgi Gerganov 2023-09-05 08:46:17 +03:00
  • 2ba85c8609
    py : minor Georgi Gerganov 2023-09-04 22:50:50 +03:00
  • e36ecdccc8
    build : on Mac OS enable Metal by default (#2901) Georgi Gerganov 2023-09-04 22:26:24 +03:00
  • bd33e5ab92
    ggml-opencl : store GPU buffer in ggml_tensor::extra (#2994) slaren 2023-09-04 14:59:52 +02:00
  • 3103568144
    llama-bench : make cpp file non-executable (#2999) Cebtenzzre 2023-09-04 06:40:18 -04:00
  • 5b8530d88c
    make : add speculative example (#3003) Leng Yue 2023-09-04 03:39:57 -07:00
  • e4386f417f
    server : add a subtle loading animation to the edit box (#2466) Aarni Koskela 2023-09-04 10:28:55 +02:00
  • 35195689cd
    2x faster (rms) norm cuda kernels (3.7% e2e improvement) (#2985) Jiahao Li 2023-09-04 14:53:30 +08:00
  • cf9b08485c
    ggml-alloc : use virtual memory for measurement (#2973) slaren 2023-09-03 20:34:09 +02:00
  • 47068e5170
    speculative : PoC for speeding-up inference via speculative sampling (#2926) Georgi Gerganov 2023-09-03 15:12:08 +03:00
  • 8f429fa511
    perplexity : fix ETA by warming up the model with an empty run Georgi Gerganov 2023-09-03 13:42:56 +03:00
  • 6519e9c99c
    gguf(python): Fix special vocab handling when id < 0 (#2984) Kerfuffle 2023-09-03 04:38:43 -06:00
  • b7f2aa9e51
    metal : restore 363f0bf and fix reduce in F16_F32 kernels (#2986) Georgi Gerganov 2023-09-03 13:23:33 +03:00
  • 73a12a6344
    cov : disable comment in PRs (#2989) Alon 2023-09-03 13:19:01 +03:00
  • 3730134776
    llama : fix bpe tokenize from byte (#2889) opparco 2023-09-03 19:18:09 +09:00
  • d9151e6f57
    metal : revert 6af0bab until we fix it Georgi Gerganov 2023-09-03 12:40:56 +03:00
  • afc43d5f82
    cov : add Code Coverage and codecov.io integration (#2928) Alon 2023-09-03 11:48:49 +03:00
  • 6460f758db
    opencl : fix a bug in ggml_cl_pool_malloc() for ggml_cl_mul_mat_f32() (#2955) Wentai Zhang 2023-09-03 16:46:44 +08:00
  • ca82cf7bac
    metal : more optimizations (#2959) Kawrakow 2023-09-03 11:06:22 +03:00
  • 6a31a3bd98
    swift : add support for k-quants (#2983) kchro3 2023-09-02 23:21:05 -07:00
  • cff7b0bf07
    convert.py : BPE fixes (#2938) Kerfuffle 2023-09-02 23:52:13 -06:00
  • 340af42f09
    docs : add catai to README.md (#2967) Ido S 2023-09-03 08:50:51 +03:00
  • c42f0ec6b3
    examples : fix gpt-neox (#2943) momonga 2023-09-03 14:36:28 +09:00
  • 2753415afd
    swift : add missing c file to Package.swift (#2978) kchro3 2023-09-02 22:27:25 -07:00
  • bc054af97a
    make : support overriding CFLAGS/CXXFLAGS/CPPFLAGS/LDFLAGS (#2886) Cebtenzzre 2023-09-03 01:26:59 -04:00
  • 3358c381f6
    logging: Fix creating empty file even when disabled (#2966) Kerfuffle 2023-09-02 11:53:55 -06:00
  • 52315a4216
    readme : update clblast instructions (#2903) bandoti 2023-09-02 09:53:18 -03:00
  • 8b56b4f2c3
    metal : show all Metal device instances in the system (#2952) Karsten Weiss 2023-09-02 14:29:09 +02:00
  • 21f3d1be86
    k-quants : fix build on armv7 (android only) (#2920) Jhen-Jie Hong 2023-09-02 20:23:45 +08:00
  • 571083f508
    server : avoid aniprompt in probabilities of final response (#2849) Jhen-Jie Hong 2023-09-02 08:31:46 +08:00
  • f04d002844
    cuda : vsubss4 for older versions of ROCm/clang (#2942) Engininja2 2023-09-01 15:33:19 -06:00
  • 69fdbb9abc
    readme : quick start command fix (#2908) ZHAOKAI WANG 2023-09-01 22:06:44 +08:00
  • 5d6f19f16b
    Allow quantize to only copy tensors, some other improvements (#2931) Kerfuffle 2023-09-01 08:02:48 -06:00
  • 0d58936686
    llama2c : rename function Georgi Gerganov 2023-09-01 17:00:40 +03:00
  • 6c9c23429b
    make : use unaligned vector moves on MinGW (#2945) Cebtenzzre 2023-09-01 09:53:14 -04:00
  • ee8654bcd0
    minor : add const qualifiers (#2853) m3ndax 2023-09-01 15:47:27 +02:00
  • 49bb9cbe0f
    docs : add java-llama.cpp to README.md (#2935) Konstantin Herud 2023-09-01 15:36:14 +02:00