Commit graph

  • d69d777c02
    ggml : quantization refactoring (#3833) Georgi Gerganov 2023-10-29 18:32:28 +02:00
  • ff3bad83e2
    flake : update flake.lock for newer transformers version + provide extra dev shell (#3797) Erik Scholz 2023-10-28 16:41:07 +02:00
  • 82a6646e02
    metal : try cwd for ggml-metal.metal if bundle lookup fails (#3793) Aarni Koskela 2023-10-28 15:43:01 +03:00
  • ba231e8a6d
    issues : change label from bug to bug-unconfirmed (#3748) Georgi Gerganov 2023-10-28 15:25:33 +03:00
  • 8a2f2fea29
    convert : ignore tokens if their IDs are within [0, vocab_size) (#3831) Georgi Gerganov 2023-10-28 15:25:15 +03:00
  • bd6d9e2059
    llama : allow quantizing k-quants to fall back when tensor size incompatible (#3747) Kerfuffle 2023-10-28 05:54:24 -06:00
  • ee1a0ec9cb
    llama : add option for greedy sampling with probs (#3813) Georgi Gerganov 2023-10-28 14:23:11 +03:00
  • 177461104b
    common : print that one line of the syntax help *also* to standard output (#3823) Henk Poley 2023-10-28 12:16:33 +02:00
  • fdee152e4e
    starcoder : add GPU offloading (#3827) Georgi Gerganov 2023-10-28 12:06:08 +03:00
  • 41aee4df82
    speculative : ensure draft and target model vocab matches (#3812) Kerfuffle 2023-10-27 15:40:07 -06:00
  • 6d459cbfbe
    llama : correctly report GGUFv3 format (#3818) cebtenzzre 2023-10-27 17:33:53 -04:00
  • c8d6a1f34a
    simple : fix batch handling (#3803) Thibault Terrasson 2023-10-27 16:37:41 +02:00
  • 2f9ec7e271
    cuda : improve text-generation and batched decoding performance (#3776) Georgi Gerganov 2023-10-27 17:01:23 +03:00
  • 34b2a5e1ee
    server : do not release slot on image input (#3798) Georgi Gerganov 2023-10-26 22:53:37 +03:00
  • 6961c4bd0b
    batched-bench : print params at start Georgi Gerganov 2023-10-25 10:26:27 +03:00
  • cc44877486
    log : disable pid in log filenames Georgi Gerganov 2023-10-25 10:09:16 +03:00
  • ad93962657
    server : add parameter -tb N, --threads-batch N (#3584) (#3768) cebtenzzre 2023-10-24 16:10:43 -04:00
  • 1717521cdb
    server : do not block system prompt update (#3767) Georgi Gerganov 2023-10-24 23:08:20 +03:00
  • b2f7e04bd3
    sync : ggml (conv ops + cuda MSVC fixes) (#3765) Georgi Gerganov 2023-10-24 21:51:20 +03:00
  • abd21fc99f
    cmake : add missed dependencies (#3763) John Smith 2023-10-25 01:48:45 +08:00
  • 2b4ea35e56
    cuda : add batched cuBLAS GEMM for faster attention (#3749) Georgi Gerganov 2023-10-24 16:48:37 +03:00
  • daab3d7f45
    Add more tokenizer tests (#3742) Galunid 2023-10-24 09:17:17 +02:00
  • 469c9addef
    metal : handle ggml_scale for n%4 != 0 (close #3754) Georgi Gerganov 2023-10-24 09:46:50 +03:00
  • e3932593d4
    Revert "make : add optional CUDA_NATIVE_ARCH (#2482)" Georgi Gerganov 2023-10-23 23:46:05 +03:00
  • 9d02956443
    issues : separate bug and enhancement template + no default title (#3748) M. Yusuf Sarıgöz 2023-10-23 22:57:16 +03:00
  • 69a6735087
    Update special token handling in conversion scripts for gpt2 derived tokenizers (#3746) Galunid 2023-10-23 21:46:00 +02:00
  • 5be6c803fa
    llama : remove token functions with context args in favor of model (#3720) Marcus Dunn 2023-10-23 12:40:03 -07:00
  • 6336701c93
    Fix baichuan convert script not detecing model (#3739) Galunid 2023-10-23 17:47:03 +02:00
  • 96981f37b1
    make : add optional CUDA_NATIVE_ARCH (#2482) Alex 2023-10-22 15:56:53 -04:00
  • 438c2ca830
    server : parallel decoding and multimodal (#3677) Georgi Gerganov 2023-10-22 22:53:08 +03:00
  • 9e70cc0322
    Add test for MPT tokenization (#3728) goerch 2023-10-22 21:21:42 +02:00
  • 5a42a5f8e8
    readme : remove unsupported node.js library (#3703) Ian Scrivener 2023-10-23 05:16:43 +11:00
  • a5e7dbd614
    llama : validate special token ids are in range when loading GGUF model (#3635) Kerfuffle 2023-10-22 12:14:56 -06:00
  • d3956aea53
    main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623) vvhg1 2023-10-22 20:09:51 +02:00
  • 22c69a2794
    batched : add len CLI argument Georgi Gerganov 2023-10-22 08:37:20 +03:00
  • 465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat shibe2 2023-10-12 16:01:23 +04:00
  • d1031cf49c
    sampling : refactor init to use llama_sampling_params (#3696) Georgi Gerganov 2023-10-20 21:07:23 +03:00
  • 8cf19d60dc
    gguf : support big endian platform (#3552) Qin Yue Chen 2023-10-20 06:19:40 -05:00
  • a0edf73bda
    server : fix uninitialized sampling context (close #3685) Georgi Gerganov 2023-10-20 13:06:10 +03:00
  • f439e506e8
    ggml : fix rope + llama minor optimizations (#3560) Herman Semenov 2023-10-20 10:02:12 +00:00
  • e78f3ef24a
    convert : restore compat with old Falcon models (#3680) cebtenzzre 2023-10-20 01:32:08 -04:00
  • f3b25e4043
    multimodal : add BakLLaVA conversion support (#3682) M. Yusuf Sarıgöz 2023-10-19 19:40:41 +03:00
  • 60abea9798
    llava : avoid segfault in case of non-existent mmproj file (#3674) M. Yusuf Sarıgöz 2023-10-19 16:59:11 +03:00
  • 004797f6ac
    readme : update hot topics Georgi Gerganov 2023-10-18 21:44:43 +03:00
  • 4e82b2ea3f
    speculative : bug fixes Georgi Gerganov 2023-10-18 18:49:40 +03:00
  • 0e89203b51
    speculative : add tree-based sampling example (#3624) Georgi Gerganov 2023-10-18 16:21:57 +03:00
  • c67fe68e41
    metal : implement q5_0 and q5_1 kernels (#3648) Jhen-Jie Hong 2023-10-18 07:21:48 -05:00
  • 1117d06607
    opencl : fix element-wise multiplication (#3656) shibe2 2023-10-18 16:09:22 +04:00
  • cb33f43a2a
    fix embeddings when using CUDA (#3657) slaren 2023-10-17 22:24:50 +02:00
  • e1675d133c
    llama : avoid fprintf in favor of LLAMA_LOG (#3538) Georgi Gerganov 2023-10-17 22:34:26 +03:00
  • 8402566a7c
    readme : update hot-topics & models, detail windows release in usage (#3615) BarfingLemurs 2023-10-17 14:13:21 -04:00
  • 40e5ce054f CLBlast: Fix temporary buffer size for f16 conversion (wsize) shibe2 2023-10-11 21:30:06 +04:00
  • a5e8c1d8c7
    train-text-from-scratch : fix assert failure in ggml-alloc (#3618) slaren 2023-10-17 19:00:58 +02:00
  • e74c705e15
    editorconfig : remove trailing spaces Georgi Gerganov 2023-10-17 19:52:53 +03:00
  • 3ad1e3f1a1
    server : documentation of JSON return value of /completion endpoint (#3632) coezbek 2023-10-17 18:51:02 +02:00
  • 1142013da4
    save-load-state : fix example + add ci test (#3655) Georgi Gerganov 2023-10-17 19:12:46 +03:00
  • 5fe268a4d9
    readme : add Aquila2 links (#3610) ldwang 2023-10-17 23:52:33 +08:00
  • 1a159553f9
    tokenizer : special token handling (#3538) staviq 2023-10-17 17:11:01 +02:00
  • 281ef73c25
    k-quants : fix quantization ranges (#3646) Georgi Gerganov 2023-10-17 09:19:28 +03:00
  • 940efa95fe
    llava : fix tokenization to not add bos between image embeddings and user prompt (#3645) Georgi Gerganov 2023-10-16 23:58:00 +03:00
  • 11bff29045
    MPT : support GQA for replit-code-v1.5 (#3627) cebtenzzre 2023-10-15 02:32:06 -04:00
  • 11dc1091f6
    Honor -ngl option for Cuda offloading in llava (#3621) M. Yusuf Sarıgöz 2023-10-14 13:52:44 +03:00
  • 2a4bcbacea
    llama : remove n_threads from llama_decode_internal (#3614) Daniel Bevenius 2023-10-13 12:33:16 +02:00
  • 424b6381c4
    ggml : add context enumeration functions (#3605) slaren 2023-10-13 12:23:10 +02:00
  • 1e0e873c37
    CLBlast: Fix matrix-vector multiplication (#3544) shibe2 2023-10-12 23:59:47 +04:00
  • 370359e5ba
    examples: support LLaVA v1.5 (multimodal model) (#3436) M. Yusuf Sarıgöz 2023-10-12 18:23:18 +03:00
  • 9e24cc6e2e
    docs : fix typo GOMP_CPU_AFFINITY (#3597) uint256_t 2023-10-12 22:36:16 +09:00
  • d28e572c02
    cmake : fix add_compile_options on macOS Georgi Gerganov 2023-10-12 14:31:05 +03:00
  • f3040beaab
    typo : it is --n-gpu-layers not --gpu-layers (#3592) Ian Scrivener 2023-10-12 22:10:50 +11:00
  • 1a8c8795d6
    ci : check if there is enough VRAM (#3596) Georgi Gerganov 2023-10-12 13:44:56 +03:00
  • b016596d90
    server : add completion mode (no chat) (#3582) Aarni Koskela 2023-10-12 15:51:53 +09:00
  • 6b3ae4da92
    prompts : add mnemonics.txt Georgi Gerganov 2023-10-12 09:35:19 +03:00
  • 57dd55e2c7
    server : fix kv cache management (#3588) Georgi Gerganov 2023-10-12 09:29:04 +03:00
  • b8fe4b5cc9
    main : fix session loading bug (#3400) Georgi Gerganov 2023-10-11 23:55:08 +03:00
  • a8bdd65525
    server : add parameter -tb N, --threads-batch N (#3584) Michael Coppola 2023-10-11 15:42:22 -04:00
  • 70c29da118
    common : fix mirostat state when using multiple sequences (#3543) Kerfuffle 2023-10-11 13:35:46 -06:00
  • 8c70a5ff25
    batched : add bench tool (#3545) Georgi Gerganov 2023-10-11 21:25:33 +03:00
  • 24ba3d829e
    examples : add batched.swift + improve CI for swift (#3562) Zane Shannon 2023-10-11 04:14:05 -07:00
  • 9f6ede19f3
    Add MPT model to supported models in README.md (#3574) Galunid 2023-10-11 01:02:49 +02:00
  • 233fc1c69f
    Minor improvements in GPT2 tokenizer (#3567) goerch 2023-10-10 18:59:52 +02:00
  • c5b49360d0
    readme : add bloom (#3570) Xingchen Song(宋星辰) 2023-10-11 00:28:50 +08:00
  • 02d2875def
    llm : add bloom models (#3553) Xingchen Song(宋星辰) 2023-10-10 22:48:21 +08:00
  • 0aa6595ae0
    swift : improvements and fixes (#3564) Jhen-Jie Hong 2023-10-10 06:31:13 -05:00
  • f5f9121de1
    llm : add MPT support (#3417) Jan Ploski 2023-10-10 09:50:23 +02:00
  • 11ea5c7d96
    infill. : fix tokenization (#3508) vvhg1 2023-10-10 09:31:21 +02:00
  • 95bd60a0a6
    ggml-alloc : fix assert in debug builds (#3555) slaren 2023-10-09 14:44:58 +02:00
  • fcca0a7004
    refact : fix convert script + zero out KV cache to avoid nans (#3523) Georgi Gerganov 2023-10-09 14:32:17 +03:00
  • dcc09d2596
    metal : do not use mul_mm kernels when ne00 < 64 (#3542) Georgi Gerganov 2023-10-09 14:28:27 +03:00
  • db3abcc114
    sync : ggml (ggml-backend) (#3548) Georgi Gerganov 2023-10-08 20:19:14 +03:00
  • eee42c670e
    ci : add Zig CI/CD and fix build (#2996) Matheus C. França 2023-10-08 10:59:20 -03:00
  • 8e6716a102
    api_like_OAI.py : compat with Microsoft Guidance (#2746) Ryder Wishart 2023-10-08 03:55:58 -07:00
  • 9c38d181d4
    api_like_OAI.py : simplify function (#2796) arcrank 2023-10-08 06:52:57 -04:00
  • a1202a31ed
    k-quants : fix comments about block sizing (#3499) Johannes Rudolph 2023-10-08 12:21:19 +02:00
  • 94e502dfb7
    ci : enable on obj-c changes + fix metal build (#3540) Georgi Gerganov 2023-10-08 11:24:50 +03:00
  • 7d8b24932f
    zig : fix build by introducing train.cpp (#3539) Luo Tian 2023-10-08 16:24:01 +08:00
  • b0ec5218c3
    metal : support MTLGPUFamily < Apple7, formatting, style (#3524) Georgi Gerganov 2023-10-08 10:01:53 +03:00
  • 63d3b06a43
    llama : fix missing break in Persimmon arch case statements (#3535) Kerfuffle 2023-10-07 23:22:17 -06:00
  • a16e89cec8
    Fix trying to strip newline from empty prompt and cfg prompt file content (#3534) Kerfuffle 2023-10-07 15:31:41 -06:00
  • 4d03833211
    gguf.py : fix CI for publishing GGUF package (#3532) M. Yusuf Sarıgöz 2023-10-07 22:14:10 +03:00
  • c47066d833
    py : change version of numpy requirement to 1.24.4 (#3515) Tom C 2023-10-07 02:56:15 -07:00