Commit graph

  • 936c79b227
    server : relay error messages (#4131) SoftwareRenderer 2023-11-19 11:54:10 -05:00
  • 262005ad9d
    common : comma should be semicolon (#4137) kchro3 2023-11-19 08:52:57 -08:00
  • 35985acffa
    gitignore : tokenize Georgi Gerganov 2023-11-19 18:50:49 +02:00
  • e937066420
    gguf-py : export chat templates (#4125) slaren 2023-11-19 11:10:52 +01:00
  • 28a2e6e7d4
    tokenize example: Respect normal add BOS token behavior (#4126) Kerfuffle 2023-11-18 14:48:17 -07:00
  • 0b5c3b0457
    scripts : Remove missed baichuan convert script (#4127) Galunid 2023-11-18 21:08:33 +01:00
  • 2923f17f6f
    Clean up ggml-cuda.cu warnings when compiling with clang (for ROCM) (#4124) Kerfuffle 2023-11-18 08:11:18 -07:00
  • bbecf3f415
    llama : increase max nodes (#4115) slaren 2023-11-17 20:39:11 +01:00
  • 8e9361089d
    build : support ppc64le build for make and CMake (#3963) Roger Meier 2023-11-17 17:11:23 +01:00
  • 5ad387e994
    tokenize : fix trailing whitespace Georgi Gerganov 2023-11-17 18:01:38 +02:00
  • 2fa02b4b3d
    examples : add tokenize (#4039) zakkor 2023-11-17 17:36:44 +02:00
  • 2ab0707acb
    convert : use 'model' value if it exists. This allows karpathy/tinyllamas to load (#4089) Don Mahurin 2023-11-17 07:32:34 -08:00
  • 11173c92d6
    py : Falcon HF compatibility (#4104) John 2023-11-17 16:24:30 +01:00
  • 9e87ef60e1
    common : improve yaml log escaping (#4080) Jannis Schönleber 2023-11-17 16:24:07 +01:00
  • c7cce1246e
    llava : fix compilation warning that fread return value is not used (#4069) Huawei Lin 2023-11-17 10:22:56 -05:00
  • f7d5e97542
    py : remove superfluous import statements (#4076) Jiří Podivín 2023-11-17 16:20:53 +01:00
  • ba4cf5c0bf
    train : move number of gpu layers argument parsing to common/train.cpp (#4074) Jiří Podivín 2023-11-17 16:19:16 +01:00
  • e85bb1a8e7
    llama : add functions to get the model's metadata (#4013) slaren 2023-11-17 16:17:37 +01:00
  • 3e916a07ac
    finetune : speed-up ggml_compute_forward_out_prod_f32 via BLAS (#4079) gwjr 2023-11-17 14:48:19 +00:00
  • 947f64f163
    finetune : zero the loraB initial vectors (#4082) Andrew Godfrey 2023-11-17 02:23:11 -08:00
  • b83e149ec6
    cuda : get_row_rounding F32 (#4095) Andrew Godfrey 2023-11-17 00:01:15 -08:00
  • 4f447a4833
    llama : fix data units (#4101) Georgi Gerganov 2023-11-17 10:00:15 +02:00
  • 91f6499393
    Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040) Kerfuffle 2023-11-16 19:14:37 -07:00
  • 8da46278e1
    gguf : fix potential infinite loops while parsing (#4100) texmex76 2023-11-16 16:01:48 +01:00
  • a6fc554e26
    llama : restore prefix space in llama tokenizer (#4081) Jared Van Bortel 2023-11-15 11:34:47 -05:00
  • 1cf2850d52
    ggml-cuda : increase max graph size (#4084) slaren 2023-11-15 13:58:13 +01:00
  • 6bb4908a17
    Fix MacOS Sonoma model quantization (#4052) Michael Potter 2023-11-14 09:34:41 -08:00
  • 36eed0c42c
    stablelm : StableLM support (#3586) Galunid 2023-11-14 11:17:12 +01:00
  • b46d12f86d
    convert.py: also look for plain model.safetensors (#4043) afrideva 2023-11-13 17:03:40 -08:00
  • bd90eca237
    llava : fix regression for square images in #3613 (#4056) M. Yusuf Sarıgöz 2023-11-13 18:20:52 +03:00
  • 3d68f364f1
    ggml : sync (im2col, GPU conv, 32-bit arm compat) (#4060) Georgi Gerganov 2023-11-13 16:55:52 +02:00
  • c049b37d7b
    readme : update hot topics Georgi Gerganov 2023-11-13 14:18:08 +02:00
  • 4760e7cc0b
    sync : ggml (backend v2) (#3912) Georgi Gerganov 2023-11-13 14:16:23 +02:00
  • bb50a792ec
    Add ReLU and SQR CUDA ops to (partially) fix Persimmon offloading (#4041) Kerfuffle 2023-11-13 01:58:15 -07:00
  • 21fd874c8d
    gguf-py: gguf_writer: Use bytearray to build metadata (#4051) Kerfuffle 2023-11-12 16:39:37 -07:00
  • 532dd74e38
    Fix some documentation typos/grammar mistakes (#4032) Richard Kiss 2023-11-11 22:04:58 -08:00
  • e86fc56f75
    Fix gguf-convert-endian script (#4037) M. Yusuf Sarıgöz 2023-11-11 18:35:31 +03:00
  • d96ca7ded7
    server : fix crash when prompt exceeds context size (#3996) Alexey Parfenov 2023-11-11 05:48:21 +00:00
  • 34b0a08207
    gguf-py: Refactor and allow reading/modifying existing GGUF files (#3981) Kerfuffle 2023-11-10 22:04:50 -07:00
  • 4a4fd3eefa
    server : allow continue edit on completion mode (#3950) Jhen-Jie Hong 2023-11-11 06:49:33 +08:00
  • df9d1293de
    Unbreak persimmon after #3837 (#4010) Galunid 2023-11-10 14:24:54 +01:00
  • a75fa576ab
    scripts: Generalize convert scripts (#3838) Galunid 2023-11-09 11:09:29 +01:00
  • 57ad015dc3
    server : add min_p param (#3877) Mihai 2023-11-09 04:00:34 +02:00
  • 875fb42871
    ggml-alloc : fix backend assignments of views (#3982) slaren 2023-11-08 13:15:14 +01:00
  • 0a7c980b6f
    gguf : track writer state, free unneeded tensors, cleanup (#3871) Jared Van Bortel 2023-11-07 12:43:04 -05:00
  • 413503d4b9
    make : do not add linker flags when compiling static llava lib (#3977) Georgi Gerganov 2023-11-07 19:25:32 +02:00
  • e9c1cecb9d
    ggml : fix backward rope after YaRN (#3974) xaedes 2023-11-07 09:04:51 +01:00
  • 54b4df8886
    Use params when loading models in llava-cli (#3976) Matthew Tejo 2023-11-06 23:43:59 -08:00
  • 46876d2a2c
    cuda : supports running on CPU for GGML_USE_CUBLAS=ON build (#3946) Meng Zhang 2023-11-06 22:49:08 -08:00
  • 381efbf480
    llava : expose as a shared library for downstream projects (#3613) Damian Stewart 2023-11-06 22:36:23 +01:00
  • 2833a6f63c
    ggml-cuda : fix f16 mul mat (#3961) slaren 2023-11-05 18:45:16 +01:00
  • d9ccce2e33
    Allow common process_escapes to handle \x sequences (#3928) Kerfuffle 2023-11-05 10:06:06 -07:00
  • bb60fd0bf6
    server : fix typo for --alias shortcut from -m to -a (#3958) Thái Hoàng Tâm 2023-11-05 23:15:27 +07:00
  • 132d25b8a6
    cuda : fix disabling device with --tensor-split 1,0 (#3951) Jared Van Bortel 2023-11-05 10:08:57 -05:00
  • 3d48f42efc
    llama : mark LLM_ARCH_STARCODER as full offload supported (#3945) Meng Zhang 2023-11-05 04:40:08 -08:00
  • c41ea36eaa
    cmake : MSVC instruction detection (fixed up #809) (#3923) Eve 2023-11-05 08:03:09 +00:00
  • a7fac013cf
    ci : use intel sde when ci cpu doesn't support avx512 (#3949) Eve 2023-11-05 07:46:44 +00:00
  • 48ade94538
    cuda : revert CUDA pool stuff (#3944) slaren 2023-11-05 08:12:13 +01:00
  • f28af0d81a
    gguf-py: Support 01.AI Yi models (#3943) Kerfuffle 2023-11-04 16:20:34 -06:00
  • d9b33fe95b
    metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) Peter Sugihara 2023-11-03 12:18:18 -07:00
  • 5ba3746171
    ggml-metal: fix yarn rope (#3937) Xiao-Yong Jin 2023-11-03 13:00:31 -05:00
  • abb77e7319
    ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) slaren 2023-11-03 12:13:09 +01:00
  • 8f961abdc4
    speculative : change default p_accept to 0.5 + CLI args (#3919) Georgi Gerganov 2023-11-03 09:41:17 +02:00
  • 05816027d6
    common : YAYF (yet another YARN fix) (#3925) Georgi Gerganov 2023-11-03 09:24:00 +02:00
  • 3fdbe6b66b
    llama : change yarn_ext_factor placeholder to -1 (#3922) cebtenzzre 2023-11-03 02:31:58 -04:00
  • 629f917cd6
    cuda : add ROCM aliases for CUDA pool stuff (#3918) Kerfuffle 2023-11-02 13:58:22 -06:00
  • 51b2fc11f7
    cmake : fix relative path to git submodule index (#3915) Andrei 2023-11-02 15:40:31 -04:00
  • 224e7d5b14
    readme : add notice about #3912 Georgi Gerganov 2023-11-02 20:44:12 +02:00
  • c7743fe1c1
    cuda : fix const ptrs warning causing ROCm build issues (#3913) Georgi Gerganov 2023-11-02 20:32:11 +02:00
  • d6069051de
    cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) Oleksii Maryshchenko 2023-11-02 18:10:39 +01:00
  • 4ff1046d75
    gguf : print error for GGUFv1 files (#3908) Georgi Gerganov 2023-11-02 16:22:30 +02:00
  • 21958bb393
    cmake : disable LLAMA_NATIVE by default (#3906) slaren 2023-11-02 13:10:33 +01:00
  • 2756c4fbff
    gguf : remove special-case code for GGUFv1 (#3901) Georgi Gerganov 2023-11-02 11:20:21 +02:00
  • 1efae9b7dc
    llm : prevent from 1-D tensors being GPU split (#3697) Georgi Gerganov 2023-11-02 09:54:18 +02:00
  • b12fa0d1c1
    build : link against build info instead of compiling against it (#3879) cebtenzzre 2023-11-02 02:50:16 -04:00
  • 4d719a6d4e
    cuda : check if this fixes Pascal card regression (#3882) Georgi Gerganov 2023-11-02 08:35:10 +02:00
  • 183b3fac6c
    metal : fix build errors and kernel sig after #2268 (#3898) Georgi Gerganov 2023-11-02 08:33:37 +02:00
  • 2fffa0d61f
    cuda : fix RoPE after #2268 (#3897) cebtenzzre 2023-11-02 01:49:44 -04:00
  • 0eb332a10f
    llama : fix llama_context_default_params after #2268 (#3893) cebtenzzre 2023-11-01 19:29:14 -04:00
  • d02e98cde0
    ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891) slaren 2023-11-01 23:10:09 +01:00
  • 898aeca90a
    llama : implement YaRN RoPE scaling (#2268) cebtenzzre 2023-11-01 18:04:33 -04:00
  • c43c2da8af
    llm : fix llm_build_kqv taking unused tensor (benign, #3837) Georgi Gerganov 2023-11-01 23:08:30 +02:00
  • 523e49b111
    llm : fix falcon norm after refactoring (#3837) Georgi Gerganov 2023-11-01 23:00:50 +02:00
  • e16b9fa4ba
    metal : multi-simd softmax (#3710) Georgi Gerganov 2023-11-01 21:25:00 +02:00
  • ff8f9a88da
    common : minor (#3715) Georgi Gerganov 2023-11-01 21:15:55 +02:00
  • 50337961a6
    llm : add llm_build_context (#3881) Georgi Gerganov 2023-11-01 20:11:02 +02:00
  • 0e40806c1c
    common : allow caller to handle help/argument exceptions (#3715) bandoti 2023-11-01 14:42:01 -03:00
  • a2758d08e4
    log : make generating separate log files optional (#3787) staviq 2023-11-01 15:18:27 +01:00
  • e75dfdd31b
    sampling : null grammar field after reset (#3885) l3utterfly 2023-11-01 21:40:43 +08:00
  • 9a3b4f6c86
    ggml : fix UNUSED macro (#3762) Georgi Gerganov 2023-11-01 13:50:45 +02:00
  • 73bdcb395e
    finetune : add -ngl parameter (#3762) Andrew Godfrey 2023-11-01 04:49:04 -07:00
  • f0e209324a
    scripts : add server-llm.sh (#3868) Georgi Gerganov 2023-11-01 11:29:07 +02:00
  • ca190bca8e
    server : re-enable completion and embedded at the same time (#3876) Adrian Hesketh 2023-11-01 09:28:28 +00:00
  • 71e3718abd
    llama : refactor graph build code (#3837) Georgi Gerganov 2023-11-01 08:04:02 +02:00
  • 238657db23
    samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841) kalomaze 2023-10-31 14:44:49 -05:00
  • 07178c98e1
    flake.nix: fix for rocm 5.7 (#3853) Tungsten842 2023-10-31 18:24:03 +01:00
  • 207b51900e
    ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) Georgi Gerganov 2023-10-30 19:19:15 +02:00
  • 6e08281e58
    Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843) Kerfuffle 2023-10-29 11:31:40 -06:00
  • 2046eb4345
    make : remove unnecessary dependency on build-info.h (#3842) cebtenzzre 2023-10-29 12:33:47 -04:00
  • 71a09da301
    llama : fix kv shift bug (#3835) Georgi Gerganov 2023-10-29 18:32:51 +02:00