Commit graph

  • ee4725a686
    ggml : group mul_mat_id rows by matrix (cpu only) (#4480) slaren 2023-12-15 12:45:50 +01:00
  • 6744dbe924
    ggml : use ggml_row_size where possible (#4472) slaren 2023-12-14 20:05:21 +01:00
  • cafcd4f895
    ggml : remove n_dims from ggml_tensor (#4469) slaren 2023-12-14 16:52:08 +01:00
  • c50e400163
    py : add protobuf dependency (#4466) wonjun Jang 2023-12-14 21:44:49 +09:00
  • 20a68a7030
    ggml : add ggml_row_size() (fixes llama out of space) (#4461) LostRuins 2023-12-14 20:13:33 +08:00
  • 55e87c3749
    ggml : fix OpenCL broadcast requirement for ggml_mul (close #4453) Georgi Gerganov 2023-12-14 10:35:29 +02:00
  • 873637afc7
    convert : support loading vocab from fast tokenizer config (#3633) wonjun Jang 2023-12-14 17:09:34 +09:00
  • 0353a18401
    readme : update supported model list (#4457) BarfingLemurs 2023-12-14 02:38:49 -05:00
  • 948ff137ec
    server : fix handling of characters that span multiple tokens when streaming (#4446) shibe2 2023-12-13 23:57:15 +04:00
  • 4d98d9a656
    sync : ggml (SD ops, tests, kernels) (#4444) Georgi Gerganov 2023-12-13 21:54:54 +02:00
  • 70f806b821
    build : detect host compiler and cuda compiler separately (#4414) Jared Van Bortel 2023-12-13 12:10:10 -05:00
  • 9fb13f9584
    common : add --version option to show build info in CLI (#4433) Siwen Yu 2023-12-13 20:50:14 +08:00
  • 113f9942fc
    readme : update hot topics Georgi Gerganov 2023-12-13 14:05:38 +02:00
  • 799a1cb13b
    llama : add Mixtral support (#4406) slaren 2023-12-13 13:04:25 +01:00
  • fecac45658
    server : tweak default sampling parameters (#4367) kalomaze 2023-12-12 04:12:35 -06:00
  • 9494d7c477
    english : use typos to fix comments and logs (#4354) Richard Kiss 2023-12-12 01:53:36 -08:00
  • 6138963fb2
    build : target Windows 8 for standard mingw-w64 (#4405) Jared Van Bortel 2023-12-12 04:27:26 -05:00
  • 6391817cd1
    llama : document logits_all deprecation (#4418) crasm 2023-12-12 04:25:57 -05:00
  • d9d4cfef64
    server : fix local model name in server (#4420) Vladimir Zorin 2023-12-12 11:25:29 +02:00
  • 41a11aaf99
    ggml : increased GGML_MAX_PARAMS to allow finetuning of 70b models (#4424) Taikono-Himazin 2023-12-12 18:24:32 +09:00
  • 8a7b2fa528
    Update README.md (#4388) Yueh-Po Peng 2023-12-11 06:27:38 +08:00
  • e18f7345a3
    grammar : revert the replacement of llama_token_to_piece with id_to_token (#4396) Xiang (Kevin) Li 2023-12-09 16:29:27 -05:00
  • fe680e3d10
    sync : ggml (new ops, tests, backend, etc.) (#4359) Georgi Gerganov 2023-12-07 22:26:54 +02:00
  • bcc0eb4591
    llama : per-layer KV cache + quantum K cache (#4309) Georgi Gerganov 2023-12-07 13:03:17 +02:00
  • 81bc9214a3
    train : fix #4227 (double free in examples/train-text-from-scratch/train-text-from-scratch.cpp) (#4351) Hongyu Ouyang 2023-12-07 02:25:22 -08:00
  • 05cd6e5036
    server : recognize cache_prompt parameter in OAI API (#4347) Georgi Gerganov 2023-12-06 20:21:59 +02:00
  • caa9249217
    common : fix compile warning Georgi Gerganov 2023-12-06 10:41:03 +02:00
  • da5eaef1f3
    speculative : support --color (#4343) stduhpf 2023-12-06 09:08:17 +01:00
  • 5f6e0c0dff
    grammar : pre-computed pieces + reserve mem + less string copies (#4330) Marcus Dunn 2023-12-05 10:55:12 -10:00
  • 5aa365d88f
    llama : allow overriding GGUF metadata when loading model (#4092) Kerfuffle 2023-12-05 10:19:18 -07:00
  • 52c8bc3cf3
    sampling : custom samplers order (#4285) MaggotHATE 2023-12-05 15:05:51 +05:00
  • e4b76bbe31
    swift : revert compiler checks for swift package (#4332) kchro3 2023-12-04 23:29:46 -08:00
  • 23b5e12eb5
    simple : update error message for KV cache check (#4324) Daniel Bevenius 2023-12-04 17:04:21 +01:00
  • d208995c6d
    swift : fix concatenation method to avoid invalid UTF8 stringfication (#4325) Miwa / Ensan 2023-12-05 01:03:49 +09:00
  • 5c9f90cba1
    swift : fix prompt tokenization logic (#4321) Miwa / Ensan 2023-12-04 22:43:45 +09:00
  • 4fa44e84ad
    grammar-parser : fix typo (#4318) Ikko Eltociear Ashimine 2023-12-04 16:57:35 +09:00
  • fbbc42827b
    ggml : reuse ggml_get_n_tasks() in ggml_graph_plan() (#4308) Georgi Gerganov 2023-12-03 15:56:35 +02:00
  • adf3de4f69
    ggml : fix soft max out-of-bounds access (#4307) Georgi Gerganov 2023-12-03 15:56:22 +02:00
  • 33e171d1e9
    server : fix OpenAI API stop field to be optional (#4299) Ed Lee 2023-12-03 01:10:43 -08:00
  • 6949b50df5
    py : add grammar to oai like api (#4294) Rickard Edén 2023-12-03 10:03:25 +01:00
  • d7b800b8bc
    llama : pad KV cache size (#4280) Georgi Gerganov 2023-12-03 10:58:16 +02:00
  • 5a7d3125e7
    llama : avoid using "optional" keyword (#4283) Georgi Gerganov 2023-12-01 20:39:12 +02:00
  • d5a1cbde60
    llama : support optional tensors (#4283) Georgi Gerganov 2023-12-01 20:35:03 +02:00
  • b220222a64
    swift : fix token_to_piece implementation (#4278) Miwa / Ensan 2023-12-02 03:19:45 +09:00
  • 511f52c334
    build : enable libstdc++ assertions for debug builds (#4275) Jared Van Bortel 2023-12-01 13:18:35 -05:00
  • 03562f3a86
    llama : support attention bias on LLaMA architecture (#4283) CausalLM 2023-12-02 02:17:06 +08:00
  • 37c746d687
    llama : add Qwen support (#4281) Shijie 2023-12-02 02:16:31 +08:00
  • 880f57973b
    llama : fix integer overflow during quantization (#4284) Georgi Gerganov 2023-12-01 18:42:11 +02:00
  • 8d6d9f033b
    py : add requirements file for convert-hf-to-gguf.py (#4277) Daniel Bevenius 2023-12-01 10:41:56 +01:00
  • ef47ec18da
    ggml : add ggml_soft_max_ext (#4256) Georgi Gerganov 2023-12-01 10:51:24 +02:00
  • 1d144112c0
    server : add --log-disable to disable logging to file (#4260) Ziad Ben Hadj-Alouane 2023-11-30 17:25:49 -05:00
  • f43f09366d
    server : add single-client multi-prompt support (#4232) Ziad Ben Hadj-Alouane 2023-11-30 17:25:04 -05:00
  • d2809a3ba2
    make : fix Apple clang determination bug (#4272) WillCorticesAI 2023-11-30 17:23:44 -05:00
  • 15f5d96037
    build : fix build info generation and cleanup Makefile (#3920) Jared Van Bortel 2023-11-30 17:23:08 -05:00
  • 33c9892af5
    llava : ShareGPT4V compatibility (vision encoder only loading) (#4172) John 2023-11-30 23:11:14 +01:00
  • 8efa0f6ebe
    main : pass LOG_TEE callback to llama.cpp log (#4033) Andrew Godfrey 2023-11-30 13:56:19 -08:00
  • 524907aa76
    readme : fix (#4135) vodkaslime 2023-12-01 05:49:21 +08:00
  • 3bd2c7ce1b
    docker : add finetune option (#4211) Juraj Bednar 2023-11-30 22:46:01 +01:00
  • bde629bb53
    batched.swift : update README.md (#4214) Miwa / Ensan 2023-12-01 06:45:17 +09:00
  • f7f9e06212
    cmake : fix the metal file foder path (#4217) Li Tan 2023-11-30 13:44:11 -08:00
  • 74daabae69
    readme : fix typo (#4253) Dawid Wysocki 2023-11-30 22:43:32 +01:00
  • b18c66ca6e
    llama : fix alignment of general.name in print meta (#4254) Daniel Bevenius 2023-11-30 22:43:08 +01:00
  • f4d973cecb
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#4258) slaren 2023-11-30 22:42:23 +01:00
  • 954e22858c
    llama : fix typical sampling (#4261) tarcey 2023-11-30 22:40:23 +01:00
  • e2bd725f4b
    py : fix oai proxy (#3972) rhjdvsgsgks 2023-11-30 20:50:40 +00:00
  • 1f5cd83275
    examples : add readme files Georgi Gerganov 2023-11-29 11:00:17 +02:00
  • 4fea3420ee
    readme : add FreeChat (#4248) Peter Sugihara 2023-11-28 23:16:34 -08:00
  • 64e64aa255
    ggml : restore abort() in GGML_ASSERT (#4242) Jared Van Bortel 2023-11-28 04:51:11 -05:00
  • 8406b0924b
    ggml : re-enable BLAS for CPU when src0 != F32 + remove redundant full offload checks in llama.cpp (#4240) Georgi Gerganov 2023-11-28 10:32:03 +02:00
  • b38a16dfcf
    cmake : fix issue with version info not getting baked into LlamaConfig.cmake (#3970) bandoti 2023-11-27 15:25:42 -04:00
  • 0dab8cd7cc
    readme : add Amica to UI list (#4230) Kasumi 2023-11-28 01:39:42 +08:00
  • bb03290c17
    examples : iOS example with swift ui (#4159) Bailey Chittle 2023-11-27 09:56:52 -05:00
  • f3b269813f
    ggml : fix -Warray-bounds warning with gcc (#4231) Jared Van Bortel 2023-11-26 22:58:43 -05:00
  • 3e73d31d9c
    lookahead : support -n -1 infinite generation Georgi Gerganov 2023-11-26 21:51:46 +02:00
  • 9656026b53
    readme : update hot topics Georgi Gerganov 2023-11-26 20:42:51 +02:00
  • 922754a8d6
    lookahead : add example for lookahead decoding (#4207) Georgi Gerganov 2023-11-26 20:33:07 +02:00
  • 22da05536f
    metal : fix yarn (#4220) Xiao-Yong Jin 2023-11-26 02:30:02 -06:00
  • 1ddb52ec38
    scripts : Use mmap in torch load (#4202) Galunid 2023-11-25 22:45:02 +01:00
  • f837c3a992
    llama : grammar reserve space in decode_utf8 (#4210) Marcus Dunn 2023-11-25 08:58:23 -08:00
  • 3014b5415d
    Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#4189) crasm 2023-11-25 10:47:07 -05:00
  • 04814e718e
    readme : update hot topics Georgi Gerganov 2023-11-25 12:02:13 +02:00
  • af19d35734
    server : OAI API compatibility (#4198) Georgi Gerganov 2023-11-25 11:29:06 +02:00
  • e9c13ff781
    llama : set metal log callback correctly (#4204) slaren 2023-11-24 18:10:01 +01:00
  • 8a052c131e
    ggml-cuda : support stablelm rope (#4156) slaren 2023-11-24 18:04:31 +01:00
  • 189d68446e
    convert : fix tensors using grad in some models (#4173) Galunid 2023-11-24 15:02:49 +01:00
  • 2568a4bf54
    main.swift : fix eos checking (#4197) eastriver 2023-11-24 18:25:10 +09:00
  • b35f3d0def
    readme : use PATH for Windows ROCm (#4195) Aaryaman Vasishta 2023-11-24 16:52:39 +09:00
  • 55978ce09b
    Fix incorrect format strings and uninitialized variables. (#4133) Haohui Mai 2023-11-23 13:56:53 -08:00
  • 6b0a7420d0
    llama : KV cache view API + better KV cache management (#4170) Georgi Gerganov 2023-11-23 19:07:56 +02:00
  • d103d935c0
    readme : update hot topics Georgi Gerganov 2023-11-23 13:51:22 +02:00
  • 9d5949f04b
    examples : fix typo in parallel example doc comment (#4181) Daniel Bevenius 2023-11-23 12:34:20 +01:00
  • ff8238f71d
    docs : add llama-star arch idea Georgi Gerganov 2023-11-23 11:35:04 +02:00
  • 8e672efe63
    stablelm : simplify + speedup generation (#4153) Galunid 2023-11-21 16:22:30 +01:00
  • 0b871f1a04
    finetune - update readme to mention llama support only (#4148) Galunid 2023-11-20 19:30:00 +01:00
  • dfc7cd48b1
    readme : update ROCm Windows instructions (#4122) Aaryaman Vasishta 2023-11-21 00:02:46 +09:00
  • 881800d1f0
    main : Add ChatML functionality to main example (#4046) Seb C 2023-11-21 00:26:59 +10:30
  • f23c0359a3
    ci : add flake8 to github actions (python linting) (#4129) Galunid 2023-11-20 11:35:47 +01:00
  • 40a34fe8d0
    speculative : fix prompt tokenization in speculative example (#4025) Branden Butler 2023-11-20 03:50:04 -06:00
  • dae06c06e5
    Revert "finetune : add --n-gpu-layers flag info to --help (#4128)" Georgi Gerganov 2023-11-19 19:16:07 +02:00
  • 05e8301e45
    finetune : add --n-gpu-layers flag info to --help (#4128) Clark Saben 2023-11-19 11:56:38 -05:00