Commit graph

  • 29122d32ac
    readme : fix ROCm link (#6579) Artem Zinnatullin 2024-04-10 00:49:12 -06:00
  • b231b37b09
    readme : update UI list (#6560) sjxx 2024-04-10 14:34:00 +08:00
  • ba5e134e07
    readme: fix typo in amdgpu target name (#6573) Jiří Sejkora 2024-04-10 00:23:02 +02:00
  • 1b67731e18
    BERT tokenizer fixes (#6498) Jared Van Bortel 2024-04-09 13:44:08 -04:00
  • c4a3a4ff47
    sync : ggml Georgi Gerganov 2024-04-09 20:29:06 +03:00
  • 400d5d722d
    server : detect search query to start webchat (#6554) Ed Lee 2024-04-09 01:31:47 -07:00
  • 5dc9dd7152
    llama : add Command R Plus support (#6491) Carolinabanana 2024-04-09 09:16:13 +01:00
  • e11a8999b5
    license : update copyright notice + add AUTHORS (#6405) Georgi Gerganov 2024-04-09 09:23:19 +03:00
  • cc4a95426d
    llama : fix attention layer count sanity check (#6550) Georgi Gerganov 2024-04-08 22:25:49 +03:00
  • cecd8d3c98
    Comment explaining a decision (#6531) kunnis 2024-04-08 10:44:19 -05:00
  • b73e564b16
    quantize : fix precedence of cli args (#6541) Georgi Gerganov 2024-04-08 16:23:01 +03:00
  • e3c337d87c
    llama : support negative ith in llama_get_ API (#6519) Rick G 2024-04-08 06:02:30 -07:00
  • beea6e1b16
    llama : save and restore kv cache for single seq id (#6341) Jan Boon 2024-04-08 20:43:30 +08:00
  • 87fb5b4234
    remove row=1 cond (#6532) Abhilash Majumder 2024-04-08 13:56:01 +05:30
  • d752327c33
    Adding KodiBot to UI list (#6535) Firat 2024-04-08 00:48:29 -07:00
  • 855f54402e
    Change Windows AMD example to release build to make inference much faster. (#6525) Mark Fairbairn 2024-04-07 19:52:19 +01:00
  • b909236c0b
    flake.lock: Update (#6517) Georgi Gerganov 2024-04-07 21:25:30 +03:00
  • e0717e751e
    Add GritLM as supported models. (#6513) DAN™ 2024-04-07 13:33:59 -04:00
  • c37247796b
    sync : ggml Georgi Gerganov 2024-04-07 17:05:51 +03:00
  • f77261a7c5
    ggml: bypass code incompatible with CUDA < 11.1 (whisper/2020) Slava Primenko 2024-04-04 14:49:24 +02:00
  • 43e8995e75
    scripts : sync ggml-cuda folder Georgi Gerganov 2024-04-07 16:08:12 +03:00
  • 9472bce308
    Run make to build the project (#6457) limitedAtonement 2024-04-07 07:05:40 -04:00
  • d4f220a5cc
    support/fix OPs GGML_TYPE_IQ4_NL, GGML_TYPE_IQ4_XS, GGML_TYPE_IQ3_XXS, GGML_TYPE_IQ3_S, GGML_TYPE_IQ2_XXS, GGML_TYPE_IQ2_XS, GGML_TYPE_IQ2_S, GGML_TYPE_IQ1_S, GGML_TYPE_IQ1_M (#6521) Neo Zhang Jianyu 2024-04-07 10:55:59 +08:00
  • 54ea0698fb
    sync : ggml Georgi Gerganov 2024-04-06 17:43:15 +03:00
  • b66aec675c
    backend : fix typo in scheduler documentation (ggml/781) Daniel Bevenius 2024-04-03 22:57:20 +02:00
  • 57dd02c44b
    Tests: Added integration tests for GBNF parser (#6472) Clint Herron 2024-04-06 10:31:33 -04:00
  • 75cd4c7729
    ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495) Pierrick Hymbert 2024-04-06 05:40:47 +02:00
  • a8bd14d557
    gguf.py : add licence and version to gguf writer (#6504) Brian 2024-04-06 05:41:38 +11:00
  • d0f5deebf8
    readme : update UI list (#6503) Hoang Nguyen 2024-04-05 11:39:43 -07:00
  • 87e21bbacd
    bench : make n_batch and n_ubatch configurable in Batched bench (#6500) Ting Sun 2024-04-06 01:34:53 +07:00
  • 1b496a745c
    [SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464) Ouadie EL FAROUKI 2024-04-05 14:35:06 +01:00
  • a307375c02
    readme : add Dot to UI list (#6487) alexpinel 2024-04-04 18:22:50 +01:00
  • b660a5729e
    readme : fix typo (#6481) Jun Jie 2024-04-05 01:16:37 +08:00
  • 0a1d889e27
    server: add cURL support to server Dockerfiles (#6474) Ed Lepedus 2024-04-04 17:31:22 +01:00
  • 7dda1b727e
    ci: exempt master branch workflows from getting cancelled (#6486) Minsoo Cheong 2024-04-05 01:30:53 +09:00
  • c666ba26c3
    build CI: Name artifacts (#6482) Ewout ter Hoeven 2024-04-04 17:08:55 +02:00
  • 2e66913e5f
    server: allow penalizing repetition of newlines on server webpage (#6431) Shakhar Dasgupta 2024-04-04 11:03:00 -04:00
  • 8120efee1d
    ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478) Pierrick Hymbert 2024-04-04 16:59:04 +02:00
  • a74401f0e5
    Correct README link (#6458) limitedAtonement 2024-04-04 10:30:02 -04:00
  • 7a2c92637a
    ci: bench: add more ftype, fix triggers and bot comment (#6466) Pierrick Hymbert 2024-04-04 11:57:58 +02:00
  • 4bcd6b959c
    common: remove duplicate check for curl (#6471) Daniel Bevenius 2024-04-04 09:49:21 +02:00
  • 9b84ae1806
    examples : add GBNF validator program (#5948) Clint Herron 2024-04-04 03:44:28 -04:00
  • 4399f13fb9
    server : remove obsolete --memory-f32 option Georgi Gerganov 2024-04-04 09:34:58 +03:00
  • 1a43c7254e
    server : add option to disable KV offload (#6468) Xiao-Yong Jin 2024-04-04 01:33:48 -05:00
  • 72d73af651
    convert : fix for lint error complaining of bare except (#6470) Clint Herron 2024-04-04 02:32:53 -04:00
  • 5fb1574c81
    A few small fixes to server's README docs (#6428) Fattire 2024-04-03 13:22:57 -07:00
  • 60cdf40cc3
    server : handle exception on wrong type in request (#6452) JH23X 2024-04-03 20:09:52 +02:00
  • bb43cf7e9d
    llama : add SEA-LION support (#6448) bryanSwk 2024-04-04 02:05:10 +08:00
  • 9f62c0173d
    ci : update checkout, setup-python and upload-artifact to latest (#6456) Ewout ter Hoeven 2024-04-03 20:01:13 +02:00
  • 5d4f12e462
    server: add cURL support to server.Dockerfile (#6461) Ed Lepedus 2024-04-03 18:56:37 +01:00
  • 154d4ee39c
    readme : add feature-rich rust bindings (#6465) Francisco Melo 2024-04-03 18:53:37 +01:00
  • e69945d953
    security : create policy (#6354) Joyce 2024-04-03 14:48:07 -03:00
  • db214fa578
    Missing tokenizer.model error during gguf conversion (#6443) Abhishek Gopinath K 2024-04-03 21:12:52 +05:30
  • 1ff4d9f3d6
    Add OpenChat, Alpaca, Vicuna chat templates (#6397) kaizau 2024-04-03 23:24:31 +08:00
  • 076b08649e
    readme : update hot topics Georgi Gerganov 2024-04-03 16:11:15 +03:00
  • 08a0c02060
    ggml : mul_mat_id use the same tensor for all the experts (#6387) slaren 2024-04-03 15:07:05 +02:00
  • 52604860f9
    [SYCL] Disable iqx on windows as WA (#6435) Meng, Hengyu 2024-04-03 10:34:40 +08:00
  • f87f7b8986
    flake.lock: Update (#6402) Georgi Gerganov 2024-04-01 19:05:57 +03:00
  • 33a5244806
    compare-llama-bench.py: fix long hexsha args (#6424) Johannes Gäßler 2024-04-01 13:30:43 +02:00
  • 226e819371
    ci: server: verify deps are coherent with the commit (#6409) Pierrick Hymbert 2024-04-01 12:36:40 +02:00
  • c50a82ce0f
    readme : update hot topics Georgi Gerganov 2024-03-31 11:56:30 +03:00
  • 37e7854c10
    ci: bench: fix Resource not accessible by integration on PR event (#6393) Pierrick Hymbert 2024-03-30 11:36:07 +01:00
  • c342d070c6
    Fedora build update (#6388) Mohammadreza Hendiani 2024-03-30 01:29:56 +03:30
  • f7fc5f6c6f
    split: allow --split-max-size option (#6343) Xuan Son Nguyen 2024-03-29 22:34:44 +01:00
  • ba0c7c70ab
    Vulkan k-quant mmq and ggml-backend offload functionality (#6155) 0cc4m 2024-03-29 17:29:21 +01:00
  • d48ccf3ad4
    sync : ggml (#6351) Georgi Gerganov 2024-03-29 17:45:46 +02:00
  • 069574775c
    [Model] Add support for xverse (#6301) hxer7963 2024-03-29 21:37:03 +08:00
  • cfde806eb9
    ci : fix BGE wget (#6383) Georgi Gerganov 2024-03-29 14:34:28 +02:00
  • b910287954
    readme : add project (#6356) zhouwg 2024-03-29 15:33:46 +08:00
  • 8093987090
    cmake : add explicit metal version options (#6370) Matt Clayton 2024-03-29 03:27:42 -04:00
  • 057400a3fd
    llama : remove redundant reshape in build_kv_store (#6369) Daniel Bevenius 2024-03-29 08:23:22 +01:00
  • b75c38166c
    convert : allow conversion of Mistral HF models (#6144) Pedro Cuenca 2024-03-29 08:15:00 +01:00
  • bfe7dafc9c
    readme : add notice for UI list Georgi Gerganov 2024-03-28 22:56:03 +02:00
  • 5106ef482c
    [SYCL] Revisited & updated SYCL build documentation (#6141) Ouadie EL FAROUKI 2024-03-28 16:01:47 +00:00
  • be55134a53
    convert : refactor vocab selection logic (#6355) Jared Van Bortel 2024-03-28 11:44:36 -04:00
  • 66ba560256
    llava : fix MobileVLM (#6364) Ziang Wu 2024-03-28 22:33:10 +08:00
  • 0308f5e3d7
    llama : fix command-r inference when omitting outputs (#6367) compilade 2024-03-28 08:05:54 -04:00
  • 28cb9a09c4
    ci: bench: fix master not schedule, fix commit status failed on external repo (#6365) Pierrick Hymbert 2024-03-28 11:27:56 +01:00
  • cfc4d75df6
    doc: fix outdated default value of batch size (#6336) Ting Sun 2024-03-28 16:51:06 +08:00
  • 6902cb7f2e
    server : stop gracefully on SIGTERM (#6348) Eric Zhang 2024-03-28 16:50:48 +08:00
  • d2d8f38996 nix: removed unnessesary indentation hutli 2024-03-27 19:17:30 +01:00
  • d39b308eaf nix: moved blas availability check to package inputs so it is still overridable hutli 2024-03-27 19:14:28 +01:00
  • c873976649 using blas.meta.available to check host platform hutli 2024-03-27 18:10:08 +01:00
  • dbb03e2b9c only using explicit blas if hostPlatform is allowed hutli 2024-03-27 17:25:05 +01:00
  • e9f17dc3bf nix: .#windows: proper cross-compilation set-up Someone Serge 2024-03-26 16:22:42 +00:00
  • 22a462cc1f nix: package: don't introduce the dependency on python Someone Serge 2024-03-26 16:22:07 +00:00
  • f6a0f5c642 nix: .#widnows: init hutli 2024-02-15 14:25:04 +01:00
  • d0e2f6416b
    doc: fix typo in MobileVLM-README.md (#6181) Ziang Wu 2024-03-28 12:03:30 +08:00
  • 25f4a613c4
    [SYCL] fix set main gpu crash (#6339) Neo Zhang Jianyu 2024-03-28 08:55:24 +08:00
  • a016026a3a
    server: continuous performance monitoring and PR comment (#6283) Pierrick Hymbert 2024-03-27 20:26:49 +01:00
  • 53c7ec53d5 nix: ci: dont test cuda and rocm (for now) Someone Serge 2024-03-27 16:17:46 +00:00
  • e5b89a441a
    ggml : fix bounds checking of zero size views (#6347) slaren 2024-03-27 15:07:50 +01:00
  • 3a0345970e
    make : whitespace Georgi Gerganov 2024-03-27 15:02:49 +02:00
  • 1e13987fba
    embedding : show full embedding for single prompt (#6342) howlger 2024-03-27 12:15:44 +01:00
  • e82f9e2b83
    [SYCL] Fix batched impl for NVidia GPU (#6164) AidanBeltonS 2024-03-27 08:16:40 +00:00
  • cbc8343619
    Make IQ1_M work for QK_K = 64 (#6327) Kawrakow 2024-03-27 08:44:27 +01:00
  • e562b9714b
    common : change --no-penalize-nl to --penalize-nl (#6334) Sigbjørn Skjæret 2024-03-27 08:23:10 +01:00
  • 2ab4f00d25
    llama2c : open file as binary (#6332) Georgi Gerganov 2024-03-27 09:16:02 +02:00
  • 1740d6dd4e
    readme : add php api bindings (#6326) Mateusz Charytoniuk 2024-03-27 08:08:59 +01:00
  • 0642b22cd1
    server: public: use relative routes for static files (#6325) Eric Zhang 2024-03-27 13:55:29 +08:00