Commit graph

  • 5a9e2f60ba
    py : minor fixes (#5668) Georgi Gerganov 2024-02-22 20:13:25 +02:00
  • 373ee3fbba
    Add Gemma chat template (#5665) Xuan Son Nguyen 2024-02-22 19:10:21 +01:00
  • 4cb4d8b22d
    workflows: nix: hardcode cachix ids, build unconditionally (#5663) Someone 2024-02-22 16:32:09 +00:00
  • 3a03541ced
    minor : fix trailing whitespace (#5638) Georgi Gerganov 2024-02-22 13:54:03 +02:00
  • 56d03d92be
    readme : update hot topics Georgi Gerganov 2024-02-22 10:35:54 +02:00
  • a46f50747b
    server : fallback to chatml, add AlphaMonarch chat template (#5628) Xuan Son Nguyen 2024-02-22 09:33:24 +01:00
  • c5688c6250
    server : clarify some params in the docs (#5640) Alexey Parfenov 2024-02-22 08:27:32 +00:00
  • 4ef245a92a
    mpt : add optional bias tensors (#5638) Dat Quoc Nguyen 2024-02-22 18:15:13 +10:00
  • 973053d8b0
    llama : fix loading models with shared tok_embd and output (#5651) slaren 2024-02-22 00:42:09 +01:00
  • 7c8bcc11dc
    Add docs for llama_chat_apply_template (#5645) Xuan Son Nguyen 2024-02-22 00:31:00 +01:00
  • 7fe4678b02
    llama : fix session save/load with quantized KV (#5649) slaren 2024-02-21 22:52:39 +01:00
  • ba2135ccae
    gemma : allow offloading the output tensor (#5646) slaren 2024-02-21 22:18:23 +01:00
  • 89febfed93
    examples : do not assume BOS when shifting context (#5622) Jared Van Bortel 2024-02-21 10:33:54 -05:00
  • 5022cf242d
    sync : ggml Georgi Gerganov 2024-02-21 16:52:39 +02:00
  • 1ecea255eb
    server: health: fix race condition on slots data using tasks queue (#5634) Pierrick Hymbert 2024-02-21 15:47:48 +01:00
  • a00a35cef9
    readme : add LocalAI to the availables UI (#5629) Ettore Di Giacinto 2024-02-21 15:39:10 +01:00
  • eccd7a26dd
    sync : ggml (#5633) Georgi Gerganov 2024-02-21 16:17:10 +02:00
  • c14f72db9c
    readme : update hot topics Georgi Gerganov 2024-02-21 15:39:54 +02:00
  • cc6cac08e3
    llava : add --skip-unknown to 1.6 convert.py (#5632) Daniel Bevenius 2024-02-21 14:36:57 +01:00
  • 580111d42b
    llama : add gemma model (#5631) postmasters 2024-02-21 05:08:22 -08:00
  • 88c46cbdac
    [SYCL] conext add name (#5624) Meng, Hengyu 2024-02-21 17:52:06 +08:00
  • a14679cc30
    IQ4_NL: 4-bit non-linear quants with blocks of 32 (#5590) Kawrakow 2024-02-21 11:39:52 +02:00
  • 6560bed3f0
    server : support llava 1.6 (#5553) CJ Pais 2024-02-20 11:07:22 -08:00
  • 06bf2cf8c4
    make : fix debug build with CUDA (#5616) slaren 2024-02-20 20:06:17 +01:00
  • 4ed8e4fbef
    llava : add explicit instructions for llava-1.6 (#5611) Daniel Bevenius 2024-02-20 18:30:27 +01:00
  • 9c405c9f9a
    Server: use llama_chat_apply_template (#5593) Xuan Son Nguyen 2024-02-20 15:58:27 +01:00
  • 5207b3fbc5
    readme : update UI list (#5605) Dane Madsen 2024-02-20 21:00:23 +11:00
  • 8dbbd75754
    metal : add build system support for embedded metal library (#5604) Haoxiang Fei 2024-02-19 22:58:36 -11:00
  • c0a8c6db37
    server : health endpoint configurable failure on no slot (#5594) Pierrick Hymbert 2024-02-20 08:48:19 +01:00
  • b9111bd209
    Update ggml_sycl_op_mul_mat_vec_q (#5502) AidanBeltonS 2024-02-20 07:01:25 +00:00
  • 633782b8d9 nix: now that we can do so, allow MacOS to build Vulkan binaries Mathijs de Bruin 2024-02-13 20:28:02 +00:00
  • 22f83f0c38 Enable Vulkan MacOS CI 0cc4m 2024-02-10 22:18:33 +01:00
  • bb9dcd560a Refactor validation and enumeration platform checks into functions to clean up ggml_vk_instance_init() 0cc4m 2024-02-14 20:57:17 +01:00
  • f50db6ae0b Add check for VK_KHR_portability_enumeration for MoltenVK support 0cc4m 2024-02-10 22:14:52 +01:00
  • d8c054517d Add preprocessor checks for Apple devices. Mathijs de Bruin 2024-02-06 14:39:22 +00:00
  • 42f664a382 Resolve ErrorIncompatibleDriver with Vulkan on MacOS. Mathijs de Bruin 2024-02-03 18:00:11 +00:00
  • 5dde540897 Allow for Vulkan build with Accelerate. Mathijs de Bruin 2024-02-03 17:56:46 +00:00
  • 40c3a6c1e1
    cuda : ignore peer access already enabled errors (#5597) slaren 2024-02-19 23:40:26 +01:00
  • f24ed14ee0
    make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598) Jared Van Bortel 2024-02-19 15:54:12 -05:00
  • 9d679f0fcc
    examples : support minItems/maxItems in JSON grammar converter (#5039) nopperl 2024-02-19 14:14:07 +00:00
  • 1387cf60f7
    llava : remove extra cont (#5587) Georgi Gerganov 2024-02-19 15:23:17 +02:00
  • 6fd413791a llava : replace ggml_cpy with ggml_cont slaren 2024-02-19 14:02:36 +01:00
  • 337c9cbd52 sync : ggml Georgi Gerganov 2024-02-19 14:54:21 +02:00
  • a3145bdc30 ggml-alloc : apply ggml/731 Georgi Gerganov 2024-02-19 14:53:48 +02:00
  • 890559ab28 metal : option to embed MSL source into compiled binary (whisper/1842) Didzis Gosko 2024-02-11 16:41:41 +02:00
  • d0e3ce51f4
    ci : enable -Werror for CUDA builds (#5579) Georgi Gerganov 2024-02-19 14:45:41 +02:00
  • 68a6b98b3c
    make : fix CUDA build (#5580) Georgi Gerganov 2024-02-19 13:41:51 +02:00
  • 70d45af0ef
    readme : fix typo in README-sycl.md (#5353) valiray 2024-02-19 02:37:10 -08:00
  • 13e2c771aa
    cmake : remove obsolete sycl compile flags (#5581) Abhilash Majumder 2024-02-19 14:45:18 +05:30
  • f53119cec4
    minor : fix trailing whitespace (#5538) Georgi Gerganov 2024-02-19 10:34:10 +02:00
  • 7084755396
    llava : avoid changing the original BakLLaVA model (#5577) Daniel Bevenius 2024-02-19 09:31:59 +01:00
  • 4480542b22
    baby-llama : allocate graphs in ggml_context (#5573) NawafAlansari 2024-02-19 03:25:38 -05:00
  • 11b12de39b
    llama : add llama_chat_apply_template() (#5538) Xuan Son Nguyen 2024-02-19 09:23:37 +01:00
  • 3a9cb4ca64
    cuda, metal : fix nans in soft_max (#5574) slaren 2024-02-19 09:04:45 +01:00
  • 769a716e30
    readme : update (#5572) Mirko185 2024-02-19 08:39:31 +01:00
  • f0d1fafc02
    ggml : android and old glibc NUMA incompatibility bugfixes (#5557) bmwl 2024-02-18 23:38:32 -08:00
  • a0c2dad9d4
    build : pass all warning flags to nvcc via -Xcompiler (#5570) Jared Van Bortel 2024-02-18 16:21:52 -05:00
  • 14278f55d2
    ggml : restore vec dot stride arg names (#5453) Georgi Gerganov 2024-02-18 22:58:57 +02:00
  • b1de96824b
    ci : fix wikitext url + compile warnings (#5569) Georgi Gerganov 2024-02-18 22:39:30 +02:00
  • 7ad554f90e
    metal : fix unused warnings (#0) Georgi Gerganov 2024-02-18 21:39:58 +02:00
  • 5ee99c32f5
    common, server : surface min_keep as its own parameter (#5567) Robey Holderith 2024-02-18 11:11:16 -08:00
  • c145f8a132
    server : slots monitoring endpoint (#5550) Pierrick Hymbert 2024-02-18 18:39:57 +01:00
  • 689a091bbe
    sampling : do not set min_keep to n_probs (#5564) Georgi Gerganov 2024-02-18 19:38:06 +02:00
  • f3f28c5395
    cmake : fix GGML_USE_SYCL typo (#5555) Georgi Gerganov 2024-02-18 19:17:00 +02:00
  • e75c6279d1
    server : enhanced health endpoint (#5548) Pierrick Hymbert 2024-02-18 17:31:28 +01:00
  • 36376abe05
    server : --n-predict option document and cap to max value (#5549) Pierrick Hymbert 2024-02-18 17:30:09 +01:00
  • 66c1968f7a
    server : graceful server shutdown (#5244) Daniel Hiltgen 2024-02-18 08:23:16 -08:00
  • 1dcc3fde00
    common : fix ub (#5530) Georgi Gerganov 2024-02-18 18:21:52 +02:00
  • 5d3de51f97
    ggml, common, examples, tests : fixed type arguments in printf (#5528) Herman Semenov 2024-02-18 16:20:12 +00:00
  • fc0c8d286a
    llava : update surgery script to not remove tensors (#5536) Daniel Bevenius 2024-02-18 17:19:23 +01:00
  • bd2d4e393b
    1.5 bit quantization (#5453) Kawrakow 2024-02-18 18:16:55 +02:00
  • c8e0d7efeb flake.lock: Update github-actions[bot] 2024-02-18 00:17:07 +00:00
  • 8f1be0d42f
    ggml : add ALiBi support for ggml_soft_max_ext (#5488) Georgi Gerganov 2024-02-17 23:04:16 +02:00
  • 6e4e973b26
    ci : add an option to fail on compile warning (#3952) Ananta Bastola 2024-02-17 16:03:14 -05:00
  • d250c9d61d
    gitignore : update for CLion IDE (#5544) clibdev 2024-02-17 18:28:37 +02:00
  • 5bf2b94dd4
    cmake : fix VULKAN and ROCm builds (#5525) Georgi Gerganov 2024-02-16 19:05:56 +02:00
  • d2819d5577
    scripts : add helpers script for bench comparing commits (#5521) Georgi Gerganov 2024-02-16 15:14:40 +02:00
  • 4cb0727698
    llava : removed excess free(NULL) operation (#5531) Herman Semenov 2024-02-16 12:43:23 +00:00
  • 65085c713e
    llama : minor fixed return int value (#5529) Herman Semenov 2024-02-16 11:45:48 +00:00
  • 6dcc02d244
    server : add "samplers" param to control the samplers order (#5494) Alexey Parfenov 2024-02-16 11:33:25 +00:00
  • 5f5808ca7b
    server : fix system prompt cli (#5516) Rőczey Barnabás 2024-02-16 11:00:56 +01:00
  • f486f6e1e5
    ggml : add numa options (#5377) bmwl 2024-02-16 01:31:07 -08:00
  • 60ed04cf82
    llava : fix clip-model-is-vision flag in README.md (#5509) Daniel Bevenius 2024-02-16 10:24:39 +01:00
  • 594845aab1
    ci : fix BERT model download and convert Georgi Gerganov 2024-02-16 09:57:55 +02:00
  • 4524290e87
    Use correct type of pooling for embedding models (#5500) Douglas Hanley 2024-02-15 11:21:49 -06:00
  • c06e45d729
    clip : fix wrong loop condition Georgi Gerganov 2024-02-15 18:49:08 +02:00
  • 9060a1e9df
    cuda : print message when initialization fails (#5512) slaren 2024-02-15 16:49:01 +01:00
  • 9350a1cf21
    scripts : add hf.sh helper script (#5501) Georgi Gerganov 2024-02-15 15:41:15 +02:00
  • 73122473ff
    fix(gguf-py): special tokens are no longer skipped when add_<token>_token is set to false (#5487) Michaël de Vries 2024-02-15 14:14:37 +01:00
  • 0d4177126b
    llava : fix memory management bug (#5491) Elbios 2024-02-15 09:01:57 +01:00
  • 7930a8a6e8
    llaba : hotfix for llava-1.6 image number (#5495) John 2024-02-15 08:59:18 +01:00
  • 704359e299
    vulkan: Find optimal memory type but with fallback (#5381) Neuman Vong 2024-02-15 17:11:15 +11:00
  • 594fca3fef
    readme : fix typo (#5490) Rune 2024-02-14 16:15:49 +01:00
  • ccbb277f46
    llava : update README.md (#5489) John 2024-02-14 15:49:42 +01:00
  • 8084d55440
    cmake : ARM intrinsics detection for MSVC (#5401) Michael Podvitskiy 2024-02-14 11:49:01 +03:00
  • aa23412989
    llava : support v1.6 (#5267) John 2024-02-14 08:38:35 +01:00
  • f5ca054855
    Early return for zero size calls to get_tensor. (#5482) AT 2024-02-13 15:44:25 -06:00
  • 6c00a06692
    gguf : add python reader example (#5216) John 2024-02-13 18:56:38 +01:00
  • ea9c8e1143
    llama : add support for Nomic Embed (#5468) Jared Van Bortel 2024-02-13 12:03:53 -05:00
  • c4e6dd59e4
    llama : allow raw byte in SPM vocabs; don't crash on nl 404 (#5478) Aarni Koskela 2024-02-13 18:18:16 +02:00