Commit graph

  • 037259be68
    llama : make load error reporting more granular (#5477) Aarni Koskela 2024-02-13 15:24:50 +02:00
  • 263978904c
    finetune : rename feed-forward tensors (w1/w2/w3) (#4839) Daniel Bevenius 2024-02-13 14:15:42 +01:00
  • cf45252a7c
    tests : multi-thread the tokenizer tests (#5474) Georgi Gerganov 2024-02-13 15:14:22 +02:00
  • 03bf161eb6
    llama : support batched embeddings (#5466) Douglas Hanley 2024-02-13 06:06:58 -06:00
  • ad014bba97
    make: add error message for bad CUDA version (#5444) Johannes Gäßler 2024-02-13 12:38:37 +01:00
  • 49cc1f7d67
    bert : add tests + fix quantization (#5475) Georgi Gerganov 2024-02-13 13:01:29 +02:00
  • 99b8b43d7b
    tests : disable moe test (#5473) Georgi Gerganov 2024-02-13 11:20:24 +02:00
  • 895407f31b
    ggml-quants : fix compiler warnings (shadow variable) (#5472) Kawrakow 2024-02-13 09:07:57 +02:00
  • 099afc6274
    llama : fix quantization when tensors are missing (#5423) Georgi Gerganov 2024-02-12 20:14:39 +02:00
  • df334a1125
    swift : package no longer use ggml dependency (#5465) Georgi Gerganov 2024-02-12 19:54:29 +02:00
  • dbd8828eb0
    py : fix persimmon n_rot conversion (#5460) Lee 2024-02-13 01:29:57 +08:00
  • 43fe07c1a4
    ggml-sycl: Replace 3d ops with macro (#5458) Abhilash Majumder 2024-02-12 20:22:05 +05:30
  • 4a46d2b792
    llava : remove prog parameter from ArgumentParser (#5457) Daniel Bevenius 2024-02-12 09:38:44 +01:00
  • 3b169441df
    sync : ggml (#5452) Georgi Gerganov 2024-02-12 09:16:06 +02:00
  • 3bdc4cd0f5
    CUDA: mul_mat_vec_q tiling, refactor mul mat logic (#5434) Johannes Gäßler 2024-02-11 19:08:39 +01:00
  • 2891c8aa9a
    Add support for BERT embedding models (#5423) Douglas Hanley 2024-02-11 10:21:38 -06:00
  • 97a336507e flake.lock: Update github-actions[bot] 2024-02-11 00:17:31 +00:00
  • c88c74f967
    vulkan: only use M-sized matmul on Apple GPUs (#5412) Sergio López 2024-02-11 15:12:00 +01:00
  • a803333a4e
    common : use enums for sampler types (#5418) Alexey Parfenov 2024-02-11 13:43:31 +00:00
  • 684780141a
    server : allow to specify tokens as strings in logit_bias (#5003) Alexey Parfenov 2024-02-11 13:38:14 +00:00
  • 85910c5b30
    main : ctrl+C print timing in non-interactive mode (#3873) Georgi Gerganov 2024-02-11 15:35:50 +02:00
  • 139b62a839
    common : fix compile warning Georgi Gerganov 2024-02-11 15:33:43 +02:00
  • 0f2411f154
    ggml : fix compile warnings (unused vars) (#4966) Georgi Gerganov 2024-02-11 15:33:01 +02:00
  • a07d0fee1f
    ggml : add mmla kernels for quantized GEMM (#4966) snadampal 2024-02-11 07:22:33 -06:00
  • e4640d8fdf
    lookup: add print for drafting performance (#5450) Johannes Gäßler 2024-02-11 12:44:51 +01:00
  • 907e08c110
    server : add llama2 chat template (#5425) Xuan Son Nguyen 2024-02-11 11:16:22 +01:00
  • f026f8120f
    metal : use autoreleasepool to avoid memory leaks (#5437) Ian Bull 2024-02-10 02:53:28 -08:00
  • cd9aea63b5
    scripts : update sync scripts with new backends Georgi Gerganov 2024-02-10 09:53:05 +02:00
  • 43b65f5eb8
    sync : ggml Georgi Gerganov 2024-02-10 09:30:36 +02:00
  • 4633d93af0
    ggml : add abort_callback for cpu backend (ggml/725) Michael Podvitskiy 2024-02-09 10:42:27 +01:00
  • 4b7b38bef5
    vulkan: Set limit for task concurrency (#5427) Neuman Vong 2024-02-10 05:30:19 +11:00
  • e00d2a62dd
    llava : add requirements.txt and update README.md (#5428) Daniel Bevenius 2024-02-09 14:00:59 +01:00
  • 7c777fcd5d
    server : fix prompt caching for repeated prompts (#5420) Riley Stewart 2024-02-09 02:49:49 -08:00
  • e5ca3937c6
    llama : do not cap thread count when MoE on CPU (#5419) Paul Tsochantaris 2024-02-09 10:48:06 +00:00
  • e4124c2477
    readme : add JavaScript/Wasm repo (#5415) Marko Tasic 2024-02-09 11:17:00 +01:00
  • b2f87cb64d
    ggml : fix error C2078: too many initializers for MSVC ARM64 (#5404) Michael Podvitskiy 2024-02-09 10:56:43 +01:00
  • 44fbe34360
    Fix Vulkan crash on APUs with very little device memory (#5424) 0cc4m 2024-02-09 06:52:33 +01:00
  • 8e6a9d2de0
    CUDA: more warps for mmvq on NVIDIA (#5394) Johannes Gäßler 2024-02-08 21:56:40 +01:00
  • 41f308f58e
    llama : do not print "offloading layers" message in CPU-only builds (#5416) slaren 2024-02-08 21:33:03 +01:00
  • 6e99f2a04f
    Fix f16_sycl cpy call from Arc (#5411) Abhilash Majumder 2024-02-08 22:39:10 +05:30
  • ff4ff05c5f
    llava : add missing .py, and fix paths in README.md (#5414) Daniel Bevenius 2024-02-08 15:20:03 +01:00
  • b7b74cef36
    fix trailing whitespace (#5407) Johannes Gäßler 2024-02-08 11:36:54 +01:00
  • 4aa43fab56
    llama : fix MiniCPM (#5392) runfuture 2024-02-08 18:36:19 +08:00
  • a6e514a85f
    llava: fix typo/formatting in README.md (#5405) Daniel Bevenius 2024-02-08 09:58:19 +01:00
  • 26d4efd11e
    sampling: fix top_k <= 0 (#5388) Johannes Gäßler 2024-02-08 09:46:30 +01:00
  • 8504d2d0da
    tests : .gitignore obj files Georgi Gerganov 2024-02-08 09:46:47 +02:00
  • c4fbb6717c
    CMAKE_OSX_ARCHITECTURES for MacOS cross compilation (#5393) Michael Podvitskiy 2024-02-07 22:39:23 +01:00
  • 8c933b70c2
    fix typo in readme (#5399) Ebey Abraham 2024-02-07 21:11:30 +00:00
  • b906596bb7
    Add Ava in the list of llama.cpp UIs (#4362) Kamil Tomšík 2024-02-07 19:44:52 +01:00
  • aa7ab99be2
    CUDA: fixed mmvq kernel for bs 2,3,4 and -sm row (#5386) Johannes Gäßler 2024-02-07 12:40:26 +01:00
  • 10afa6f1d1
    [SYCL] update install make by w64devkit (#5297) Neo Zhang Jianyu 2024-02-07 18:16:55 +08:00
  • 0ef46da632
    llava-cli : always tokenize special tokens (#5382) Xiao-Yong Jin 2024-02-07 02:17:25 -06:00
  • ee1628bdfe
    Basic Vulkan Multi-GPU implementation (#5321) 0cc4m 2024-02-07 07:54:50 +01:00
  • ed0bf32290
    readme : modernize (#5379) Eve 2024-02-07 06:21:30 +00:00
  • 9a697d842b
    readme : update ui list (#5354) Ben Williams 2024-02-06 22:16:48 -08:00
  • 316c7faf77
    llama : add MiniCPM support (#5346) runfuture 2024-02-07 14:15:56 +08:00
  • f3e2b4fa3f
    server : update /props with "total_slots" value (#5373) Justin Parker 2024-02-07 01:15:19 -05:00
  • f68664ac24
    convert : fix TypeError on GPT-2 vocab.json (#5288) Sang-Kil Park 2024-02-07 13:28:00 +09:00
  • 213d1439fa
    server : remove model.json endpoint (#5371) Alexey Parfenov 2024-02-06 18:08:38 +00:00
  • 17c97fb062
    CUDA: mul_mat_vec_q max. batch size 8 -> 4 (#5370) Johannes Gäßler 2024-02-06 18:43:06 +01:00
  • b08f22c882
    Update README.md (#5366) Kawrakow 2024-02-06 19:00:16 +02:00
  • f57fadc009
    Slight quantization improvement for Q4_K and Q5_K (#5361) Kawrakow 2024-02-06 17:28:02 +02:00
  • 2e9c0bd6b3
    readme : add phi, orion 14b, internlm2, and yi-VL to readme (#5362) BarfingLemurs 2024-02-06 09:06:48 -05:00
  • 2c516611f1
    CUDA: mul_mat_vec_q for batch sizes > 1 (#5351) Johannes Gäßler 2024-02-06 14:44:06 +01:00
  • 8a79c591de
    server : include total "num_slots" in props endpoint (#5349) Justin Parker 2024-02-06 04:20:59 -05:00
  • 31e7903221
    server : add dynatemp_range and dynatemp_exponent (#5352) Michael Coppola 2024-02-06 04:20:00 -05:00
  • 4ffc7a17d4
    server : various fixes for the prompt field in /completion (#5300) Niall Coates 2024-02-06 08:16:23 +00:00
  • 906cff55c2
    py : handle byte tokens in get_token_type (#5341) Georgi Gerganov 2024-02-06 07:47:22 +02:00
  • 098f6d737b
    make: Use ccache for faster compilation (#5318) Johannes Gäßler 2024-02-05 19:33:00 +01:00
  • 78b00dda6c
    README: updated introduction (#5343) Johannes Gäßler 2024-02-05 15:55:10 +01:00
  • c6b395535a
    ggml : make use of ggml-quants.h possible in C++ code (#5338) Kawrakow 2024-02-05 14:09:47 +02:00
  • abb61944a5
    ggml : avoid duplicating function calls using MIN/MAX macros (#5325) Dr. Tom Murphy VII Ph.D 2024-02-05 06:13:57 -05:00
  • 89503dcb5f
    iq3_xxs: quards for the no-imatrix situation (#5334) Kawrakow 2024-02-05 12:32:27 +02:00
  • 7e1ae372f3
    py : fix internlm2-hf convert to gguf (#5305) Guoteng 2024-02-05 17:04:06 +08:00
  • 6fdfa2ecc6
    iq2_xxs: tune quantization (#5320) Kawrakow 2024-02-05 10:46:06 +02:00
  • a2d60c9158
    server : allow to get default generation settings for completion (#5307) Alexey Parfenov 2024-02-05 08:10:22 +00:00
  • e6f8177532
    common : add dynamic temperature parameters to main example cli (#5295) l3utterfly 2024-02-05 17:00:47 +09:00
  • 30679d438d
    scripts : fix typos, cleanup (#5303) Georgi Gerganov 2024-02-05 09:48:03 +02:00
  • 4be04c8965
    scripts : add non-interactive server-llm.sh (#5303) Нияз Гарифзянов 2024-02-05 10:43:57 +03:00
  • 5d55b0cd82
    readme : add CodeShell models to the supported models list (#5330) chiranko 2024-02-05 15:41:38 +08:00
  • 4833ac209d
    [SYCL] Fix cpy with dims of 3 (#5289) AidanBeltonS 2024-02-05 07:08:24 +00:00
  • 9392ebd49e flake.lock: Update github-actions[bot] 2024-02-04 00:17:24 +00:00
  • 5ed26e1fc9
    Adding some imatrix tools (#5302) Kawrakow 2024-02-04 10:39:58 +02:00
  • 277fad30c6
    cmake : use set() for LLAMA_WIN_VER (#5298) Welby Seely 2024-02-03 23:18:51 -05:00
  • 3c0d25c475
    make: add nvcc info print (#5310) Johannes Gäßler 2024-02-03 20:15:13 +01:00
  • 3cc5ed353c
    make: fix nvcc optimization flags for host code (#5309) Johannes Gäßler 2024-02-03 20:14:59 +01:00
  • 60ecf099ed add Vulkan support to Nix flake Martin Schwaighofer 2024-01-28 12:59:43 +01:00
  • e920ed393d
    Vulkan Intel Fixes, Optimizations and Debugging Flags (#5301) 0cc4m 2024-02-03 18:15:00 +01:00
  • 52bb63c708
    refactor : switch to emplace_back to avoid extra object (#5291) Michael Klimenko 2024-02-03 12:23:37 +01:00
  • 1ec3332ade
    YaRN : store rope scaling type as int32_t in memory (#5285) Jared Van Bortel 2024-02-03 06:22:06 -05:00
  • 6a66c5071a
    readme : add tenere in the ui tools list (#5284) BADR 2024-02-03 12:20:26 +01:00
  • a305dba8ff
    Fix im2col with 32fp (#5286) AidanBeltonS 2024-02-03 08:11:37 +00:00
  • 191221178f
    perplexity : fix KL divergence calculations on Windows (#5273) kalomaze 2024-02-02 08:15:30 -06:00
  • e437b37fd0
    scripts : parse wtype in server-llm.sh (#5167) Georgi Gerganov 2024-02-02 14:23:40 +02:00
  • 2d40085c26
    py : add check for '.attn.masked_bias' layers to GPT2model (#5281) Mirror Azure 2024-02-02 14:39:09 +03:00
  • b05102fe8c
    Tidy ggml-sycl (#5261) AidanBeltonS 2024-02-02 08:39:48 +00:00
  • 6b91b1e0a9
    docker : add build for SYCL, Vulkan + update readme (#5228) Xuan Son Nguyen 2024-02-02 08:56:31 +01:00
  • e805f0fa99
    [SYCL] get MAX_MEM_ALLOC from device property (#5270) Meng, Hengyu 2024-02-02 15:54:14 +08:00
  • af3ba5d946
    [SYCL] update guide of SYCL backend (#5254) Neo Zhang Jianyu 2024-02-02 15:53:27 +08:00
  • e1e721094d
    llama : fix memory leak in llama_batch_free (#5252) Ian Bull 2024-02-01 23:20:13 -08:00