Commit graph

  • d84c48505f
    llama : fix Baichuan2 13B (#6092) slaren 2024-03-15 22:14:16 +01:00
  • 877b4d0c62
    llama : add support for control vectors (#5970) Theia Vogel 2024-03-15 13:43:02 -07:00
  • 12247f4c69
    llama : add Command-R support (#6033) Andrew Canis 2024-03-15 16:41:22 -04:00
  • 4e9a7f7f7f
    llava : change API to pure C style for Rust FFI bindgen (#6079) Ting Lou 2024-03-15 22:31:05 +08:00
  • 3020327f6c
    cuda : disable unused cudaLaunchHostFunc code (#6078) slaren 2024-03-15 13:24:03 +01:00
  • 46acb36767
    fix set main gpu error (#6073) Neo Zhang Jianyu 2024-03-15 18:53:53 +08:00
  • 131b058409
    make : ggml-metal.o depends on ggml.h Georgi Gerganov 2024-03-15 11:36:50 +02:00
  • 753e36f650
    [SYCL] Fix non-intel device selection (#6042) AidanBeltonS 2024-03-15 09:26:20 +00:00
  • 7ce2c77f88
    gguf : add support for I64 and F64 arrays (#6062) Ondřej Čertík 2024-03-15 02:46:51 -06:00
  • aab606a11f
    llama : add Orion chat template (#6066) Xuan Son Nguyen 2024-03-15 09:44:57 +01:00
  • b0bc9f4a9d
    llama-bench : use random tokens to improve accuracy with mixtral (#6069) slaren 2024-03-15 09:22:24 +01:00
  • 4755afd1cb
    llama : fix integer overflow during quantization (#6063) Georgi Gerganov 2024-03-14 22:58:41 +02:00
  • 6e0438da3c
    gguf : fix resource leaks (#6061) Steve Grubb 2024-03-14 14:29:32 -04:00
  • 727107707a
    gguf-py : bump version to 0.8.0 (#6060) Ondřej Čertík 2024-03-14 11:57:31 -06:00
  • 69ff61397d
    llama : support models without vocabulary (#5798) Michael Podvitskiy 2024-03-14 17:21:56 +01:00
  • 044ec4b2a5
    embedding : add EOS token if not present (#899) Georgi Gerganov 2024-03-14 15:14:14 +02:00
  • 77178eedc8
    gguf-py : fix dtype check (#6045) Georgi Gerganov 2024-03-14 13:32:14 +02:00
  • 15a333260a
    readme : improve readme for Llava-1.6 example (#6044) Jian Liao 2024-03-14 04:18:23 -07:00
  • 43241adf22
    server: disable debug release type sanitizer, simplify trigger (#6047) Pierrick Hymbert 2024-03-14 12:15:39 +01:00
  • a44bc969e4
    llama : fix typo Georgi Gerganov 2024-03-14 13:13:06 +02:00
  • 2c4fb69246
    llama : optimize defrag moves + fix fragmentation calculation (#6037) Michael Podvitskiy 2024-03-14 11:56:48 +01:00
  • 3ca23481dd
    gguf-py : add support for I8, I16 and I32 (#6045) Ondřej Čertík 2024-03-14 04:40:14 -06:00
  • 3fe8d7a17f
    ggml : designate enum vals for integer types (#6050) Georgi Gerganov 2024-03-14 12:38:37 +02:00
  • 68265ebfc6
    embedding : print all resulting embeddings (#899) Georgi Gerganov 2024-03-14 12:37:20 +02:00
  • 381da2d9f0
    metal : build metallib + fix embed path (#6015) Georgi Gerganov 2024-03-14 11:55:23 +02:00
  • 0fd6c1f015
    embedding : print cosine similarity (#899) Georgi Gerganov 2024-03-14 10:12:29 +02:00
  • 19885d205e
    readme : update details about running llama in Termux on Android (#6039) Linwei Wang 2024-03-14 02:34:40 +08:00
  • 76a936c893
    readme : update API changes and hot topics Georgi Gerganov 2024-03-13 20:33:56 +02:00
  • 463628372d
    grammar : handle missing "root" node (#6004) Clint Herron 2024-03-13 14:10:40 -04:00
  • f30ea47a87
    llama : add pipeline parallelism support (#6017) slaren 2024-03-13 18:54:21 +01:00
  • d8fd0ccf6a
    test-backend-ops : skip CPU backend by default (#6028) slaren 2024-03-13 14:58:30 +01:00
  • b3d978600f
    Update get version (#6025) AidanBeltonS 2024-03-13 13:17:54 +00:00
  • 99b71c068f
    Server: Use multi-task for embeddings endpoint (#6001) Xuan Son Nguyen 2024-03-13 11:39:11 +01:00
  • 306d34be7a
    ci : remove tidy-review (#6021) slaren 2024-03-12 16:55:19 +01:00
  • 8030da7afe
    ggml : reuse quantum structs across backends (#5943) Georgi Gerganov 2024-03-12 14:27:20 +02:00
  • 184215e783
    ggml : fix UB in IQ2_S and IQ3_S (#6012) Georgi Gerganov 2024-03-12 13:49:55 +02:00
  • 48358b2e5b
    sycl : update IQ1_S kernels (WIP - not working!) (#5995) Georgi Gerganov 2024-03-12 11:15:05 +02:00
  • 5cdb371731
    grammar : fix unnecessarily retained pointer to rules (#6003) gliptic 2024-03-11 20:59:03 +01:00
  • 44ca159faf
    1.5 bit: we can do even better (#5999) Kawrakow 2024-03-11 16:53:15 +01:00
  • 05b06210c9
    llama : more consistent names of count variables (#5994) Georgi Gerganov 2024-03-11 17:49:47 +02:00
  • 83796e62bc
    llama : refactor unicode stuff (#5992) Georgi Gerganov 2024-03-11 17:47:47 +02:00
  • 828defefb6
    Update server docker image URLs (#5997) Jakub N 2024-03-11 14:40:42 +01:00
  • caa106d4e0
    Server: format error to json (#5961) Xuan Son Nguyen 2024-03-11 10:56:41 +01:00
  • 3202361c5b
    ggml, ci : Windows ARM runner and build fixes (#5979) Michael Podvitskiy 2024-03-11 10:28:51 +01:00
  • 332bdfd798
    server : maintain chat completion id for streaming responses (#5988) Minsoo Cheong 2024-03-11 17:09:32 +09:00
  • ecab1c75de
    cmake : fix subdir for LLAMA_METAL_EMBED_LIBRARY (#5985) Gilad S 2024-03-11 10:00:08 +02:00
  • ee35600b90
    llama : fix F16/F32 downcast + improve names (#5980) Georgi Gerganov 2024-03-11 09:56:47 +02:00
  • be858f6205
    Better 1.5 bit quantization (#5971) Kawrakow 2024-03-11 07:51:49 +01:00
  • ef3ced26a3
    [SYCL] Add q3_s and q1_s (#5886) Abhilash Majumder 2024-03-11 10:27:56 +05:30
  • 3814a07392
    [SYCL] Add support for SYCL Nvidia target (#5738) AidanBeltonS 2024-03-11 01:13:57 +00:00
  • bb6d00bbf9
    metal : move mm_id indices to shared mem (#5982) Georgi Gerganov 2024-03-10 23:12:48 +02:00
  • 7ab7b733bb
    android : fix utf8 decoding error (#5935) Dean 2024-03-11 04:03:17 +08:00
  • d9f65c97c3
    readme : update hot topics Georgi Gerganov 2024-03-10 20:58:26 +02:00
  • b838b53ad6
    sync : ggml Georgi Gerganov 2024-03-10 20:10:46 +02:00
  • df4dc3e7cb
    ggml : try fix 32-bit arm compat (whisper/1938) Georgi Gerganov 2024-03-08 23:45:07 +02:00
  • bf47a5eefc
    ggml : remove __constant__ specifier for CUDA tables (#5940) Georgi Gerganov 2024-03-10 20:09:24 +02:00
  • fa8a809a91
    server: ci: windows build and tests (#5968) Pierrick Hymbert 2024-03-10 18:17:47 +01:00
  • bcebd7dbf6
    llama : add support for GritLM (#5959) DAN™ 2024-03-10 11:56:30 -04:00
  • 2960eae847
    grammar : verify parsed state (#5950) Clint Herron 2024-03-10 11:17:43 -04:00
  • c78541479c
    nix: update flake.lock (#5969) Georgi Gerganov 2024-03-10 16:43:08 +02:00
  • 621e86b331
    server: benchmark: chat/completions scenario and other llm servers comparison (#5941) Pierrick Hymbert 2024-03-09 23:41:49 +01:00
  • 77d1ac7e00
    server : print chat template info Georgi Gerganov 2024-03-09 22:04:00 +02:00
  • d894f352bf
    perplexity : support using multiple sequences to allow larger batch sizes (#5946) slaren 2024-03-09 19:55:54 +01:00
  • 098dbaab44
    readme : update hot topics Georgi Gerganov 2024-03-09 18:14:13 +02:00
  • 8380ecfb21
    ggml : fix unnecessary f32 -> f16 -> f32 casts (mmla) (#5951) Georgi Gerganov 2024-03-09 17:36:20 +02:00
  • 58308a0ecc
    server : fix metrics init (#5964) Georgi Gerganov 2024-03-09 17:34:15 +02:00
  • 5b09797321
    ggml : remove old quantization functions (#5942) Georgi Gerganov 2024-03-09 15:53:59 +02:00
  • 97c09585d6
    server : clarify some items in the readme (#5957) Georgi Gerganov 2024-03-09 15:47:47 +02:00
  • fb215c3832
    server : normalize embeddings (#5956) SeungWon Jeong 2024-03-09 21:27:58 +09:00
  • 2c4f566c88
    tests : gitignore ggml-common.h Georgi Gerganov 2024-03-09 14:17:11 +02:00
  • 0db32beaf0
    server : fix passing prompt as tokens (#5955) Alexey Parfenov 2024-03-09 11:16:53 +00:00
  • 8a3012a4ad
    ggml : add ggml-common.h to deduplicate shared code (#5940) Georgi Gerganov 2024-03-09 12:47:57 +02:00
  • 9674aaf35c
    server : simplify logic for empty prompts (#5953) Georgi Gerganov 2024-03-09 12:34:18 +02:00
  • 950ba1ab84
    Server: reorganize some http logic (#5939) Xuan Son Nguyen 2024-03-09 11:27:53 +01:00
  • e1fa9569ba
    server : add SSL support (#5926) Gabe Goodhart 2024-03-09 02:57:09 -07:00
  • fd72d2d2a5
    server: tests: add truncated prompt tests, better kv cache size (#5933) Pierrick Hymbert 2024-03-09 10:30:04 +01:00
  • c2101a2e90
    llama : support Mamba Selective State Space Models (#5328) compilade 2024-03-08 17:31:00 -05:00
  • 515f7d0d4f
    llama : fix quantization of shared token_embd (#5944) compilade 2024-03-08 10:53:37 -05:00
  • 76e868821a
    server: metrics: add llamacpp:prompt_seconds_total and llamacpp:tokens_predicted_seconds_total, reset bucket only on /metrics. Fix values cast to int. Add Process-Start-Time-Unix header. (#5937) Pierrick Hymbert 2024-03-08 12:25:04 +01:00
  • e457fb3540
    llama : assume tied weights if lm_head/output weights is missing (#5824) Don Mahurin 2024-03-08 02:41:50 -08:00
  • af37fd8b30
    server : fix EOS token detection with disabled cache (#5938) Georgi Gerganov 2024-03-08 12:40:02 +02:00
  • 581ed5c4fe
    log : fix MSVC compile errors (#5643) UEXTM.com 2024-03-08 04:35:04 -05:00
  • 6cdabe6526
    llama-bench : add embeddings option (#5924) Georgi Gerganov 2024-03-07 16:32:38 +02:00
  • 89fb735fcf
    Revert "[SYCL] fix error when set main gpu to non-zero (#5901)" (#5918) Neo Zhang Jianyu 2024-03-07 19:14:49 +08:00
  • 55a2a900ff
    server : add /v1/completions endpoint (#5914) Minsoo Cheong 2024-03-07 19:42:39 +09:00
  • 2002bc96bf
    server : refactor (#5882) Georgi Gerganov 2024-03-07 11:41:53 +02:00
  • ceca1aef07
    [SYCL] fix error when set main gpu to non-zero (#5901) Neo Zhang Jianyu 2024-03-07 16:34:31 +08:00
  • e04e04f8fa
    ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906) Jared Van Bortel 2024-03-06 15:42:23 -05:00
  • e25fb4b18f
    ggml : use uint8x16_t return type for ggml_vqtbl1q_u8 (#5894) bobqianic 2024-03-06 07:35:07 +00:00
  • 1e35d619a6
    convert : remove AWQ remnants (#5768) Georgi Gerganov 2024-03-06 09:12:25 +02:00
  • 8ced9f7e32
    add wait() to make code stable (#5895) Neo Zhang Jianyu 2024-03-06 12:08:32 +08:00
  • 652ca2bded
    compare-llama-bench.py : remove mul_mat_q (#5892) slaren 2024-03-05 22:27:29 +01:00
  • bd836944f8
    quants : use MM256_SET_M128I consistently to fix gcc 7 build (#5889) Jared Van Bortel 2024-03-05 11:56:37 -05:00
  • 3de31677d3
    grammars : blacklists character control set (#5888) ExtReMLapin 2024-03-05 17:33:08 +01:00
  • 82cb31eb93
    Revert "grammars : don't allow to output unescaped new line in string (#5885)" Georgi Gerganov 2024-03-05 15:56:24 +02:00
  • b1a4e994fd
    grammars : don't allow to output unescaped new line in string (#5885) ExtReMLapin 2024-03-05 14:44:29 +01:00
  • 61d1c88e15
    Vulkan Improvements (#5835) 0cc4m 2024-03-05 13:33:42 +01:00
  • 21b0867433
    [SYCL] fix mul_mat fault in CI/unit-test (#5862) Neo Zhang Jianyu 2024-03-05 16:08:35 +08:00
  • 6a87ac3a52
    fix editorconfig check break (#5879) Minsoo Cheong 2024-03-05 15:12:23 +09:00
  • 29eee40474
    fix speculative decoding build on windows (#5874) Jeffrey Quesnelle 2024-03-04 19:23:06 -08:00