Commit graph

  • 18e437665c
    metal : fix flash attention kernel requirements (#7169) Georgi Gerganov 2024-05-10 18:20:10 +03:00
  • 8c660242d7
    convert : print "ignore_merges" field Georgi Gerganov 2024-05-10 17:53:04 +03:00
  • 25c6e82e7a
    llama : use n_vocab to differentiate between mistral 7B and llama3 8B (#7200) slaren 2024-05-10 14:28:01 +02:00
  • 4e3880978f
    Fix memory bug in grammar parser (#7194) Justine Tunney 2024-05-10 07:01:08 -04:00
  • f89fe2732c
    Main+: optionally allow special tokens from user in interactive mode (#7097) HanishKVC 2024-05-10 15:51:58 +05:30
  • d11afd6652
    llava : fix moondream support (#7163) Andrei 2024-05-10 02:41:10 -04:00
  • 8c570c9496
    Minor arithmetic improvement to mmvq wrapper kernel (#7172) Ouadie EL FAROUKI 2024-05-10 01:32:15 +01:00
  • eaf4bd8b39
    eval-callback : fix conversion to float (#7184) slaren 2024-05-10 01:04:12 +02:00
  • befddd0f15
    Vulkan Bugfixes and Improvements (#7084) 0cc4m 2024-05-09 20:39:54 +02:00
  • d46dbc76f8
    readme : add scheduled server workflow status badge Georgi Gerganov 2024-05-09 16:40:42 +03:00
  • 0961d86604
    readme : add app (#6371) l3utterfly 2024-05-09 22:32:40 +09:00
  • 43248e5594
    llama3 custom regex split (#6965) jaime-m-p 2024-05-09 15:30:44 +02:00
  • a743d76a01
    CUDA: generalize FP16 fattn vec kernel (#7061) Johannes Gäßler 2024-05-09 14:32:02 +02:00
  • f31ec120bc
    Add warning if token is invalid (#7173) Galunid 2024-05-09 14:13:05 +02:00
  • fd9f92b154
    llama : update llama_timings.n_p_eval setting (#7160) Daniel Bevenius 2024-05-09 13:03:29 +02:00
  • 22842164bc
    gguf-py : add special token modification capability (#7166) Sigbjørn Skjæret 2024-05-09 12:56:00 +02:00
  • 4734524882
    opencl : alignment size converted from bits to bytes (#7090) Albert Jin 2024-05-09 17:34:37 +08:00
  • 07cd41d096
    TypoFix (#7162) Ahmet Zeer 2024-05-09 11:16:45 +03:00
  • 4426e2987b
    cmake : fix typo (#7151) Jared Van Bortel 2024-05-08 19:55:32 -04:00
  • f98eb31c51
    convert-hf : save memory with lazy evaluation (#7075) compilade 2024-05-08 18:16:38 -04:00
  • bc4bba364f
    Introduction of CUDA Graphs to LLama.cpp (#6766) agray3 2024-05-08 21:55:49 +01:00
  • c12452c7ae
    JSON: [key] -> .at(key), assert() -> GGML_ASSERT (#7143) Johannes Gäßler 2024-05-08 21:53:08 +02:00
  • 9da243b36a
    Revert "llava : add support for moondream vision language model (#6899)" Georgi Gerganov 2024-05-08 22:14:39 +03:00
  • bd1871fa2b
    server : add themes + favicon (#6848) JohnnyB 2024-05-08 20:12:06 +01:00
  • 26458af1d6
    metal : use vm_allocate instead of posix_memalign on macOS (#7078) Gilad S 2024-05-08 22:08:10 +03:00
  • 83330d8cd6
    main : add --conversation / -cnv flag (#7108) Dawid Potocki 2024-05-09 02:32:32 +12:00
  • 465263d0cf
    sgemm : AVX Q4_0 and Q8_0 (#6891) Eve 2024-05-08 14:29:23 +00:00
  • 911b3900dd
    server : add_special option for tokenize endpoint (#7059) Johan 2024-05-08 14:27:58 +02:00
  • ad211edef5
    convert.py : --vocab-only generates false but valid params (#7027) 20kdc 2024-05-08 13:22:32 +01:00
  • 229ffff872
    llama : add BPE pre-tokenization for Qwen2 (#7114) Ren Xuancheng 2024-05-08 20:06:43 +08:00
  • 1fd9c1741d
    clean up json_value & server_log (#7142) Xuan Son Nguyen 2024-05-08 13:24:14 +02:00
  • 4cd621c26d
    convert : add BPE pre-tokenization for DBRX (#7132) DAN™ 2024-05-08 06:43:23 -04:00
  • 7e0b6a7b3b
    py : also print the normalizers Georgi Gerganov 2024-05-08 12:47:07 +03:00
  • acdce3cdef
    compare-llama-bench.py: add missing basicConfig (#7138) Brian 2024-05-08 18:54:39 +10:00
  • 3855416027
    ggml : introduce bfloat16 support (#6412) Justine Tunney 2024-05-08 02:30:09 -04:00
  • c0e6fbf8c3
    metal : fix unused warning Georgi Gerganov 2024-05-08 09:14:50 +03:00
  • c780e75305
    Further tidy on Android instructions README.md (#7077) Jeximo 2024-05-07 21:26:43 -03:00
  • 48b2f9c1fc
    Fixed save_imatrix to match old behaviour for MoE (#7099) jukofyork 2024-05-08 01:24:16 +01:00
  • af0a5b6163
    server: fix incorrectly reported token probabilities (#7125) Johannes Gäßler 2024-05-07 23:07:58 +02:00
  • b6aa670203
    Fix OLMo HF to GGUF conversion (#6910) nopperl 2024-05-07 19:39:43 +00:00
  • 260b7c6529
    server : update readme with undocumented options (#7013) Kyle Mistele 2024-05-07 13:44:29 -05:00
  • 53d6c52e22
    readme : update hot topics Georgi Gerganov 2024-05-07 21:43:13 +03:00
  • 3af34c1d1b
    main : update log text (EOS to EOG) (#7104) RhinoDevel 2024-05-07 19:51:31 +02:00
  • 04976db7a8
    docs: fix typos (#7124) omahs 2024-05-07 17:20:33 +02:00
  • 947d3ad27d
    ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098) Georgi Gerganov 2024-05-07 11:08:49 +03:00
  • 858f6b73f6
    Add an option to build without CUDA VMM (#7067) William Tambellini 2024-05-06 11:12:14 -07:00
  • b3a995b416
    flake.lock: Update (#7079) Georgi Gerganov 2024-05-06 18:36:06 +03:00
  • bcdee0daa7
    minor : fix trailing whitespace Georgi Gerganov 2024-05-06 09:31:30 +03:00
  • 628b299106
    Adding support for the --numa argument for llama-bench. (#7080) kunnis 2024-05-05 07:17:47 -05:00
  • 8f8acc8683
    Disable benchmark on forked repo (#7034) Sigbjørn Skjæret 2024-05-05 13:38:55 +02:00
  • ca36326020
    readme : add note that LLaMA 3 is not supported with convert.py (#7065) Lyle Dean 2024-05-05 06:21:46 +01:00
  • 889bdd7686
    command-r : add BPE pre-tokenization (#7063) DAN™ 2024-05-05 01:19:30 -04:00
  • 6fbd432211
    py : logging and flake8 suppression refactoring (#7081) Brian 2024-05-05 15:07:48 +10:00
  • 842500144e
    gguf-split: add --no-tensor-first-split (#7072) Xuan Son Nguyen 2024-05-04 18:56:22 +02:00
  • cf768b7e71
    Tidy Android Instructions README.md (#7016) Jeximo 2024-05-04 13:10:15 -03:00
  • fcd84a0f5a
    Fix Linux /sys cpu path to guess number of cores (#7064) viric 2024-05-04 15:26:53 +02:00
  • 03fb8a002d
    If first token generated from the server is the stop word the server will crash (#7038) maor-ps 2024-05-04 12:06:40 +03:00
  • 92139b90af
    tests : add test-tokenizer-0.sh + fix some tokenizers (#7036) Georgi Gerganov 2024-05-04 08:32:32 +03:00
  • a2ac89d6ef
    convert.py : add python logging instead of print() (#6511) Brian 2024-05-04 05:36:41 +10:00
  • 433def286e
    llama : rename ctx to user_data in progress_callback (#7045) Daniel Bevenius 2024-05-03 15:24:30 +02:00
  • 60325fa56f
    Remove .attention from skipped tensors to match more accurately (#7051) Bartowski 2024-05-02 19:49:09 -04:00
  • 6ecf3189e0
    chore: fix typo in llama.cpp (#7032) alwqx 2024-05-02 23:56:41 +08:00
  • b0d943de17
    Update LOG_IMPL and LOG_TEE_IMPL (#7029) Andrew Downing 2024-05-01 17:31:30 -04:00
  • 8d608a81b7
    main : fix off by one error for context shift (#6921) l3utterfly 2024-05-02 04:27:41 +09:00
  • 3ea0d36000
    Server: add tests for batch size, different seeds (#6950) Johannes Gäßler 2024-05-01 17:52:55 +02:00
  • 1613ef8d8e
    CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) Johannes Gäßler 2024-05-01 14:46:37 +02:00
  • c4ec9c0d3d
    ci : exempt confirmed bugs from being tagged as stale (#7014) slaren 2024-05-01 07:13:59 +02:00
  • a8f9b07631
    perplexity: more statistics, added documentation (#6936) Johannes Gäßler 2024-04-30 23:36:27 +02:00
  • f364eb6fb5
    switch to using localizedDescription (#7010) Kevin Gibbons 2024-04-30 08:14:02 -07:00
  • 77e15bec62
    metal : remove deprecated error code (#7008) Georgi Gerganov 2024-04-30 15:52:21 +03:00
  • a68a1e7ed0
    metal : log more info on error (#6987) Kevin Gibbons 2024-04-30 02:34:50 -07:00
  • 9c67c2773d
    ggml : add Flash Attention (#5021) Georgi Gerganov 2024-04-30 12:16:08 +03:00
  • 952d03dbea
    convert : use utf8 encoding (#7000) Georgi Gerganov 2024-04-30 11:05:25 +03:00
  • 8843a98c2b
    Improve usability of --model-url & related flags (#6930) Olivier Chafik 2024-04-30 00:52:50 +01:00
  • b8c1476e44
    Extending grammar integration tests (#6644) Clint Herron 2024-04-29 14:40:14 -04:00
  • 5539e6fdd1
    main : fix typo in comment in main.cpp (#6985) Daniel Bevenius 2024-04-29 19:56:59 +02:00
  • b8a7a5a90f
    build(cmake): simplify instructions (cmake -B build && cmake --build build ...) (#6964) Olivier Chafik 2024-04-29 17:02:45 +01:00
  • d2c898f746
    ci : tmp disable gguf-split (#6983) Georgi Gerganov 2024-04-29 18:36:39 +03:00
  • 544f1f10ad
    ggml : fix __MSC_VER -> _MSC_VER (#6977) Georgi Gerganov 2024-04-29 17:55:02 +03:00
  • ffe666572f
    llava-cli : multiple images (#6969) cpumaxx 2024-04-29 07:34:24 -07:00
  • 24affa7db3
    readme : update hot topics Georgi Gerganov 2024-04-29 17:06:19 +03:00
  • f4ab2a4147
    llama : fix BPE pre-tokenization (#6920) Georgi Gerganov 2024-04-29 16:58:41 +03:00
  • 3f167476b1
    sampling : use std::random_device{}() for default random seed (#6962) David Renshaw 2024-04-29 09:35:45 -04:00
  • 3055a41805
    convert : fix conversion of some BERT embedding models (#6937) Christian Zhou-Zheng 2024-04-29 09:34:41 -04:00
  • 577277ffd2
    make : change GNU make default CXX from g++ to c++ (#6966) Przemysław Pawełczyk 2024-04-29 15:08:20 +02:00
  • ca7f29f568
    ci : add building in MSYS2 environments (Windows) (#6967) Przemysław Pawełczyk 2024-04-29 14:59:47 +02:00
  • c4f708a93f
    llama : fix typo LAMMAFILE -> LLAMAFILE (#6974) Johannes Gäßler 2024-04-29 14:36:22 +02:00
  • e00b4a8f81
    Fix more int overflow during quant (PPL/CUDA). (#6563) DAN™ 2024-04-28 18:38:44 -04:00
  • 7bb36ccf91
    gguf : enforce that tensor names are unique (#6905) Xuan Son Nguyen 2024-04-28 17:36:18 +02:00
  • ce023f6f2f
    add device version in device list (#6959) Neo Zhang 2024-04-28 22:40:31 +08:00
  • 6e472f58e4 flake.lock: Update github-actions[bot] 2024-04-28 00:18:27 +00:00
  • 4dba7e8114
    Replace "alternative" boolean operator in conditional compilation directive (#6949) mgroeber9110 2024-04-27 21:02:06 +02:00
  • b7368332e2
    ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935) Pierrick Hymbert 2024-04-27 17:50:48 +02:00
  • 928e0b7013
    Reset schedule earlier to allow overlap with ggml graph computation on device (#6933) agray3 2024-04-26 19:08:30 +01:00
  • 0c4d489e29
    quantize: add imatrix and dataset metadata in GGUF (#6658) Pierrick Hymbert 2024-04-26 20:06:33 +02:00
  • 017e6999b5
    add basic tensor data validation function (#6884) slaren 2024-04-26 18:39:58 +02:00
  • e2764cd7ca
    gguf : fix mismatch between alloc and free functions (#6929) slaren 2024-04-26 17:07:42 +02:00
  • 4b1c3c98b4
    llamafile : use 64-bit integers in sgemm (#6928) Justine Tunney 2024-04-26 10:05:33 -04:00
  • bbe3c6e761
    ci: server: fix python installation (#6925) Pierrick Hymbert 2024-04-26 12:27:25 +02:00
  • 7f5ff558ee
    server: stop generation at n_ctx_train if n_predict is not set (#6638) Pierrick Hymbert 2024-04-26 12:15:30 +02:00