Commit graph

  • 470939d483
    common : preallocate sampling token data vector (#8363) Kevin Wang 2024-07-08 03:26:53 -04:00
  • 6f0dbf6ab0
    infill : assert prefix/suffix tokens + remove old space logic (#8351) Georgi Gerganov 2024-07-08 09:34:35 +03:00
  • ffd00797d8
    common : avoid unnecessary logits fetch (#8358) Kevin Wang 2024-07-08 02:31:55 -04:00
  • 04ce3a8b19
    readme : add supported glm models (#8360) toyer 2024-07-08 13:57:19 +08:00
  • 3fd62a6b1c
    py : type-check all Python scripts with Pyright (#8341) compilade 2024-07-07 15:04:39 -04:00
  • a8db2a9ce6
    Update llama-cli documentation (#8315) Denis Spasyuk 2024-07-07 09:08:28 -06:00
  • 4090ea5501
    ci : add checks for cmake,make and ctest in ci/run.sh (#8200) Alex Tuddenham 2024-07-07 15:59:14 +01:00
  • f1948f1e10
    readme : update bindings list (#8222) Andy Tai 2024-07-07 06:21:37 -07:00
  • f7cab35ef9
    gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#8048) Brian 2024-07-07 22:58:43 +10:00
  • 905942abdb
    llama : support glm3 and glm4 (#8031) toyer 2024-07-07 20:52:10 +08:00
  • b5040086d4
    llama : fix n_rot default (#8348) Georgi Gerganov 2024-07-07 14:59:02 +03:00
  • d39130a398
    py : use cpu-only torch in requirements.txt (#8335) compilade 2024-07-07 07:23:38 -04:00
  • b81ba1f96b
    finetune: Rename command name in README.md (#8343) standby24x7 2024-07-07 19:38:02 +09:00
  • 210eb9ed0a
    finetune: Rename an old command name in finetune.sh (#8344) standby24x7 2024-07-07 19:37:47 +09:00
  • cb4d86c4d7
    server: Retrieve prompt template in /props (#8337) Bjarke Viksøe 2024-07-07 11:10:38 +02:00
  • 86e7299ef5
    added support for Authorization Bearer tokens when downloading model (#8307) Derrick T. Woolworth 2024-07-06 15:32:04 -05:00
  • 60d83a0149
    update main readme (#8333) Xuan Son Nguyen 2024-07-06 19:01:23 +02:00
  • 87e25a1d1b
    llama : add early return for empty range (#8327) Daniel Bevenius 2024-07-06 09:22:16 +02:00
  • 213701b51a
    Detokenizer fixes (#8039) jaime-m-p 2024-07-05 19:01:35 +02:00
  • be20e7f49d
    Reorganize documentation pages (#8325) Xuan Son Nguyen 2024-07-05 18:08:32 +02:00
  • 7ed03b8974
    llama : fix compile warning (#8304) Georgi Gerganov 2024-07-05 17:32:09 +03:00
  • 1d894a790e
    cmake : add GGML_BUILD and GGML_SHARED macro definitions (#8281) Natsu 2024-07-05 22:29:35 +08:00
  • 1f3e1b66e2
    Enabled more data types for oneMKL gemm_batch (#8236) Ouadie EL FAROUKI 2024-07-05 13:23:25 +01:00
  • 148ec970b6
    convert : remove AWQ remnants (#8320) Georgi Gerganov 2024-07-05 10:15:36 +03:00
  • 2cccbaa008
    llama : minor indentation during tensor loading (#8304) Georgi Gerganov 2024-07-05 10:15:24 +03:00
  • 8e558309dc
    CUDA: MMQ support for iq4_nl, iq4_xs (#8278) Johannes Gäßler 2024-07-05 09:06:31 +02:00
  • 0a423800ff
    CUDA: revert part of the RDNA1 optimizations (#8309) Daniele 2024-07-05 07:06:09 +00:00
  • d12f781074
    llama : streamline embeddings from "non-embedding" models (#8087) Douglas Hanley 2024-07-05 02:05:56 -05:00
  • bcefa03bc0
    CUDA: fix MMQ stream-k rounding if ne00 % 128 != 0 (#8311) Johannes Gäßler 2024-07-05 09:05:34 +02:00
  • 5a7447c569
    readme : fix minor typos [no ci] (#8314) Pieter Ouwerkerk 2024-07-05 02:58:41 -04:00
  • 61ecafa390
    passkey : add short intro to README.md [no-ci] (#8317) Daniel Bevenius 2024-07-05 08:14:24 +02:00
  • aa5898dc53
    llama : prefer n_ over num_ prefix (#8308) Georgi Gerganov 2024-07-05 09:10:03 +03:00
  • 6c05752c50
    contributing : update guidelines (#8316) Georgi Gerganov 2024-07-05 09:09:47 +03:00
  • a9554e20b6
    [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) luoyu-intel 2024-07-05 05:06:13 +00:00
  • e235b267a2
    py : switch to snake_case (#8305) Georgi Gerganov 2024-07-05 07:53:33 +03:00
  • f09b7cb609
    rm get_work_group_size() by local cache for performance (#8286) Neo Zhang Jianyu 2024-07-05 10:32:29 +08:00
  • a38b884c6c
    cli: add EOT when user hit Ctrl+C (#8296) Xuan Son Nguyen 2024-07-04 20:55:03 +02:00
  • d7fd29fff1
    llama : add OpenELM support (#7359) Icecream95 2024-07-05 05:14:21 +12:00
  • 6f63d646c1
    tokenize : add --show-count (token) option (#8299) Daniel Bevenius 2024-07-04 18:38:58 +02:00
  • 51d2ebadbb build: Export hf-to-gguf as snakecase ditsuke 2024-07-04 20:54:35 +05:30
  • 1e920018d3 doc: Add context for why we add an explicit pytorch source ditsuke 2024-07-03 01:02:56 +05:30
  • 01a5f06550 chore: Remove rebase artifacts ditsuke 2024-07-02 15:48:13 +05:30
  • 07786a61a2 chore: Fixup requirements and build ditsuke 2024-07-02 15:35:43 +05:30
  • de14e2ea2b chore: ignore all __pychache__ ditsuke 2024-07-02 15:18:13 +05:30
  • 821922916f fix: Update script paths in CI scripts ditsuke 2024-03-10 23:21:46 +05:30
  • b1c3f26e5e fix: Actually include scripts in build ditsuke 2024-02-29 01:47:15 +05:30
  • b0a46993df build(python): Package scripts with pip-0517 compliance ditsuke 2024-02-27 12:01:02 +05:30
  • 807b0c49ff
    Inference support for T5 and FLAN-T5 model families (#5763) fairydreaming 2024-07-04 15:46:11 +02:00
  • f8c4c0738d
    tests : add _CRT_SECURE_NO_WARNINGS for WIN32 (#8231) Daniel Bevenius 2024-07-04 12:53:42 +02:00
  • 402d6feffa
    llama : suppress unref var in Windows MSVC (#8150) Daniel Bevenius 2024-07-04 12:50:57 +02:00
  • 20fc3804bf
    convert : fix gemma v1 tokenizer convert (#8248) Georgi Gerganov 2024-07-04 10:41:03 +03:00
  • f619024764
    [SYCL] Remove unneeded semicolons (#8280) AidanBeltonS 2024-07-04 02:07:19 +01:00
  • d23287f122
    Define and optimize RDNA1 (#8085) Daniele 2024-07-03 23:02:58 +00:00
  • 5f2d4e60e2
    ppl : fix n_seq_max for perplexity (#8277) slaren 2024-07-03 19:33:31 +02:00
  • 916248af1f
    fix phi 3 conversion (#8262) Xuan Son Nguyen 2024-07-03 16:01:54 +02:00
  • f8d6a23804
    fix typo (#8267) Judd 2024-07-03 20:40:16 +08:00
  • fadde67135
    Dequant improvements rebase (#8255) AidanBeltonS 2024-07-03 02:55:34 +01:00
  • a27152b602
    fix: add missing short command line argument -mli for multiline-input (#8261) MistApproach 2024-07-02 22:56:46 +02:00
  • 3e2618bc7b
    Adding step to clean target to remove legacy binary names to reduce upgrade / migration confusion arising from #7809. (#8257) Clint Herron 2024-07-02 13:19:56 -04:00
  • 07a3fc0608
    Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (#8258) Clint Herron 2024-07-02 12:18:10 -04:00
  • 968967376d
    Add JAIS model(s) (#8118) Faisal Zaghloul 2024-07-02 10:36:00 -04:00
  • 023b8807e1
    convert-hf : print output file name when completed (#8181) Daniel Bevenius 2024-07-02 08:40:49 +02:00
  • 0e0590adab
    cuda : update supports_op for matrix multiplication (#8245) slaren 2024-07-02 08:39:38 +02:00
  • a9f3b10215
    [SYCL] Fix win build conflict of math library (#8230) luoyu-intel 2024-07-02 04:50:07 +00:00
  • d08c20edde
    [SYCL] Fix the sub group size of Intel (#8106) luoyu-intel 2024-07-02 02:16:00 +00:00
  • 5fac350b9c
    Fix gemma2 tokenizer convert (#8244) Xuan Son Nguyen 2024-07-02 01:07:23 +02:00
  • cb5fad4c6c
    CUDA: refactor and optimize IQ MMVQ (#8215) Johannes Gäßler 2024-07-01 20:39:06 +02:00
  • dae57a1ebc
    readme: add Paddler to the list of projects (#8239) Mateusz Charytoniuk 2024-07-01 19:13:22 +02:00
  • 49122a873f
    gemma2: add sliding window mask (#8227) Xuan Son Nguyen 2024-07-01 18:48:34 +02:00
  • 0ddeff1023
    readme : update tool list (#8209) Roni 2024-07-01 14:48:16 +02:00
  • 3840b6f593
    nix : enable curl (#8043) Michael Francis 2024-07-01 07:47:04 -04:00
  • 257f8e41e2
    nix : remove OpenCL remnants (#8235) Georgi Gerganov 2024-07-01 14:46:18 +03:00
  • 694c59cb42
    Document BERT support. (#8205) iacore 2024-07-01 11:40:58 +00:00
  • 197fe6c1d7
    [SYCL] Update SYCL-Rope op and Refactor (#8157) zhentaoyu 2024-07-01 19:39:06 +08:00
  • d0a7145ba9
    flake.lock: Update (#8218) Georgi Gerganov 2024-07-01 02:09:34 +03:00
  • 9ef0780062
    Fix new line issue with chat template, disable template when in-prefix/suffix is set (#8203) Xuan Son Nguyen 2024-06-30 20:27:13 +02:00
  • 1c5eba6f8e
    llama: Add attention and final logit soft-capping, update scaling factor to Gemma2 (#8197) Andrei 2024-06-29 20:44:08 -07:00
  • 72272b83a3
    fix code typo in llama-cli (#8198) Xuan Son Nguyen 2024-06-29 00:14:20 +02:00
  • 8748d8ac6f
    json: attempt to skip slow tests when running under emulator (#8189) Olivier Chafik 2024-06-28 18:02:05 +01:00
  • 26a39bbd6b
    Add MiniCPM, Deepseek V2 chat template + clean up llama_chat_apply_template_internal (#8172) Xuan Son Nguyen 2024-06-28 15:11:44 +02:00
  • 38373cfbab
    Add SPM infill support (#8016) Sigbjørn Skjæret 2024-06-28 12:53:43 +02:00
  • b851b3fba0
    cmake : allow user to override default options (#8178) slaren 2024-06-28 12:37:45 +02:00
  • 139cc621e9
    json: restore default additionalProperties to false, fix some pattern escapes (#8180) Olivier Chafik 2024-06-28 09:26:45 +01:00
  • e57dc62057
    llama: Add support for Gemma2ForCausalLM (#8156) pculliton 2024-06-28 00:00:43 -04:00
  • a27aa50ab7
    Add missing items in makefile (#8177) Xuan Son Nguyen 2024-06-28 02:19:11 +02:00
  • cb0b06a8a6
    json: update grammars/README w/ examples & note about additionalProperties (#8132) Olivier Chafik 2024-06-27 22:08:42 +01:00
  • 558f44bf83
    CI: fix release build (Ubuntu+Mac) (#8170) loonerin 2024-06-27 15:01:23 -04:00
  • 8172ee9da9
    cmake : fix deprecated option names not working (#8171) slaren 2024-06-27 20:04:39 +02:00
  • 16791b8f0b
    Add chatml fallback for cpp llama_chat_apply_template (#8160) Xuan Son Nguyen 2024-06-27 18:14:19 +02:00
  • ab3679112d
    flake.lock: Update (#8071) Georgi Gerganov 2024-06-27 18:37:29 +03:00
  • 97877eb10b
    Control vector loading fixes (#8137) jukofyork 2024-06-27 15:48:07 +01:00
  • 387952651a
    Delete examples/llama.android/llama/CMakeLists.txt (#8165) Raj Hammeer Singh Hada 2024-06-27 20:09:29 +05:30
  • 6030c61281
    Add Qwen2MoE 57B-A14B model identifier (#8158) Sigbjørn Skjæret 2024-06-27 16:27:41 +02:00
  • 85a267daaa
    CUDA: fix MMQ stream-k for --split-mode row (#8167) Johannes Gäßler 2024-06-27 16:26:05 +02:00
  • f675b20a3b
    Added support for Viking pre-tokenizer (#8135) kustaaya 2024-06-27 11:58:54 +03:00
  • 911e35bb8b
    llama : fix CodeLlama FIM token checks (#8144) Sigbjørn Skjæret 2024-06-27 09:46:41 +02:00
  • ac146628e4
    Fix llama-android.cpp for error - "common/common.h not found" (#8145) Raj Hammeer Singh Hada 2024-06-27 07:27:57 +05:30
  • 9b31a40c6d
    clip : suppress unused variable warnings (#8105) Daniel Bevenius 2024-06-27 01:50:09 +02:00
  • c70d117c37
    scripts : fix filename sync Georgi Gerganov 2024-06-26 23:25:22 +03:00
  • ae5d0f4b89
    ci : publish new docker images only when the files change (#8142) slaren 2024-06-26 21:59:28 +02:00