Commit graph

  • 063d99ad11
    [SYCL] fix scratch size of softmax (#8642) luoyu-intel 2024-07-23 07:43:28 +00:00
  • 081fe431aa
    llama : fix codeshell support (#8599) Keke Han 2024-07-23 00:43:43 +08:00
  • d94c6e0ccb
    llama : add support for SmolLm pre-tokenizer (#8609) Jason Stillerman 2024-07-22 10:43:01 -04:00
  • 566daa5a5b
    *.py: Stylistic adjustments for python (#8233) Jiří Podivín 2024-07-22 15:44:53 +02:00
  • 6f11a83e4e
    llama : allow overrides for tokenizer flags (#8614) Georgi Gerganov 2024-07-22 13:33:22 +03:00
  • e093dd2382
    tests : re-enable tokenizer tests (#8611) Georgi Gerganov 2024-07-22 13:32:49 +03:00
  • 50e05353e8
    llama : add Mistral Nemo inference support (#8604) Douglas Hanley 2024-07-22 03:06:17 -05:00
  • 628154492a
    server : update doc to clarify n_keep when there is bos token (#8619) Jan Boon 2024-07-22 16:02:09 +08:00
  • 04bab6b7da
    ggml: fix compile error for RISC-V (#8623) Mark Zhuang 2024-07-22 15:56:45 +08:00
  • b7c11d36e6
    examples: fix android example cannot be generated continuously (#8621) devojony 2024-07-22 14:54:42 +08:00
  • 45f2c19cc5
    flake.lock: Update (#8610) Georgi Gerganov 2024-07-21 16:45:10 +03:00
  • 22f281aa16
    examples : Rewrite pydantic_models_to_grammar_examples.py (#8493) M-A 2024-07-20 22:09:17 -04:00
  • 328884f421
    gguf-py : fix some metadata name extraction edge cases (#8591) compilade 2024-07-20 21:58:49 -04:00
  • c69c63039c
    convert_hf : fix Gemma v1 conversion (#8597) compilade 2024-07-20 21:53:01 -04:00
  • 69c487f4ed
    CUDA: MMQ code deduplication + iquant support (#8495) Johannes Gäßler 2024-07-20 22:25:26 +02:00
  • 07283b1a90
    gguf : handle null name during init (#8587) Georgi Gerganov 2024-07-20 17:15:42 +03:00
  • 940362224d
    llama : add support for Tekken pre-tokenizer (#8579) Michael Coppola 2024-07-20 09:43:51 -04:00
  • 69b9945b44
    llama.swiftui: fix end of generation bug (#8268) Huifeng Ou 2024-07-20 09:09:37 -04:00
  • c3776cacab
    gguf_dump.py: fix markddown kv array print (#8588) Brian 2024-07-20 17:35:25 +10:00
  • 87e397d00b
    ggml : fix quant dot product with odd number of blocks (#8549) slaren 2024-07-19 17:17:27 +02:00
  • 57b1d4f9eb
    convert-*.py: remove add_name from ChatGLMModel class (#8590) Brian 2024-07-20 00:04:38 +10:00
  • d197545530
    llama : bump max layers from 256 to 512 (#8530) Georgi Gerganov 2024-07-19 16:50:47 +03:00
  • be0cfb4175
    readme : fix server badge Georgi Gerganov 2024-07-19 14:34:55 +03:00
  • b57eb9ca4f
    ggml : add friendlier error message to fopen errors (#8575) Clint Herron 2024-07-19 07:05:45 -04:00
  • f299aa98ec
    fix: typo of chatglm4 chat tmpl (#8586) Frank Mai 2024-07-19 17:44:41 +08:00
  • 3d0e4367d9
    convert-*.py: add general.name kv override (#8571) Brian 2024-07-19 17:51:51 +10:00
  • a15ef8f8a0
    CUDA: fix partial offloading for ne0 % 256 != 0 (#8572) Johannes Gäßler 2024-07-18 23:48:47 +02:00
  • 705b7ecf60
    cmake : install all ggml public headers (#8480) 65a 2024-07-18 07:47:12 -07:00
  • 0d2c7321e9
    server: use relative routes for static files in new UI (#8552) Eric Zhang 2024-07-18 18:43:49 +08:00
  • 672a6f1018
    convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499) Brian 2024-07-18 20:40:15 +10:00
  • 3807c3de04
    server : respect --special cli arg (#8553) RunningLeon 2024-07-18 16:06:22 +08:00
  • e02b597be3
    lookup: fibonacci hashing, fix crashes (#8548) Johannes Gäßler 2024-07-17 23:35:44 +02:00
  • b3283448ce
    build : Fix docker build warnings (#8535) (#8537) Al Mochkin 2024-07-17 20:21:55 +02:00
  • 30f80ca0bc
    CONTRIBUTING.md : remove mention of noci (#8541) Brian 2024-07-18 00:57:06 +10:00
  • 1bdd8ae19f
    [CANN] Add Ascend NPU backend (#6035) hipudding 2024-07-17 19:23:50 +08:00
  • da3913d8f9
    batched: fix n_predict parameter (#8527) Masaya, Kato 2024-07-17 16:34:28 +09:00
  • d65a8361fe
    llama : disable context-shift for DeepSeek v2 (#8501) Georgi Gerganov 2024-07-17 10:32:59 +03:00
  • 5e116e8dd5
    make/cmake: add missing force MMQ/cuBLAS for HIP (#8515) Johannes Gäßler 2024-07-16 21:20:59 +02:00
  • 1666f92dcd
    gguf-hash : update clib.json to point to original xxhash repo (#8491) Brian 2024-07-16 17:14:16 +10:00
  • 37b12f92ab
    export-lora : handle help argument (#8497) Steve Bonds 2024-07-16 00:04:45 -07:00
  • 0efec57787
    llama : valign + remove unused ftype (#8502) Georgi Gerganov 2024-07-16 10:00:30 +03:00
  • 7acfd4e8d5
    convert_hf : faster lazy safetensors (#8482) compilade 2024-07-15 23:13:10 -04:00
  • 97bdd26eee
    Refactor lora adapter support (#8332) Xuan Son Nguyen 2024-07-15 20:50:47 +02:00
  • 4db8f60fe7
    fix ci (#8494) Xuan Son Nguyen 2024-07-15 19:23:10 +02:00
  • 8fac431b06
    ggml : suppress unknown pragma 'GCC' on windows (#8460) Daniel Bevenius 2024-07-15 14:48:17 +02:00
  • f17f39ff9c
    server: update README.md with llama-server --help output [no ci] (#8472) M-A 2024-07-15 08:04:56 -04:00
  • 9104bc20ed
    common : add --no-cont-batching arg (#6358) Georgi Gerganov 2024-07-15 14:54:58 +03:00
  • fc690b018e
    docs: fix links in development docs [no ci] (#8481) NikolaiLyssogor 2024-07-15 04:46:39 -07:00
  • 16bdfa42ac
    [SYCL] add concat through dim 1/2 (#8483) Meng, Hengyu 2024-07-15 19:32:15 +08:00
  • 3dfda05956
    llama : de-duplicate deepseek2 norm Georgi Gerganov 2024-07-15 14:10:39 +03:00
  • bda62d7999
    Vulkan MMQ Fix (#8479) 0cc4m 2024-07-15 09:38:52 +02:00
  • 090fca7a07
    pydantic : replace uses of __annotations__ with get_type_hints (#8474) compilade 2024-07-14 19:51:21 -04:00
  • aaab2419ea
    flake.lock: Update (#8475) Georgi Gerganov 2024-07-14 18:54:02 +03:00
  • 73cf442e7b
    llama : fix Gemma-2 Query scaling factors (#8473) Georgi Gerganov 2024-07-14 14:05:09 +03:00
  • e236528e76
    gguf_hash.py: Add sha256 (#8470) Brian 2024-07-14 16:47:14 +10:00
  • fa79495bb4
    llama : fix pre-tokenization of non-special added tokens (#8228) compilade 2024-07-13 23:35:10 -04:00
  • 17eb6aa8a9
    vulkan : cmake integration (#8119) bandoti 2024-07-13 13:12:39 -03:00
  • c917b67f06
    metal : template-ify some of the kernels (#8447) Georgi Gerganov 2024-07-13 18:32:33 +03:00
  • 4e24cffd8c
    server : handle content array in chat API (#8449) Georgi Gerganov 2024-07-12 14:48:15 +03:00
  • 6af51c0d96
    main : print error on empty input (#8456) Georgi Gerganov 2024-07-12 14:48:04 +03:00
  • f53226245f
    llama : suppress unary minus operator warning (#8448) Daniel Bevenius 2024-07-12 11:05:21 +02:00
  • c3ebcfa148
    server : ensure batches are either all embed or all completion (#8420) Douglas Hanley 2024-07-12 03:14:12 -05:00
  • 8a4441ea1a
    docker : fix filename for convert-hf-to-gguf.py in tools.sh (#8441) Armen Kaleshian 2024-07-12 04:08:19 -04:00
  • 5aefbce27a
    convert : remove fsep token from GPTRefactForCausalLM (#8237) Jiří Podivín 2024-07-12 10:06:33 +02:00
  • 71c1121d11
    examples : sprintf -> snprintf (#8434) Georgi Gerganov 2024-07-12 10:46:14 +03:00
  • 370b1f7e7a
    ggml : minor naming changes (#8433) Georgi Gerganov 2024-07-12 10:46:02 +03:00
  • b549a1bbef
    [SYCL] fix the mul_mat_id ut issues (#8427) Chen Xi 2024-07-12 00:52:04 +00:00
  • 368645698a
    ggml : add NVPL BLAS support (#8329) (#8425) Nicholai Tukanov 2024-07-11 11:49:15 -05:00
  • b078c619aa
    cuda : suppress 'noreturn' warn in no_device_code (#8414) Daniel Bevenius 2024-07-11 17:53:42 +02:00
  • 808aba3916
    CUDA: optimize and refactor MMQ (#8416) Johannes Gäßler 2024-07-11 16:47:47 +02:00
  • a977c11544
    gitignore : deprecated binaries Georgi Gerganov 2024-07-11 11:20:40 +03:00
  • 9a55ffe6fb
    tokenize : add --no-parse-special option (#8423) compilade 2024-07-11 03:41:48 -04:00
  • 7a221b672e
    llama : use F32 precision in Qwen2 attention and no FA (#8412) Georgi Gerganov 2024-07-11 10:21:30 +03:00
  • 278d0e1846
    Initialize default slot sampling parameters from the global context. (#8418) Clint Herron 2024-07-10 20:08:17 -04:00
  • dd07a123b7
    Name Migration: Build the deprecation-warning 'main' binary every time (#8404) Clint Herron 2024-07-10 12:35:18 -04:00
  • f4444d992c
    [SYCL] Use multi_ptr to clean up deprecated warnings (#8256) AidanBeltonS 2024-07-10 16:10:49 +01:00
  • 6b2a849d1f
    ggml : move sgemm sources to llamafile subfolder (#8394) Georgi Gerganov 2024-07-10 15:23:29 +03:00
  • 0f1a39f343
    ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780) Dibakar Gope 2024-07-10 07:14:51 -05:00
  • 83321c6958
    gguf-py rel pipeline (#8410) M. Yusuf Sarıgöz 2024-07-10 15:12:35 +03:00
  • cc61948b1f
    llama : C++20 compatibility for u8 strings (#8408) Borislav Stanimirov 2024-07-10 14:45:44 +03:00
  • 7a80710d93
    msvc : silence codecvt c++17 deprecation warnings (#8395) Borislav Stanimirov 2024-07-10 14:40:53 +03:00
  • a8be1e6f59
    llama : add assert about missing llama_encode() call (#8400) fairydreaming 2024-07-10 13:38:58 +02:00
  • e4dd31ff89
    py : fix converter for internlm2 (#8321) RunningLeon 2024-07-10 19:26:40 +08:00
  • 8f0fad42b9
    py : fix extra space in convert_hf_to_gguf.py (#8407) laik 2024-07-10 19:19:10 +08:00
  • a59f8fdc85
    Server: Enable setting default sampling parameters via command-line (#8402) Clint Herron 2024-07-09 18:26:40 -04:00
  • fd560fe680
    Update README.md to fix broken link to docs (#8399) Andy Salerno 2024-07-09 11:58:44 -07:00
  • e500d6135a
    Deprecation warning to assist with migration to new binary names (#8283) Clint Herron 2024-07-09 11:54:43 -04:00
  • a03e8dd99d
    make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392) Johannes Gäßler 2024-07-09 17:11:07 +02:00
  • 5b0b8d8cfb
    sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372) Alberto Cabrera Pérez 2024-07-09 15:03:15 +01:00
  • 9925ca4087
    cmake : allow external ggml (#8370) Borislav Stanimirov 2024-07-09 11:38:00 +03:00
  • 9beb2dda03
    readme : fix typo [no ci] (#8389) daghanerdonmez 2024-07-09 09:16:00 +03:00
  • 7d0e23d72e
    gguf-py : do not use internal numpy types (#7472) compilade 2024-07-09 01:04:49 -04:00
  • 7fdb6f73e3
    flake.lock: Update (#8342) Georgi Gerganov 2024-07-09 01:36:38 +03:00
  • a130eccef4
    labeler : updated sycl to match docs and code refactor (#8373) Alberto Cabrera Pérez 2024-07-08 21:35:17 +01:00
  • c4dd11d1d3
    readme : fix web link error [no ci] (#8347) b4b4o 2024-07-08 22:19:24 +08:00
  • 2ec846d558
    sycl : fix powf call in device code (#8368) Alberto Cabrera Pérez 2024-07-08 14:22:41 +01:00
  • 3f2d538b81
    scripts : fix sync for sycl Georgi Gerganov 2024-07-08 13:51:31 +03:00
  • 2ee44c9a18 sync : ggml Georgi Gerganov 2024-07-08 10:39:50 +03:00
  • 6847d54c4f tests : fix whitespace (#0) Georgi Gerganov 2024-07-08 10:39:36 +03:00
  • fde13b3bb9 feat: cuda implementation for ggml_conv_transpose_1d (ggml/854) John Balis 2024-07-02 11:09:52 -05:00