Commit graph

  • 6084bfb261
    ggml : fix GGML_MAX_N_THREADS + improve formatting (ggml/969) Georgi Gerganov 2024-09-24 13:23:59 +03:00
  • faac0bae26
    common : ensure llama_batch size does not exceed max size (#9668) matiaslin 2024-09-29 05:25:00 -07:00
  • f99d3f8367
    py : add model class for Chameleon conversion (#9683) nopperl 2024-09-29 12:02:06 +00:00
  • 589b48d41e
    contrib : add Resources section (#9675) Georgi Gerganov 2024-09-29 14:38:18 +03:00
  • f4d2b8846a
    llama : add reranking support (#9510) Georgi Gerganov 2024-09-28 17:42:03 +03:00
  • 1b2f992cd2
    test-backend-ops : use flops for some performance tests (#9657) slaren 2024-09-28 14:32:46 +02:00
  • 739842703e
    llama : add comment about thread-safety [no ci] (#9449) Georgi Gerganov 2024-09-28 15:13:21 +03:00
  • 6102037bbb
    vocab : refactor tokenizer to reduce init overhead (#9449) Zhenwei Jin 2024-09-28 20:10:58 +08:00
  • 9a913110cf
    llama : add support for Chameleon (#8543) nopperl 2024-09-28 12:08:43 +00:00
  • 43bcdd9703
    readme : add tool (#9655) Aarni Koskela 2024-09-28 15:07:14 +03:00
  • 6a0f779484
    ggml : add run-time detection of neon, i8mm and sve (#9331) Dan Johansson 2024-09-28 14:06:16 +02:00
  • 89f9944981
    Enable use to the rebar feature to upload buffers to the device. (#9251) Markus Tavenrath 2024-09-28 12:05:05 +02:00
  • b5de3b74a5
    readme : update hot topics Georgi Gerganov 2024-09-27 20:57:51 +03:00
  • 44f59b4301
    cmake : add option for common library (#9661) Borislav Stanimirov 2024-09-27 10:42:06 +03:00
  • 95bc82fbc0
    [SYCL] add missed dll file in package (#9577) Neo Zhang Jianyu 2024-09-26 17:38:31 +08:00
  • 7691654c68
    mtgpu: enable VMM (#9597) R0CKSTAR 2024-09-26 09:27:40 +08:00
  • ea9c32be71
    ci : fix docker build number and tag name (#9638) Xuan Son Nguyen 2024-09-25 17:26:01 +02:00
  • 1e43630218
    ggml : remove assert for AArch64 GEMV and GEMM Q4 kernels (#9217) Charles Xu 2024-09-25 15:12:20 +02:00
  • afbbfaa537
    server : add more env vars, improve gen-docs (#9635) Xuan Son Nguyen 2024-09-25 14:05:13 +02:00
  • 3d6bf6919f
    llama : add IBM Granite MoE architecture (#9438) Gabe Goodhart 2024-09-25 01:06:52 -06:00
  • 904837e0cb
    cann: fix crash when llama-bench is running on multiple cann devices (#9627) Dou Xinpeng 2024-09-25 11:30:38 +08:00
  • 70392f1f81
    ggml : add AVX512DQ requirement for AVX512 builds (#9622) Eric Zhang 2024-09-24 16:03:21 +08:00
  • bb5f819975
    sync : ggml Georgi Gerganov 2024-09-24 11:01:18 +03:00
  • c038931615
    examples : adapt to ggml.h changes (ggml/0) Georgi Gerganov 2024-09-20 21:50:16 +03:00
  • 31ac5834fe
    llama : keep track of all EOG tokens in the vocab (#9609) Georgi Gerganov 2024-09-24 10:16:06 +03:00
  • cea1486ecf
    log : add CONT level for continuing previous log entry (#9610) Georgi Gerganov 2024-09-24 10:15:35 +03:00
  • 0aa15011e3
    server : add newline after chat example (#9616) StrangeBytesDev 2024-09-23 23:04:39 -07:00
  • b0f27361f3
    sampling : avoid expensive softmax during greedy sampling (#9605) Georgi Gerganov 2024-09-24 09:03:17 +03:00
  • c087b6f11d
    threads: fix msvc build without openmp (#9615) Max Krasnyansky 2024-09-23 21:18:48 -07:00
  • 116efee0ee
    cuda: add q8_0->f32 cpy operation (#9571) Ivan 2024-09-24 03:14:24 +03:00
  • 0b3bf966f4
    server : add --no-context-shift option (#9607) Xuan Son Nguyen 2024-09-23 22:23:54 +02:00
  • f0c7b5edf8
    threads: improve ggml_barrier scaling with large number of threads (#9598) Max Krasnyansky 2024-09-23 11:42:43 -07:00
  • 1d48e98e4f
    readme : add programmable prompt engine language CLI (#9599) Riceball LEE 2024-09-23 23:58:17 +08:00
  • f3979df762
    flake.lock: Update (#9586) Georgi Gerganov 2024-09-23 18:43:40 +03:00
  • 1e7b9299c6
    ggml : AVX512 gemm for Q4_0_8_8 (#9532) Srihari-mcw 2024-09-23 19:36:38 +05:30
  • 37f8c7b4c9
    perplexity : remove extra new lines after chunks (#9596) Georgi Gerganov 2024-09-23 11:28:02 +03:00
  • bf9c1013ac
    metal : use F32 prec for K*Q in vec FA (#9595) Georgi Gerganov 2024-09-23 11:27:47 +03:00
  • e62e9789cd
    Revert "[SYCL] fallback mmvq (#9088)" (#9579) Akarshan Biswas 2024-09-23 08:58:06 +05:30
  • c35e586ea5
    musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (#9526) R0CKSTAR 2024-09-22 22:55:49 +08:00
  • 912c331d3d
    Fix merge error in #9454 (#9589) Molly Sophia 2024-09-22 21:26:50 +08:00
  • a5b57b08ce
    CUDA: enable Gemma FA for HIP/Pascal (#9581) Johannes Gäßler 2024-09-22 09:34:52 +02:00
  • ecd5d6b65b
    llama: remove redundant loop when constructing ubatch (#9574) Shankar 2024-09-21 19:30:34 -07:00
  • 2a63caaa69
    RWKV v6: RWKV_WKV op CUDA implementation (#9454) Molly Sophia 2024-09-22 10:29:12 +08:00
  • d09770cae7
    ggml-alloc : fix list of allocated tensors with GGML_ALLOCATOR_DEBUG (#9573) slaren 2024-09-21 14:24:23 +02:00
  • 41f477879f
    Update CUDA graph on scale change plus clear nodes/params (#9550) agray3 2024-09-21 01:41:07 +01:00
  • e948a7da7a
    CI: Provide prebuilt windows binary for hip (#9467) Huang Qi 2024-09-21 08:39:41 +08:00
  • 63351143b2
    quantize : improve type name parsing (#9570) slaren 2024-09-20 20:55:36 +02:00
  • d13edb17ed ggml : fix builds (#0) Georgi Gerganov 2024-09-20 20:12:52 +03:00
  • 27609c49b9 ggml : fix trailing whitespace (#0) Georgi Gerganov 2024-09-20 19:13:02 +03:00
  • 4301535326 sync : ggml Georgi Gerganov 2024-09-20 19:06:59 +03:00
  • 424c5d00a9 ggml/examples: add backend support for numerical optimization (ggml/949) Johannes Gäßler 2024-09-20 19:04:44 +03:00
  • a6809c6a2e examples : add null threadpool args where needed (ggml/0) Georgi Gerganov 2024-09-08 11:10:43 +03:00
  • 5cb12f6839
    CUDA: fix sum.cu compilation for CUDA < 11.7 (#9562) Johannes Gäßler 2024-09-20 18:35:35 +02:00
  • d39e26741f
    examples : flush log upon ctrl+c (#9559) Georgi Gerganov 2024-09-20 11:46:56 +03:00
  • 722ec1eb51
    perplexity : do not escape input data by default (#9548) Sigbjørn Skjæret 2024-09-20 08:38:10 +02:00
  • 6026da52d6
    server : clean-up completed tasks from waiting list (#9531) Georgi Gerganov 2024-09-19 12:44:53 +03:00
  • eca0fab44e
    imatrix : disable prompt escape by default (#9543) Sigbjørn Skjæret 2024-09-19 09:58:14 +02:00
  • 64c6af3195
    ggml : fix n_threads_cur initialization with one thread (#9538) slaren 2024-09-18 19:13:08 +02:00
  • 0d2f22e45c
    scripts : verify py deps at the start of compare (#9520) Georgi Gerganov 2024-09-18 18:34:32 +03:00
  • 6443ddd985
    llama : use reserve/emplace_back in sampler_sample (#9534) Daniel Bevenius 2024-09-18 13:42:36 +02:00
  • 8a308354f6
    server : match OAI structured output response (#9527) Vinesh Janarthanan 2024-09-18 01:50:34 -05:00
  • f799155ab8
    server : fix OpenSSL build (remove obsolete LOG_INFO) (#9529) Eric Zhang 2024-09-18 14:28:20 +08:00
  • faf67b3de4
    [SYCL]set context default value to avoid memory issue, update guide (#9476) Neo Zhang Jianyu 2024-09-18 08:30:31 +08:00
  • 7be099fa81
    llama-bench: correct argument parsing error message (#9524) Michael Podvitskiy 2024-09-17 22:41:38 +02:00
  • 8b836ae731
    arg : add env variable for parallel (#9513) Bert Wagner 2024-09-17 09:35:38 -04:00
  • 8344ef58f8
    llama : fix n_vocab init for 'no_vocab' case (#9511) Michael Podvitskiy 2024-09-17 12:18:22 +02:00
  • 0226613853
    threadpool : skip polling for unused threads (#9461) Max Krasnyansky 2024-09-17 01:19:46 -07:00
  • 503147a9f9
    unicode : add <algorithm> (#9508) Yuri Khrustalev 2024-09-17 02:51:15 -04:00
  • 0d2ec43833
    llama : support IBM Granite architecture (#9412) Gabe Goodhart 2024-09-17 00:44:58 -06:00
  • 37f3a3810e
    llama : add llama_n_head() (#9512) Michael Podvitskiy 2024-09-17 08:23:30 +02:00
  • 23e0d70bac
    ggml : move common CPU backend impl to new header (#9509) slaren 2024-09-16 16:22:07 +02:00
  • acb2c32c33
    llama : rename n_embed to n_embd in rwkv6_time_mix (#9504) Daniel Bevenius 2024-09-16 13:07:13 +02:00
  • a6a3a5c531
    ggml : link MATH_LIBRARY not by its full path (#9339) Michael Podvitskiy 2024-09-16 13:06:50 +02:00
  • d54c21df7e
    convert : identify missing model files (#9397) compilade 2024-09-16 03:30:22 -04:00
  • 19514d632e
    cmake : do not hide GGML options + rename option (#9465) Georgi Gerganov 2024-09-16 10:27:50 +03:00
  • 5c3d0f1824
    ggml : IQ4_NL sgemm + Q4_0 AVX optimization (#9422) Eve 2024-09-16 06:48:24 +00:00
  • 0aadac10c7
    llama : support OLMoE (#9462) Shane A 2024-09-15 23:47:37 -07:00
  • 95ca85168b
    llama : support MiniCPM3 (#9322) CarryFun 2024-09-16 14:45:20 +08:00
  • 441b72b91f
    main : option to disable context shift (#9484) Vinesh Janarthanan 2024-09-16 01:20:01 -05:00
  • c4965a64f7
    metal : handle zero-sized allocs (#9466) Georgi Gerganov 2024-09-16 09:05:56 +03:00
  • 90a2fff0e7
    flake.lock: Update (#9488) Georgi Gerganov 2024-09-16 05:14:23 +03:00
  • 6262d13e0b
    common : reimplement logging (#9418) Georgi Gerganov 2024-09-15 20:46:12 +03:00
  • e6deac31f7
    gguf-split : add basic checks (#9499) slaren 2024-09-15 19:02:27 +02:00
  • 6988da94a2
    cmake : correct order of sycl flags (#9497) Michael Podvitskiy 2024-09-15 18:55:52 +02:00
  • 3c7989fd29
    py : add "LLaMAForCausalLM" conversion support (#9485) Csaba Kecskemeti 2024-09-15 00:48:25 -07:00
  • d6b37c881f
    readme : update tools list (#9475) OSecret 2024-09-15 10:36:53 +03:00
  • 7596487beb
    cmake : try to fix sycl+intel build (#9487) Michael Podvitskiy 2024-09-15 09:06:38 +02:00
  • 822b6322de
    ggml : ggml_type_name return "NONE" for invalid values (#9458) Yuri Khrustalev 2024-09-14 05:54:37 -04:00
  • dcdcee3a74
    server: add data: [DONE] to /chat/completions stream response (#9459) VoidIsVoid 2024-09-14 17:36:44 +08:00
  • 1f4111e540
    cmake : use list(APPEND ...) instead of set() + dedup linker (#9463) Georgi Gerganov 2024-09-14 10:55:05 +03:00
  • befaf1197f
    llama : make cell_id const in inp_s_mask block (#9470) Daniel Bevenius 2024-09-14 09:50:12 +02:00
  • feff4aa846
    server : add loading html page while model is loading (#9468) Xuan Son Nguyen 2024-09-13 14:23:11 +02:00
  • 0abc6a2c25
    llama : llama_perf + option to disable timings during decode (#9355) Georgi Gerganov 2024-09-13 09:53:38 +03:00
  • bd35cb0ae3
    feat: remove a sampler from a chain (#9445) Gilad S. 2024-09-13 04:54:49 +03:00
  • 78203641fe
    server : Add option to return token pieces in /tokenize endpoint (#9108) Mathijs Henquet 2024-09-12 22:30:11 +02:00
  • e6b7801bd1
    cann: Add host buffer type for Ascend NPU (#9406) Dou Xinpeng 2024-09-12 19:46:43 +08:00
  • e665744317
    llava : fix the script error in MobileVLM README (#9054) fengerhu1 2024-09-12 19:34:22 +08:00
  • d4c3c10fad
    lora : raise error if lm_head is ignored (#9103) Xuan Son Nguyen 2024-09-12 13:33:57 +02:00
  • 2a825116b6
    cmake : fix for builds without GGML_CDEF_PUBLIC (#9338) Michael Podvitskiy 2024-09-12 13:30:01 +02:00
  • 4dc4f5f14a
    ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) Huang Qi 2024-09-12 19:28:43 +08:00