Commit graph

  • 5107e8cea3
    DRY: Fixes clone functionality (#10192) wwoodsTM 2024-11-07 08:20:25 -07:00
  • 2319126a70
    fix q4_0_8_8 format for corrupted tokens issue (#10198) snadampal 2024-11-07 02:02:08 -06:00
  • 3bcd40b3c5
    Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133) Zhiyuan Li 2024-11-07 18:19:10 +11:00
  • 5c333e0140
    metal : add BF16 support (#8439) Georgi Gerganov 2024-11-06 19:53:51 +02:00
  • b11f9ba9b8
    server : remove hack for extra parallel slot (#10187) Georgi Gerganov 2024-11-06 13:29:01 +02:00
  • 94d8cb8be1
    metal : fix from ptr buffer name (#10189) Diego Devesa 2024-11-06 12:10:07 +01:00
  • 1dc04b2dee
    ggml : adjust is_first_call init value (#10193) Georgi Gerganov 2024-11-06 11:20:10 +02:00
  • a1eaf6a960
    metal : add quantized FA support (#10149) Georgi Gerganov 2024-11-06 10:24:23 +02:00
  • b8deef0ec0
    llama : add <|tool_call|> formatting to Granite template (#10177) Gabe Goodhart 2024-11-05 05:23:04 -07:00
  • a9e8a9a030
    ggml : fix arch check in bf16_to_fp32 (#10164) Diego Devesa 2024-11-04 23:17:01 +01:00
  • 3407364776
    Q6_K AVX improvements (#10118) Eve 2024-11-04 22:06:31 +00:00
  • d5a409e57f
    ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
  • 401558b7ba
    ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
  • 9e0ecfb697
    server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
  • 6a066b9978
    fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
  • ea02c753eb
    cuda : clear error after changing peer access (#10153) Diego Devesa 2024-11-04 13:10:23 +01:00
  • 05697f670b
    metal : simplify f16 and f32 dequant kernels (#0) Georgi Gerganov 2024-11-04 13:49:34 +02:00
  • f8e58135cf
    metal : move dequantize templates to beginning of MSL source (#0) Georgi Gerganov 2024-11-04 13:43:32 +02:00
  • 329ed914c9
    CANN: adjust backend registry refactor. (#10158) leo-pony 2024-11-04 19:08:22 +08:00
  • ce027adfb3
    sync : ggml Georgi Gerganov 2024-11-04 10:33:37 +02:00
  • 284e5b0275
    cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
  • e2292aaa17
    metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
  • 9f40989351
    ggml : move CPU backend to a separate file (#10144) Diego Devesa 2024-11-03 19:34:08 +01:00
  • 08828a6d7d
    metal : minor fixup in FA kernel (#10143) Georgi Gerganov 2024-11-03 15:18:40 +02:00
  • 1839f69130
    flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
  • 9830b6923b
    Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
  • 42cadc74bd
    server : fix slot selection by lru (#10126) sasha0552 2024-11-02 16:34:56 +00:00
  • 45950415ed
    server : fix endpoint checks (#10135) Georgi Gerganov 2024-11-02 18:34:00 +02:00
  • 1926d6e39d
    llama : adjust default context size + print warnings (#10136) Georgi Gerganov 2024-11-02 15:18:56 +02:00
  • b634f8a26f
    simple-chat : only add bos on first prompt (#10129) Diego Devesa 2024-11-02 13:08:53 +01:00
  • 7554aa4655
    convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
  • a6744e43e8
    llama : add simple-chat example (#10124) Diego Devesa 2024-11-01 23:50:59 +01:00
  • e991e3127f
    llama : use smart pointers for ggml resources (#10117) Diego Devesa 2024-11-01 23:48:26 +01:00
  • 418f5eef26
    vulkan : improve ggml_vk_create_buffer error handling (#9898) Shupei Fan 2024-11-02 02:33:14 +08:00
  • ba6f62eb79
    readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
  • d865d1478c
    server : fix smart selection of available slot (#10120) sasha0552 2024-11-01 13:33:14 +00:00
  • 1804adb0cf
    ggml : remove ggml_scratch (#10121) Georgi Gerganov 2024-11-01 12:58:45 +02:00
  • 815fe72adc
    sync : ggml Georgi Gerganov 2024-11-01 10:28:24 +02:00
  • f221d56220
    ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
  • e597e50794
    build: fix build error in Windows env with OneAPI setup (#10107) Zhenwei Jin 2024-11-01 11:09:59 +08:00
  • 85679d37f3
    llama : improve output buffer type selection (#10098) Diego Devesa 2024-11-01 00:49:53 +01:00
  • 1e9f94994e
    quantize : fix --keep-split (#10114) Diego Devesa 2024-11-01 00:45:34 +01:00
  • c02e5ab2a6
    llama : fix buffer checks for mamba and rwk (#10111) Diego Devesa 2024-10-31 22:54:23 +01:00
  • ab3d71f97f
    loader: refactor tensor weights storage (#9935) Zhenwei Jin 2024-11-01 02:50:39 +08:00
  • 0a683e8088
    server : include scheme when printing URL (#10106) Kevin Gibbons 2024-10-31 06:02:35 -07:00
  • dea5e86051
    ggml : check tensor name lengths in gguf files (#10100) Diego Devesa 2024-10-31 11:40:59 +01:00
  • 1329c0a75e
    kompute: add mul_mat_q4_k shader (#10097) Sergio López 2024-10-31 10:09:52 +01:00
  • 61408e7fad
    kompute: add backend registry / device interfaces (#10045) Sergio López 2024-10-30 17:01:52 +01:00
  • b9e02e8184
    ggml : fix memory leaks when loading invalid gguf files (#10094) Diego Devesa 2024-10-30 14:51:21 +01:00
  • 6763f713bb
    readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
  • 79a2bc042d
    convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
  • fc83a9e584
    ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) xctan 2024-10-30 15:00:40 +08:00
  • c5b0f4b5d9
    llama : refactor model loader with backend registry (#10026) Diego Devesa 2024-10-30 02:01:23 +01:00
  • 8f275a7c45
    ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) Changyeon Kim 2024-10-29 17:52:56 +09:00
  • 8d8ff71536
    llama : remove Tail-Free sampling (#10071) Georgi Gerganov 2024-10-29 10:42:05 +02:00
  • 61715d5cc8
    llama : Add IBM granite template (#10013) arch-btw 2024-10-28 10:45:33 -07:00
  • 07028f9d74
    flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
  • 524afeec9d
    musa: workaround for Guilty Lockup in cleaning src0 (#10042) R0CKSTAR 2024-10-28 17:02:48 +08:00
  • 8125e6cbfc
    server : don't overfill the batch during infill (#10018) Georgi Gerganov 2024-10-28 08:49:32 +02:00
  • 8841ce3f43
    llama : switch KQ multiplication to F32 precision by default (#10015) Georgi Gerganov 2024-10-27 20:59:58 +02:00
  • cc2983d375
    sync : ggml Georgi Gerganov 2024-10-26 10:34:08 +03:00
  • 8c60a8a462
    increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
  • 9e4a2563ea
    scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
  • 668750357e
    metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
  • ff252ea48e
    llama : add DRY sampler (#9702) wwoodsTM 2024-10-25 10:07:34 -06:00
  • d80fb71f8b
    llama: string_split fix (#10022) Michael Podvitskiy 2024-10-25 17:57:54 +02:00
  • 2f8bd2b901
    llamafile : extend sgemm.cpp support for Q5_0 models (#10010) Srihari-mcw 2024-10-25 12:57:41 +05:30
  • bc5ba007b2
    server : check that the prompt fits in the slot's context (#10030) Georgi Gerganov 2024-10-25 10:13:46 +03:00
  • 958367bf53
    server : refactor slot input data, move tokenizer to HTTP thread (#10023) Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
  • 40f2555797
    ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
  • 167a515651
    CUDA: fix insufficient buffer clearing for MMQ (#10032) Johannes Gäßler 2024-10-24 14:40:23 +02:00
  • c39665f589
    CUDA: fix MMQ for non-contiguous src0, add tests (#10021) Johannes Gäßler 2024-10-24 11:09:36 +02:00
  • 0a1c750c80
    server : samplers accept the prompt correctly (#10019) wwoodsTM 2024-10-23 13:27:51 -06:00
  • 190a37d797
    sync : ggml Georgi Gerganov 2024-10-23 17:23:55 +03:00
  • 2d3aba9ee8
    llama.vim : bump generation time limit to 3s [no ci] Georgi Gerganov 2024-10-23 17:16:56 +03:00
  • 80273a306d CUDA: fix 1D im2col, add tests (ggml/993) Johannes Gäßler 2024-10-18 09:24:44 +02:00
  • c19af0acb1 ggml : remove redundant set of contexts used field (ggml/978) Daniel Bevenius 2024-10-16 20:10:01 +02:00
  • ac113a0fee
    llama.vim : add classic vim support (#9995) Michael Coppola 2024-10-23 07:09:26 -04:00
  • 4c9388fb96
    metal : add POOL2D and fix IM2COL (#9943) Jun Hee Yoo 2024-10-23 19:33:45 +09:00
  • 873279b159 flake.lock: Update github-actions[bot] 2024-10-20 00:22:59 +00:00
  • c8c07d658a
    llama : fix empty batch causing llama_batch_allocr to crash (#9966) Xuan Son Nguyen 2024-10-22 16:59:02 +02:00
  • 19d900a756
    llama : rename batch to ubatch (#9950) Daniel Bevenius 2024-10-22 15:31:06 +02:00
  • 11d47057a5
    Rwkv chat template fix (#10001) Molly Sophia 2024-10-22 21:22:26 +08:00
  • c421ac072d
    lora : warn user if new token is added in the adapter (#9948) Xuan Son Nguyen 2024-10-22 13:08:41 +02:00
  • 4ff7fe1fb3
    llama : add chat template for RWKV-World + fix EOT (#9968) Molly Sophia 2024-10-22 18:33:37 +08:00
  • 6b8447352d
    [CANN] Adapt to dynamically loadable backends mechanism (#9970) leo-pony 2024-10-22 16:16:01 +08:00
  • 674804a996
    arg : fix typo in embeddings argument help [no ci] (#9994) Daniel Bevenius 2024-10-22 09:40:02 +02:00
  • e94a138d64
    llama.vim : fix info text display [no ci] (#9787) Georgi Gerganov 2024-10-22 00:35:25 +03:00
  • e01c67affe
    llama.vim : move info to the right of screen [no ci] (#9787) Georgi Gerganov 2024-10-21 22:52:22 +03:00
  • 994cfb1acb
    readme : update UI list (#9972) Asghar Ghorbani 2024-10-21 20:20:59 +02:00
  • 94008cc760
    arg : fix attention non-causal arg value hint (#9985) Daniel Bevenius 2024-10-21 20:12:52 +02:00
  • dbd5f2f573
    llama.vim : plugin for Neovim (#9787) Georgi Gerganov 2024-10-21 20:25:02 +03:00
  • f594bc80ba
    ggml : add asserts for type conversion in fattn kernels (#9971) Georgi Gerganov 2024-10-21 16:20:46 +03:00
  • d5ebd79c76
    rpc : pack only RPC structs (#9959) Radoslav Gerganov 2024-10-21 13:35:40 +03:00
  • 55e47786e3
    llama : default sampling changes + greedy update (#9897) Georgi Gerganov 2024-10-21 09:46:40 +03:00
  • bc21975084
    speculative : fix handling of some input params (#9963) Georgi Gerganov 2024-10-21 09:37:12 +03:00
  • 1db8c84fc6
    fix mul_mat_vec_q and *_vec_q error (#9939) Neo Zhang Jianyu 2024-10-21 14:26:09 +08:00
  • 45f097645e
    readme : update bindings list (#9951) Loïc Carrère 2024-10-20 18:25:41 +02:00
  • 7cab2083c7
    readme : update infra list (#9942) icppWorld 2024-10-20 12:01:34 -04:00
  • cda0e4b648
    llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745) Xuan Son Nguyen 2024-10-18 23:18:01 +02:00