Commit graph

  • 15fa07a5c5
    make : use C compiler to build metal embed object (#8899) slaren 2024-08-07 18:24:05 +02:00
  • be55695eff
    ggml-backend : fix async copy from CPU (#8897) slaren 2024-08-07 13:29:02 +02:00
  • 0478174d59
    [SYCL] Updated SYCL device filtering (#8901) Ouadie EL FAROUKI 2024-08-07 11:25:36 +01:00
  • a8dbc6f753
    CUDA/HIP: fix tests/test-backend-ops (#8896) Johannes Gäßler 2024-08-07 09:07:52 +02:00
  • 506122d854
    llama-bench : add support for getting cpu info on Windows (#8824) Zhenwei Jin 2024-08-07 09:01:06 +08:00
  • 725e3d9437
    quantize : update usage comment in quantize.cpp (#8889) Daniel Bevenius 2024-08-07 01:43:00 +02:00
  • 31958546c3
    typo correction (#8891) Nexes the Old 2024-08-07 01:41:54 +02:00
  • 1e6f6554aa
    server : add lora hotswap endpoint (WIP) (#8857) Xuan Son Nguyen 2024-08-06 17:33:39 +02:00
  • 641f5dd2a6
    CUDA: fix padding logic for FP16/FP32 (#8884) Johannes Gäßler 2024-08-06 17:13:55 +02:00
  • 5f4dcb1e60
    simple : update name of executable to llama-simple (#8885) Daniel Bevenius 2024-08-06 16:44:35 +02:00
  • db20f50cf4
    cmake : Link vulkan-shaders-gen with pthreads (#8835) Jaeden Amero 2024-08-06 17:21:47 +04:00
  • efda90c93a
    [Vulkan] Fix compilation of vulkan-shaders-gen on w64devkit after e31a4f6 (#8880) MaggotHATE 2024-08-06 16:32:03 +05:00
  • 0bf16de07b
    contributing : add note about write access Georgi Gerganov 2024-08-06 11:48:01 +03:00
  • 2d5dd7bb3f
    ggml : add epsilon as a parameter for group_norm (#8818) Molly Sophia 2024-08-06 15:26:46 +08:00
  • cdd1889de6
    convert : add support for XLMRoberta embedding models (#8658) Douglas Hanley 2024-08-06 02:20:54 -05:00
  • c21a896405
    [CANN]: Fix ggml_backend_cann_buffer_get_tensor (#8871) Mengqing Cao 2024-08-06 12:42:42 +08:00
  • d4ff847153
    [SYCL] correct cmd name (#8877) Neo Zhang 2024-08-06 09:09:12 +08:00
  • 0a4ce78681
    common : Changed tuple to struct (TODO fix) (#8823) Liu Jia 2024-08-06 00:14:10 +08:00
  • bc0f887e15
    cann: fix buffer_num and runtime speed slowly error (#8865) wangshuai09 2024-08-05 21:10:37 +08:00
  • b42978e7e4
    readme : add ramalama to the availables UI (#8811) Eric Curtin 2024-08-05 13:45:01 +01:00
  • b9dfc25ca3
    ggml : fix overflows in elu function (#8866) Justine Tunney 2024-08-05 05:43:40 -07:00
  • 1ef14b3007
    py: Add more authorship metadata from model card (#8810) Brian 2024-08-05 21:15:28 +10:00
  • d3f0c7166a
    Stop the generation when <|eom_id|> token is encountered - needed for Llama 3.1 tool call support (#8858) fairydreaming 2024-08-05 09:38:01 +02:00
  • e31a4f6797
    cmake: fix paths for vulkan shaders compilation on Windows (#8573) stduhpf 2024-08-05 08:18:27 +02:00
  • 400ae6f65f
    readme : update model list (#8851) BarfingLemurs 2024-08-05 01:54:10 -04:00
  • f1ea5146d7
    llama : better replace_all (#8852) Georgi Gerganov 2024-08-05 08:53:39 +03:00
  • 064cdc265f
    vulkan : fix Qantized Mat-Vec Mul on AMD GPUs for ncols < 64 (#8855) 0cc4m 2024-08-05 07:52:55 +02:00
  • 5587e57a76 sync : ggml Georgi Gerganov 2024-08-04 19:13:25 +03:00
  • a3738b2fa7 vulkan : implement Stable Diffusion operators (ggml/904) 0cc4m 2024-08-04 17:28:08 +02:00
  • 655858ace0 ggml : move c parameter comment to ggml_rope_ext (ggml/901) Daniel Bevenius 2024-07-29 15:06:06 +02:00
  • c02b0a8a4d
    cann: support q4_0 model (#8822) wangshuai09 2024-08-05 12:22:30 +08:00
  • 0d6fb52be0
    Install curl in runtime layer (#8693) Brandon Squizzato 2024-08-04 14:17:16 -04:00
  • 978ba3d83d
    Server: Don't ignore llama.cpp params (#8754) ardfork 2024-08-04 18:16:23 +00:00
  • ecf6b7f23e
    batched-bench : handle empty -npl (#8839) Brian Cunnie 2024-08-04 03:55:03 -07:00
  • 01aae2b497 baby-llama : remove duplicate vector include Daniel Bevenius 2024-08-03 15:07:47 +02:00
  • 4b77ea95f5
    flake.lock: Update (#8847) Georgi Gerganov 2024-08-04 05:53:20 +03:00
  • 76614f352e
    ggml : reading the runtime sve config of the cpu (#8709) jdomke 2024-08-04 01:34:41 +09:00
  • b72c20b85c
    Fix conversion of unnormalized BF16->BF16 weights (#7843) Sigbjørn Skjæret 2024-08-02 21:11:39 +02:00
  • e09a800f9a
    cann: Fix ggml_cann_im2col for 1D im2col (#8819) Mengqing Cao 2024-08-02 16:50:53 +08:00
  • 0fbbd88458
    [SYCL] Fixing wrong VDR iq4nl value (#8812) Ouadie EL FAROUKI 2024-08-02 01:55:17 +01:00
  • afbb4c1322
    ggml-cuda: Adding support for unified memory (#8035) matteo 2024-08-01 23:28:28 +02:00
  • b7a08fd5e0
    Build: Only include execinfo.h on linux systems that support it (#8783) Alex O'Connell 2024-08-01 12:53:46 -04:00
  • 7a11eb3a26
    cuda : fix dmmv cols requirement to 2*GGML_CUDA_DMMV_X (#8800) slaren 2024-08-01 15:26:22 +02:00
  • c8a0090922
    cann: support q8_0 for Ascend backend (#8805) wangshuai09 2024-08-01 10:39:05 +08:00
  • afbbcf3c04
    server : update llama-server embedding flag documentation (#8779) Igor Okulist 2024-07-31 18:59:09 -05:00
  • ed9d2854c9
    Build: Fix potential race condition (#8781) Clint Herron 2024-07-31 15:51:06 -04:00
  • 398ede5efe
    Adding Gemma 2 2B configs (#8784) pculliton 2024-07-31 11:12:10 -04:00
  • 44d28ddd5c
    cmake : fix use of external ggml (#8787) Borislav Stanimirov 2024-07-31 16:40:08 +03:00
  • 268c566006
    nix: cuda: rely on propagatedBuildInputs (#8772) Someone 2024-07-30 23:35:30 +03:00
  • 7e72aa74fd
    py: add_array() will not add to kv store if value is an empty array (#8774) Brian 2024-07-31 00:57:03 +10:00
  • 7c27a19b2e
    added android implementation of ggml_print_backtrace_symbols (#8751) l3utterfly 2024-07-30 23:40:18 +09:00
  • 140074bb86
    flake.lock: Update (#8729) Georgi Gerganov 2024-07-30 15:58:57 +03:00
  • 6e2b6000e5
    cann: update cmake (#8765) wangshuai09 2024-07-30 18:37:35 +08:00
  • c887d8b017
    [SYCL] Add TIMESTEP_EMBEDDING OP (#8707) zhentaoyu 2024-07-30 14:56:51 +08:00
  • 75af08c475
    ggml: bugfix: fix the inactive elements is agnostic for risc-v vector (#8748) CarterLi999 2024-07-30 00:38:34 +08:00
  • 439b3fc75a
    cuda : organize vendor-specific headers into vendors directory (#8746) R0CKSTAR 2024-07-29 20:56:12 +08:00
  • 0832de7236
    [SYCL] add conv support (#8688) Meng, Hengyu 2024-07-29 10:50:27 +08:00
  • 6eeaeba126
    cmake: use 1 more thread for non-ggml in CI (#8740) Johannes Gäßler 2024-07-28 22:32:44 +02:00
  • 4730faca61
    chore : Fix vulkan related compiler warnings, add help text, improve CLI options (#8477) Austin 2024-07-28 03:52:42 -04:00
  • 4c676c85e5
    llama : refactor session file management (#8699) compilade 2024-07-28 00:42:05 -04:00
  • e54c35e4fb
    feat: Support Moore Threads GPU (#8383) R0CKSTAR 2024-07-28 07:41:25 +08:00
  • 5e2727fe03
    scripts : sync vulkan-shaders (#0) Georgi Gerganov 2024-07-27 18:08:31 +03:00
  • 56f20aa25d
    scripts : sync ggml-aarch64 sources Georgi Gerganov 2024-07-27 17:19:35 +03:00
  • 345c8c0c87 ggml : add missing semicolon (#0) Georgi Gerganov 2024-07-27 15:57:09 +03:00
  • ae7985cd7b sync : ggml Georgi Gerganov 2024-07-27 15:53:48 +03:00
  • a05ca93697 ggml : loop tiling optimizations for scalar path (ggml/898) Mahesh Madhav 2024-07-25 00:54:08 -07:00
  • 9f77d899b7 ggml: add support for float16 input tensors in pooling operations (ggml/895) Ivan Filipov 2024-07-22 14:32:02 +03:00
  • 203b7f1531 vulkan : initialize vk_buffer_struct members to VK_NULL_HANDLE (ggml/893) Tony Wasserka 2024-07-20 20:49:44 +02:00
  • d2b851bfa1 cmake : only enable GGML_NATIVE and x86 flags if not crosscompiling (ggml/885) Borislav Stanimirov 2024-07-12 17:24:20 +03:00
  • c12b6e8ee7 ggml : remove unnecessary UNUSED macro call (ggml/880) Daniel Bevenius 2024-07-08 12:03:42 +02:00
  • b5e95468b1
    llama : add support for llama 3.1 rope scaling factors (#8676) Jeffrey Morgan 2024-07-27 05:03:45 -07:00
  • 92090eca21
    llama : add function for model-based max number of graph nodes (#8622) Georgi Gerganov 2024-07-27 14:59:29 +03:00
  • 9d03d085dd
    common : add --no-warmup option for main/llama-cli (#8712) Daniel Bevenius 2024-07-27 12:45:02 +02:00
  • bfb4c74981
    cann: Fix Multi-NPU execution error (#8710) wangshuai09 2024-07-27 16:36:44 +08:00
  • 2b1f616b20
    ggml : reduce hash table reset cost (#8698) slaren 2024-07-27 04:41:55 +02:00
  • 01245f5b16
    llama : fix order of parameters (#8706) Judd 2024-07-26 16:38:12 +08:00
  • 01aec4a631
    server : add Speech Recognition & Synthesis to UI (#8679) Yaiko 2024-07-25 18:10:16 -04:00
  • 41cd47caab
    examples : export-lora : fix issue with quantized base models (#8687) Xuan Son Nguyen 2024-07-25 23:49:39 +02:00
  • 49ce0ab6d4
    ggml: handle ggml_init failure to fix NULL pointer deref (#8692) DavidKorczynski 2024-07-25 22:23:05 +01:00
  • 4226a8d10e
    llama : fix build + fix fabs compile warnings (#8683) Georgi Gerganov 2024-07-25 19:57:31 +03:00
  • bf5a81df37
    ggml : fix build on Windows with Snapdragon X (#8531) Andreas (Andi) Kunar 2024-07-25 18:01:00 +02:00
  • 88954f7fbd
    tests : fix printfs (#8068) Georgi Gerganov 2024-07-25 18:57:44 +03:00
  • ed67bcb24f
    [SYCL] fix multi-gpu issue on sycl (#8554) Chen Xi 2024-07-25 11:45:18 +00:00
  • eddcb5238b
    ggml : add and use ggml_cpu_has_llamafile() (#8664) Georgi Gerganov 2024-07-25 12:37:42 +03:00
  • be6d7c0791
    examples : remove finetune and train-text-from-scratch (#8669) Xuan Son Nguyen 2024-07-25 10:39:04 +02:00
  • 4b0eff3df5
    docs : Quantum -> Quantized (#8666) Ujjawal Panchal 2024-07-25 13:43:27 +05:30
  • 8a4bad50a8
    llama: use sliding window for phi3 (#8627) Fan Shupei 2024-07-25 15:21:09 +08:00
  • 68504f0970
    readme : update games list (#8673) MorganRO8 2024-07-24 12:48:00 -04:00
  • f19bf99c01
    Build Llama SYCL Intel with static libs (#8668) Joe Todd 2024-07-24 14:36:00 +01:00
  • 3a7ac5300a
    readme : update UI list [no ci] (#8505) Thorsten Sommer 2024-07-24 14:52:30 +02:00
  • 96952e7181
    llama : fix llama_chat_format_single for mistral (#8657) Xuan Son Nguyen 2024-07-24 13:48:46 +02:00
  • 79167d9e49
    Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS (#8667) Joe Todd 2024-07-24 11:55:26 +01:00
  • b115105f05
    add llama_lora_adapter_clear (#8653) Xuan Son Nguyen 2024-07-24 11:25:19 +02:00
  • de280085e7
    examples : Fix llama-export-lora example (#8607) Xuan Son Nguyen 2024-07-23 23:48:37 +02:00
  • b841d07408
    server : fix URL.parse in the UI (#8646) Vali Malinoiu 2024-07-23 17:37:42 +03:00
  • 64cf50a0ed
    sycl : Add support for non-release DPC++ & oneMKL (#8644) Joe Todd 2024-07-23 14:58:37 +01:00
  • 938943cdbf
    llama : move vocab, grammar and sampling into separate files (#8508) Georgi Gerganov 2024-07-23 13:10:17 +03:00
  • 751fcfc6c3
    Vulkan IQ4_NL Support (#8613) 0cc4m 2024-07-23 10:56:49 +02:00
  • 46e47417aa
    Allow all RDNA2 archs to use sdot4 intrinsic (#8629) Jeroen Mostert 2024-07-23 10:50:40 +02:00
  • e7e6487ba0
    contrib : clarify PR squashing + module names (#8630) Georgi Gerganov 2024-07-23 11:28:38 +03:00