Commit graph

  • 1d1ccce676
    flake.lock: Update (#9162) Georgi Gerganov 2024-08-29 07:28:14 +03:00
  • 9fe94ccac9
    docker : build images only once (#9225) slaren 2024-08-28 17:28:00 +02:00
  • 66b039a501
    docker : update CUDA images (#9213) slaren 2024-08-28 13:20:36 +02:00
  • 20f1789dfb vulkan : fix build (#0) Georgi Gerganov 2024-08-27 22:10:58 +03:00
  • 231cff5f6f sync : ggml Georgi Gerganov 2024-08-27 22:01:45 +03:00
  • 3246fe84d7
    Fix minicpm example directory (#9111) Xie Yanbo 2024-08-27 20:33:08 +08:00
  • 78eb487bb0
    llama : fix qs.n_attention_wv for DeepSeek-V2 (#9156) compilade 2024-08-27 06:09:23 -04:00
  • a77feb5d71
    server : add some missing env variables (#9116) Xuan Son Nguyen 2024-08-27 11:07:01 +02:00
  • 2e59d61c1b
    llama : fix ChatGLM4 wrong shape (#9194) CausalLM 2024-08-27 14:58:22 +08:00
  • 75e1dbbaab
    llama : fix llama3.1 rope_freqs not respecting custom head_dim (#9141) Carsten Kragelund Jørgensen 2024-08-27 08:53:40 +02:00
  • ad76569f8e
    common : Update stb_image.h to latest version (#9161) arch-btw 2024-08-26 22:58:50 -07:00
  • 7d787ed96c
    ggml : do not crash when quantizing q4_x_x with an imatrix (#9192) slaren 2024-08-26 19:44:43 +02:00
  • 06658ad7c3
    metal : separate scale and mask from QKT in FA kernel (#9189) Georgi Gerganov 2024-08-26 18:31:02 +03:00
  • fc18425b6a
    ggml : add SSM Metal kernels (#8546) Georgi Gerganov 2024-08-26 17:55:36 +03:00
  • 879275ac98
    tests : fix compile warnings for unreachable code (#9185) Georgi Gerganov 2024-08-26 16:30:25 +03:00
  • 7a3df798fc
    ci : add VULKAN support to ggml-ci (#9055) Georgi Gerganov 2024-08-26 12:19:39 +03:00
  • e5edb210cd
    server : update deps (#9183) Georgi Gerganov 2024-08-26 12:16:57 +03:00
  • 0c41e03ceb
    metal : gemma2 flash attention support (#9159) slaren 2024-08-26 11:08:59 +02:00
  • f12ceaca0c
    ggml-ci : try to improve build time (#9160) slaren 2024-08-26 11:03:30 +02:00
  • 436787f170
    llama : fix time complexity of string replacement (#9163) Justine Tunney 2024-08-25 23:09:53 -07:00
  • 93bc3839f9
    common: fixed not working find argument --n-gpu-layers-draft (#9175) Herman Semenov 2024-08-25 22:54:37 +00:00
  • f91fc5639b
    CUDA: fix Gemma 2 numerical issues for FA (#9166) Johannes Gäßler 2024-08-25 22:11:48 +02:00
  • e11bd856d5
    CPU/CUDA: Gemma 2 FlashAttention support (#8542) Johannes Gäßler 2024-08-24 21:34:59 +02:00
  • 8f824ffe8e
    quantize : fix typo in usage help of quantize.cpp (#9145) João Dinis Ferreira 2024-08-24 07:22:45 +01:00
  • 3ba780e2a8
    lora : fix llama conversion script with ROPE_FREQS (#9117) Xuan Son Nguyen 2024-08-23 12:58:53 +02:00
  • a07c32ea54
    llama : use F32 precision in GLM4 attention and no FA (#9130) piDack 2024-08-23 15:27:17 +08:00
  • 11b84eb457
    [SYCL] Add a space to supress a cmake warning (#9133) Akarshan Biswas 2024-08-22 19:39:47 +05:30
  • 1731d4238f
    [SYCL] Add oneDNN primitive support (#9091) luoyu-intel 2024-08-22 12:50:10 +08:00
  • a1631e53f6
    llama : simplify Mamba with advanced batch splits (#8526) compilade 2024-08-21 17:58:11 -04:00
  • fc54ef0d1c
    server : support reading arguments from environment variables (#9105) Xuan Son Nguyen 2024-08-21 11:04:34 +02:00
  • b40eb84895
    llama : support for falcon-mamba architecture (#9074) Younes Belkada 2024-08-21 12:06:36 +04:00
  • f63f603c87
    llava : zero-initialize clip_ctx structure fields with aggregate initialization 908) fairydreaming 2024-08-21 09:45:49 +02:00
  • 8455340b87
    llama : std::move llm_bigram_bpe from work_queue (#9062) Daniel Bevenius 2024-08-21 09:32:58 +02:00
  • 2f3c1466ff
    llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. (#8984) Changyeon Kim 2024-08-21 04:00:00 +09:00
  • 50addec9a5
    [SYCL] fallback mmvq (#9088) Meng, Hengyu 2024-08-20 23:50:17 +08:00
  • 4f8d19ff17
    [SYCL] Fix SYCL im2col and convert Overflow with Large Dims (#9052) zhentaoyu 2024-08-20 23:06:51 +08:00
  • 90db8146d5
    tests : add missing comma in grammar integration tests (#9099) fairydreaming 2024-08-20 11:09:55 +02:00
  • cfac111e2b
    cann: add doc for cann backend (#8867) wangshuai09 2024-08-19 16:46:38 +08:00
  • 1b6ff90ff8
    rpc : print error message when failed to connect endpoint (#9042) Radoslav Gerganov 2024-08-19 10:11:45 +03:00
  • 18eaf29f4c
    rpc : prevent crashes on invalid input (#9040) Radoslav Gerganov 2024-08-19 10:10:21 +03:00
  • 554b049068
    flake.lock: Update (#9068) Georgi Gerganov 2024-08-18 17:43:32 +03:00
  • 2339a0be1c
    tests : add integration test for lora adapters (#8957) ltoniazzi 2024-08-18 10:58:04 +01:00
  • 2fb9267887
    Fix incorrect use of ctx_split for bias tensors (#9063) Yoshi Suhara 2024-08-17 06:34:21 -07:00
  • 8b3befc0e2
    server : refactor middleware and /health endpoint (#9056) Xuan Son Nguyen 2024-08-16 17:19:05 +02:00
  • d565bb2fd5
    llava : support MiniCPM-V-2.6 (#8967) tc-mb 2024-08-16 21:34:41 +08:00
  • ee2984bdaf
    py : fix wrong input type for raw_dtype in ggml to gguf scripts (#8928) Farbod Bijary 2024-08-16 14:06:30 +03:30
  • c8ddce8560
    Fix inference example lacks required parameters (#9035) Aisuko 2024-08-16 19:08:59 +10:00
  • 23fd453544
    gguf-py : bump version from 0.9.1 to 0.10.0 (#9051) compilade 2024-08-16 02:36:11 -04:00
  • c679e0cb5c
    llama : add EXAONE model support (#9025) Minsoo Cheong 2024-08-16 15:35:18 +09:00
  • fb487bb567
    common : add support for cpu_get_num_physical_cores() on Windows (#8771) Liu Jia 2024-08-16 14:23:12 +08:00
  • 2a24c8caa6
    Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922) Yoshi Suhara 2024-08-15 19:23:33 -07:00
  • e3f6fd56b1
    ggml : dynamic ggml_sched_max_splits based on graph_size (#9047) Nico Bosshard 2024-08-16 04:22:55 +02:00
  • 4b9afbbe90
    retrieval : fix memory leak in retrieval query handling (#8955) gtygo 2024-08-15 15:40:12 +08:00
  • 37501d9c79
    server : fix duplicated n_predict key in the generation_settings (#8994) Riceball LEE 2024-08-15 15:28:05 +08:00
  • 4af8420afb
    common : remove duplicate function llama_should_add_bos_token (#8778) Zhenwei Jin 2024-08-15 15:23:23 +08:00
  • 6bda7ce6c3
    llama : add pre-tokenizer regexes for BLOOM and gpt3-finnish (#8850) Esko Toivonen 2024-08-15 10:17:12 +03:00
  • d5492f0525
    ci : disable bench workflow (#9010) Georgi Gerganov 2024-08-15 10:11:11 +03:00
  • 234b30676a
    server : init stop and error fields of the result struct (#9026) Jiří Podivín 2024-08-15 08:21:57 +02:00
  • 5fd89a70ea
    Vulkan Optimizations and Fixes (#8959) 0cc4m 2024-08-14 18:32:53 +02:00
  • 98a532d474
    server : fix segfault on long system prompt (#8987) compilade 2024-08-14 02:51:02 -04:00
  • 43bdd3ce18
    cmake : remove unused option GGML_CURL (#9011) Georgi Gerganov 2024-08-14 09:14:49 +03:00
  • 06943a69f6
    ggml : move rope type enum to ggml.h (#8949) Daniel Bevenius 2024-08-13 21:13:15 +02:00
  • 828d6ff7d7
    export-lora : throw error if lora is quantized (#9002) Xuan Son Nguyen 2024-08-13 11:41:14 +02:00
  • fc4ca27b25
    ci : fix github workflow vulnerable to script injection (#9008) Diogo Teles Sant'Anna 2024-08-12 13:28:23 -03:00
  • 1f67436c5e
    ci : enable RPC in all of the released builds (#9006) Radoslav Gerganov 2024-08-12 19:17:03 +03:00
  • 0fd93cdef5
    llama : model-based max number of graph nodes calculation (#8970) Nico Bosshard 2024-08-12 17:13:59 +02:00
  • 84eb2f4fad
    docs: introduce gpustack and gguf-parser (#8873) Frank Mai 2024-08-12 20:45:50 +08:00
  • 1262e7ed13
    grammar-parser : fix possible null-deref (#9004) DavidKorczynski 2024-08-12 13:36:41 +01:00
  • df5478fbea
    ggml: fix div-by-zero (#9003) DavidKorczynski 2024-08-12 13:21:41 +01:00
  • 2589292cde
    Fix a spelling mistake (#9001) Liu Jia 2024-08-12 17:46:03 +08:00
  • d3ae0ee8d7
    py : fix requirements check '==' -> '~=' (#8982) Georgi Gerganov 2024-08-12 11:02:01 +03:00
  • 5ef07e25ac
    server : handle models with missing EOS token (#8997) Georgi Gerganov 2024-08-12 10:21:50 +03:00
  • 4134999e01
    gguf-py : Numpy dequantization for most types (#8939) compilade 2024-08-11 14:45:41 -04:00
  • 8cd1bcfd3f
    flake.lock: Update (#8979) Georgi Gerganov 2024-08-11 16:58:58 +03:00
  • a21c6fd450
    update guide (#8909) Neo Zhang 2024-08-11 16:37:43 +08:00
  • 33309f661a
    llama : check all graph nodes when searching for result_embd_pooled (#8956) fairydreaming 2024-08-11 10:35:26 +02:00
  • 7c5bfd57f8
    Optimize Vulkan backend for better CPU performance and less GPU synchronization overhead. (#8943) Markus Tavenrath 2024-08-11 10:09:09 +02:00
  • 6e02327e8b
    metal : fix uninitialized abort_callback (#8968) slaren 2024-08-10 15:42:10 +02:00
  • 7eb23840ed
    llama : default n_swa for phi-3 (#8931) Xuan Son Nguyen 2024-08-10 13:04:40 +02:00
  • 7c3f55c100
    Add support for encoder-only T5 models (#8900) fairydreaming 2024-08-10 11:43:26 +02:00
  • 911b437f22
    gguf-py : fix double call to add_architecture() (#8952) Matteo Mortari 2024-08-10 07:58:49 +02:00
  • b72942fac9
    Merge commit from fork Georgi Gerganov 2024-08-09 23:03:21 +03:00
  • 6afd1a99dc
    llama : add support for lora adapters in T5 model (#8938) fairydreaming 2024-08-09 18:53:09 +02:00
  • 272e3bd95e
    make : fix llava obj file race (#8946) Georgi Gerganov 2024-08-09 18:24:30 +03:00
  • 45a55b91aa
    llama : better replace_all (cont) (#8926) Georgi Gerganov 2024-08-09 18:23:52 +03:00
  • 3071c0a5f2
    llava : support MiniCPM-V-2.5 (#7599) tc-mb 2024-08-09 18:33:53 +08:00
  • 4305b57c80
    sync : ggml Georgi Gerganov 2024-08-09 10:03:48 +03:00
  • 70c0ea3560
    whisper : use vulkan as gpu backend when available (whisper/2302) Matt Stephenson 2024-07-16 03:21:09 -04:00
  • 5b2c04f492
    embedding : add --pooling option to README.md [no ci] (#8934) Daniel Bevenius 2024-08-09 08:33:30 +02:00
  • 6f6496bb09
    llama : fix typo in llama_tensor_get_type comment [no ci] (#8937) Daniel Bevenius 2024-08-09 08:32:23 +02:00
  • daef3ab233
    server : add one level list nesting for embeddings (#8936) Mathieu Geli 2024-08-09 08:32:02 +02:00
  • 345a686d82
    llama : reduce useless copies when saving session (#8916) compilade 2024-08-08 23:54:00 -04:00
  • 3a14e00366
    gguf-py : simplify support for quant types (#8838) compilade 2024-08-08 13:33:09 -04:00
  • afd27f01fe
    scripts : sync cann files (#0) Georgi Gerganov 2024-08-08 14:56:52 +03:00
  • 366d486c16
    scripts : fix sync filenames (#0) Georgi Gerganov 2024-08-08 14:40:12 +03:00
  • e44a561ab0
    sync : ggml Georgi Gerganov 2024-08-08 13:19:47 +03:00
  • f93d49ab1e
    ggml : ignore more msvc warnings (ggml/906) Borislav Stanimirov 2024-08-07 10:00:56 +03:00
  • 5b33ea1ee7
    metal : fix struct name (ggml/912) Georgi Gerganov 2024-08-07 09:57:00 +03:00
  • 85fca8deb6
    metal : add abort callback (ggml/905) Conrad Kramer 2024-08-07 02:55:49 -04:00
  • ebd541a570
    make : clean llamafile objects (#8923) Pablo Duboue 2024-08-08 04:44:51 -04:00