Commit graph

  • a4f569e8a3
    [SYCL] fix no file in win rel (#6314) Neo Zhang Jianyu 2024-03-27 09:47:06 +08:00
  • 32c8486e1f
    wpm : portable unicode tolower (#6305) Jared Van Bortel 2024-03-26 17:46:21 -04:00
  • 557410b8f0
    llama : greatly reduce output buffer memory usage (#6122) compilade 2024-03-26 10:46:41 -04:00
  • 55c1b2a3bb
    IQ1_M: 1.75 bpw quantization (#6302) Kawrakow 2024-03-26 15:21:27 +01:00
  • e097633f63
    convert-hf : fix exception in sentencepiece with added tokens (#6320) Pedro Cuenca 2024-03-26 13:32:19 +01:00
  • d25b1c31b0
    quantize : be able to override metadata by key (#6321) Kawrakow 2024-03-26 13:09:30 +01:00
  • deb7240100
    embedding : adjust n_ubatch value (#6296) Minsoo Cheong 2024-03-26 18:11:46 +09:00
  • 3d032ece8e
    server : add n_discard parameter (#6300) Jan Boon 2024-03-26 16:47:43 +08:00
  • e190f1fca6
    nix: make xcrun visible in Nix sandbox for precompiling Metal shaders (#6118) Joseph Stahl 2024-03-25 20:51:46 -04:00
  • 280345968d
    cuda : rename build flag to LLAMA_CUDA (#6299) slaren 2024-03-26 01:16:01 +01:00
  • b06c16ef9f
    nix: fix blas support (#6281) Christian Kögler 2024-03-25 18:52:45 +01:00
  • 1f2fd4e727
    tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303) Kawrakow 2024-03-25 18:33:15 +01:00
  • 43139cc528
    flake.lock: Update (#6266) Georgi Gerganov 2024-03-25 17:22:27 +02:00
  • 2f34b865b6
    cuda : fix LLAMA_CUDA_F16 build (#6298) slaren 2024-03-25 15:43:22 +01:00
  • ae1f211ce2
    cuda : refactor into multiple files (#6269) slaren 2024-03-25 13:50:23 +01:00
  • ad3a0505e3
    Server: clean up OAI params parsing function (#6284) Xuan Son Nguyen 2024-03-25 09:42:17 +01:00
  • 95ad616cdd
    [SYCL] fix SYCL backend build on windows is break by LOG() error (#6290) Neo Zhang Jianyu 2024-03-25 15:52:41 +08:00
  • 64e7b47c69
    examples : add "retrieval" (#6193) Minsoo Cheong 2024-03-25 16:38:22 +09:00
  • 7733f0c760
    ggml : support AVX512VNNI (#6280) Justine Tunney 2024-03-25 01:39:56 -04:00
  • a32b77c4b2
    Fix heap corruption from wmode out-of-bound writes on windows (#6272) Rick G 2024-03-24 14:45:56 -07:00
  • a0e584defd
    imatrix : fix wname for mul_mat_id ops (#6271) Georgi Gerganov 2024-03-24 16:18:45 +02:00
  • 7aed0ffe68
    Fixed lookup compilation issues on Windows (#6273) Johannes Gäßler 2024-03-24 14:21:17 +01:00
  • ea279d5609
    ci : close inactive issue, increase operations per run (#6270) Pierrick Hymbert 2024-03-24 09:57:06 +01:00
  • 586e7bc561
    sampling : deduplicated code for probability distribution access (#6240) Minsoo Cheong 2024-03-24 17:54:07 +09:00
  • ddf6568510
    [SYCL] offload op (#6217) Meng, Hengyu 2024-03-24 12:04:25 +08:00
  • d03224ac98
    Support build win release for SYCL (#6241) Neo Zhang Jianyu 2024-03-24 09:44:01 +08:00
  • 94d1b3b411
    use _wfopen instead of fopen on Windows (#6248) Jared Van Bortel 2024-03-23 18:48:02 -04:00
  • 95562175f8
    gitignore : gguf-split Georgi Gerganov 2024-03-23 21:35:23 +02:00
  • f482bb2e49
    common: llama_load_model_from_url split support (#6192) Pierrick Hymbert 2024-03-23 18:07:00 +01:00
  • 1997577d5e
    server: docs: --threads and --threads, --ubatch-size, --log-disable (#6254) Pierrick Hymbert 2024-03-23 18:00:38 +01:00
  • 476b0251b2
    llama : add grok-1 support (#6204) Julius Arkenberg 2024-03-23 17:41:53 +01:00
  • 21cad01b6e
    split: add gguf-split in the make build target (#6262) Pierrick Hymbert 2024-03-23 17:18:13 +01:00
  • 1b26aebe4d
    server: flush stdout after logging in both text and json layout (#6253) Pierrick Hymbert 2024-03-23 13:18:45 +01:00
  • 50ccaf5eac
    lookup: complement data from context with general text statistics (#5479) Johannes Gäßler 2024-03-23 01:24:36 +01:00
  • 56a00f0a2f
    common : default --hf-file to --model (#6234) Georgi Gerganov 2024-03-22 21:10:39 +02:00
  • 92397d87a4
    convert-llama2c-to-ggml : enable conversion of GQA models (#6237) fraxy-v 2024-03-22 20:49:06 +02:00
  • 1d0331c12a
    quantize: options for output and token embedding tensors qtype (#6239) Kawrakow 2024-03-22 19:47:14 +01:00
  • dba1af6129
    llama_model_loader: support multiple split/shard GGUFs (#6187) Pierrick Hymbert 2024-03-22 19:00:01 +01:00
  • ee804f6223
    ci: apply concurrency limit for github workflows (#6243) Minsoo Cheong 2024-03-23 02:15:06 +09:00
  • 80bd33bc2c
    common : add HF arg helpers (#6234) Georgi Gerganov 2024-03-22 15:33:38 +02:00
  • e80f06d2a1
    llama : correction of the attn.v.weight quantization for IQ3_XS (#6209) Nexesenex 2024-03-22 14:32:02 +01:00
  • f77a8ffd3b
    tests : conditional python & node json schema tests (#6207) Olivier Chafik 2024-03-22 13:09:07 +00:00
  • 72114edf06
    json-schema-to-grammar : fix order of props + non-str const/enum (#6232) Olivier Chafik 2024-03-22 13:07:44 +00:00
  • 2f0e81e053
    cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208) slaren 2024-03-22 14:05:31 +01:00
  • 29ab270e65
    readme : add RecurseChat to the list of UIs (#6219) Xiaoyi Chen 2024-03-22 04:29:49 -07:00
  • 6b8bb3a31d
    server : fix n_keep always showing as 0 in response (#6211) Jan Boon 2024-03-22 19:12:05 +08:00
  • 68e210b354
    server : enable continuous batching by default (#6231) Georgi Gerganov 2024-03-22 13:08:28 +02:00
  • b3e94f26ba
    metal : proper assert for mat-mat memory alignment (#6225) Georgi Gerganov 2024-03-22 11:35:53 +02:00
  • b2075fd6a5
    ci : add CURL flag for the mac builds (#6214) Vaibhav Srivastav 2024-03-22 08:53:43 +01:00
  • 95d576b48e
    metal : pad n_ctx by 32 (#6177) Georgi Gerganov 2024-03-22 09:36:03 +02:00
  • 59c17f02de
    add blog link (#6222) Neo Zhang Jianyu 2024-03-22 15:19:37 +08:00
  • fa046eafbc
    Fix params underscore convert to dash. (#6203) DAN™ 2024-03-21 21:32:42 -04:00
  • be07a03217
    server : update readme doc from slot_id to id_slot (#6213) Jan Boon 2024-03-22 06:41:24 +08:00
  • d0a71233fb
    cuda : disable host register by default (#6206) slaren 2024-03-21 19:54:28 +01:00
  • f372c49ccd
    Corrected typo to wrong file (#6199) semidark 2024-03-21 11:52:35 -06:00
  • 924ce1dce7
    tests : disable system() calls (#6198) Georgi Gerganov 2024-03-21 16:20:05 +02:00
  • 03a8f8fafe
    cuda : fix LLAMA_CUDA_F16 build (#6197) slaren 2024-03-21 13:59:53 +01:00
  • cfd3be76e3
    ggml : same IQ4_NL quantization for CPU/CUDA/Metal (#6196) Kawrakow 2024-03-21 13:59:38 +01:00
  • 5b7b0ac8df
    json-schema-to-grammar improvements (+ added to server) (#5978) Olivier Chafik 2024-03-21 11:50:43 +00:00
  • 1943c01981
    ci : fix indentation error (#6195) Vaibhav Srivastav 2024-03-21 10:30:40 +01:00
  • 5e43ba8742
    build : add mac pre-build binaries (#6182) Vaibhav Srivastav 2024-03-21 10:13:12 +01:00
  • 76aa30a263
    Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183) Kawrakow 2024-03-21 08:27:57 +01:00
  • c5b8595e3f
    Add nvidia and amd backends (#6157) AidanBeltonS 2024-03-21 06:10:52 +00:00
  • 42e21c6882
    cuda : fix conflict with std::swap (#6186) slaren 2024-03-21 01:47:46 +01:00
  • 1c51f98adc
    cuda : print the returned error when CUDA initialization fails (#6185) slaren 2024-03-20 21:03:26 +01:00
  • f9c7ba3447
    llava : update MobileVLM-README.md (#6180) Ziang Wu 2024-03-20 23:29:51 +08:00
  • 272935b281
    llava : add MobileVLM_V2 backup (#6175) Ziang Wu 2024-03-20 23:02:32 +08:00
  • ccf58aa3ec
    cuda : refactor to remove global resources (#6170) slaren 2024-03-20 14:42:59 +01:00
  • 91f8ad167d
    Server: version bump for httplib and json (#6169) Xuan Son Nguyen 2024-03-20 13:30:36 +01:00
  • 6b7e76d28c
    gitignore : ignore curl-related files Georgi Gerganov 2024-03-20 14:17:34 +02:00
  • bc0baab2ea
    server : allow to override -ngl in tests (#6170) Georgi Gerganov 2024-03-20 14:14:32 +02:00
  • d795988d9e
    Revert "llava : add a MobileVLM_V2-1.7B backup (#6152)" Georgi Gerganov 2024-03-20 13:29:49 +02:00
  • f8c4e745e1
    llava : add a MobileVLM_V2-1.7B backup (#6152) Ziang Wu 2024-03-20 19:20:37 +08:00
  • 47cc7a7bf9
    Server: Handle n_keep parameter in the request (#6174) Karthick 2024-03-20 16:32:34 +05:30
  • bd60d82d0c
    server tests : more pythonic process management; fix bare except: (#6146) Jared Van Bortel 2024-03-20 01:33:49 -04:00
  • 6c0b287748
    update readme sycl for new update (#6151) Neo Zhang Jianyu 2024-03-20 11:21:41 +08:00
  • d26e8b669d
    increase igpu cluster limit (#6159) Abhilash Majumder 2024-03-20 08:28:49 +05:30
  • d8b009a945
    Remove undeed header file. (#6158) DAN™ 2024-03-19 12:16:09 -04:00
  • d0d5de42e5
    gguf-split: split and merge gguf per batch of tensors (#6135) Pierrick Hymbert 2024-03-19 12:05:44 +01:00
  • b80cf3b2d1
    common : disable repeat penalties by default (#6127) Georgi Gerganov 2024-03-19 10:21:54 +02:00
  • 970a48060a
    ci : exempt some labels from being tagged as stale (#6140) slaren 2024-03-19 09:06:54 +01:00
  • 4c28b82529
    common : print usage on '-h' and '--help' (#6145) DAN™ 2024-03-19 01:59:36 -04:00
  • 2d15886bb0 flake.lock: Update github-actions[bot] 2024-03-17 06:37:44 +00:00
  • d199ca79f2
    mpt : implement backwards compatiblity with duped output tensor (#6139) Jared Van Bortel 2024-03-18 12:49:02 -04:00
  • 104f5e0fc1
    clip : fix memory leak (#6138) Felix 2024-03-18 16:40:22 +01:00
  • 5e1b7f94a0
    backend : set max split inputs to GGML_MAX_SRC (#6137) slaren 2024-03-18 16:33:44 +01:00
  • ac9ee6a4ad
    ci : disable stale issue messages (#6126) Georgi Gerganov 2024-03-18 13:45:38 +02:00
  • 4f6d1337ca
    ci : temporary disable sanitizer builds (#6128) Georgi Gerganov 2024-03-18 13:45:27 +02:00
  • 2bf8d0f7c4
    backend : offload large batches to GPU (#6083) slaren 2024-03-18 11:03:04 +01:00
  • 496bc79bc2
    common : tidy-up argument parsing (#6105) DAN™ 2024-03-18 04:27:44 -04:00
  • 9b03719ad7
    convert : add support for CamembertModel architecture (#6119) Thérence 2024-03-18 09:17:00 +01:00
  • 3a6efdd03c
    convert : use f32 outtype for bf16 tensors (#6106) Romain D 2024-03-18 09:04:41 +01:00
  • d01b3c4c32
    common: llama_load_model_from_url using --model-url (#6098) Pierrick Hymbert 2024-03-17 19:12:37 +01:00
  • cd776c37c9
    ci : close all stale issues at once (#6115) Georgi Gerganov 2024-03-17 19:51:57 +02:00
  • dc0f612548
    ggml:fix finding transfer queue family index error (#6094) GainLee 2024-03-18 01:12:22 +08:00
  • c47cf414ef
    ggml : add AVX512F SIMD (#6088) AmirAli Mirian 2024-03-16 11:52:02 -04:00
  • b5f4ae09c3
    gritlm : add initial README.md (#6086) Daniel Bevenius 2024-03-16 16:46:29 +01:00
  • dfbfdd60f9
    readme : add wllama as a wasm binding (#6100) Xuan Son Nguyen 2024-03-16 16:42:08 +01:00
  • 15961ec04d
    common : refactor nested if causing error C1061 on MSVC (#6101) DAN™ 2024-03-16 11:39:15 -04:00
  • a56d09a440
    ci : close inactive issue with workflow (#6053) Pierrick Hymbert 2024-03-16 13:20:53 +01:00