Commit graph

  • afd9909a64
    rpc : backend refactoring (#9912) Radoslav Gerganov 2024-10-18 14:33:58 +03:00
  • 87421a23e8
    [SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705) Ouadie EL FAROUKI 2024-10-18 06:46:16 +01:00
  • 60ce97c9d8
    add amx kernel for gemm (#8998) Ma Mingfei 2024-10-18 13:34:36 +08:00
  • 8901755ba3
    server : add n_indent parameter for line indentation requirement (#9929) Georgi Gerganov 2024-10-18 07:32:19 +03:00
  • 6f55bccbb8
    llama : rename batch_all to batch (#8881) Daniel Bevenius 2024-10-18 01:41:51 +02:00
  • 17bb928080
    readme : remove --memory-f32 references (#9925) Georgi Gerganov 2024-10-17 23:43:05 +03:00
  • 9f45fc1e99
    llama : change warning to debug log Georgi Gerganov 2024-10-17 23:26:32 +03:00
  • 99bd4ac28c
    llama : infill sampling handle very long tokens (#9924) Georgi Gerganov 2024-10-17 22:32:47 +03:00
  • 3752217ed5
    readme : update bindings list (#9918) Tim Wang 2024-10-17 17:57:14 +11:00
  • f010b77a37
    vulkan : add backend registry / device interfaces (#9721) Diego Devesa 2024-10-17 02:46:58 +02:00
  • 2194200278
    fix: allocating CPU buffer with size 0 (#9917) Gilad S. 2024-10-17 02:34:22 +03:00
  • 73afe681aa
    fix: use vm_allocate to allocate CPU backend buffer on macOS (#9875) Gilad S. 2024-10-17 01:36:51 +03:00
  • 9e04102448
    llama : suppress conversion from 'size_t' to 'int' (#9046) Daniel Bevenius 2024-10-16 19:34:28 +02:00
  • dbf18e4de9
    llava : fix typo in error message [no ci] (#9884) Daniel Bevenius 2024-10-16 19:24:05 +02:00
  • 66c2c93082
    grammar : fix JSON Schema for string regex with top-level alt. (#9903) Joe Eli McIlvain 2024-10-16 09:03:24 -07:00
  • 10433e8b45
    llama : add tensor name for "result_norm" (#9907) Molly Sophia 2024-10-16 18:10:21 +08:00
  • 1f66b699c4
    server : fix the disappearance of the end of the text (#9867) Alexey Parfenov 2024-10-16 08:35:53 +00:00
  • 0e41b300ed
    sync : ggml Georgi Gerganov 2024-10-16 11:28:14 +03:00
  • cd60b88bf7
    ggml-alloc : remove buffer_id from leaf_alloc (ggml/987) Daniel Bevenius 2024-10-09 16:40:35 +02:00
  • becfd387f6
    [CANN] Fix cann compilation error (#9891) leo-pony 2024-10-16 08:51:46 +08:00
  • 755a9b2bf0
    llama : add infill sampler (#9896) Georgi Gerganov 2024-10-15 16:35:33 +03:00
  • 223c25a72f
    server : improve infill context reuse (#9894) Georgi Gerganov 2024-10-15 16:28:55 +03:00
  • fbc98b748e
    sampling : add XTC sampler (#9742) MaggotHATE 2024-10-15 15:54:55 +05:00
  • dcdd535302
    server : update preact (#9895) Georgi Gerganov 2024-10-15 12:48:44 +03:00
  • 4c42f93b22
    readme : update bindings list (#9889) Michał Tuszyński 2024-10-15 10:20:34 +02:00
  • a89f75e1b7
    server : handle "logprobs" field with false value (#9871) VoidIsVoid 2024-10-14 15:04:36 +08:00
  • 13dca2a54a
    Vectorize load instructions in dmmv f16 CUDA kernel (#9816) agray3 2024-10-14 01:49:08 +01:00
  • d4c19c0f5c
    server : accept extra_context for the infill endpoint (#9874) Georgi Gerganov 2024-10-13 21:31:35 +03:00
  • c7181bd294
    server : reuse cached context chunks (#9866) Georgi Gerganov 2024-10-13 18:52:48 +03:00
  • 92be9f1216
    flake.lock: Update (#9870) Georgi Gerganov 2024-10-13 06:11:26 +03:00
  • edc265661c
    server : add option to time limit the generation phase (#9865) Georgi Gerganov 2024-10-12 16:14:27 +03:00
  • 1bde94dd02
    server : remove self-extend features (#9860) Georgi Gerganov 2024-10-12 16:06:31 +03:00
  • 95c76e8e92
    server : remove legacy system_prompt feature (#9857) Georgi Gerganov 2024-10-12 14:51:54 +03:00
  • 11ac9800af
    llama : improve infill support and special token detection (#9798) Georgi Gerganov 2024-10-12 08:21:51 +03:00
  • 943d20b411
    musa : update doc (#9856) R0CKSTAR 2024-10-12 13:09:53 +08:00
  • 96776405a1
    ggml : move more prints to the ggml log system (#9839) Diego Devesa 2024-10-11 15:34:45 +02:00
  • 7eee341bee
    common : use common_ prefix for common library functions (#9805) Diego Devesa 2024-10-10 22:57:42 +02:00
  • 0e9f760eb1
    rpc : add backend registry / device interfaces (#9812) Diego Devesa 2024-10-10 20:14:55 +02:00
  • cf8e0a3bb9
    musa: add docker image support (#9685) R0CKSTAR 2024-10-11 02:10:37 +08:00
  • c7499c557c
    examples : do not use common library in simple example (#9803) Diego Devesa 2024-10-10 19:50:49 +02:00
  • c81f3bbb05
    cmake : do not build common library by default when standalone (#9804) Diego Devesa 2024-10-09 18:49:52 +02:00
  • e7022064ab
    perplexity : fix integer overflow (#9783) Georgi Gerganov 2024-10-09 17:00:18 +03:00
  • 3dc48fe75a
    examples : remove llama.vim Georgi Gerganov 2024-10-09 10:55:42 +03:00
  • dca1d4b58a
    ggml : fix BLAS with unsupported types (#9775) Diego Devesa 2024-10-08 14:21:43 +02:00
  • 458367a906
    server : better security control for public deployments (#9776) Xuan Son Nguyen 2024-10-08 13:27:04 +02:00
  • fa42aa6d89
    scripts : fix spelling typo in messages and comments (#9782) standby24x7 2024-10-08 15:19:53 +09:00
  • 6374743747
    ggml : add backend registry / device interfaces to BLAS backend (#9752) Diego Devesa 2024-10-07 21:55:08 +02:00
  • f1af42fa8c
    Update building for Android (#9672) Andrew Minh Nguyen 2024-10-07 09:37:31 -07:00
  • 6279dac039
    flake.lock: Update (#9753) Georgi Gerganov 2024-10-07 19:35:42 +03:00
  • d5ac8cf2f2
    ggml : add metal backend registry / device (#9713) Georgi Gerganov 2024-10-07 18:27:51 +03:00
  • 96b6912103
    metal : single allocation of encode_async block (#9747) Paul Tsochantaris 2024-10-07 13:26:31 +01:00
  • d5cb86844f
    contrib : simplify + minor edits [no ci] Georgi Gerganov 2024-10-06 14:15:27 +03:00
  • f4b2dcdf49
    readme : fix typo [no ci] Georgi Gerganov 2024-10-06 13:49:41 +03:00
  • b6d6c5289f
    sync : llama.cpp Georgi Gerganov 2024-10-06 12:53:28 +03:00
  • b0915d5b51
    vulkan : retry allocation with fallback flags (whisper/2451) SRHMorris 2024-10-06 08:34:20 +01:00
  • 8c475b97b8
    rerank : use [SEP] token instead of [BOS] (#9737) Georgi Gerganov 2024-10-05 15:55:04 +03:00
  • 58b16695e1
    sync : ggml Georgi Gerganov 2024-10-05 15:53:49 +03:00
  • 905f5485b2
    metal : zero-init buffer contexts (whisper/0) Georgi Gerganov 2024-10-05 14:33:54 +03:00
  • 71967c2a6d
    Add Llama Assistant (#9744) Viet-Anh NGUYEN (Andrew) 2024-10-05 01:29:35 +07:00
  • 17880771ad
    sync : ggml Georgi Gerganov 2024-10-04 18:50:25 +03:00
  • 55951c018d
    ggml : fix typo in example usage ggml_gallocr_new (ggml/984) Daniel Bevenius 2024-10-04 15:46:18 +02:00
  • ff565769f2
    ggml : fixes after sync (ggml/983) Diego Devesa 2024-10-04 08:41:40 +02:00
  • f3fdcfaa79
    ci : fine-grant permission (#9710) Xuan Son Nguyen 2024-10-04 11:47:19 +02:00
  • 133c7b46b3
    Fixed RNG seed docs (#9723) Daniel Kleine 2024-10-04 10:54:44 +02:00
  • d5ed2b929d
    metal : remove abort (skip) (ggml/0) Georgi Gerganov 2024-10-03 21:18:19 +03:00
  • 1bb8a64ebf
    sync : ggml Georgi Gerganov 2024-10-03 21:17:49 +03:00
  • fabdc3bda3
    ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980) Johannes Gäßler 2024-10-03 17:29:59 +02:00
  • eee39bdc96
    ggml: refactor cross entropy loss CPU impl. (ggml/976) Johannes Gäßler 2024-10-02 15:32:39 +02:00
  • 5d5ab1e5cc
    metal : fix compute pass descriptor autorelease crash (#9718) Jack Mousseau 2024-10-03 11:01:46 -07:00
  • a7ad553513
    ggml-backend : add device description to CPU backend (#9720) Diego Devesa 2024-10-03 17:39:18 +02:00
  • d6fe7abf04
    ggml: unify backend logging mechanism (#9709) bandoti 2024-10-03 12:39:03 -03:00
  • e3c355ba65
    convert : handle tokenizer merges format from transformers 4.45 (#9696) compilade 2024-10-03 10:22:15 -04:00
  • 841713e1e4
    rpc : enable vulkan (#9714) Radoslav Gerganov 2024-10-03 13:00:52 +03:00
  • 5639971466
    Fixed dequant precision issues in Q4_1 and Q5_1 (#9711) Ouadie EL FAROUKI 2024-10-03 07:50:44 +01:00
  • c83ad6d01e
    ggml-backend : add device and backend reg interfaces (#9707) Diego Devesa 2024-10-03 01:49:47 +02:00
  • a39ab216aa
    llama : reduce compile time and binary size (#9712) Xuan Son Nguyen 2024-10-02 15:49:55 +02:00
  • f536f4c439
    [SYCL] Initial cmake support of SYCL for AMD GPUs (#9658) Alberto Cabrera Pérez 2024-10-02 13:57:18 +01:00
  • 00b7317e63
    vulkan : do not use tensor->extra (#9407) Radoslav Gerganov 2024-10-02 13:49:16 +03:00
  • 76b37d1541
    gguf-split : improve --split and --merge logic (#9619) Zhenwei Jin 2024-10-02 15:21:57 +08:00
  • 148844fe97
    examples : remove benchmark (#9704) Georgi Gerganov 2024-10-02 10:14:44 +03:00
  • 3f1ae2e32c
    Update README.md (#9591) Paweł Wodnicki 2024-10-01 12:18:46 -05:00
  • f1b8c42711
    sync : ggml Georgi Gerganov 2024-10-01 16:09:42 +03:00
  • e98c1c188e
    test: fix OPT_STEP_ADAMW for test-backend-ops (ggml/974) Johannes Gäßler 2024-09-30 09:55:23 +02:00
  • cb00020504
    vulkan : mul_mat: fix UB with small warps (ggml/952) Salvatore Mesoraca 2024-09-30 09:14:09 +02:00
  • 6c5322481a
    ggml : fix ggml_cast (ggml/973) Borislav Stanimirov 2024-09-30 10:11:41 +03:00
  • 7254cdf7e8
    ggml: fix gradient allocation logic (ggml/966) Johannes Gäßler 2024-09-29 23:18:02 +02:00
  • cad341d889
    metal : reduce command encoding overhead (#9698) Georgi Gerganov 2024-10-01 16:00:25 +03:00
  • a90484c6d9
    llama : print correct model type for Llama 3.2 1B and 3B Georgi Gerganov 2024-10-01 11:42:01 +03:00
  • 1927378bcc
    convert : refactor rope_freqs generation (#9396) compilade 2024-10-01 02:31:36 -04:00
  • 6f1d9d71f4
    Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641) serhii-nakon 2024-09-30 21:57:12 +03:00
  • 511636df0c
    ci : reduce severity of unused Pyright ignore comments (#9697) compilade 2024-09-30 14:13:16 -04:00
  • 08a43d05b6
    py : update transfomers version (#9694) vb 2024-09-30 17:03:47 +02:00
  • ace4f4be37
    flake.lock: Update (#9680) Georgi Gerganov 2024-09-30 17:48:49 +03:00
  • 8277a817f1
    console : utf-8 fix for windows stdin (#9690) Ruchira Hasaranga 2024-09-30 13:53:42 +05:30
  • c919d5db39
    ggml : define missing HWCAP flags (#9684) Georgi Gerganov 2024-09-29 21:18:23 +03:00
  • d0b1d663e4
    sync : ggml Georgi Gerganov 2024-09-29 21:16:07 +03:00
  • aaa4099925
    CUDA: remove bad assert (ggml/972) Johannes Gäßler 2024-09-29 19:56:17 +02:00
  • 641002fba8
    vulkan : multithread pipeline creation (ggml/963) Jeff Bolz 2024-09-29 11:50:17 -05:00
  • 0de8b203f1
    vulkan : fix build for GGML_VULKAN_RUN_TESTS, add TFLOPS to log (ggml/961) Jeff Bolz 2024-09-27 02:58:01 -05:00
  • 544f409b4b
    vulkan : argsort barriers must be under uniform control flow (ggml/951) Salvatore Mesoraca 2024-09-26 08:59:42 +02:00