Commit graph

  • 128dcbd3c9
    add --no-mmap in llama-bench (#5257) Neo Zhang Jianyu 2024-02-02 03:48:53 +08:00
  • 4d0924a890
    Vulkan Phi Fix for AMD Proprietary Drivers (#5260) 0cc4m 2024-02-01 19:25:24 +01:00
  • 8ca511cade
    cuda : fix LLAMA_CUDA_F16 (#5262) slaren 2024-02-01 18:30:17 +01:00
  • d71ac90985
    make : generate .a library for static linking (#5205) Ali Nehzat 2024-02-02 02:18:53 +11:00
  • ce32060198
    llama : support InternLM2 (#5184) Guoteng 2024-02-01 17:19:51 +08:00
  • 1cfb5372cf
    Fix broken Vulkan Cmake (properly) (#5230) Eve 2024-01-31 19:21:55 +00:00
  • d3bac7d584
    llama : reorder build_orion() at correct place (#5118) Georgi Gerganov 2024-01-31 18:47:10 +02:00
  • 5cb04dbc16
    llama : remove LLAMA_MAX_DEVICES and LLAMA_SUPPORTS_GPU_OFFLOAD (#5240) Georgi Gerganov 2024-01-31 17:30:17 +02:00
  • efb7bdbbd0
    metal : add im2col F32 dst support (#5132) Georgi Gerganov 2024-01-31 15:35:41 +02:00
  • 15606309a0
    llava : add MobileVLM support (#5132) JidongZhang-THU 2024-01-31 21:10:15 +08:00
  • b2b9f025e7
    format license text, restore apache license by legal suggestion (#5233) Neo Zhang Jianyu 2024-01-31 21:04:46 +08:00
  • dabcc5b471
    ggml : limit n_threads to the max n_tasks (#5238) slaren 2024-01-31 13:43:03 +01:00
  • f8e9140cb4
    Vulkan Fixes (#5223) 0cc4m 2024-01-31 11:44:19 +01:00
  • d62520eb2c
    Fix typos of IQ2_XXS and IQ3_XXS in llama.cpp (#5231) Yiming Cui 2024-01-31 11:04:21 +08:00
  • 01684139c3
    support SYCL backend windows build (#5208) Neo Zhang Jianyu 2024-01-31 10:38:07 +08:00
  • e8dc55d006
    kompute : llama-bench support and ggml_cpu_has_kompute() (#5226) Jared Van Bortel 2024-01-30 19:04:37 -05:00
  • e0085fdf7c
    Revert "server : change deps.sh xxd files to string literals (#5221)" Georgi Gerganov 2024-01-30 21:19:26 +02:00
  • e6f291d158
    server : fix context shift (#5195) Georgi Gerganov 2024-01-30 20:17:30 +02:00
  • 4003be0e5f
    server : change deps.sh xxd files to string literals (#5221) JohnnyB 2024-01-30 12:15:05 -06:00
  • fea4fd4ba7
    ggml : fix IQ3_XXS on Metal (#5219) Kawrakow 2024-01-30 19:15:28 +02:00
  • 8f8ddfcfad
    sync : ggml (#0) Georgi Gerganov 2024-01-30 16:21:57 +02:00
  • 6fb50ebbf0
    gguf : fix comparison (ggml/715) Georgi Gerganov 2024-01-29 21:08:18 +02:00
  • 625a699b54
    ggml_cuda_cpy support for 4d tensors and float16->float32 upcasting (ggml/686) John Balis 2024-01-29 06:37:33 -06:00
  • a4b07c057a
    gguf : add input validation, prevent integer overflows (ggml/709) Georgi Gerganov 2024-01-29 14:00:10 +02:00
  • 549a1e6cd5
    ci : fix yolo URLs + fix metal capture (ggml/712) Georgi Gerganov 2024-01-29 13:29:46 +02:00
  • 5f14ee0b0c
    metal : add debug capture backend function (ggml/694) Jack Mousseau 2024-01-29 01:22:23 -08:00
  • 8e14e3ddb3
    Faster AVX2 dot product for IQ2_XS (#5187) Kawrakow 2024-01-30 15:15:07 +02:00
  • f4d7e54974
    SOTA 3-bit quants (#5196) Kawrakow 2024-01-30 15:14:12 +02:00
  • 2256f36b79
    Vulkan Windows APU Memory Handling (#5199) 0cc4m 2024-01-30 13:59:30 +01:00
  • 7359016c7c
    quantize : fix typo (#5211) Vladimir Malyutin 2024-01-30 17:57:07 +07:00
  • 813416991a
    main : allow empty --prompt-cache file (#5176) divinity76 2024-01-30 10:18:02 +01:00
  • 5589921ef8
    readme : minor (#5204) Romain Neutron 2024-01-30 10:16:38 +01:00
  • 49f44b5c55
    readme : update hot topics Georgi Gerganov 2024-01-30 11:14:44 +02:00
  • 6685cc41c2
    server : improve README (#5209) Wu Jian Ping 2024-01-30 17:11:46 +08:00
  • ceebbb5b21
    ggml alloc: Fix for null dereference on alloc failure (#5200) Paul Tsochantaris 2024-01-29 22:19:29 +00:00
  • 6daa69ee81
    kompute : fix fallback to CPU (#5201) Jared Van Bortel 2024-01-29 17:11:27 -05:00
  • fbf1ddec69
    Nomic Vulkan backend (#4456) Jared Van Bortel 2024-01-29 15:50:50 -05:00
  • 2aed77eb06
    fix typo "RLIMIT_MLOCK" (#5175) divinity76 2024-01-29 15:45:41 +01:00
  • c82d18e863
    server : embeddings compatibility for OpenAI (#5190) Wu Jian Ping 2024-01-29 21:48:10 +08:00
  • 14fef85e2d
    py : fix except (#5194) Georgi Gerganov 2024-01-29 15:35:54 +02:00
  • e76627bcce
    py : improve BPE tokenizer support (#5189) Sang-Kil Park 2024-01-29 18:24:19 +09:00
  • fbe7dfa53c
    ggml : add max buffer sizes to opencl and metal backends (#5181) slaren 2024-01-29 09:05:13 +01:00
  • 172ac82629
    cmake : fix Vulkan build (#5182) Eve 2024-01-29 08:04:47 +00:00
  • d2f650cb5b
    metal : free metal objects (#5161) Paul Tsochantaris 2024-01-28 19:50:16 +00:00
  • 35dec26cc2
    sync : ggml Georgi Gerganov 2024-01-28 19:48:05 +02:00
  • d460510c72
    ggml : minor type fix (int64_t -> size_t) Georgi Gerganov 2024-01-28 18:44:58 +02:00
  • 2307523d32
    ggml : add Vulkan backend (#2059) 0cc4m 2024-01-28 18:03:59 +01:00
  • 0f648573dd
    ggml : add unified SYCL backend for Intel GPUs (#2690) Abhilash Majumder 2024-01-28 21:26:23 +05:30
  • b764b8f1d0
    flake.lock: Update (#5162) Georgi Gerganov 2024-01-28 16:54:54 +02:00
  • 9241c3a2ac
    Apply min_p to unsorted tokens (#5115) Johannes Gäßler 2024-01-28 09:59:49 +01:00
  • b2b2bf988c
    Tests for min_p, sampling queue (#5147) Johannes Gäßler 2024-01-28 09:35:14 +01:00
  • af4980bfed
    readme : add link to rust bindings (#5148) Marcus Dunn 2024-01-28 00:30:44 -08:00
  • f2e69d28c0
    llama : add support for Orion-14B (#5118) sharpHL 2024-01-28 16:00:30 +08:00
  • 39baaf55a1
    docker : add server-first container images (#5157) Kyle Mistele 2024-01-28 01:55:31 -06:00
  • 6db2b41a76
    llava : support for Yi-VL and fix for mobileVLM (#5093) John 2024-01-27 16:09:18 +01:00
  • 753eafed0e
    sync : ggml Georgi Gerganov 2024-01-27 16:59:20 +02:00
  • e976423005
    ggml : check ggml_add src1 type (ggml/708) Judd 2024-01-26 21:04:01 +08:00
  • 35a2ee9143
    Remove unused data and add fixes (#5154) Michael Klimenko 2024-01-27 15:25:55 +01:00
  • ec903c0341
    server : add self-extend support (#5104) Maximilian Winter 2024-01-27 14:38:05 +01:00
  • a1d6df129b
    Add OpenCL add kernel (#5151) 0cc4m 2024-01-26 23:07:32 +01:00
  • bbe7c56c99
    cmake : pass CPU architecture flags to nvcc (#5146) Jared Van Bortel 2024-01-26 15:34:06 -05:00
  • 62fead3ea0
    cuda : fix tensor size calculation for non-split buffer (#5145) slaren 2024-01-26 18:59:43 +01:00
  • 15b4538ff2
    ggml-alloc : add 10% margin to the buffer sizes (#5149) slaren 2024-01-26 18:18:26 +01:00
  • 7032f4f634
    ggml : update softmax n_task calculation (#5126) snadampal 2024-01-26 11:17:59 -06:00
  • 5f1925a8ce
    scripts : move run-with-preset.py from root to scripts folder Georgi Gerganov 2024-01-26 17:09:44 +02:00
  • 3b7c914de2
    tests : gitignore test-c.o Georgi Gerganov 2024-01-26 14:48:15 +02:00
  • 48c857aa10
    server : refactored the task processing logic (#5065) Xuan Son Nguyen 2024-01-26 13:42:20 +01:00
  • 413e7b0559
    ci : add model tests + script wrapper (#4586) crasm 2024-01-26 07:18:00 -05:00
  • 6dd3c28c9c
    metal : remove unused n_buffers and buffers (#5129) Paul Tsochantaris 2024-01-26 12:16:07 +00:00
  • 38b431de23
    gguf : fix "general.alignment" type in gguf_reader.py (#5136) Riceball LEE 2024-01-26 17:10:28 +08:00
  • aad0b01d73
    readme : update hot topics Georgi Gerganov 2024-01-26 10:52:33 +02:00
  • 1182cf4d4f
    Another bucket sort (#5109) Kawrakow 2024-01-26 09:14:39 +02:00
  • fe54033b69
    readme : add MobileVLM 1.7B/3B to the supported models list (#5107) XiaotaoChen 2024-01-26 04:14:32 +08:00
  • 5eaf9964fc
    llama : dynamic temperature sampling (#4972) l3utterfly 2024-01-26 05:06:22 +09:00
  • d292f4f204
    examples : make pydantic scripts pass mypy and support py3.8 (#5099) Jared Van Bortel 2024-01-25 14:51:24 -05:00
  • 256d1bb0dd
    android : use release cmake build type by default (#5123) Valentin Konovalov 2024-01-25 12:05:51 -05:00
  • faa3526a1e
    Fix Q3_K_XS for MoE models (#5113) Kawrakow 2024-01-25 17:58:53 +02:00
  • ddc5a5033f
    metal : show compile log messages Georgi Gerganov 2024-01-25 11:26:17 +02:00
  • cd4fddb29f
    cuda : fix 2-bit quants on amd hip (#5105) Engininja2 2024-01-24 16:18:15 -06:00
  • c9b316c78f nix-shell: use addToSearchPath Michael Hueschen 2024-01-22 16:44:10 -07:00
  • bf63d695b8 nix: add cc to devShell LD_LIBRARY_PATH Michael Hueschen 2024-01-22 03:17:05 -07:00
  • 1387ea2117
    llama : pre-allocate input tensors in a separate buffer (#5100) slaren 2024-01-24 12:48:14 +01:00
  • 26d607608d
    metal : disable support for MUL_MAT F32 x F16 Georgi Gerganov 2024-01-23 15:50:56 +02:00
  • 44879ee885
    Additional KL-divergence statistics (#5081) Kawrakow 2024-01-23 15:17:20 +02:00
  • 9ecdd12e95
    CUDA: more info when no device code (#5088) Johannes Gäßler 2024-01-23 13:31:56 +01:00
  • 89758723c7
    minor : clean-up some warnings and style (#5094) Georgi Gerganov 2024-01-23 14:12:57 +02:00
  • 2bed4aa3f3
    devops : add intel oneapi dockerfile (#5068) Xuan Son Nguyen 2024-01-23 08:11:39 +01:00
  • 125d03a503
    llama.vim : added api key support (#5090) Michael Coppola 2024-01-23 01:51:27 -05:00
  • 011e8ec577
    llama : fix not enough space in buffer with Qwen (#5086) slaren 2024-01-22 23:42:41 +01:00
  • 6f9939d119
    KL-divergence (#5076) Kawrakow 2024-01-22 16:10:14 +02:00
  • 780e24a22e
    ggml : parallelize FP32 conversion when using BLAS (#5045) Reinforce-II 2024-01-22 21:15:08 +08:00
  • 3ce7e8f8e7
    llava : MobileVLM support (#4954) XiaotaoChen 2024-01-22 21:09:35 +08:00
  • b2d80e105a flake.nix: add a comment about flakes vs nix Someone Serge 2024-01-21 03:41:37 +00:00
  • 28603cd283 nix: add a comment on the many nixpkgs-with-cuda instances Someone Serge 2024-01-21 03:29:38 +00:00
  • 5e97ec91ae nix: add a comment about makeScope Someone Serge 2024-01-21 03:15:13 +00:00
  • 7251870780 nix: refactor the cleanSource rules Someone Serge 2024-01-13 17:45:01 +00:00
  • fe8b3c0d4b workflows: nix-ci: drop the redundant "paths" filter Someone Serge 2024-01-13 17:38:32 +00:00
  • f4dd059259 workflows: nix-build-aarch64: rate limit Someone Serge 2024-01-13 17:16:54 +00:00
  • f7276f7500 workflows: nix-ci: rebuild on flake.lock updates Someone Serge 2024-01-13 17:10:19 +00:00
  • 15bceec2d7
    imatrix : keep intermediate imatrix results (#5077) Kawrakow 2024-01-22 14:18:43 +02:00