Commit graph

  • edd1ab7bc3 flake.lock: update Someone Serge 2023-12-31 17:42:22 +00:00
  • 198ed7ebfc flake.nix: suggest the binary caches Someone Serge 2023-12-30 18:25:25 +00:00
  • d836174731 workflows: nix-ci: add a qemu job for jetsons Someone Serge 2023-12-30 18:01:07 +00:00
  • 06f2a5d190 workflows: nix-flakestry: drop tag filters Someone Serge 2023-12-30 17:36:08 +00:00
  • c5239944ba workflows: weekly nix flake update Someone Serge 2023-12-30 16:38:36 +00:00
  • 1e9ae54cf2 workflows: nix-ci: add a job for eval Someone Serge 2023-12-30 17:19:11 +00:00
  • 7adedecbe3 workflows: nix-ci: init; build flake outputs Someone Serge 2023-12-26 19:17:26 +00:00
  • 356ea17e0f flake.nix: expose checks Someone Serge 2023-12-29 16:21:50 +00:00
  • a5c088d8c6 flake.nix: rocm not yet supported on aarch64, so hide the output Someone Serge 2023-12-26 23:34:40 +00:00
  • 1e3900ebac flake.nix: expose full scope in legacyPackages Someone Serge 2023-12-29 16:15:37 +00:00
  • e39106c055
    ggml : add ggml_vdotq_s32 alias (#4715) Georgi Gerganov 2023-12-31 11:43:31 +02:00
  • 9fbda719de
    clip : refactor + bug fixes (#4696) Georgi Gerganov 2023-12-30 23:24:42 +02:00
  • 39d8bc71ed
    CUDA: fixed tensor cores not being used on RDNA3 (#4697) Johannes Gäßler 2023-12-30 13:52:01 +01:00
  • 24a447e20a
    ggml : add ggml_cpu_has_avx_vnni() (#4589) automaticcat 2023-12-30 15:07:48 +07:00
  • a20f3c7465
    CUDA: fix tensor core logic for Pascal and HIP (#4682) Johannes Gäßler 2023-12-29 23:12:53 +01:00
  • 0235b9b571
    clip : use ggml_backend_buffer_is_host (#4205) Georgi Gerganov 2023-12-29 18:53:34 +02:00
  • ce18d727a4
    clip : enable gpu backend (#4205) Steward Garcia 2023-12-29 11:52:15 -05:00
  • 91bb39cec7
    cuda: fix vmm oom issue on NVIDIA AGX Orin (#4687) hydai 2023-12-30 00:31:19 +08:00
  • 04ac0607e9
    python : add check-requirements.sh and GitHub workflow (#4585) crasm 2023-12-29 09:50:29 -05:00
  • 68eccbdc5b
    flake.nix : rewrite (#4605) Philip Taron 2023-12-29 06:42:26 -08:00
  • 97bbca6e85
    cmake : fix ld warning duplicate libraries libllama.a (#4671) Cuong Trinh Manh 2023-12-29 21:39:15 +07:00
  • 4af4801566
    llava-cli : refactor to use sampling library (#4669) Justine Tunney 2023-12-29 06:38:38 -08:00
  • db49ff8ed7
    server : replace sleep with condition variables (#4673) Justine Tunney 2023-12-29 06:24:12 -08:00
  • 60f55e888c
    server : fix OpenAI server sampling w.r.t. penalty. (#4675) SakuraUmi 2023-12-29 22:22:44 +08:00
  • b93edd22f5
    server : allow to generate multimodal embeddings (#4681) Karthik Sethuraman 2023-12-29 06:22:10 -08:00
  • 82d6eab224
    main-cmake-pkg : fix build issue (#4665) andrijdavid 2023-12-29 15:18:20 +01:00
  • afd997ab60
    llama.swiftui : fix infinite loop, ouput timings, buff UI (#4674) Peter Sugihara 2023-12-29 05:58:56 -08:00
  • c8255f8a6b
    scripts : print list of sync commits Georgi Gerganov 2023-12-29 15:12:35 +02:00
  • 441f51dca0
    ci : build with CLBlast + ggml-opencl use GGML_API (whisper/1576) Tamotsu Takahashi 2023-12-29 19:23:27 +09:00
  • 38b3de4658
    sync : ggml Georgi Gerganov 2023-12-29 14:56:41 +02:00
  • afc8c19291
    ggml : fix some mul mat cases + add tests for src1 F16 (ggml/669) bssrdf 2023-12-29 03:32:31 -05:00
  • ca38b8d334
    scripts : do not sync commits from this repo Georgi Gerganov 2023-12-29 14:41:36 +02:00
  • 65e5f6dadb
    Fix OpenAI server sampling w.r.t. temp and seed (#4668) Justine Tunney 2023-12-28 11:20:00 -08:00
  • ea5497df5d
    gpt2 : Add gpt2 architecture integration (#4555) manikbhandari 2023-12-28 09:03:57 -05:00
  • f6793491b5
    llama : add AWQ for llama, llama2, mpt, and mistral models (#4593) Nam D. Tran 2023-12-27 22:39:45 +07:00
  • 879b690a9e
    finetune : fix output formatting in print_params (#4653) Daniel Bevenius 2023-12-27 15:16:55 +01:00
  • b47879b0dd
    scripts : add sync-ggml-am.sh Georgi Gerganov 2023-12-27 11:15:31 +02:00
  • 951010fa53
    ggml : fix dot product for ARM (#4630) Georgi Gerganov 2023-12-27 11:02:13 +02:00
  • f56d6077d0
    Add byte token type when tokenizer.model is not exists (#4641) wonjun Jang 2023-12-27 17:37:25 +09:00
  • dc68f0054c
    cuda : fix vmm pool with multi GPU (#4620) slaren 2023-12-26 21:23:59 +01:00
  • de8e496437
    Update comment for AdamW implementation reference. (#4604) WillCorticesAI 2023-12-26 05:42:08 -05:00
  • 77465dad48
    Fix new CUDA10 compilation errors (#4635) FantasyGmm 2023-12-26 18:38:36 +08:00
  • a206137f92
    Adding Emeltal reference to UI list (#4629) Paul Tsochantaris 2023-12-25 16:09:53 +00:00
  • b9f47952ff
    simplify bug issue template (#4623) slaren 2023-12-24 21:01:12 +01:00
  • 753be377b6
    llama : add PLaMo model (#3557) Shintarou Okada 2023-12-24 22:35:49 +09:00
  • 5bf3953d7e
    cuda : improve cuda pool efficiency using virtual memory (#4606) slaren 2023-12-24 14:34:22 +01:00
  • 708e179e85
    fallback to CPU buffer if host buffer alloc fails (#4610) slaren 2023-12-23 16:10:51 +01:00
  • 925e5584a0
    ci(docker): fix tags in "Build and push docker image (tagged)" (#4603) Samuel Maynard 2023-12-23 11:35:55 +02:00
  • 6123979952
    server : allow to specify custom prompt for penalty calculation (#3727) Alexey Parfenov 2023-12-23 09:31:49 +00:00
  • b9ec82d262
    grammar : check the full vocab only if necessary (opt) (#4306) kalomaze 2023-12-23 03:27:07 -06:00
  • e0a4002273
    CUDA: fixed row rounding for 0 tensor splits (#4594) Johannes Gäßler 2023-12-23 09:16:33 +01:00
  • 7082d24cec
    lookup : add prompt lookup decoding example (#4484) LeonEricsson 2023-12-22 17:05:56 +01:00
  • ba66175132
    sync : ggml (fix im2col) (#4591) Georgi Gerganov 2023-12-22 17:53:43 +02:00
  • a55876955b
    cuda : fix jetson compile error (#4560) FantasyGmm 2023-12-22 23:11:12 +08:00
  • 6724ef1657
    Fix CudaMemcpy direction (#4599) Henrik Forstén 2023-12-22 15:34:05 +02:00
  • 48b7ff193e
    llama : fix platforms without mmap (#4578) slaren 2023-12-22 12:12:53 +01:00
  • 48b24b170e
    ggml : add comment about backward GGML_OP_DIAG_MASK_INF (#4203) Herman Semenov 2023-12-22 09:26:49 +00:00
  • 28cb35a0ec
    make : add LLAMA_HIP_UMA option (#4587) Michael Kesper 2023-12-22 09:03:25 +01:00
  • f31b984898
    ci : tag docker image with build number (#4584) rhuddleston 2023-12-21 23:56:34 -07:00
  • 2bb98279c5
    readme : add zig bindings (#4581) Deins 2023-12-22 08:49:54 +02:00
  • 0137ef88ea
    ggml : extend enum ggml_log_level with GGML_LOG_LEVEL_DEBUG (#4579) bobqianic 2023-12-22 06:47:01 +00:00
  • c7e9701f86
    llama : add ability to cancel model loading (#4462) crasm 2023-12-22 01:19:36 -05:00
  • afefa319f1
    ggml : change ggml_scale to take a float instead of tensor (#4573) Georgi Gerganov 2023-12-21 23:20:49 +02:00
  • 769a7bc85e
    gguf-py : fix broken link Georgi Gerganov 2023-12-21 23:20:36 +02:00
  • 32259b2dad
    gguf : simplify example dependencies Georgi Gerganov 2023-12-21 23:07:58 +02:00
  • 4a5f9d629e
    ci : add jlumbroso/free-disk-space to docker workflow (#4150) Samuel Maynard 2023-12-21 22:36:26 +02:00
  • d232aca5a7
    llama : initial ggml-backend integration (#4520) slaren 2023-12-21 21:07:46 +01:00
  • 31f27758fa
    llama : allow getting n_batch from llama_context in c api (#4540) Marcus Dunn 2023-12-21 11:57:48 -08:00
  • 56fa50819f
    metal : fix ggml_metal_log vargs (#4373) Finn Voorhees 2023-12-21 14:55:02 -05:00
  • 0f630fbc92
    cuda : ROCm AMD Unified Memory Architecture (UMA) handling (#4449) Erik Garrison 2023-12-21 13:45:32 -06:00
  • 562cf222b5
    ggml-cuda: Fix HIP build by adding define for __trap (#4569) arlo-phoenix 2023-12-21 20:13:25 +01:00
  • 8fe03ffdda
    common : remove incorrect --model-draft default (#4568) Jared Van Bortel 2023-12-21 12:55:34 -05:00
  • 9154494808
    CUDA: mul_mat_id always on GPU for batches >= 32 (#4553) Johannes Gäßler 2023-12-21 18:42:59 +01:00
  • c083718c89
    readme : update coding guidelines Georgi Gerganov 2023-12-21 19:27:14 +02:00
  • 880e352277
    py : open merges file as 'utf-8' (#4566) howlger 2023-12-21 18:07:34 +01:00
  • 66f35a2f48
    cuda : better error message for ggml_get_rows (#4561) bobqianic 2023-12-21 17:06:44 +00:00
  • 1398823922
    cuda : replace asserts in wrong architecture checks with __trap (#4556) slaren 2023-12-21 18:02:30 +01:00
  • d3223afdad
    llama : disable per-tensor info prints on model load (#4562) Johannes Gäßler 2023-12-21 17:34:17 +01:00
  • 1d7a1912ce
    Fix access violation in ggml_cuda_free_data if tensor->extra is NULL (#4554) LoganDark 2023-12-21 01:59:27 -08:00
  • 799fc22689
    CUDA: Faster Mixtral prompt processing (#4538) Johannes Gäßler 2023-12-20 15:41:22 +01:00
  • 328b83de23
    ggml : fixed check for _MSC_VER (#4535) Eric Sommerlade 2023-12-19 16:17:01 +00:00
  • a7aee47b98
    ggml-cuda: Fix HIP build (#4528) arlo-phoenix 2023-12-18 22:33:45 +01:00
  • 0e18b2e7d0
    llama.swiftui : add tinyllama 1.1B F16 Georgi Gerganov 2023-12-18 20:17:43 +02:00
  • 6ff39b129d
    llama.swiftui : add more models Georgi Gerganov 2023-12-18 20:05:12 +02:00
  • b9e74f9bca
    llama : add phi-2 + fix NeoX rope + ggml_mul_mat_set_prec (#4490) Ebey Abraham 2023-12-18 17:27:47 +00:00
  • 3c04bf6da8
    llama : fix try_override for bool_value which always return true (#4519) hankcs 2023-12-18 05:14:58 -08:00
  • 2994f0c5a2
    decode : fix logits_valid for legacy API (#4516) Jared Van Bortel 2023-12-17 19:39:02 -05:00
  • b1306c4394
    readme : update hot topics Georgi Gerganov 2023-12-17 20:16:23 +02:00
  • 800a489e4a
    llama.swiftui : add bench functionality (#4483) Georgi Gerganov 2023-12-17 19:38:41 +02:00
  • f7f468a97d
    gguf-py : fail fast on nonsensical special token IDs (#4489) Jared Van Bortel 2023-12-17 10:45:46 -05:00
  • 919c40660f
    build : Check the ROCm installation location (#4485) Matheus Gabriel Alves Silva 2023-12-17 12:23:33 -03:00
  • 45668633fd
    finetune : keep allocs alive until all allocations are done (#4486) slaren 2023-12-17 16:05:56 +01:00
  • 0ffc92d2d2
    server : disable llm logs if SERVER_VERBOSE is off (#3792) olexiyb 2023-12-17 17:02:16 +02:00
  • 8edd2b40fd
    server : fix grammar being ignored (#4494) AdithyanI 2023-12-17 15:57:56 +01:00
  • eb16dae7e7
    server : fix possible ambiguity in content type charset (#4501) Alexey Parfenov 2023-12-17 14:56:09 +00:00
  • 62bd52b7bf
    server : allow requests larger than 8K (#4500) mzcu 2023-12-17 15:54:37 +01:00
  • 5daa5f54fd
    Link to cublas dynamically on Windows even with LLAMA_STATIC (#4506) Bach Le 2023-12-17 18:57:33 +08:00
  • c6c4fc081c
    lora : add support for non-llama models (#3333) slaren 2023-12-16 18:58:46 +01:00
  • 8a5be3bd58
    llama : sanity checks for access to logits (#4274) Jared Van Bortel 2023-12-15 22:16:15 -05:00
  • 88ae8952b6
    server : add optional API Key Authentication example (#4441) ShadovvBeast 2023-12-15 13:49:01 +02:00