Commit graph

  • 584d674be6
    llama : remove redundant assert for StableLM (#4901) Georgi Gerganov 2024-01-12 20:54:12 +02:00
  • 930f907d3e
    export-lora : use LLAMA_FILE_MAGIC_GGLA (#4894) Daniel Bevenius 2024-01-12 18:54:53 +01:00
  • e790eef21c
    llama.swiftui : update models layout (#4826) Zay 2024-01-12 05:48:00 -07:00
  • 5537d9d36b
    gitignore : imatrix Georgi Gerganov 2024-01-12 14:33:21 +02:00
  • 1b280c9fff
    CUDA: fix softmax compile for old CUDA versions (#4862) Johannes Gäßler 2024-01-12 12:30:41 +01:00
  • 3cabe80630
    llama : fix typo "imp_embd" -> "inp_embd" Georgi Gerganov 2024-01-12 13:10:19 +02:00
  • 4315a94366
    common : streamline the formatting of help (#4890) howlger 2024-01-12 12:05:32 +01:00
  • 2d00741e12
    py : fix lint (#4889) Georgi Gerganov 2024-01-12 13:03:38 +02:00
  • f445c0e68c
    llama : fix llm_build_k_shift to use correct n_rot (#4889) Georgi Gerganov 2024-01-12 13:01:56 +02:00
  • 326b418b59
    Importance Matrix calculation (#4861) Kawrakow 2024-01-12 06:59:57 +01:00
  • 1d118386fe
    server : fix infill when prompt is empty (#4833) Georgi Gerganov 2024-01-11 23:23:49 +02:00
  • 7edefbd79c
    main : better name for variable n_print (#4874) Georgi Gerganov 2024-01-11 22:46:26 +02:00
  • 3ca63b4538
    main : disable token count by default (#4874) Georgi Gerganov 2024-01-11 22:43:05 +02:00
  • b037787548
    swift : track ggml release branch (#4867) Georgi Gerganov 2024-01-11 21:58:28 +02:00
  • 469e75d0a3
    llama : restore intended k-quants mixes for MoE models (#4872) Kawrakow 2024-01-11 20:43:15 +01:00
  • 49662cbed3
    ggml : SOTA 2-bit quants (add IQ2_XS) (#4856) Kawrakow 2024-01-11 20:39:39 +01:00
  • 3ba5b8ca8e
    swift : pin ggml commit + remove ggml.h from spm-headers (#4878) Georgi Gerganov 2024-01-11 21:31:31 +02:00
  • 4330bd83fe
    server : implement credentialed CORS (#4514) Laura 2024-01-11 19:02:48 +01:00
  • 27379455c3
    server : support for multiple api keys (#4864) Michael Coppola 2024-01-11 12:51:17 -05:00
  • eab6795006
    server : add LOG_INFO when model is successfully loaded (#4881) Behnam M 2024-01-11 12:41:39 -05:00
  • d8d90aa343
    ci: nix-flake-update: new token with pr permissions (#4879) Someone 2024-01-11 17:22:34 +00:00
  • 43f76bf1c3
    main : print total token count and tokens consumed so far (#4874) pudepiedj 2024-01-11 16:14:52 +00:00
  • 2f043328e3
    server : fix typo in model name (#4876) Isaac McFadyen 2024-01-11 09:33:26 -05:00
  • 2a7c94db5f
    metal : put encoder debug group behind a define (#4873) Paul Tsochantaris 2024-01-11 14:31:52 +00:00
  • 64802ec00d
    sync : ggml Georgi Gerganov 2024-01-11 09:39:08 +02:00
  • 3267c2abc7
    metal : fix deprecation warning (ggml/690) Georgi Gerganov 2024-01-11 09:34:59 +02:00
  • f85a973aa1
    ggml : remove ggml_cpy_inplace and ggml_cont_inplace (ggml/693) Timothy Cronin 2024-01-11 02:27:48 -05:00
  • 5362e43962
    metal : wrap each operation in debug group (ggml/690) Jack Mousseau 2024-01-10 06:19:19 -08:00
  • e739de7909
    ggml : change GGML_MAX_NAME at compile time (ggml/682) leejet 2024-01-10 21:13:42 +08:00
  • c910e3c28a
    Fix execlp call (ggml/689) Halalaluyafail3 2024-01-09 11:16:37 -05:00
  • f34432ca1e
    fix : cuda order of synchronization when setting a buffer (ggml/679) Erik Scholz 2024-01-05 16:00:00 +01:00
  • 7a9f75c38b
    server : update readme to document the new /health endpoint (#4866) Behnam M 2024-01-11 02:12:05 -05:00
  • 5c1980d8d4
    server : fix build + rename enums (#4870) Georgi Gerganov 2024-01-11 09:10:34 +02:00
  • cd108e641d
    server : add a /health endpoint (#4860) Behnam M 2024-01-10 14:56:05 -05:00
  • 57d016ba2d
    llama : add additional suffixes for model params (#4834) Brian 2024-01-11 01:09:53 +11:00
  • 329ff61569
    llama : recognize 1B phi models (#4847) Austin 2024-01-10 08:39:09 -05:00
  • d34633d8db
    clip : support more quantization types (#4846) John 2024-01-10 14:37:09 +01:00
  • 4f56458d34
    Python script to compare commits with llama-bench (#4844) Johannes Gäßler 2024-01-10 01:04:33 +01:00
  • 6efb8eb30e
    convert.py : fix vanilla LLaMA model conversion (#4818) Austin 2024-01-09 13:46:46 -05:00
  • 36e5a08b20
    llava-cli : don't crash if --image flag is invalid (#4835) Justine Tunney 2024-01-09 09:59:14 -08:00
  • 4dccb38d9a
    metal : improve dequantize precision to match CPU (#4836) Georgi Gerganov 2024-01-09 19:37:08 +02:00
  • 9a818f7c42
    scripts : improve get-pg.sh (#4838) Georgi Gerganov 2024-01-09 19:20:45 +02:00
  • 18adb4e9bb
    readme : add 3rd party collama reference to UI list (#4840) iohub 2024-01-10 00:45:54 +08:00
  • d9653894df
    scripts : script to get Paul Graham essays in txt format (#4838) Georgi Gerganov 2024-01-09 16:23:05 +02:00
  • 128de3585b
    server : update readme about token probs (#4777) Behnam M 2024-01-09 05:02:05 -05:00
  • 8c58330318
    server : add api-key flag to documentation (#4832) Zsapi 2024-01-09 10:12:43 +01:00
  • 18c2e1752c
    ggml : fix vld1q_s8_x4 32-bit compat (#4828) Georgi Gerganov 2024-01-09 10:42:06 +02:00
  • 8f900abfc0
    CUDA: faster softmax via shared memory + fp16 math (#4742) Johannes Gäßler 2024-01-09 08:58:55 +01:00
  • 1fc2f265ff
    common : fix the short form of --grp-attn-w, not -gat (#4825) howlger 2024-01-08 20:05:53 +01:00
  • a9a8c5de3d
    readme : add link to SOTA models Georgi Gerganov 2024-01-08 20:25:17 +02:00
  • dd5ae06405
    SOTA 2-bit quants (#4773) Kawrakow 2024-01-08 16:02:32 +01:00
  • 668b31fc7d
    swift : exclude ggml-metal.metal from the package (#4822) Georgi Gerganov 2024-01-08 16:40:51 +02:00
  • 42ea63c5a3
    llama.swiftui : update readme Georgi Gerganov 2024-01-08 15:57:36 +02:00
  • 52531fdff8
    main : add self-extend support (#4815) Georgi Gerganov 2024-01-08 11:18:32 +02:00
  • b0034d93ce
    examples : add passkey test (#3856) Georgi Gerganov 2024-01-08 11:14:04 +02:00
  • b7e7982953
    readme : add lgrammel/modelfusion JS/TS client for llama.cpp (#4814) Lars Grammel 2024-01-07 21:24:11 +01:00
  • 226460cc0d
    llama-bench : add no-kv-offload parameter (#4812) slaren 2024-01-07 17:59:01 +01:00
  • d5a410e855
    CUDA: fixed redundant value dequantization (#4809) Johannes Gäßler 2024-01-07 17:24:08 +01:00
  • 9dede37d81
    llama : remove unused vars (#4796) Georgi Gerganov 2024-01-07 14:29:36 +02:00
  • 3c36213df8
    llama : remove redundant GQA check (#4796) Georgi Gerganov 2024-01-07 11:21:53 +02:00
  • 72d8407b36
    llama.swiftui : use llama.cpp as SPM package (#4804) Alex Azarov 2024-01-07 09:20:50 +01:00
  • d117d4dc5d
    llama : print tensor meta for debugging Georgi Gerganov 2024-01-07 09:50:31 +02:00
  • 3418c03ecc
    llama.swiftui : add visionOS target (#4805) Alex Azarov 2024-01-07 08:46:55 +01:00
  • 63ee677efd
    ggml : use __builtin_amdgcn_sudot4 in __dp4a for gfx11 (#4787) Konstantin Zhuravlyov 2024-01-07 01:52:42 -05:00
  • 67984921a7
    server : fix n_predict check (#4798) Georgi Gerganov 2024-01-07 08:45:26 +02:00
  • c75ca5d96f
    llama.swiftui : use correct pointer for llama_token_eos (#4797) Daniel Illescas Romero 2024-01-06 16:12:59 +01:00
  • 96e80dabc6
    examples : improve base-translate.sh script (#4783) Georgi Gerganov 2024-01-06 11:40:24 +02:00
  • eec22a1c63
    cmake : check for openblas64 (#4134) a-n-n-a-l-e-e 2024-01-05 08:04:40 -08:00
  • be36bb946a
    flake.nix : fix typo (#4700) Ikko Eltociear Ashimine 2024-01-06 01:02:44 +09:00
  • 91d38876df metal : switch back to default.metallib (ggml/681) Georgi Gerganov 2024-01-05 16:30:52 +02:00
  • d061bf9405 ggml : fix q2_k bpw in comments (ggml/680) Georgi Gerganov 2024-01-05 15:36:04 +02:00
  • 1bf681f90e ggml : add error handling to graph_compute (whisper/1714) Finn Voorhees 2024-01-03 08:39:43 -05:00
  • c1d7cb28d3
    ggml : do not sched_yield when calling BLAS (#4761) Georgi Gerganov 2024-01-05 15:18:21 +02:00
  • 3681f22443
    examples : add few-shot translation example (#4783) Georgi Gerganov 2024-01-05 15:11:10 +02:00
  • b3a7c20b5c
    finetune : remove unused includes (#4756) Daniel Bevenius 2024-01-04 20:45:37 +01:00
  • 012cf349ae
    server : send token probs for "stream == false" (#4714) Georgi Gerganov 2024-01-04 19:56:33 +02:00
  • a91928014f
    Print backend name on test-backend-ops failure (#4751) Johannes Gäßler 2024-01-04 09:43:23 +01:00
  • 3c0b585561
    llama.swiftui : support loading custom model from file picker (#4767) singularity 2024-01-04 16:22:38 +08:00
  • e5804313a1
    server : fix options in README.md (#4765) Michael Coppola 2024-01-04 03:17:09 -05:00
  • dc891b7f7a
    ggml : include stdlib.h before intrin.h (#4736) Georgi Gerganov 2024-01-04 10:12:26 +02:00
  • 46cea79e1f
    llama.swiftui : fix build of ggml.metallib (#4754) singularity 2024-01-04 15:58:16 +08:00
  • cb1e2818e0
    train : fix typo in overlapping-samples help msg (#4758) Daniel Bevenius 2024-01-03 18:53:40 +01:00
  • ece9a45e8f
    swift : update Package.swift to use ggml as dependency (#4691) Ashraful Islam 2024-01-03 11:30:02 -06:00
  • 7bed7eba35 cuda : simplify expression Georgi Gerganov 2024-01-03 14:18:46 +02:00
  • d55356d3ba cuda : mark I16 and I32 ops as unsupported Georgi Gerganov 2024-01-03 13:01:44 +02:00
  • 75e3fd8581 sync : ggml Georgi Gerganov 2024-01-03 11:37:44 +02:00
  • 289313716f metal : add kernel_get_rows_i32 Georgi Gerganov 2024-01-03 11:35:46 +02:00
  • ab62fc3e55 scripts : fix sync order + metal sed Georgi Gerganov 2024-01-03 11:25:54 +02:00
  • 5f66ebca9c ggml : extend ggml_get_rows, ggml_repeat, ggml_concat (ggml/639) Guillaume Wenzek 2023-12-29 18:07:03 +01:00
  • f2eb19bd8b
    server : throw an error when slot unavailable (#4741) Justin Parker 2024-01-03 03:43:19 -05:00
  • f3f62f0d83
    metal : optimize ggml_mul_mat_id (faster Mixtral PP) (#4725) Georgi Gerganov 2024-01-02 21:07:47 +02:00
  • 0ef3ca2ac6
    server : add token counts to html footer (#4738) Phil H 2024-01-02 15:48:49 +00:00
  • 540938f890
    llama : llama_model_desc print number of experts Georgi Gerganov 2024-01-02 16:26:45 +02:00
  • 0040d42eeb
    llama : replace all API facing int's with int32_t (#4577) Marcus Dunn 2024-01-02 06:15:16 -08:00
  • 83e633c27e
    llama : differentiate the KV dims in the attention (#4657) postmasters 2024-01-02 03:51:28 -08:00
  • 32866c5edd
    editorconfig : fix whitespace and indentation #4710 Georgi Gerganov 2024-01-02 13:28:15 +02:00
  • 5d7002d437
    server : add --override-kv parameter (#4710) minarchist 2024-01-02 04:38:15 -06:00
  • 26f3071d71
    py : re-enable mmap in convert hf (#4732) Nam D. Tran 2024-01-02 16:23:38 +07:00
  • 775ac8712a
    finetune: fix typo in README.md (#4733) Daniel Bevenius 2024-01-02 10:16:55 +01:00
  • 58ba655af0
    metal : enable shader debugging (cmake option) (#4705) Georgi Gerganov 2024-01-02 10:57:44 +02:00