Commit graph

  • d6bd4d46dd
    llama : support StableLM 2 1.6B (#5052) compilade 2024-01-22 06:21:52 -05:00
  • 152d9d05e0
    finetune : print sample-start/include-sample-start (#5072) Daniel Bevenius 2024-01-22 12:11:01 +01:00
  • 66d575c45c
    llama : add Q3_K_XS (#5060) Kawrakow 2024-01-22 12:43:33 +02:00
  • 57744932c6
    ci : fix Windows CI by updating Intel SDE version (#5053) bobqianic 2024-01-22 08:55:05 +00:00
  • 3466c6ebcf
    llama : add more qwen2 models (#5071) Shijie 2024-01-22 15:33:19 +08:00
  • 504dc37be8
    Revert LLAMA_NATIVE to OFF in flake.nix (#5066) iSma 2024-01-21 22:37:13 +01:00
  • 05490fad7f
    add safetensors support to convert-lora-to-ggml.py (#5062) kuronekosaiko 2024-01-22 00:28:14 +08:00
  • 6c5629d4d2
    add #include <string> to unicode.h (#5051) bobqianic 2024-01-21 15:17:35 +00:00
  • 7dcbe39d36
    Add ability to evauate multiple choice tasks (#5047) Kawrakow 2024-01-21 14:42:44 +02:00
  • 726c0fa9a2
    Slightly faster imatrix (#5050) Kawrakow 2024-01-21 08:01:20 +02:00
  • 942c0107a7
    flake.lock: Update (#5054) Georgi Gerganov 2024-01-21 05:17:27 +02:00
  • b43ebde3b0
    convert : partially revert PR #4818 (#5041) Jared Van Bortel 2024-01-20 18:14:18 -05:00
  • 97c1549808
    perplexity : fix MSVC build after #5020 (#5043) Jared Van Bortel 2024-01-20 10:08:08 -05:00
  • 6df465a91d
    llama : run all KQV ops on the CPU with no KV offload (#5049) slaren 2024-01-20 16:05:49 +01:00
  • 77bc1bbd05
    cmake : add support for ccache (#5002) Herman Semenov 2024-01-20 08:11:31 +00:00
  • 48e2b13372
    Add a dart/flutter binding to README.md (#4882) adel boussaken 2024-01-20 09:05:43 +01:00
  • cca894f16a
    cuda : fix compile error in jetson platform (#4975) Kylin 2024-01-20 15:01:46 +08:00
  • 381ee19572
    finetune : fix ggml_allocr lifetimes (tmp workaround) (#5033) Uzo Nweke 2024-01-19 13:20:50 -05:00
  • a5cacb22b2
    imatrix : add README.md Georgi Gerganov 2024-01-19 15:24:47 +02:00
  • 9b75cb2b3c
    llama : support upcoming Qwen2 (#5037) Shijie 2024-01-19 19:53:13 +08:00
  • de9a147df1 py : fix flake8 lint Georgi Gerganov 2024-01-19 13:52:22 +02:00
  • 7051aacfac
    winogrande: evaluate log-probs in parallel (#5036) Kawrakow 2024-01-19 11:39:11 +02:00
  • 2b3b999cac
    llama : add CodeShell support (#5016) chiranko 2024-01-19 17:07:27 +08:00
  • 993fba8180
    perplexity: avoid unnecessary alloocations and logit copies (#5035) Kawrakow 2024-01-19 11:02:39 +02:00
  • 8b20858e5e
    perplexity : faster Winogrande via batching (#5024) Georgi Gerganov 2024-01-19 10:45:06 +02:00
  • 57e2a7a52a
    llama : fix falcon arch for tied output embeddings (#4978) John 2024-01-18 23:12:15 +01:00
  • 9b6ea4263a
    cmake : add ggml public headers (#5011) Georgi Gerganov 2024-01-18 23:36:07 +02:00
  • 821f0a271e
    server : defer tasks when "slot unavailable" (#5018) Xuan Son Nguyen 2024-01-18 21:33:05 +01:00
  • 96d7f56d29
    llama : fix mlock with no-mmap with Metal (#5025) slaren 2024-01-18 21:12:15 +01:00
  • 2d5419d08a
    imatrix : fix assert for src0 non-cont check Georgi Gerganov 2024-01-18 21:45:51 +02:00
  • d391ae9b49
    perplexity : fix winogrande N tasks option Georgi Gerganov 2024-01-18 20:49:00 +02:00
  • e9240cdfa0
    scripts : add get-winogrande.sh Georgi Gerganov 2024-01-18 20:45:39 +02:00
  • b46757735d
    convert.py : fix llama/llama2 conversion due to vocab_size=-1 (#5019) David Sommers 2024-01-18 12:20:59 -05:00
  • 3e945cc1e9
    HellaSwag: speed up by parallelizing log-prob evaluation (#5020) Kawrakow 2024-01-18 19:18:21 +02:00
  • ad19812cda
    perplexity : faster HellaSwag via batching (#5017) Georgi Gerganov 2024-01-18 15:33:01 +02:00
  • 682986a08e
    Add Winogrande evaluation (#5015) Kawrakow 2024-01-18 13:46:27 +02:00
  • dcad445d0c
    scritps : add helper script to get hellaswag data in txt format Georgi Gerganov 2024-01-18 11:44:49 +02:00
  • 1e605f4102
    metal : fix memory leak, dangling pointer and unused autorel (#5007) Paul Tsochantaris 2024-01-18 08:47:24 +00:00
  • 6b6916b215
    sync : ggml Georgi Gerganov 2024-01-17 20:54:50 +02:00
  • 38566680cd
    ggml : add IQ2 to test-backend-ops + refactoring (#4990) Georgi Gerganov 2024-01-17 18:54:56 +02:00
  • ba69bbc84c
    imatrix : offload to GPU support (#4957) Georgi Gerganov 2024-01-17 18:46:30 +02:00
  • 44a1a4a41a
    backend : add eval callback (#4935) Georgi Gerganov 2024-01-17 18:39:41 +02:00
  • c918fe8dca
    metal : create autorelease pool during library build (#4970) Georgi Gerganov 2024-01-17 18:38:39 +02:00
  • 0f83e727af
    py : fix whitespace Georgi Gerganov 2024-01-17 18:37:36 +02:00
  • 4f4bf35f46
    py : fix missing added_tokens_dict for SPM and BPE vocabs (#4971) Georgi Gerganov 2024-01-17 15:45:03 +02:00
  • 2b3a665d39
    llama : use Q4_K for attn_v for Q2_K_S when n_gqa >= 4 (#4996) Kawrakow 2024-01-17 12:36:37 +02:00
  • 7563293665
    metal : remove unnecessary nil check (#4986) Paul Tsochantaris 2024-01-17 08:07:24 +00:00
  • f46c0c1b0e
    llama : fix copy/paste error in llama_sampling_params comment (#4994) David Renshaw 2024-01-17 02:17:50 -05:00
  • 5c99960901
    py : remove unnecessary hasattr (#4903) Georgi Gerganov 2024-01-16 20:59:31 +02:00
  • bee938da74
    nix: remove nixConfig from flake.nix (#4984) Philip Taron 2024-01-16 09:56:21 -08:00
  • cec8a48470
    finetune : add training data file to log message (#4979) Daniel Bevenius 2024-01-16 18:54:24 +01:00
  • 334a835a1c
    ggml : importance matrix support for legacy quants (#4969) Kawrakow 2024-01-16 19:51:26 +02:00
  • 4feb4b33ee
    examples : add complete parallel function calling example (#4974) Maximilian Winter 2024-01-16 18:41:42 +01:00
  • 959ef0c0df
    perplexity : fix kv cache handling for hellaswag (#4981) Georgi Gerganov 2024-01-16 19:34:54 +02:00
  • c37b3474e6
    flake.lock: update flake-parts, flake-parts/nixpkgs-lib, and nixpkgs (#4920) Georgi Gerganov 2024-01-16 19:13:54 +02:00
  • 158f8c9e21
    metal : localized logic in ggml_metal_graph_compute (#4924) Paul Tsochantaris 2024-01-16 17:05:19 +00:00
  • 862f5e41ab
    android : introduce starter project example (#4926) Neuman Vong 2024-01-17 00:47:34 +11:00
  • 3a48d558a6
    metal : replace loop of dispatch_async with dispatch_apply (#4934) Alex Azarov 2024-01-16 14:41:27 +01:00
  • 7c8d3abd1a
    metal : log recommendedMaxWorkingSetSize on iOS 16+ (#4936) Alex Azarov 2024-01-16 14:33:02 +01:00
  • 122ed4840c
    examples : fix and improv docs for the grammar generator (#4909) Maximilian Winter 2024-01-16 13:10:48 +01:00
  • a0b3ac8c48
    ggml : introduce GGML_CALL function annotation (#4850) Justine Tunney 2024-01-16 03:16:33 -08:00
  • d75c232e1d
    finetune : use LLAMA_FILE_MAGIC_GGLA (#4961) Daniel Bevenius 2024-01-16 12:14:19 +01:00
  • e0324285a5
    speculative : threading options (#4959) stduhpf 2024-01-16 12:04:32 +01:00
  • 3e5ca7931c
    pass cpu-architecture arguments only to host code (C;C++) (#4943) ngc92 2024-01-15 20:40:48 +02:00
  • 4483396751
    llama : apply classifier-free guidance to logits directly (#4951) David Friehs 2024-01-15 14:06:52 +01:00
  • d9aa4ffa6e
    awq-py : fix typo in awq-py/README.md (#4947) Victor Z. Peng 2024-01-15 04:41:46 -08:00
  • ddb008d845
    cuda : fix dequantize kernel names (#4938) Georgi Gerganov 2024-01-15 13:27:00 +02:00
  • 2faaef3979
    llama : check for 256 divisibility for IQ2_XS, IQ2_XXS (#4950) Kawrakow 2024-01-15 10:09:38 +02:00
  • 4a3156de2f
    CUDA: faster dequantize kernels for Q4_0 and Q4_1 (#4938) Kawrakow 2024-01-15 07:48:06 +02:00
  • a836c8f534
    llama : fix missing quotes (#4937) David Pflug 2024-01-14 10:46:00 -05:00
  • 467a882fd2
    Add ability to use importance matrix for all k-quants (#4930) Kawrakow 2024-01-14 16:21:12 +02:00
  • bb0c139247
    llama : check LLAMA_TRACE env for extra logging (#4929) Georgi Gerganov 2024-01-14 13:26:53 +02:00
  • 9408cfdad6
    scripts : sync-ggml-am.sh option to skip commits Georgi Gerganov 2024-01-14 11:08:09 +02:00
  • 03c5267490
    llama : use LLAMA_LOG_ macros for logging Georgi Gerganov 2024-01-14 11:03:19 +02:00
  • a128c38de8
    Fix ffn_down quantization mix for MoE models (#4927) Kawrakow 2024-01-14 10:53:39 +02:00
  • 5f5fe1bd60
    metal : correctly set SIMD support flags on iOS (#4923) Alex Azarov 2024-01-14 09:44:39 +01:00
  • ac32902a87
    llama : support WinXP build with MinGW 8.1.0 (#3419) Karthik Kumar Viswanathan 2024-01-14 00:41:44 -08:00
  • 147b17ac94
    2-bit quantizations (#4897) Kawrakow 2024-01-14 09:45:56 +02:00
  • 807179ec58
    Make Q3_K_S be the same as olf Q3_K_L for Mixtral-8x7B (#4906) Kawrakow 2024-01-14 09:44:30 +02:00
  • 76484fbfd3
    sync : ggml Georgi Gerganov 2024-01-14 00:14:46 +02:00
  • c71d608ce7
    ggml: cache sin/cos for RoPE (#4908) Johannes Gäßler 2024-01-13 21:41:37 +01:00
  • 4be5ef556d
    metal : remove old API (#4919) Georgi Gerganov 2024-01-13 20:45:45 +02:00
  • 0ea069b87b
    server : fix prompt caching with system prompt (#4914) Georgi Gerganov 2024-01-13 19:31:26 +02:00
  • f172de03f1
    llama : fix detokenization of non-special added-tokens (#4916) Georgi Gerganov 2024-01-13 18:47:38 +02:00
  • 2d57de5255
    metal : disable log for loaded kernels (#4794) Georgi Gerganov 2024-01-13 18:46:37 +02:00
  • df845cc982
    llama : minimize size used for state save/load (#4820) David Friehs 2024-01-13 17:29:43 +01:00
  • 6b48ed0893
    workflows: unbreak nix-build-aarch64, and split it out (#4915) Someone 2024-01-13 16:29:16 +00:00
  • 722d33f34e
    main : add parameter --no-display-prompt (#4541) Yann Follet 2024-01-14 00:09:08 +08:00
  • c30b1ef39a
    gguf : fix potential infinite for-loop (#4600) texmex76 2024-01-13 17:06:20 +01:00
  • b38b5e93ae
    metal : refactor kernel loading code (#4794) Georgi Gerganov 2024-01-13 18:03:45 +02:00
  • 7dc78764e2
    compare-llama-bench: tweak output format (#4910) Johannes Gäßler 2024-01-13 15:52:53 +01:00
  • 356327feb3
    server : fix deadlock that occurs in multi-prompt scenarios (#4905) Ziad Ben Hadj-Alouane 2024-01-13 09:20:46 -05:00
  • ee8243adaa
    server : fix crash with multimodal models without BOS token (#4904) makomk 2024-01-13 14:16:11 +00:00
  • 15ebe59210
    convert : update phi-2 to latest HF repo (#4903) Georgi Gerganov 2024-01-13 13:44:37 +02:00
  • de473f5f8e
    sync : ggml Georgi Gerganov 2024-01-12 22:02:43 +02:00
  • f238461236
    ggml : fix 32-bit ARM compat for IQ2_XS (whisper/1758) Georgi Gerganov 2024-01-12 14:02:30 +02:00
  • fa5c1fb44a
    backend_sched : fix assignments slaren 2024-01-12 20:38:34 +01:00
  • 52ee4540c0
    examples : add pydantic models to GBNF grammar generator (#4883) Maximilian Winter 2024-01-12 20:46:45 +01:00
  • 3fe81781e3
    CUDA: faster q8_0 -> f16 dequantization (#4895) Johannes Gäßler 2024-01-12 20:38:54 +01:00
  • e7e4df031b
    llama : ggml-backend integration (#4766) slaren 2024-01-12 20:07:38 +01:00