Commit graph

  • 9e4e077ec5
    ci: server: fix python installation (#6922) Pierrick Hymbert 2024-04-26 11:11:51 +02:00
  • 83b72cb086
    Merge pull request from GHSA-p5mv-gjc5-mwqv Georgi Gerganov 2024-04-26 10:41:53 +03:00
  • d4a9afc100
    ci: server: fix python installation (#6918) Pierrick Hymbert 2024-04-26 09:27:49 +02:00
  • 7d641c26ac
    ci: fix concurrency for pull_request_target (#6917) Pierrick Hymbert 2024-04-26 09:26:59 +02:00
  • 5790c8dac1
    bench: server add stop word for PHI-2 (#6916) Pierrick Hymbert 2024-04-26 09:26:16 +02:00
  • 46e12c4692
    llava : add support for moondream vision language model (#6899) vik 2024-04-25 12:38:31 -07:00
  • dba497e0c1
    cmake : restore LLAMA_LLAMAFILE_DEFAULT Georgi Gerganov 2024-04-25 21:31:17 +03:00
  • fa0b4ad252
    cmake : remove obsolete ANDROID check Georgi Gerganov 2024-04-25 18:59:51 +03:00
  • d6e1d44f16
    llama : synchronize before get/set session data (#6911) slaren 2024-04-25 17:59:03 +02:00
  • 853d06ffe2
    ci : tmp disable slow tests Georgi Gerganov 2024-04-25 17:06:27 +03:00
  • 3fe0596c18
    readme : update model list (#6908) BarfingLemurs 2024-04-25 09:52:28 -04:00
  • 0ead1f1072
    llama : check that all the tensor data is in the model file (#6885) slaren 2024-04-25 15:23:47 +02:00
  • 51543729ff
    ggml : fix redefinition of vaddvq_f32 for 32-bit ARM (#6906) Georgi Gerganov 2024-04-25 15:48:25 +03:00
  • 4ab99d8d47
    clip : rename lerp function to avoid conflict (#6894) Daniel Bevenius 2024-04-25 14:38:14 +02:00
  • 54770413c4
    ggml : fix MIN / MAX macros (#6904) Georgi Gerganov 2024-04-25 15:12:28 +03:00
  • aa750c1ede
    tests : minor bash stuff (#6902) Georgi Gerganov 2024-04-25 14:27:20 +03:00
  • 1966eb2615
    quantize : add '--keep-split' to quantize model into shards (#6688) jiez 2024-04-25 18:29:35 +08:00
  • 784e11dea1
    README: add graphic for matrix multiplication (#6881) Johannes Gäßler 2024-04-24 21:29:13 +02:00
  • b4e4b8a935
    llama : add llama_get_pooling_type function (#6862) Douglas Hanley 2024-04-24 08:10:07 -05:00
  • 3fe847b574
    server : do not apply Markdown formatting in code sections (#6850) mgroeber9110 2024-04-24 12:54:24 +02:00
  • 37246b1031
    common : revert showing control tokens by default for server (#6860) Kyle Mistele 2024-04-24 05:15:29 -05:00
  • 28103f4832
    Server: fix seed for multiple slots (#6835) Johannes Gäßler 2024-04-24 11:08:36 +02:00
  • c0d1b3e03e
    ggml : move 32-bit arm compat in ggml-impl.h (#6865) Georgi Gerganov 2024-04-24 12:00:07 +03:00
  • abd3314064
    llama : add phi 3 chat template (#6857) Tristan Druyen 2024-04-24 10:52:37 +02:00
  • 3fec68be4e
    convert : add support of codeqwen due to tokenizer (#6707) Junyang Lin 2024-04-24 15:16:21 +08:00
  • c8297c6af5
    llama : add phi3 support (#6852) liuwei-git 2024-04-24 15:00:37 +08:00
  • 4e96a812b3
    [SYCL] Windows default build instructions without -DLLAMA_SYCL_F16 flag activated (#6767) Anas Ahouzi 2024-04-23 02:53:18 +02:00
  • 192090bae4
    llamafile : improve sgemm.cpp (#6796) Justine Tunney 2024-04-22 15:00:36 -04:00
  • e931888d50
    ggml : fix calloc argument ordering. (#6820) Dave Airlie 2024-04-23 00:05:06 +10:00
  • 8960fe86ae
    llama : fix typo in <|im_end|> token text (#6745) Georgi Gerganov 2024-04-22 15:41:11 +03:00
  • c0956b09ba
    ci: fix job are cancelling each other (#6781) Pierrick Hymbert 2024-04-22 13:22:54 +02:00
  • e9b4a1bf68 flake.lock: Update github-actions[bot] 2024-04-21 00:17:47 +00:00
  • 5cf5e7d490
    build: generate hex dump of server assets during build (#6661) Olivier Chafik 2024-04-21 18:48:53 +01:00
  • 40f74e4d73
    llama : add option to render special/control tokens (#6807) Georgi Gerganov 2024-04-21 18:36:45 +03:00
  • b9cc76d87e
    ggml : fix ggml_backend_cpu_supports_op() for CPY (#0) Georgi Gerganov 2024-04-21 16:47:57 +03:00
  • 7dbdba5690
    llama : add llama-3 chat template (#6751) Wouter 2024-04-21 15:03:39 +02:00
  • c1386c936e
    gguf-py : add IQ1_M to GGML_QUANT_SIZES (#6761) pmysl 2024-04-21 14:49:30 +02:00
  • e8d35f47cb
    doc : add link to falcon (#6789) Jan Boon 2024-04-21 20:35:40 +08:00
  • 2cca09d509
    readme : add Fedora instructions (#6783) Mohammadreza Hendiani 2024-04-21 16:02:05 +03:30
  • 89b0bf0d5d
    llava : use logger in llava-cli (#6797) Justine Tunney 2024-04-21 08:19:04 -04:00
  • b97bc3966e
    llama : support Llama 3 HF conversion (#6745) Pedro Cuenca 2024-04-21 13:50:41 +02:00
  • b8109bc013
    doc : server tests require llama to be built with curl enabled (#6788) Jan Boon 2024-04-21 00:29:50 +08:00
  • aed82f6837
    common : try to fix Android CI (#6780) Georgi Gerganov 2024-04-20 13:27:12 +03:00
  • 0e4802b2ec
    ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748) loonerin 2024-04-19 13:03:35 -04:00
  • 637e9a86c2
    server: static: upstream upgrade (#6765) Pierrick Hymbert 2024-04-19 13:19:01 +02:00
  • 9958c81b79
    Implement the OLMo architecture (#6741) nopperl 2024-04-19 09:35:54 +00:00
  • 8b1b1f4982
    train : add general name (#6752) Austin 2024-04-19 03:16:45 -04:00
  • bca40e9814
    fix wrong parameter in cmd in readme-sycl.md (#6755) Neo Zhang 2024-04-19 09:16:31 +08:00
  • 0d56246f4b
    ggml : group all experts in a single ggml_mul_mat_id (#6505) slaren 2024-04-18 15:18:48 +02:00
  • 03c0946d73
    convert : support models with multiple chat templates (#6588) Sigbjørn Skjæret 2024-04-18 13:49:01 +02:00
  • e11b2e6e1e
    Qwen2 : assume tied weights if lm_head/output weights is missing (#6738) Ren Xuancheng 2024-04-18 19:38:04 +08:00
  • c71bfd736e
    llama : fix compatibility with old 2 expert models (#6735) slaren 2024-04-18 09:04:47 +02:00
  • 3b8f1ec4b1
    llamafile : tmp disable + build sgemm.o when needed (#6716) Georgi Gerganov 2024-04-17 23:58:26 +03:00
  • 8dd1ec8b3f
    readme : add UI (#6724) Yaroslav 2024-04-17 14:47:50 +02:00
  • facb8b56f8
    convert : fix autoawq gemma (#6704) Zheng.Deng 2024-04-17 04:51:07 +08:00
  • 532c1737a1
    llama : make general.name optional (#6709) Georgi Gerganov 2024-04-16 23:50:38 +03:00
  • 666867b799
    ggml : fix llamafile sgemm wdata offsets (#6710) Georgi Gerganov 2024-04-16 23:50:22 +03:00
  • 8cc91dc63c
    ggml : add llamafile sgemm (#6414) Justine Tunney 2024-04-16 14:55:30 -04:00
  • dbceec87c0
    llama : add StableLM2 12B (#6635) Ashish 2024-04-16 08:48:35 -07:00
  • f4dea7da18
    llama : add qwen2moe (#6074) Shijie 2024-04-16 23:40:48 +08:00
  • 8a56075b07
    gritlm : add --outdir option to hf.sh script (#6699) Daniel Bevenius 2024-04-16 08:34:06 +02:00
  • 58227ffdeb
    perplexity : require positive --ctx-size arg (#6695) Georgi Gerganov 2024-04-16 09:28:33 +03:00
  • 4fbd8098e6
    gguf : add special tokens metadata for FIM/Infill (#6689) Daniel Bevenius 2024-04-16 08:13:13 +02:00
  • 7593639ce3
    main: add --json-schema / -j flag (#6659) Olivier Chafik 2024-04-15 18:35:21 +01:00
  • 132f55795e
    llama : fix restoring the number of outputs from state files (#6687) compilade 2024-04-15 08:56:55 -04:00
  • 3272896d79
    server : revert "minor layout improvements" (#6684) Pierrick Hymbert 2024-04-15 14:18:47 +02:00
  • 7fc16a2c32
    swift : linux support (#6590) Steven Prichard 2024-04-15 05:14:46 -05:00
  • 17e98d4c96
    fix mul_mat_id() for new input, make the ut pass (#6682) Neo Zhang Jianyu 2024-04-15 17:12:26 +08:00
  • 1958f7e06c
    llama : add missing kv clear in llama_beam_search (#6664) David Renshaw 2024-04-14 15:24:15 -04:00
  • 04fbc5f23e
    Add Command R chat template (#6650) Chao Jiang 2024-04-15 00:16:34 +08:00
  • f184dd9208
    flake.lock: Update (#6669) Georgi Gerganov 2024-04-14 16:55:30 +03:00
  • 422c2aff1c
    Added support for GGML_OP_CLAMP in Metal (#6662) Dave 2024-04-14 07:14:19 -04:00
  • 8800226d65
    Fix --split-max-size (#6655) Sigbjørn Skjæret 2024-04-14 13:12:59 +02:00
  • e689fc4e91
    [bug fix] convert github repository_owner to lowercase (#6673) Jaemin Son 2024-04-14 20:12:36 +09:00
  • a4ec34e1cd
    convert : enable the --use-temp-file cli flag (#6645) James A Capozzoli 2024-04-14 04:40:18 -04:00
  • de17e3f745
    fix memcpy() crash, add missed cmd in guide, fix softmax (#6622) Neo Zhang Jianyu 2024-04-14 10:42:29 +08:00
  • b5e7285baf
    CUDA: fix matrix multiplication logic for tests (#6667) Johannes Gäßler 2024-04-14 00:21:55 +02:00
  • 4bd0f93e4a
    model: support arch DbrxForCausalLM (#6515) Pierrick Hymbert 2024-04-13 11:33:52 +02:00
  • ab9a3240a9
    JSON schema conversion: ️ faster repetitions, min/maxLength for strings, cap number length (#6555) Olivier Chafik 2024-04-12 19:43:38 +01:00
  • fbbc030ba9
    metal : unify mul_mv_id kernels (#6556) slaren 2024-04-12 18:13:20 +02:00
  • 4cc120c744
    infill : add download instructions for model (#6626) Daniel Bevenius 2024-04-12 14:11:46 +02:00
  • 24ee66ed0d
    server : coherent log output for KV cache full (#6637) Pierrick Hymbert 2024-04-12 13:49:21 +02:00
  • 91c736015b
    llama : add gguf_remove_key + remove split meta during quantize (#6591) jiez 2024-04-12 18:45:06 +08:00
  • 5c4d767ac0
    chore: Fix markdown warnings (#6625) Rene Leonhardt 2024-04-12 10:52:36 +02:00
  • ef21ce4ccb
    imatrix : remove invalid assert (#6632) Georgi Gerganov 2024-04-12 11:49:58 +03:00
  • dee7f8d692
    Correct free memory and total memory. (#6630) MasterYi1024 2024-04-12 16:28:12 +08:00
  • 81da18e71c
    eval-callback: use ggml_op_desc to pretty print unary operator name (#6631) Pierrick Hymbert 2024-04-12 10:26:47 +02:00
  • 9ed2737acc
    ci : disable Metal for macOS-latest-cmake-x64 (#6628) Georgi Gerganov 2024-04-12 11:15:05 +03:00
  • 04a5ac211e
    Optimization: eliminate addition of redundant stacks when advancing grammar. (#6616) Clint Herron 2024-04-11 21:44:50 -04:00
  • f7001ccc5a
    As suggested by @slaren, disabling Metal for test to fix CI build on OSX from #6576 (#6619) Clint Herron 2024-04-11 17:44:48 -04:00
  • a474f50ebb
    Refactor Error Handling for CUDA (#6575) Nikolas 2024-04-11 21:56:29 +02:00
  • cbaadc9294
    grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609) Olivier Chafik 2024-04-11 19:47:34 +01:00
  • 1bbdaf6ecd
    ci: download artifacts to release directory (#6612) Hugo Roussel 2024-04-11 19:52:21 +02:00
  • f4183afe6a
    scripts : add --outdir option to hf.sh (#6600) Daniel Bevenius 2024-04-11 15:22:47 +02:00
  • b804b1ef77
    eval-callback: Example how to use eval callback for debugging (#6576) Pierrick Hymbert 2024-04-11 14:51:07 +02:00
  • 8228b66dbc
    gguf : add option to not check tensor data (#6582) Daniel Bevenius 2024-04-10 20:16:48 +02:00
  • b3a96f27f0
    minor layout improvements (#6572) Ralph Soika 2024-04-10 19:18:25 +02:00
  • 4f407a0a35
    llama : add model types for mixtral (#6589) slaren 2024-04-10 17:24:14 +02:00
  • 65c64dc36f
    convert.py : add consolidated.safetensors for mixtral 8x22b (#6587) slaren 2024-04-10 15:23:12 +02:00
  • 67fac4b95f
    docs : how to add a model (#6565) Pierrick Hymbert 2024-04-10 08:58:48 +02:00