Commit graph

  • 79da24b58c
    readme : update hot topics Georgi Gerganov 2023-08-23 23:41:16 +03:00
  • cf658adc83
    llm : add Falcon support (#2717) Georgi Gerganov 2023-08-23 23:08:04 +03:00
  • a192860cfe
    minor : fix trailing whitespace Georgi Gerganov 2023-08-23 22:37:39 +03:00
  • 95385241a9
    examples : restore the functionality to import llama2.c models (#2685) Olivier Chafik 2023-08-23 20:33:05 +01:00
  • 335acd2ffd
    fix convert-lora-to-ggml.py (#2738) slaren 2023-08-23 16:46:54 +02:00
  • 5290c38e6e
    main : insert bos if no tokens (#2727) klosax 2023-08-23 16:46:03 +02:00
  • cc34dbda96
    gitignore : fix for windows (#2729) akawrykow 2023-08-23 07:31:34 -07:00
  • 7c2227a197
    chmod : make scripts executable (#2675) Cebtenzzre 2023-08-23 10:29:09 -04:00
  • f19dca04ea
    devops : RPM Specs (#2723) JohnnyB 2023-08-23 15:28:22 +01:00
  • 8207214b6a
    Fix values shown in the quantize tool help (#2735) Kawrakow 2023-08-23 12:57:12 +03:00
  • 62959e740e
    Strided perplexity (#2714) Kawrakow 2023-08-23 12:56:42 +03:00
  • 7f7ddd5002
    Fix ggml to gguf conversion on Windows (#2733) IgnacioFDM 2023-08-23 06:31:09 -03:00
  • b8ad1b66b2
    server : allow json array in prompt or content for direct token input (#2306) Xiao-Yong Jin 2023-08-23 02:12:12 -05:00
  • f5fe98d11b
    docs : add grammar docs (#2701) Evan Jones 2023-08-22 21:01:57 -04:00
  • 777f42ba18
    Improve handling of special tokens in GGML to GGUF converter (#2725) Kerfuffle 2023-08-22 17:39:39 -06:00
  • 46ef5b5fcf
    llama : fix whitespace escaping in tokenizer (#2724) goerch 2023-08-22 23:10:42 +02:00
  • c63bb1d16a
    CUDA: use mul_mat_q kernels by default (#2683) Johannes Gäßler 2023-08-22 22:47:05 +02:00
  • 3b6cfe7c92
    convert.py : clarifying error message (#2718) Alex Petenchea 2023-08-22 21:58:16 +03:00
  • 800c9635b4
    Fix CUDA softmax by subtracting max value before exp (#2665) Jiahao Li 2023-08-23 02:27:06 +08:00
  • deb7dfca4b
    gguf : add ftype meta info to the model (#2710) Georgi Gerganov 2023-08-22 20:05:59 +03:00
  • bac66994cf
    Quantization imrovements for k_quants (#2707) Kawrakow 2023-08-22 19:14:09 +03:00
  • 519c981f8b
    embedding : evaluate prompt in batches (#2713) slaren 2023-08-22 16:03:12 +02:00
  • 1123f7fbdf
    ggml-cuda : use graph allocator (#2684) slaren 2023-08-22 15:25:19 +02:00
  • ef3f333d37
    ggml : sync latest (SAM + SD operators, CUDA alibi) (#2709) Georgi Gerganov 2023-08-22 14:22:08 +03:00
  • 8e4364f2af
    llama-bench : minor fixes (#2695) slaren 2023-08-22 09:56:03 +02:00
  • 1e3bc523d8
    ggml : support CUDA's half type for aarch64(#1455) (#2670) Kylin 2023-08-22 15:14:23 +08:00
  • 14b1d7e6f7
    metal : add missing barriers for mul-mat (#2699) Shouzheng Liu 2023-08-22 02:18:40 -04:00
  • 226255b44e
    server : fallback to default if client param is null (#2688) Jhen-Jie Hong 2023-08-22 08:32:00 +08:00
  • 930523c8e1
    Fix convert-llama-ggmlv3-to-gguf.py vocab conversion (#2698) Kerfuffle 2023-08-21 18:01:34 -06:00
  • c8dba409e6
    py : remove obsolete script Georgi Gerganov 2023-08-21 23:40:22 +03:00
  • 6381d4e110
    gguf : new file format with flexible meta data (beta) (#2398) Georgi Gerganov 2023-08-21 23:07:43 +03:00
  • dadbed99e6
    metal : fix synchronization in new matrix multiplication kernel (#2686) Shouzheng Liu 2023-08-21 06:59:29 -04:00
  • cb1c0727bd
    HellaSwag: split token evaluation into batches if needed (#2681) Kawrakow 2023-08-21 11:11:31 +03:00
  • 9e232f0234
    ggml : move all type info to ggml_type_traits (#2663) slaren 2023-08-20 22:17:53 +02:00
  • 5e9ff54a67
    More efficient Hellaswag implementation (#2677) Kawrakow 2023-08-20 16:44:46 +03:00
  • 1f0bccb279
    server : better default prompt (#2646) Georgi Gerganov 2023-08-19 00:45:36 +03:00
  • f63564adfa
    server : update xxd usage for older versions compatibility (#2649) Jhen-Jie Hong 2023-08-19 05:41:32 +08:00
  • 2d8b76a110
    Add link to clojure bindings to Readme. (#2659) Adrian 2023-08-18 12:39:22 -07:00
  • 7af633aec3
    readme : incoming BREAKING CHANGE Georgi Gerganov 2023-08-18 17:48:31 +03:00
  • 097e121e2f
    llama : add benchmark example (#2626) slaren 2023-08-18 12:44:58 +02:00
  • eaf98c2649
    readme : add link to Rust bindings (#2656) mdrokz 2023-08-18 15:47:58 +05:30
  • e9b12c332e
    perplexity : more meaningful ETA number - 2 decimal points Georgi Gerganov 2023-08-18 12:48:55 +03:00
  • 604b8bdfa6
    Fix unicode in grammars (fixes #2501) (#2553) Evan Jones 2023-08-17 19:54:44 -04:00
  • 10151bee2e
    server : support for saving templates in browser LocalStorage (#2486) staviq 2023-08-17 23:34:01 +00:00
  • 0992a7b8b1
    README: fix LLAMA_CUDA_MMV_Y documentation (#2647) Johannes Gäßler 2023-08-17 23:57:59 +02:00
  • 6ddeefad9b
    [Zig] Fixing Zig build and improvements (#2554) Henri Vasserman 2023-08-17 23:11:18 +03:00
  • 8dae7ce684
    Add --cfg-negative-prompt-file option for examples (#2591) Kerfuffle 2023-08-17 07:29:44 -06:00
  • a73ccf1aa3
    llama : replace (permute + reshape + view_1d) with (view_3d) (#2538) Georgi Gerganov 2023-08-17 10:47:09 +03:00
  • 7cf54e1f74
    tests : adds simple llama grammar tests (#2618) drbh 2023-08-17 03:41:01 -04:00
  • a872a2b28e
    ggml-alloc : fix discrepency between measure&eval (#2639) Shouzheng Liu 2023-08-17 03:35:53 -04:00
  • 0919a0f73d
    cmake : install ggml-meta.metal if LLAMA_METAL (#2449) Kolen Cheung 2023-08-16 21:09:49 +01:00
  • ed53db86c3
    metal : print error of load pipeline state (#2564) Jhen-Jie Hong 2023-08-17 04:09:03 +08:00
  • fc8ef549e5
    metal : enable ggml-alloc (#2627) Shouzheng Liu 2023-08-16 16:08:28 -04:00
  • bf83bff674
    metal : matrix-matrix multiplication kernel (#2615) Shouzheng Liu 2023-08-16 16:07:04 -04:00
  • b5ffb2849d
    scripts : add helper script to get wikitext Georgi Gerganov 2023-08-15 10:04:58 +03:00
  • 3ebb00935f
    server : add missing /json-schema-to-grammar.mjs (#2616) Jhen-Jie Hong 2023-08-15 06:14:14 +08:00
  • d783f7982e
    metal : return null instead of exit(1) (#2573) Jhen-Jie Hong 2023-08-14 21:37:39 +08:00
  • d75561df20
    server : add --numa support (#2524) Cheng Shao 2023-08-14 15:36:42 +02:00
  • 348acf188c
    llama : add missing enum keyword in function signatures (#2610) Kamil Tomšík 2023-08-14 15:35:16 +02:00
  • 1cd06fa25e
    CUDA: launch_bounds, small q4_K, q5_K mmq refactor (#2596) Johannes Gäßler 2023-08-14 10:41:22 +02:00
  • 2feb8934eb
    server : fix default grammar by use empty string in the UI (#2604) Jhen-Jie Hong 2023-08-14 16:20:17 +08:00
  • 5517d6e692
    server : implement json-schema-to-grammar.mjs & add grammar param in the UI (#2588) Jhen-Jie Hong 2023-08-14 15:16:54 +08:00
  • f31b539714
    Enhance Windows 7 and below compatibility. (#2592) vxiiduu 2023-08-14 13:59:16 +10:00
  • ee77efea2a
    test : add simple grammar parsing tests (#2594) drbh 2023-08-13 10:00:48 -04:00
  • f64d44a9b9
    CUDA: Fixed OpenLLaMA 3b mmq, reduced compile time (#2590) Johannes Gäßler 2023-08-13 00:24:45 +02:00
  • b19edd54d5
    Adding support for llama2.c models (#2559) byte-6174 2023-08-11 19:17:25 -04:00
  • 53dc399472
    server: fixed wrong variable name in timing json (#2579) Equim 2023-08-12 06:35:14 +08:00
  • 9ca4abed89
    Handle ENABLE_VIRTUAL_TERMINAL_PROCESSING more gracefully on earlier versions of Windows. DannyDaemonic 2023-08-10 13:11:36 -07:00
  • e59fcb2bc1
    Add --n-predict -2 for stopping generation on full context (#2565) Christian Demsar 2023-08-10 10:28:27 -04:00
  • 1638757767
    Fix grammar-based sampling issue in server (#2566) Martin Krasser 2023-08-10 12:16:38 +02:00
  • 916a9acdd0
    ggml-alloc: Don't try to re-use buffers of external tensors (#2562) Sam Spilsbury 2023-08-09 23:47:42 +03:00
  • ea04a4ca19
    add log_callback to llama_context_params for custom logging. (#2234) grahameth 2023-08-09 22:46:40 +02:00
  • 25d43e0eb5
    CUDA: tuned mul_mat_q kernels (#2546) Johannes Gäßler 2023-08-09 09:42:34 +02:00
  • f5bfea0580
    Allow passing grammar to completion endpoint (#2532) Martin Krasser 2023-08-08 15:29:19 +02:00
  • acfc5478ff
    CUDA: tighter VRAM scratch size for 65b/70b (#2551) Johannes Gäßler 2023-08-08 14:38:16 +02:00
  • 7ed8d1fe7f
    llm.vim : multiline autocompletion, get rid of "^@" (#2543) chaihahaha 2023-08-08 20:07:02 +08:00
  • e7f94d6fdc
    vim : bring back simple llm.vim example Georgi Gerganov 2023-08-08 15:05:30 +03:00
  • 2d7baaf50f
    vim : streaming and more (#2495) AustinMroz 2023-08-08 06:44:48 -05:00
  • f3c3b4b167
    Add --rope-scale parameter (#2544) klosax 2023-08-07 19:07:19 +02:00
  • 93356bdb7a
    ggml : mul mat tweaks (#2372) Georgi Gerganov 2023-08-07 14:25:58 +03:00
  • 60baff7c85
    ggml : pad result of ggml_nbytes() Georgi Gerganov 2023-08-07 14:24:42 +03:00
  • 9082b5dfbf
    ggml : change params pointer (style change) (#2539) Georgi Gerganov 2023-08-07 13:55:18 +03:00
  • 99d29c0094
    ggml : sync (custom ops) (#2537) Georgi Gerganov 2023-08-07 13:20:09 +03:00
  • 3d9a551816
    Fixed mmap prefetch for GPU offloading (#2529) Johannes Gäßler 2023-08-07 10:09:40 +02:00
  • f6f9896ac3
    metal : fix out-of-bounds access + inc concurrency nodes (#2416) Georgi Gerganov 2023-08-07 10:52:57 +03:00
  • 34a14b28ff
    [Makefile] Move ARM CFLAGS before compilation (#2536) GiviMAD 2023-08-06 23:21:46 -07:00
  • 7297128db8
    [Zig] Rewrite build for Zig 0.11 (#2514) Henri Vasserman 2023-08-07 08:35:53 +03:00
  • 86c3219895
    console : fix issue related to Windows 11 PowerShell console mode persistence (#2521) DannyDaemonic 2023-08-05 23:49:34 -07:00
  • 2e8265ae17
    convert.py : add missing abstract methods for quantized data (#2491) Keiichi Tabata 2023-08-06 15:34:05 +09:00
  • f514d1b306
    CUDA: faster k-quant mul_mat_q kernels (#2525) Johannes Gäßler 2023-08-05 18:20:44 +02:00
  • 332311234a
    fix firefox autoscroll (#2519) Jonas Wunderlich 2023-08-04 20:16:11 +00:00
  • 182af739c4
    server: regenerate completion.js.hpp (#2515) Cebtenzzre 2023-08-04 15:00:57 -04:00
  • 4329d1acb0
    CUDA: use min compute capability of GPUs actually used (#2506) Cebtenzzre 2023-08-04 11:35:22 -04:00
  • 02f9d96a86
    CUDA: check if event is NULL before cudaStreamWaitEvent (#2505) Cebtenzzre 2023-08-04 11:34:32 -04:00
  • 3498588e0f
    Add --simple-io option for subprocesses and break out console.h and cpp (#1558) DannyDaemonic 2023-08-04 08:20:12 -07:00
  • 5f631c2679
    Fixing race condition in server and partial stream handling in frontend. (#2391) Stephen Nichols 2023-08-04 06:37:24 -05:00
  • 415e99fec2
    Stream save llama context data to file instead of allocating entire buffer upfront (#2488) l3utterfly 2023-08-04 19:29:52 +08:00
  • ff966e7ca6
    build : fix several cast and printf warnings (#2499) Borislav Stanimirov 2023-08-04 13:07:21 +03:00
  • 8183159cf3
    examples : generate JSON according to schema (#1887) Evan Jones 2023-08-02 22:05:44 -04:00
  • 468ea24fb4
    CUDA: faster non k-quant mul_mat_q kernels (#2483) Johannes Gäßler 2023-08-02 18:04:04 +02:00