llama.cpp

History

Ivy233 02082f1519 clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 ) * [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.		2025-03-26 15:06:04 +01:00
..
batched	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
batched-bench	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
batched.swift	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
convert-llama2c-to-ggml	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
cvector-generator	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
deprecation-warning	Update deprecation-warning.cpp (#10619 )	2024-12-04 23:19:20 +01:00
embedding	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
eval-callback	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
export-lora	common : refactor '-o' option (#12278 )	2025-03-10 13:34:13 +02:00
gbnf-validator	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 )	2025-01-30 19:13:58 +00:00
gen-docs	ggml : move AMX to the CPU backend (#10570 )	2024-11-29 21:54:58 +01:00
gguf	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
gguf-hash	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
gguf-split	ci : use -no-cnv in gguf-split tests (#11254 )	2025-01-15 18:28:35 +02:00
gritlm	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
imatrix	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
infill	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
jeopardy	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
llama-bench	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama.android	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama.swiftui	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llava	clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 )	2025-03-26 15:06:04 +01:00
lookahead	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
lookup	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
main	docs : bring llama-cli conversation/template docs up-to-date (#12426 )	2025-03-17 21:14:32 +01:00
parallel	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
passkey	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
perplexity	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
quantize	ggml : portability fixes for VS 2017 (#12150 )	2025-03-04 18:53:26 +02:00
quantize-stats	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
retrieval	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
rpc	rpc-server : add support for the SYCL backend (#10934 )	2024-12-23 10:39:30 +02:00
run	run: de-duplicate fmt and format functions and optimize (#11596 )	2025-03-25 18:46:11 +01:00
save-load-state	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
server	server : Add verbose output to OAI compatible chat endpoint. (#12246 )	2025-03-23 19:30:26 +01:00
simple	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
simple-chat	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
simple-cmake-pkg	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
speculative	speculative : fix seg fault in certain cases (#12454 )	2025-03-18 19:35:11 +02:00
speculative-simple	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
sycl	[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035 )	2025-02-24 22:33:23 +08:00
tokenize	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
tts	llama-tts : avoid crashes related to bad model file paths (#12482 )	2025-03-21 11:12:45 +02:00
chat-13B.bat	Create chat-13B.bat (#592 )	2023-03-29 20:21:09 +03:00
chat-13B.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
chat-persistent.sh	scripts : fix pattern and get n_tokens in one go (#10221 )	2024-11-09 09:06:54 +02:00
chat-vicuna.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
chat.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
CMakeLists.txt	tts : add OuteTTS support (#10784 )	2024-12-18 19:27:21 +02:00
convert_legacy_llama.py	metadata: Detailed Dataset Authorship Metadata (#8875 )	2024-11-13 21:10:38 +11:00
json_schema_pydantic_example.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )	2025-03-05 13:05:13 +00:00
llama.vim	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
llm.vim	llm.vim : stop generation at multiple linebreaks, bind to <F2> (#2879 )	2023-08-30 09:50:55 +03:00
Miku.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
pydantic_models_to_grammar.py	pydantic : replace uses of __annotations__ with get_type_hints (#8474 )	2024-07-14 19:51:21 -04:00
pydantic_models_to_grammar_examples.py	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00
reason-act.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
regex_to_grammar.py	py : switch to snake_case (#8305 )	2024-07-05 07:53:33 +03:00
server-llama2-13B.sh	`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )	2024-06-13 00:41:52 +01:00
server_embd.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
ts-type-to-grammar.sh	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00