llama.cpp

History

Max Krasnyansky 053b1539c0 threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 ) * threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling We talked about adding LOW priority for GGML threads in the original threadpool PR. It might be useful for some cases to avoid contention. Latest Windows ARM64 releases started parking (offlining) the CPU cores more aggresively which results in suboptimal performance with n_threads > 4. To deal with that we now disable Power Throttling for our threads for the NORMAL and higher priorities. Co-authored-by: Diego Devesa <slarengh@gmail.com> * threading: disable SetThreadInfo() calls for older Windows versions * Update tools/llama-bench/llama-bench.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>		2025-05-31 15:39:19 -07:00
..
cmake	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
arg.cpp	threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 )	2025-05-31 15:39:19 -07:00
arg.h	common : add common_remote_get_content (#13123 )	2025-04-26 22:58:12 +02:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	build : link against build info instead of compiling against it (#3879 )	2023-11-02 08:50:16 +02:00
chat-parser.cpp	server: allow unclosed thinking tags (#13931 )	2025-05-31 08:26:10 -07:00
chat-parser.h	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
chat.cpp	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
chat.h	server: fix streaming crashes (#13786 )	2025-05-26 16:03:57 +01:00
CMakeLists.txt	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
common.cpp	threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 )	2025-05-31 15:39:19 -07:00
common.h	server: --offline mode (#13804 )	2025-05-26 22:34:27 +01:00
console.cpp	console : utf-8 fix for windows stdin (#9690 )	2024-09-30 11:23:42 +03:00
console.h	gguf : new file format with flexible meta data (beta) (#2398 )	2023-08-21 23:07:43 +03:00
json-partial.cpp	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
json-partial.h	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
json-schema-to-grammar.h	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
llguidance.cpp	llguidance : set tokenizer slices to default (#13424 )	2025-05-10 17:19:52 +02:00
log.cpp	Fix: Compile failure due to Microsoft STL breaking change (#11836 )	2025-02-12 21:36:11 +01:00
log.h	cleanup: fix compile warnings associated with gnu_printf (#11811 )	2025-02-12 10:06:53 -04:00
ngram-cache.cpp	ggml : portability fixes for VS 2017 (#12150 )	2025-03-04 18:53:26 +02:00
ngram-cache.h	llama : use LLAMA_TOKEN_NULL (#11062 )	2025-01-06 10:52:15 +02:00
regex-partial.cpp	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 )	2025-05-25 01:48:08 +01:00
sampling.h	sampling : support for llguidance grammars (#10224 )	2025-02-02 09:55:32 +02:00
speculative.cpp	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
speculative.h	speculative : update default params (#11954 )	2025-02-19 13:29:42 +02:00