llama.cpp

History

Max Krasnyansky 053b1539c0 threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 ) * threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling We talked about adding LOW priority for GGML threads in the original threadpool PR. It might be useful for some cases to avoid contention. Latest Windows ARM64 releases started parking (offlining) the CPU cores more aggresively which results in suboptimal performance with n_threads > 4. To deal with that we now disable Power Throttling for our threads for the NORMAL and higher priorities. Co-authored-by: Diego Devesa <slarengh@gmail.com> * threading: disable SetThreadInfo() calls for older Windows versions * Update tools/llama-bench/llama-bench.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>		2025-05-31 15:39:19 -07:00
..
ggml-blas	ggml : add support for dynamic loading of backends (#10469 )	2024-11-25 15:13:39 +01:00
ggml-cann	CANN: Add SOC TYPE printing in cmake configuration (#13837 )	2025-05-28 11:54:20 +08:00
ggml-cpu	threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995 )	2025-05-31 15:39:19 -07:00
ggml-cuda	CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856 ) (#13895 )	2025-05-31 08:48:04 +02:00
ggml-hip	CUDA/HIP: Share the same unified memory allocation logic. (#12934 )	2025-04-15 11:20:38 +02:00
ggml-kompute	llama : add Qwen2VL support + multimodal RoPE (#10361 )	2024-12-14 14:43:46 +02:00
ggml-metal	ggml : add ggml_gelu_erf() (#13667 )	2025-05-21 16:26:33 +02:00
ggml-musa	musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647 )	2025-05-21 09:58:49 +08:00
ggml-opencl	opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, `group_norm` (#13787 )	2025-05-27 12:56:08 -07:00
ggml-rpc	rpc : add rpc_msg_set_tensor_hash_req (#13353 )	2025-05-09 10:31:07 +03:00
ggml-sycl	SYCL: Add mrope kernel (#13755 )	2025-05-30 19:40:57 +05:30
ggml-vulkan	vulkan: use timestamp queries for GGML_VULKAN_PERF (#13817 )	2025-05-27 18:39:07 +02:00
CMakeLists.txt	cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890 )	2025-05-30 01:28:54 +02:00
ggml-alloc.c	ggml: Don't assert fail when tensor data changes (#13222 )	2025-05-01 22:46:10 +02:00
ggml-backend-impl.h	ggml : upgrade init_tensor API to return a ggml_status (#11854 )	2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp	ggml-backend : fix backend search path (#12330 )	2025-03-11 14:25:17 +01:00
ggml-backend.cpp	sched : avoid changing cur_copy when a graph is already allocated (#13922 )	2025-05-30 18:56:19 +02:00
ggml-common.h	musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (#12611 )	2025-03-30 10:59:38 +02:00
ggml-impl.h	ggml : riscv: add xtheadvector support (#13720 )	2025-05-27 16:21:36 +03:00
ggml-opt.cpp	mnist: fix segmentation fault (ggml/1227)	2025-05-19 13:29:56 +03:00
ggml-quants.c	whisper: remove MSVC warnings pragmas (whisper/3090)	2025-05-07 17:28:36 +03:00
ggml-quants.h	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )	2024-12-12 19:02:49 +01:00
ggml.c	ggml : add ggml_repeat_4d (#13824 )	2025-05-27 15:53:55 +02:00
gguf.cpp	gguf : use ggml log system (#13571 )	2025-05-15 19:13:11 +02:00