llama.cpp/ggml
Radoslav Gerganov 553a5c3a9f
rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943)
RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.
2025-04-25 10:08:08 +03:00
..
cmake scripts : update sync + fix cmake merge 2025-03-27 10:09:29 +02:00
include rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943) 2025-04-25 10:08:08 +03:00
src rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943) 2025-04-25 10:08:08 +03:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (#12871) 2025-04-21 18:13:51 +02:00