rpc : update README for cache usage (#12620)

2025-03-28 09:44:13 +02:00 · 2025-03-28 09:44:13 +02:00 · ef03229ff4
commit ef03229ff4
parent 13731766db
1 changed files with 11 additions and 0 deletions
--- a/examples/rpc/README.md
+++ b/examples/rpc/README.md
@ -72,3 +72,14 @@ $ bin/llama-cli -m ../models/tinyllama-1b/ggml-model-f16.gguf -p "Hello, my name
 This way you can offload model layers to both local and remote devices.
 ### Local cache
 The RPC server can use a local cache to store large tensors and avoid transferring them over the network.
 This can speed up model loading significantly, especially when using large models.
 To enable the cache, use the `-c` option:
 ```bash
 $ bin/rpc-server -c
 ```
 By default, the cache is stored in the `$HOME/.cache/llama.cpp/rpc` directory and can be controlled via the `LLAMA_CACHE` environment variable.