Diego Devesa
|
1d36b3670b
|
llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
|
2025-05-02 20:27:13 +02:00 |
|
Xuan Son Nguyen
|
0da5d86026
|
server : allow using LoRA adapters per-request (#10994)
* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2025-01-02 15:05:18 +01:00 |
|
Georgi Gerganov
|
1da7b76569
|
server : fix speculative decoding with context shift (#10641)
* server : fix speculative decoding with context shift
ggml-ci
* server : take into account speculative limits
ggml-ci
* server : add tests
|
2024-12-04 22:38:20 +02:00 |
|
Xuan Son Nguyen
|
b782e5c7d4
|
server : add more test cases (#10569)
* server : add split model test
* add test speculative
* add invalid cases
|
2024-11-29 21:48:56 +01:00 |
|