llama.cpp/examples/server/tests/unit
Olivier Chafik c7f460ab88
server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607)
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B

* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template

* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out

* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability

* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
..
test_basic.py server : add flag to disable the web-ui (#10762) (#10751) 2024-12-10 18:22:34 +01:00
test_chat_completion.py tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos (#11616) 2025-02-03 23:49:27 +00:00
test_completion.py server : Fixed wrong function name in llamacpp server unit test (#11473) 2025-01-29 00:03:42 +01:00
test_ctx_shift.py server : replace behave with pytest (#10416) 2024-11-26 16:20:18 +01:00
test_embedding.py server : add support for "encoding_format": "base64" to the */embeddings endpoints (#10967) 2024-12-24 21:33:04 +01:00
test_infill.py server : fix extra BOS in infill endpoint (#11106) 2025-01-06 15:36:08 +02:00
test_lora.py server : allow using LoRA adapters per-request (#10994) 2025-01-02 15:05:18 +01:00
test_rerank.py server : fill usage info in embeddings and rerank responses (#10852) 2024-12-17 18:00:24 +02:00
test_security.py server : replace behave with pytest (#10416) 2024-11-26 16:20:18 +01:00
test_slot_save.py server : replace behave with pytest (#10416) 2024-11-26 16:20:18 +01:00
test_speculative.py server : allow using LoRA adapters per-request (#10994) 2025-01-02 15:05:18 +01:00
test_tokenize.py server : replace behave with pytest (#10416) 2024-11-26 16:20:18 +01:00
test_tool_call.py server: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none (#11607) 2025-02-13 10:05:16 +00:00