* server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere |
||
|---|---|---|
| .. | ||
| steps | ||
| ctx_shift.feature | ||
| embeddings.feature | ||
| environment.py | ||
| infill.feature | ||
| issues.feature | ||
| lora.feature | ||
| parallel.feature | ||
| passkey.feature | ||
| rerank.feature | ||
| results.feature | ||
| security.feature | ||
| server.feature | ||
| slotsave.feature | ||
| wrong_usages.feature | ||