context : allow cache-less context for embeddings (#13108)

* context : allow cache-less context for embeddings ggml-ci * context : enable reranking with encode() ggml-ci * context : encode() clears embd_seq ggml-ci * examples : use llama_encode() when appropriate ggml-ci * models : nomic bert moe does not require KV cache * llama : update comments for llama_decode/llama_encode ggml-ci * context : update warning log [no ci]
2025-05-08 14:28:33 +03:00 · 2025-05-08 14:28:33 +03:00 · 6562e5a4d6
commit 6562e5a4d6
parent 51fb96b1ff
5 changed files with 47 additions and 23 deletions
--- a/src/llama-model.cpp
+++ b/src/llama-model.cpp
@ -12852,6 +12852,13 @@ llama_memory_i * llama_model::create_memory(const llama_memory_params & params,
    llama_memory_i * res;

    switch (arch) {
+        case LLM_ARCH_BERT:
+        case LLM_ARCH_JINA_BERT_V2:
+        case LLM_ARCH_NOMIC_BERT:
+        case LLM_ARCH_NOMIC_BERT_MOE:
+            {
+                res = nullptr;
+            } break;
        case LLM_ARCH_MAMBA:
        case LLM_ARCH_RWKV6:
        case LLM_ARCH_RWKV6QWEN2: