server : vision support via libmtmd (#12898)

* server : (experimental) vision support via libmtmd * mtmd : add more api around mtmd_image_tokens * mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * abstract out the batch management * small fix * refactor logic adding tokens to batch * implement hashing image * use FNV hash, now hash bitmap instead of file data * allow decoding image embedding to be split into batches * rm whitespace * disable some features when mtmd is on * fix --no-mmproj-offload * mtmd_context_params no timings * refactor server_inp to server_tokens * fix the failing test case * init * wip * working version * add mtmd::bitmaps * add test target * rm redundant define * test: mtmd_input_chunks_free * rm outdated comment * fix merging issue * explicitly create mtmd::input_chunks * mtmd_input_chunk_copy * add clone() * improve server_input struct * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff * fix detokenize * add const to various places * add warning about breaking changes * add c api * helper: use mtmd_image_tokens_get_n_pos * fix ctx_shift * fix name shadowing * more strict condition * support remote image_url * remote image_url log * add CI test * do not log base64 * add "has_multimodal" to /props * remove dangling image * speculative: use slot.cache_tokens.insert * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * rm can_be_detokenized * on prmpt processing done, assert cache_tokens.size * handle_completions_impl returns void * adapt the new web ui * update docs and hot topics * rm assert * small fix (2) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-09 19:29:37 +02:00 · 2025-05-09 19:29:37 +02:00 · 33eff40240
commit 33eff40240
parent 17512a94d6
10 changed files with 774 additions and 101 deletions
--- a/tools/mtmd/README.md
+++ b/tools/mtmd/README.md
@ -16,38 +16,7 @@ The naming and structure related to multimodal support have evolved, which might

 ## Pre-quantized models

-These are ready-to-use models, most of them come with `Q4_K_M` quantization by default:
-
-```sh
-# Gemma 3
-llama-mtmd-cli -hf ggml-org/gemma-3-4b-it-GGUF
-llama-mtmd-cli -hf ggml-org/gemma-3-12b-it-GGUF
-llama-mtmd-cli -hf ggml-org/gemma-3-27b-it-GGUF
-
-# SmolVLM
-llama-mtmd-cli -hf ggml-org/SmolVLM-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/SmolVLM-256M-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/SmolVLM-500M-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/SmolVLM2-2.2B-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/SmolVLM2-256M-Video-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/SmolVLM2-500M-Video-Instruct-GGUF
-
-# Pixtral 12B
-llama-mtmd-cli -hf ggml-org/pixtral-12b-GGUF
-
-# Qwen 2 VL
-llama-mtmd-cli -hf ggml-org/Qwen2-VL-2B-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/Qwen2-VL-7B-Instruct-GGUF
-
-# Qwen 2.5 VL
-llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-3B-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-7B-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-32B-Instruct-GGUF
-llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-72B-Instruct-GGUF
-
-# Mistral Small 3.1 24B (IQ2_M quantization)
-llama-mtmd-cli -hf ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF
-```
+See the list of pre-quantized model [here](../../docs/multimodal.md)

 ## How it works and what is `mmproj`?