convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209)
* wip * qwen2.5vl ok * vision: fix models missing "text_config" * add test * fix test repo name * fix 32B model * Revert "fix 32B model" This reverts commit 651752f1ae25fe8a01c1e57c18cf2eca80b2774e. * clarify about 32B * rm qwen surgery script * update llava/readme * move V_ENC_EMBD_PATCH handling to Qwen2VLVisionModel
This commit is contained in:
parent
c642bc014c
commit
074e42ab31
7 changed files with 132 additions and 233 deletions
|
@ -35,6 +35,16 @@ llama-mtmd-cli -hf ggml-org/SmolVLM2-500M-Video-Instruct-GGUF
|
|||
# Pixtral 12B
|
||||
llama-mtmd-cli -hf ggml-org/pixtral-12b-GGUF
|
||||
|
||||
# Qwen 2 VL
|
||||
llama-mtmd-cli -hf ggml-org/Qwen2-VL-2B-Instruct-GGUF
|
||||
llama-mtmd-cli -hf ggml-org/Qwen2-VL-7B-Instruct-GGUF
|
||||
|
||||
# Qwen 2.5 VL
|
||||
llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-3B-Instruct-GGUF
|
||||
llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-7B-Instruct-GGUF
|
||||
llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-32B-Instruct-GGUF
|
||||
llama-mtmd-cli -hf ggml-org/Qwen2.5-VL-72B-Instruct-GGUF
|
||||
|
||||
# Mistral Small 3.1 24B (IQ2_M quantization)
|
||||
llama-mtmd-cli -hf ggml-org/Mistral-Small-3.1-24B-Instruct-2503-GGUF --chat-template mistral-v7
|
||||
```
|
||||
|
@ -60,7 +70,17 @@ Built upon `clip.cpp` (similar to `llava.cpp`), `libmtmd` offers several advanta
|
|||
|
||||
## How to obtain `mmproj`
|
||||
|
||||
Multimodal projector (`mmproj`) files are specific to each model architecture. Please refer to the relevant guide for instructions on how to obtain or create them:
|
||||
Multimodal projector (`mmproj`) files are specific to each model architecture.
|
||||
|
||||
For the following models, you can use `convert_hf_to_gguf.py`with `--mmproj` flag to get the `mmproj` file:
|
||||
- [Gemma 3](https://huggingface.co/collections/google/gemma-3-release-67c6c6f89c4f76621268bb6d) - Note: 1B variant does not have vision support
|
||||
- SmolVLM (from [HuggingFaceTB](https://huggingface.co/HuggingFaceTB))
|
||||
- SmolVLM2 (from [HuggingFaceTB](https://huggingface.co/HuggingFaceTB))
|
||||
- [Pixtral 12B](https://huggingface.co/mistral-community/pixtral-12b) - only works with `transformers`-compatible checkpoint
|
||||
- Qwen 2 VL and Qwen 2.5 VL (from [Qwen](https://huggingface.co/Qwen))
|
||||
- [Mistral Small 3.1 24B](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)
|
||||
|
||||
For older models, please refer to the relevant guide for instructions on how to obtain or create them:
|
||||
|
||||
- [LLaVA](../../docs/multimodal/llava.md)
|
||||
- [MobileVLM](../../docs/multimodal/MobileVLM.md)
|
||||
|
@ -70,10 +90,3 @@ Multimodal projector (`mmproj`) files are specific to each model architecture. P
|
|||
- [MiniCPM-o 2.6](../../docs/multimodal/minicpmo2.6.md)
|
||||
- [IBM Granite Vision](../../docs/multimodal/granitevision.md)
|
||||
- [Google Gemma 3](../../docs/multimodal/gemma3.md)
|
||||
|
||||
For the following models, you can use `convert_hf_to_gguf.py`with `--mmproj` flag to get the `mmproj` file:
|
||||
- [Gemma 3](https://huggingface.co/collections/google/gemma-3-release-67c6c6f89c4f76621268bb6d) - Note: 1B variant does not have vision support
|
||||
- SmolVLM (from [HuggingFaceTB](https://huggingface.co/HuggingFaceTB))
|
||||
- SmolVLM2 (from [HuggingFaceTB](https://huggingface.co/HuggingFaceTB))
|
||||
- [Pixtral 12B](https://huggingface.co/mistral-community/pixtral-12b) - only works with `transformers`-compatible checkpoint
|
||||
- [Mistral Small 3.1 24B](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503)
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue