Commit graph

157 commits

Author SHA1 Message Date
Sigbjørn Skjæret
88fc854b4b
llama : improve sep token handling (#14272) 2025-06-20 14:04:09 +02:00
pqnet
5fc7856815
convert : fix remote option in Windows (#14100) 2025-06-19 12:21:40 +02:00
Sigbjørn Skjæret
3865cff4f5
convert : fix null head_dim AutoConfig regression (#14248) 2025-06-18 09:52:07 +02:00
Đinh Trọng Huy
ad590be98c
model : add NeoBERT (#14164)
* convert neobert model to gguf

* add inference graph

* fix flake8 lint

* followed reviewer suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* follow reviewers suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* override NeoBERT feed-forward length

---------

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-16 14:53:41 +02:00
Bartowski
d7da8dc83a
model : Add support for Arcee AI's upcoming AFM model (#14185)
* Add Arcee AFM support

* Add draft update code

* Fix linter and update URL, may still not be final

* Update src/llama-model.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* Remote accidental blank line

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-06-16 01:04:06 +02:00
Mikko Juola
9ae4143bc6
model : add dots.llm1 architecture support (#14044) (#14118)
Adds:

* Dots1Model to convert_hf_to_gguf.py

* Computation graph code to llama-model.cpp

* Chat template to llama-chat.cpp to detect this model's template.

---

The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.

The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:

* https://huggingface.co/rednote-hilab/dots.llm1.inst

* https://huggingface.co/rednote-hilab/dots.llm1.base

The model architecture is a combination of Qwen and Deepseek parts, as
seen here:

ffe12627b4/src/transformers/models/dots1/modular_dots1.py
2025-06-15 09:52:06 +02:00
Sigbjørn Skjæret
55f6b9fa65
convert : fix duplicate key DeepSeek-R1 conversion error (#14103) 2025-06-10 23:29:52 +02:00
Sigbjørn Skjæret
3678b838bb
llama : support GEGLU for jina-bert-v2 (#14090) 2025-06-10 18:02:08 +02:00
Sigbjørn Skjæret
1caae7fc6c
gguf-py : add add_classifier_output_labels method to writer (#14031)
* add add_classifier_output_labels

* use add_classifier_output_labels
2025-06-05 17:42:31 +02:00
Sigbjørn Skjæret
5e1c3aed40
convert : fix nomic-bert-moe mask token (#13757) 2025-06-01 18:07:21 +02:00
Sigbjørn Skjæret
c496fe0b1d
convert : fix vocab padding code for bert models (#13954) 2025-06-01 17:23:11 +02:00
Sigbjørn Skjæret
db38704f01
convert : fix rwkv bos/eos token (#13844) 2025-05-30 14:50:43 +02:00
Xuan-Son Nguyen
07e4351ce6
convert : allow partial update to the chkhsh pre-tokenizer list (#13847)
* convert : allow partial update to the chkhsh pre-tokenizer list

* code style

* update tokenizer out

* rm inp/out files for models not having gguf

* fixed hash for glm

* skip nomic-bert-moe test

* Update convert_hf_to_gguf_update.py

* fix minerva-7b hash

* rm redundant import
2025-05-30 12:24:37 +02:00
Đinh Trọng Huy
291f2b6913
llama : add support for DistilBert (#13907)
* add distilbert

* small fixes

* add note for LLM_ARCH_DISTIL_BERT

* Use MODEL_ARCH.BERT for DistilBert

---------

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
2025-05-30 11:56:02 +02:00
Sigbjørn Skjæret
e83ba3e460
llama : add support for jina-reranker-v2 (#13900) 2025-05-29 21:42:31 +02:00
Sigbjørn Skjæret
5ca82fc1d7
convert : workaround for AutoConfig dummy labels (#13881) 2025-05-29 10:00:57 +02:00
Sigbjørn Skjæret
6385b843a8
llama : add RobertaForSequenceClassification reranker support (#13875) 2025-05-29 08:15:01 +02:00
Đinh Trọng Huy
e0e3aa231d
llama : add support for BertForSequenceClassification reranker (#13858)
* convert: add support for BertForSequenceClassification

* add support for reranking using BertForSequenceClassification

* merge checks of eos and sep

* fix lint

---------

Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
2025-05-28 19:01:58 +02:00
Đinh Trọng Huy
aa6dff05be
convert: small addition to support LlamaModel (#13838)
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
2025-05-28 16:34:18 +02:00
Xuan-Son Nguyen
a3938fb53d
convert : fix qwen omni conversion (#13859)
* convert : fix qwen omni conversion

* fix typo
2025-05-28 16:12:35 +02:00
Xuan-Son Nguyen
26b79b6cb3
convert : fix tensor naming conflict for llama 4 vision (#13836)
* convert : fix tensor naming conflict for llama 4 vision

* add comment
2025-05-28 10:05:54 +02:00
Xuan-Son Nguyen
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784)
* mtmd : allow multiple modalities at the same time

* refactor mtmd tokenizer

* fix compile

* ok, missing SinusoidsPositionEmbedding

* first working version

* fix style

* more strict validate of n_embd

* refactor if..else to switch

* fix regression

* add test for 3B

* update docs

* fix tokenizing with add_special

* add more tests

* fix test case "huge"

* rm redundant code

* set_position_mrope_1d rm n_tokens
2025-05-27 14:06:10 +02:00
Xuan-Son Nguyen
40aaa8a403
mtmd : add support for Qwen2-Audio and SeaLLM-Audio (#13760)
* mtmd : add Qwen2-Audio support

* small clean up

* update discussion link

* clarify mtmd_get_output_embd

* clarification in multimodal.md

* fix ultravox bug

* ggml_cont
2025-05-25 14:06:32 +02:00
Xuan-Son Nguyen
797990c4bc
mtmd : add ultravox audio input (#13623)
* convert ok, load ok

* warmup ok

* test

* still does not work?

* fix padding

* temporary give up

* fix merge conflict

* build_ultravox()

* rm test

* fix merge conflict

* add necessary mtmd APIs

* first working version (only 4s of audio)

* will this monster compile?

* fix compile

* please compile

* fPIC

* fix windows

* various fixes

* clean up audio_helpers

* fix conversion

* add some debug stuff

* long audio input ok

* adapt the api

* add --audio arg

* final touch UX

* add miniaudio to readme

* fix typo

* refactor kv metadata

* mtmd_default_marker()
2025-05-22 20:42:48 +02:00
antichristHater
c76532e7ba
convert : add qwen2vl support for unsloth merges (#13686) 2025-05-21 18:40:35 +02:00
Xuan-Son Nguyen
92ecdcc06a
mtmd : add vision support for llama 4 (#13282)
* wip llama 4 conversion

* rm redundant __init__

* fix conversion

* fix conversion

* test impl

* try this

* reshape patch_embeddings_0

* fix view

* rm ffn_post_norm

* cgraph ok

* f32 for pos embd

* add image marker tokens

* Llama4UnfoldConvolution

* correct pixel shuffle

* fix merge conflicts

* correct

* add debug_graph

* logits matched, but it still preceives the image incorrectly

* fix style

* add image_grid_pinpoints

* handle llama 4 preprocessing

* rm load_image_size

* rm unused line

* fix

* small fix 2

* add test & docs

* fix llava-1.6 test

* test: add notion of huge models

* add comment

* add warn about degraded quality
2025-05-19 13:04:14 +02:00
Xuan-Son Nguyen
c531edfa34
convert : fix conversion for llama 4 (#13567) 2025-05-15 17:40:07 +02:00
Gabe Goodhart
d590cd4c24
model : Granite MoE shared (#13269)
* feat: Add GGUF conversion for granitemoeshared

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: hparam and arch plumbing for granitemoeshared

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Split MoE fused tensors for shared experts in conversion

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* feat: First WIP cut at model arch in cpp

The hparam and architecture plumbing should be correct, but the
implementation of the shared experts seems to still be broken.

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Cleaner (maybe more correct?) splitting for gate/up

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Fix the input to the shared experts

I had misread that the shared experts take the inputs _before_ the standard
MoE layer and was feeding the output of the MoE to the shared experts.

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Avoid architecture-specific checks for Granite MoE Shared

This is a cleaner way that will allow more flexibility in architecture
strings going forward.

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* refactor: Split granite architectures out of llm_build_llama

This helps de-clutter the llama-family graph construction and allows
granite to diverge further (in preparation for Granite 4).

NOTE: I removed the granite scale factors from llm_build_deci because they
appear to only be there as copy-paste from llm_build_llama. The HF config
does not seem to set those values:
https://huggingface.co/Deci/DeciLM-7B/blob/main/config.json

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Fix compiler warning about uninitialized inp_pos

This should not have been reachable, but it warns on some compliers

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Consoladate GraniteMoEShared into GraniteMoE for conversion

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Consolidate GraniteMoEShared into GraniteMoE on the c++ side

Branch: GraniteMoEShared

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-05-13 15:12:01 +02:00
Sigbjørn Skjæret
d2a4ef05c6
vocab : add ByteDance-Seed/Seed-Coder (#13423) 2025-05-10 22:08:07 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 (#13422)
* convert : internvl support

* InternVL3-1B working

* fix regression

* rm mobilevlm from test

* fix conversion

* add test for internvl

* add to list of pre-quant

* restore boi/eoi check

* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Sigbjørn Skjæret
1a844be132
convert : support rope_scaling type and rope_type (#13349) 2025-05-08 15:34:29 +02:00
Xuan-Son Nguyen
32916a4907
clip : refactor graph builder (#13321)
* mtmd : refactor graph builder

* fix qwen2vl

* clean up siglip cgraph

* pixtral migrated

* move minicpmv to a dedicated build function

* move max_feature_layer to build_llava

* use build_attn for minicpm resampler

* fix windows build

* add comment for batch_size

* also support tinygemma3 test model

* qwen2vl does not use RMS norm

* fix qwen2vl norm (2)
2025-05-06 22:40:24 +02:00
Sigbjørn Skjæret
764b85627b
convert : qwen2/3moe : set yarn metadata if present (#13331)
* set yarn metadata if present

* add comment about enabling YaRN

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
2025-05-06 11:12:06 +02:00
Xuan-Son Nguyen
5215b91e93
clip : fix confused naming ffn_up and ffn_down (#13290)
* clip :  fix confused naming ffn_up and ffn_down

* rm ffn_i/o/g naming

* rename n_embd, n_ff

* small fix

* no check n_ff
2025-05-05 12:54:44 +02:00
Sigbjørn Skjæret
ae803bfc3d
convert : bailingmoe : set yarn metadata if present (#13312) 2025-05-05 12:34:26 +02:00
ymcki
3bf785f3ef
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843) 2025-05-03 17:39:51 +02:00
Jared Van Bortel
2f567611c0
llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245) 2025-05-02 11:42:30 -04:00
Jared Van Bortel
7d2123484e
convert : use correct context length for nomic-embed-text-v2 (#13216) 2025-05-02 11:41:54 -04:00
Xuan-Son Nguyen
074e42ab31
convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209)
* wip

* qwen2.5vl ok

* vision: fix models missing "text_config"

* add test

* fix test repo name

* fix 32B model

* Revert "fix 32B model"

This reverts commit 651752f1ae25fe8a01c1e57c18cf2eca80b2774e.

* clarify about 32B

* rm qwen surgery script

* update llava/readme

* move V_ENC_EMBD_PATCH handling to Qwen2VLVisionModel
2025-05-02 17:17:15 +02:00
Xuan-Son Nguyen
dcf886007d
convert : explicitly disable trust_remote_code for AutoConfig (#13246) 2025-05-02 08:45:10 +02:00
Xuan-Son Nguyen
8936784f7a
mtmd : add **vision** support for Mistral Small 3.1 (#13231)
* convert ok

* load ok, missing patch merger

* ah sheet it works

* update llava/readme

* add test

* fix test
2025-05-01 17:05:42 +02:00
Xuan-Son Nguyen
3e168bede4
convert : improve model arch handling (#13122)
* convert : improve model arch handling

* use AutoConfig

* rm trust_remote_code

* Update convert_hf_to_gguf.py

* fix self.block_count for vision

* fix NomicBertModel
2025-04-30 16:56:24 +02:00
Xuan-Son Nguyen
07c2e2f76c
convert : correct typo image_mean --> image_std (#13208) 2025-04-30 13:06:15 +02:00
AT
5f5e39e1ba
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466)
* Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture

- Adds MoE-based embedding model supporting multilingual embeddings.
- Selects architecture variant based on hyperparameter detection (MoE layers).
- Removes unnecessary subclass initialization checks for clarity.

https://www.nomic.ai/blog/posts/nomic-embed-text-v2

Co-authored-by: Jared Van Bortel <jared@nomic.ai>

* fix tokenizer

* don't rename this tensor

---------

Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2025-04-28 22:52:15 +03:00
matteo
ced44be342
llama-chat : fix wrong template in GLM4-0414 (#13140)
* fix wrong template in GLM4-0414

* fix spaces

* no bos token since it is already in the template

* moved the chatgml4 check to higher priority

* restored template for old GLM models

* moved the GLM4 template check in the correct place with correct check
2025-04-27 21:57:32 +02:00
HimariO
ca2bb89eac
clip : Add Qwen2.5VL support (#12402)
* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-27 10:10:34 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B (#13065)
* add pixtral text model (vision is wip)

* cgraph ok, just missing 2D RoPE

* fix bad rebase

* first working version

* fix problem with img_break token

* support dynamic image size

* update docs

* update test script
2025-04-23 20:21:59 +02:00
piDack
eb1776b15a
convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (#13021)
* append mult-eos,half-rope,bos to GLM4-0414

* remove unset var
2025-04-23 16:59:14 +02:00
Xuan-Son Nguyen
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) (#13050)
* mtmd : support SmolVLM (version 1 and 2)

* correct chat template

* fix n_patches

* scale_factor is an int

* add more models to test
2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen
2016f07bd1
convert : experimental support for --mmproj flag (#13023)
* convert : experimental support for `--mmproj` flag

* fix bad ctrl+f replace

* fix style

* split into subclasses TextModel and VisionModel

* rename Mode --> ModelBase

* small fix

* correct CLIP_VISION arch name (because existing GGUF already use it)

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* fix Mistral3Model

* fix typo

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
2025-04-20 23:29:36 +02:00