mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) (#13784)

* mtmd : allow multiple modalities at the same time

* refactor mtmd tokenizer

* fix compile

* ok, missing SinusoidsPositionEmbedding

* first working version

* fix style

* more strict validate of n_embd

* refactor if..else to switch

* fix regression

* add test for 3B

* update docs

* fix tokenizing with add_special

* add more tests

* fix test case "huge"

* rm redundant code

* set_position_mrope_1d rm n_tokens
This commit is contained in:
Xuan-Son Nguyen 2025-05-27 14:06:10 +02:00 committed by GitHub
parent 72b090da2c
commit bc583e3c63
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 1148 additions and 744 deletions

File diff suppressed because it is too large Load diff