Commit graph

1458 commits

Author SHA1 Message Date
LostRuins Concedo
59e991c23c
Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133) 2025-04-27 12:43:37 +02:00
HimariO
ca2bb89eac
clip : Add Qwen2.5VL support (#12402)
* implment vision model architecture, gguf convertor

* handle window attention inputs

* add debug utils

* fix few incorrect tensor memory layout

* move position id remap out of ggml to avoid int32 cuda operations

* cleaning up

* ignore transformers Qwen2_5_xxx type check

* remove not so often use `qwen2vl-cli` debug functions

* remove commented-out code blocks

* fix attn weight scaling after rebase

* add `PROJECTOR_TYPE_QWEN2_5_VL`

* remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`

* replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`

* remove `attn_window_size` from gguf

* fix model conversion

* clean up

* fix merging problem

* add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-27 10:10:34 +02:00
Xuan-Son Nguyen
4753791e70
clip : improve projector naming (#13118)
* clip : improve projector naming

* no more kv has_llava_projector

* rm unused kv

* rm more unused
2025-04-26 22:39:47 +02:00
frob
d5fe4e81bd
grammar : handle maxItems == 0 in JSON schema (#13117)
Co-authored-by: Richard Lyons <frob@cloudstaff.com>
2025-04-26 10:10:20 +02:00
Xuan-Son Nguyen
edb18b6e8f
clip : fix pixtral on some GPU backends (#13097)
* clip : fix pixtral on some GPU backends

* refactor inp_raw set

* rm outdated comment

* fix dynamic size

* add TODO
2025-04-25 14:31:42 +02:00
Xuan-Son Nguyen
13be08daf9
clip : remove boi/eoi embeddings for GLM-edge model (#13081) 2025-04-24 22:17:04 +02:00
Georgi Gerganov
226251ed56
embeddings : fix batch sizes (#13076)
ggml-ci
2025-04-24 22:29:22 +03:00
Georgi Gerganov
13b4548877
cmake : do not include ./src as public for libllama (#13062)
* cmake : do not include ./src as public for libllama

ggml-ci

* cmake : rework tests

ggml-ci

* llguidance : remove unicode include

ggml-ci

* cmake : make c++17 private

ggml-ci
2025-04-24 16:00:10 +03:00
Xuan-Son Nguyen
7c727fbe39
arg : add --no-mmproj-offload (#13093)
* arg : add --no-mmproj-offload

* Update common/arg.cpp
2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen
80982e815e
arg : clean up handling --mmproj with -hf (#13082)
* arg : clean up handling --mmproj with -hf

* rm change about no_mmproj

* Revert "rm change about no_mmproj"

This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c.

* handle no_mmproj explicitly

* skip download mmproj on examples not using it
2025-04-24 12:14:13 +02:00
pl752
5630406959
llama-mtmd-cli: Sigint rework in mtmd vision example (#13080)
* Sigint rework in mtmd vision example

* Applied suggestions on mtmd-cli PR

* Forgot to invert one of the conditions

* Update examples/llava/mtmd-cli.cpp

* Removed redundant exit check

---------

Co-authored-by: pl752 <maximpl752@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-23 23:32:35 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B (#13065)
* add pixtral text model (vision is wip)

* cgraph ok, just missing 2D RoPE

* fix bad rebase

* first working version

* fix problem with img_break token

* support dynamic image size

* update docs

* update test script
2025-04-23 20:21:59 +02:00
Radoslav Gerganov
2cca6c01e4
rpc : add command line option for number of threads for the CPU backend (#13060)
closes #13051
2025-04-23 10:32:49 +03:00
Xuan-Son Nguyen
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) (#13050)
* mtmd : support SmolVLM (version 1 and 2)

* correct chat template

* fix n_patches

* scale_factor is an int

* add more models to test
2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations (#13055)
* llava : update documentations

* fix typo
2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen
84a9bf2fc2
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli (#13012)
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`

* support for minicpmv

* remove cpp files of llava and minicpmv

* update hot topics

* mtmd : add not supported msg for qwen2vl

* Update examples/llava/mtmd.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-21 15:32:58 +02:00
Xuan-Son Nguyen
2016f07bd1
convert : experimental support for --mmproj flag (#13023)
* convert : experimental support for `--mmproj` flag

* fix bad ctrl+f replace

* fix style

* split into subclasses TextModel and VisionModel

* rename Mode --> ModelBase

* small fix

* correct CLIP_VISION arch name (because existing GGUF already use it)

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* fix Mistral3Model

* fix typo

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
2025-04-20 23:29:36 +02:00
Jeffrey Morgan
6602304814
llava: fix errors in clip.h on certain compilers (#13030) 2025-04-20 12:15:41 +02:00
Xuan-Son Nguyen
37b9f0d29d
clip : refactor, add image_manipulation and llava_uhd classes (#13011)
* clip : refactor, add `image_manipulation` and `llava_uhd`

* refactor llava-1.6 preprocessing

* simplify logic for llava-1.5

* missing include
2025-04-19 09:15:45 +02:00
Daniel Tang
6408210082
main : Fix Ctrl+D/newline handling (#12951)
This restores the behavior from #491. This does not affect Ctrl+D's ability to
terminate --multiline-input lines (#1040).

This also actually implements #587: "If the user wants the text to end in a
newline, this should be accomplished by explicitly adding a newline by using
\ followed by return, then returning control by pressing return again."

Fixes #12949
2025-04-18 22:02:55 +02:00
Xuan-Son Nguyen
35370ba945
server : use std::move whenever possible (#12936)
* server : use std::move whenever possible

* use r-value ref

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* make task creation scoped

* restore std::move

* fix task_id not set correctly

* apply changes from suggestion

Co-authored-by: ggerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-18 19:58:12 +02:00
Xuan-Son Nguyen
b9154ecff9
mtmd : add methods to access mtmd_image_tokens (#12906)
* mtmd : add more api around mtmd_image_tokens

* mtmd : ability to calc image hash

* shared_ptr for mtmd_image_tokens

* move hash to user-define ID (fixed)

* fix prompt_modified

* rm redundant data member
2025-04-18 10:04:51 +02:00
Radoslav Gerganov
2db9ba1464
rpc : add RPC_CMD_HELLO (#12955)
Add RPC_CMD_HELLO for getting the version of the protocol implemend by
the server. Follow the semantic versioning rules at https://semver.org

Hopefully this bring better user experience when we make breaking
changes at the protocol level and avoid issues like #12465
2025-04-18 10:13:42 +03:00
Russyyds
d6d2c2ab8c
Add performance print for gemma3 in example (#12929) 2025-04-14 19:18:20 +02:00
Neo Zhang Jianyu
81c7e64fc2
dsiable curl lib check, this action is missed by commit bd3f59f812 (#12761) (#12937) 2025-04-14 18:19:07 +08:00
Ed Addario
71e90e8813
quantize: Handle user-defined quantization levels for additional tensors (#12511)
* Add llama_model_quantize_params parameters

* Add new quantize parameters parsing and validation

* Update usage

* Add new parameters defaults

* Add new quantization parameters logic

* Add llama_model_quantize_params parameters

* Add new quantize parameters parsing and validation

* Update usage

* Add new parameters defaults

* Add new quantization parameters logic

* Minor refactoring as per the contributors' coding guidelines

* Update descriptions to match existing style

* Add llama_model_quantize_params parameters

* Add new quantize parameters parsing and validation

* Update usage

* Add new parameters defaults

* Add new quantization parameters logic

* Minor refactoring as per the contributors' guidelines

* Implement general --tensor-type instead of tensor-specific command option

* Fix implied type bug

* Restore missing #includes

* Add regex capability for tensor selection

* Refactor function name and update ALLOWED_TENSOR_TYPE

* Add missing #include

* Handle edge case when tensor name is cls.output

* Minor logging improvement
2025-04-13 21:29:28 +03:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX (#12915) 2025-04-12 17:33:39 +02:00
Matt Clayton
e59ea539b8
llava: Fix cpu-only clip image encoding sefault (#12907)
* llava: Fix cpu-only clip image encoding

* clip : no smart ptr for ggml_backend_t

* Fix for backend_ptr push_back

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-12 07:29:03 +02:00
Georgi Gerganov
c94085df28
server : add VSCode's Github Copilot Chat support (#12896)
* server : add VSCode's Github Copilot Chat support

* cont : update handler name
2025-04-11 23:37:41 +03:00
yuri@FreeBSD
e8a62631b3
rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903) 2025-04-11 22:04:14 +02:00
tastelikefeet
b2034c2b55
contrib: support modelscope community (#12664)
* support download from modelscope

* support login

* remove comments

* add arguments

* fix code

* fix win32

* test passed

* fix readme

* revert readme

* change to MODEL_ENDPOINT

* revert tail line

* fix readme

* refactor model endpoint

* remove blank line

* fix header

* fix as comments

* update comment

* update readme

---------

Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
2025-04-11 14:01:56 +02:00
Xuan-Son Nguyen
0c50923944
clip : use smart pointer (⚠️ breaking change) (#12869)
* clip : use smart pointers

* fix warmup

* add forward declaration

* misisng include

* fix include (2)

* composite

* simplify batch ptr

* fix conflict
2025-04-11 12:09:39 +02:00
Xuan-Son Nguyen
8b9cc7cdd8
llava : introduce libmtmd (#12849)
* wip llava2

* migrated gemma3 to llava2

* add timings

* correct pre/postfix

* fix missing include

* fix compilation unused var warn

* update llava2_tokenize

* change name llava2 --> mtmd

* improve api

* refine helpers

* Update examples/llava/mtmd.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-10 22:57:16 +02:00
Plamen Minev
381603a775
ci: detach common from the library (#12827)
* fix: detach common from the library

* fix: building chat test template
2025-04-09 10:11:11 +02:00
Xuan-Son Nguyen
65a69e6e1b
clip : do not print ftype (#12832) 2025-04-09 10:09:53 +02:00
Matt Clayton
b32efad2bc
llava: improve clip_ctx destructor to not memleak load_image_size (#12834) 2025-04-08 22:01:58 +02:00
Georgi Gerganov
a19b5cef16
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
2025-04-08 19:54:51 +03:00
Xuan-Son Nguyen
78a1ba0a4f
server : fix thread.join() on exit (#12831) 2025-04-08 18:37:06 +02:00
dm4
2dabf759e7
llava: add more helper functions to check projector types in clip context (#12824)
Signed-off-by: dm4 <sunrisedm4@gmail.com>
2025-04-08 15:49:13 +02:00
characharm
8ca6e1c3a4
server : webui : Improve Chat Input with Auto-Sizing Textarea (#12785)
* Update ChatScreen.tsx

* useAutosizeTextarea.ts

useAutosizeTextarea to encapsulate the logic.

* Implement responsive auto-sizing chat textarea

Replaces the manual textarea resizing with an automatic height adjustment based on content.

- `useChatTextarea` hook to manage textarea state and auto-sizing logic via refs, preserving the optimization
- Textarea now grows vertically up to a maximum height (`lg:max-h-48`) on large screens (lg breakpoint and up).
- Disables auto-sizing and enables manual vertical resizing (`resize-vertical`) on smaller screens for better mobile usability.
- Aligns the "Send" button to the bottom of the textarea (`items-end`) for consistent positioning during resize.

* -update compressed index.html.gz after npm run build
-refactor: replace OptimizedTextareaValue with AutosizeTextareaApi in VSCode context hook

* chore: normalize line endings to LF
refactor: AutosizeTextareaApi -> chatTextareaApi

* refactor: Rename interface to PascalCase

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-08 11:14:59 +02:00
stduhpf
4ccea213bc
hellaswag: display estimated score confidence interval (#12797) 2025-04-07 18:47:08 +03:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp (#12766)
* arg.cpp: add a missing include

* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen
0364178ca2
clip : refactor clip_init, add tests (#12757)
* refactor clip_init

* fix loading file

* fix style

* test ok

* better test with report

* add missing headers

* clarify

* add KEY_MM_PATCH_MERGE_TYPE

* remove bool has_* pattern

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update examples/llava/clip.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* use ggml_soft_max_ext

* refactor logging system

* add minicpm-v-o 2.6 for testing

* use nullptr everywhere

* fix Yi-VL model

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-05 17:17:40 +02:00
Nauful Shaikh
b772394297
server : webui : Upgrade daisyui, tailwindcss. (#12735)
* Upgrade daisyui, tailwindcss.

* Switch to all themes.

* Revert a change.

* Update formatting.

* Install packages before npm build.

* Revert "Install packages before npm build."

This reverts commit 336c5147e614e60993162794ba9d9d4629a916f8.

* Add index.html.gz

* run build

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-04 16:09:52 +02:00
nick huang
23106f94ea
gguf-split : --merge now respects --dry-run option (#12681)
* gguf-split now respects dry-run option

* removing trailing space
2025-04-04 16:09:12 +02:00
Georgi Gerganov
a10b36c91a
llama : refactor kv cache guard (#12695)
* llama : refactor kv cache guard

ggml-ci

* cont : fix comment [no ci]

* llama : fix kv_cache restore logic

ggml-ci

* context : simplify kv cache updates

ggml-ci

* cont : better name [no ci]

* llama : fix llama_decode return code when could not find KV slot

ggml-ci

* context : change log err -> warn [no ci]

* kv-cache : add comment + warning
2025-04-02 14:32:59 +03:00
Xuan-Son Nguyen
42eb248f46
common : remove json.hpp from common.cpp (#12697)
* common : remove json.hpp from common.cpp

* fix comment
2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option (#12694)
* (wip) refactor downloading system [no ci]

* fix all examples

* fix mmproj with -hf

* gemma3: update readme

* only handle mmproj in llava example

* fix multi-shard download

* windows: fix problem with std::min and std::max

* fix 2
2025-04-01 23:44:05 +02:00
Sigbjørn Skjæret
1a85949067
llava : proper description fix (#12668) 2025-03-31 11:28:30 +02:00