Commit graph

31 commits

Author SHA1 Message Date
Anthony Umfer
9a390c4829
tools : fix uninitialized llama_batch in server (#13436)
* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()

* Update tools/server/server.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

* use C++11 initializer syntax

* switch from Copy-list-initialization to Direct-list-initialization

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-11 17:08:26 +02:00
David Huang
7f323a589f
Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (#13386) 2025-05-11 14:18:39 +02:00
City
3eac209319
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
* Support InternVL 3 38B and 78B mmproj

* Swap norms in clip.cpp

* Group variables together
2025-05-11 11:35:52 +02:00
Xuan-Son Nguyen
a634d75d1b
mtmd : move helpers to dedicated file (#13442)
* mtmd : move helpers to dedicated file

* fix windows build

* rm redundant include
2025-05-11 11:34:23 +02:00
Xuan-Son Nguyen
15e6125a39
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)
* mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl

* fix typo
2025-05-10 19:57:54 +02:00
Xuan-Son Nguyen
3b24d26c22
server : update docs (#13432) 2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 (#13422)
* convert : internvl support

* InternVL3-1B working

* fix regression

* rm mobilevlm from test

* fix conversion

* add test for internvl

* add to list of pre-quant

* restore boi/eoi check

* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen
33eff40240
server : vision support via libmtmd (#12898)
* server : (experimental) vision support via libmtmd

* mtmd : add more api around mtmd_image_tokens

* mtmd : add more api around mtmd_image_tokens

* mtmd : ability to calc image hash

* shared_ptr for mtmd_image_tokens

* move hash to user-define ID (fixed)

* abstract out the batch management

* small fix

* refactor logic adding tokens to batch

* implement hashing image

* use FNV hash, now hash bitmap instead of file data

* allow decoding image embedding to be split into batches

* rm whitespace

* disable some features when mtmd is on

* fix --no-mmproj-offload

* mtmd_context_params no timings

* refactor server_inp to server_tokens

* fix the failing test case

* init

* wip

* working version

* add mtmd::bitmaps

* add test target

* rm redundant define

* test: mtmd_input_chunks_free

* rm outdated comment

* fix merging issue

* explicitly create mtmd::input_chunks

* mtmd_input_chunk_copy

* add clone()

* improve server_input struct

* clip :  fix confused naming ffn_up and ffn_down

* rm ffn_i/o/g naming

* rename n_embd, n_ff

* small fix

* no check n_ff

* fix detokenize

* add const to various places

* add warning about breaking changes

* add c api

* helper: use mtmd_image_tokens_get_n_pos

* fix ctx_shift

* fix name shadowing

* more strict condition

* support remote image_url

* remote image_url log

* add CI test

* do not log base64

* add "has_multimodal" to /props

* remove dangling image

* speculative: use slot.cache_tokens.insert

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* rm can_be_detokenized

* on prmpt processing done, assert cache_tokens.size

* handle_completions_impl returns void

* adapt the new web ui

* update docs and hot topics

* rm assert

* small fix (2)

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-09 19:29:37 +02:00
Diego Devesa
27ebfcacba
llama : do not crash if there is no CPU backend (#13395)
* llama : do not crash if there is no CPU backend

* add checks to examples
2025-05-09 13:02:07 +02:00
Bartowski
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389)
* Add --parse-special for enabling parsing of special tokens in imatrix calculation

* whitespace
2025-05-09 11:53:58 +02:00
R0CKSTAR
0527771dd8
llama-run: add support for downloading models from ModelScope (#13370)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-09 10:25:50 +01:00
Xuan-Son Nguyen
2189fd3b63
mtmd : fix batch_view for m-rope (#13397)
* mtmd : fix batch_view for m-rope

* nits : fix comment
2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 (#13398)
* llama : one-off chat template fix for Mistral-Small-2503

* update readme

* add mistral-v7-tekken
2025-05-09 11:17:51 +02:00
Xuan-Son Nguyen
d9c4accaff
server : (webui) rename has_multimodal --> modalities (#13393)
* server : (webui) rename has_multimodal --> modalities

* allow converting SVG to PNG

* less complicated code
2025-05-09 09:06:37 +02:00
Matt Clayton
f05a6d71a0
mtmd : Expose helper_decode_image_chunk (#13366)
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free

* Slim down

* Cleanups
2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen
ee01d71e58
server : (webui) fix a very small misalignment (#13387)
* server : (webui) fix a very small misalignment

* restore font-bold
2025-05-08 18:51:45 +02:00
Xuan-Son Nguyen
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements (#13365)
* rework the input area

* process selected file

* change all icons to heroicons

* fix thought process collapse

* move conversation more menu to sidebar

* sun icon --> moon icon

* rm default system message

* stricter upload file check, only allow image if server has mtmd

* build it

* add renaming

* better autoscroll

* build

* add conversation group

* fix scroll

* extra context first, then user input in the end

* fix <hr> tag

* clean up a bit

* build

* add mb-3 for <pre>

* throttle adjustTextareaHeight to make it less laggy

* (nits) missing padding in sidebar

* rm stray console log
2025-05-08 15:37:29 +02:00
welix
0ccc121354
mtmd : fix the calculation of n_tokens for smolvlm (#13381)
Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com>
2025-05-08 15:03:53 +02:00
Georgi Gerganov
6562e5a4d6
context : allow cache-less context for embeddings (#13108)
* context : allow cache-less context for embeddings

ggml-ci

* context : enable reranking with encode()

ggml-ci

* context : encode() clears embd_seq

ggml-ci

* examples : use llama_encode() when appropriate

ggml-ci

* models : nomic bert moe does not require KV cache

* llama : update comments for llama_decode/llama_encode

ggml-ci

* context : update warning log [no ci]
2025-05-08 14:28:33 +03:00
Georgi Gerganov
51fb96b1ff
context : remove logits_all flag (#13284)
* context : remove logits_all flag

ggml-ci

* llama : remove logits_all flag + reorder llama_context_params

ggml-ci
2025-05-08 14:26:50 +03:00
Xuan-Son Nguyen
32916a4907
clip : refactor graph builder (#13321)
* mtmd : refactor graph builder

* fix qwen2vl

* clean up siglip cgraph

* pixtral migrated

* move minicpmv to a dedicated build function

* move max_feature_layer to build_llava

* use build_attn for minicpm resampler

* fix windows build

* add comment for batch_size

* also support tinygemma3 test model

* qwen2vl does not use RMS norm

* fix qwen2vl norm (2)
2025-05-06 22:40:24 +02:00
oobabooga
233461f812
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264)
* sampling: add Top-nσ sampler to `llama-server` and sampler ordering

* revert: sampler ordering

* revert: VS' crappy auto-formatting

* revert: VS' crappy auto-formatting pt.2

* revert: my crappy eye sight...

* sampling: add XTC to Top-nσ sampler chain

* sampling: add Dyna. Temp. to Top-nσ sampler chain

* sampling: actually remove Top-nσ from sampler(oops)

* Integrate top_n_sigma into main sampler chain

* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA

* Formatting

* Lint

* Exit early in the sampler if nsigma < 0

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-05-05 22:12:19 +02:00
igardev
b34c859146
server : Webui - change setText command from parent window to also send the message. (#13309)
* setText command from parent window for llama-vscode now sends the message automatically.

* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.

* Fix code formatting.

* Add index.html.gz changes.

* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."

This reverts commit 67687b7fda8a293724ba92ea30bb151677406bc8.

* easier approach

* add setTimeout

---------

Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-05 16:03:31 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd (#13311)
* mv llava to mtmd

* change ref everywhere
2025-05-05 16:02:55 +02:00
Xuan-Son Nguyen
5215b91e93
clip : fix confused naming ffn_up and ffn_down (#13290)
* clip :  fix confused naming ffn_up and ffn_down

* rm ffn_i/o/g naming

* rename n_embd, n_ff

* small fix

* no check n_ff
2025-05-05 12:54:44 +02:00
Xuan-Son Nguyen
27aa259532
mtmd : add C public API (#13184)
* init

* wip

* working version

* add mtmd::bitmaps

* add test target

* rm redundant define

* test: mtmd_input_chunks_free

* rm outdated comment

* fix merging issue

* explicitly create mtmd::input_chunks

* mtmd_input_chunk_copy

* add clone()

* add const to various places

* add warning about breaking changes

* helper: use mtmd_image_tokens_get_n_pos
2025-05-04 23:43:42 +02:00
Diego Devesa
9fdfcdaedd
rpc : use backend registry, support dl backends (#13304) 2025-05-04 21:25:43 +02:00
Diego Devesa
86bd60d3fe
llava/mtmd : fixes to fully support dl backends (#13303) 2025-05-04 17:05:20 +02:00
Johannes Gäßler
3e959f0976
imatrix: fix oob writes if src1 is not contiguous (#13286) 2025-05-04 00:50:37 +02:00
Xuan-Son Nguyen
36667c8edc
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259) 2025-05-03 20:07:54 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00