Georgi Gerganov
ae92c1855b
sync : ggml
...
ggml-ci
2025-06-10 18:39:33 +03:00
Georgi Gerganov
b8e2194efc
sync : ggml
...
ggml-ci
2025-06-10 09:21:56 +03:00
Georgi Gerganov
f3a4b1659c
sync : ggml
...
ggml-ci
2025-06-01 13:43:57 +03:00
Georgi Gerganov
53f925074d
sync : vendor ( #13901 )
...
* sync : vendor
ggml-ci
* cont : fix httplib version
ggml-ci
* cont : fix lint
* cont : fix lint
* vendor : move to common folder /vendor
ggml-ci
* cont : fix lint
* cont : move httplib to /vendor + use json_fwd.hpp
ggml-ci
* cont : fix server build
ggml-ci
* cont : add missing headers
ggml-ci
* cont : header clean-up
ggml-ci
2025-05-30 16:25:45 +03:00
Georgi Gerganov
1c49c70d07
sync : ggml
2025-05-27 18:05:33 +03:00
Georgi Gerganov
a26c4cc11e
scripts : add option to compare commits in Debug ( #13806 )
...
* scripts : add option to compare commits in Debug
* cont : reuse existing CMAKE_OPTS
2025-05-26 22:24:01 +03:00
Olivier Chafik
f5cd27b71d
server
: streaming of tool calls and thoughts when --jinja
is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>
2025-05-25 01:48:08 +01:00
Georgi Gerganov
d30cb5a7fa
sync : ggml
...
ggml-ci
2025-05-19 13:29:56 +03:00
Sigbjørn Skjæret
be1d4a13db
scripts : fix compare-llama-bench.py show parameter ( #13514 )
2025-05-14 08:41:01 +02:00
Sigbjørn Skjæret
bf79371120
scripts : support arbitrary input file formats in compare-llama-bench.py ( #13455 )
2025-05-13 15:31:12 +02:00
Georgi Gerganov
1e2809bc4b
sync : ggml
2025-05-13 14:02:28 +03:00
Sigbjørn Skjæret
09232370fc
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare ( #13451 )
2025-05-11 16:20:39 +02:00
Georgi Gerganov
d879433824
sync : ggml
...
ggml-ci
2025-05-07 17:28:36 +03:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Georgi Gerganov
b34443923c
sync : ggml ( #13268 )
...
* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)
* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)
* review: remove src_x/y < 0 checks; add performance tests
* sync : ggml
ggml-ci
* vulkan : fix lint (#0 )
---------
Co-authored-by: Acly <aclysia@gmail.com>
2025-05-02 20:54:30 +03:00
Georgi Gerganov
b1dd4d08e8
sync : ggml
...
ggml-ci
2025-05-01 20:15:34 +03:00
Georgi Gerganov
8d33d740c3
sync : ggml
2025-05-01 10:00:39 +03:00
Johannes Gäßler
19e899ce21
scripts: n_depth for compare-llama-bench [no ci] ( #13201 )
2025-04-29 23:32:04 +02:00
Georgi Gerganov
63b4911494
sync : ggml
...
ggml-ci
2025-04-24 17:32:47 +03:00
Georgi Gerganov
526739b879
sync : ggml
...
ggml-ci
2025-04-14 09:26:15 +03:00
Georgi Gerganov
47ba87d0a4
sync : ggml
2025-04-11 00:17:47 +03:00
Georgi Gerganov
eb420e1148
sync : ggml
...
ggml-ci
2025-04-11 00:17:47 +03:00
Georgi Gerganov
e4bf72d631
scripts : fix sync-ggml-am.sh
2025-04-11 00:17:47 +03:00
Georgi Gerganov
a4e46e28f9
sync : ggml
...
ggml-ci
2025-04-07 18:44:17 +03:00
Georgi Gerganov
0114a32da0
sync : ggml
...
ggml-ci
2025-03-31 15:07:32 +03:00
Georgi Gerganov
d3f1f0acfb
sync : ggml
...
ggml-ci
2025-03-30 08:33:31 +03:00
Georgi Gerganov
029c693fdc
sync : ggml
...
ggml-ci
2025-03-27 10:09:29 +02:00
Georgi Gerganov
771d84371c
scripts : update sync + fix cmake merge
...
ggml-ci
2025-03-27 10:09:29 +02:00
Georgi Gerganov
df0665a483
sync : ggml
...
ggml-ci
2025-03-27 09:04:38 +02:00
Georgi Gerganov
102ac1891d
sync : ggml
...
ggml-ci
2025-03-07 14:49:44 +02:00
Olivier Chafik
669912d9a5
tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )
...
* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
Daniel Bevenius
a057897ad4
llama : add xcframework build script ( #11996 )
...
* llama : add xcframework build script
This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.
The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```
Refs: https://github.com/ggml-org/llama.cpp/issues/10747
* examples : remove llama.cpp (source dir ref) from project.pbxproj
This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.
* ci : updated build.yml to use build-xcframework.sh
* ci : add xcframework build to github releases
This commit adds the ability to create a GitHub release with the
xcframework build artifact.
* scripts : add apple app validation scripts
This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.
The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.
* llama : remove Package.swift
This commit removes the Package.swift file, as we are now building an
XCFramework for the project.
* llama : remove Sources and spm-headers directories
* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
Georgi Gerganov
dfd6b2c0be
sync : ggml
...
ggml-ci
2025-03-03 18:18:11 +02:00
Georgi Gerganov
3d1cf3cf33
sync : ggml
...
ggml-ci
2025-03-03 18:18:11 +02:00
Georgi Gerganov
8371d44595
sync : ggml
...
ggml-ci
2025-03-03 18:18:11 +02:00
Georgi Gerganov
aede2074f6
scripts : sync-ggml-am.sh fix
2025-03-03 18:18:11 +02:00
MoonRide303
5137da7b8c
scripts: corrected encoding when getting chat template ( #11866 ) ( #11907 )
...
Signed-off-by: MoonRide303 <moonride303@gmail.com>
2025-02-18 10:30:16 +01:00
Johannes Gäßler
6dde178248
scripts: fix compare-llama-bench commit hash logic ( #11891 )
2025-02-15 20:23:22 +01:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Olivier Chafik
c7f460ab88
server
: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none
(#11607 )
...
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
Georgi Gerganov
0fb77f821f
sync : ggml
2025-02-12 21:46:02 +02:00
Georgi Gerganov
8a59053f63
sync : ggml
2025-02-06 21:23:03 +02:00
Georgi Gerganov
7c9e0ca520
sync : ggml
2025-02-04 12:59:21 +02:00
Georgi Gerganov
8ec05832fa
sync : ggml
2025-02-03 14:57:08 +02:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Georgi Gerganov
815857791d
sync : ggml
2025-01-29 11:25:29 +02:00
Olivier Chafik
6171c9d258
Add Jinja template support ( #11016 )
...
* Copy minja from 58f0ca6dd7
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626
* Update minja to https://github.com/google/minja/pull/25
* Update minja from https://github.com/google/minja/pull/27
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Georgi Gerganov
f26c874179
scripts : restore hf.sh ( #11288 )
...
ggml-ci
2025-01-18 13:18:32 +02:00
Georgi Gerganov
f11cfdfd7f
ci : use -no-cnv in gguf-split tests ( #11254 )
...
* ci : use -no-cnv in gguf-split tests
ggml-ci
* ci : use -no-cnv in requantize tests
ggml-ci
* scripts : fix [no ci]
2025-01-15 18:28:35 +02:00
Georgi Gerganov
44d1e796d0
sync : ggml
2025-01-14 10:39:42 +02:00