* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* llama : allow building all tests on windows when not using shared libraries
* add static windows build to ci
* tests : enable debug logs for test-chat
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* vulkan: scalar flash attention implementation
* vulkan: always use fp32 for scalar flash attention
* vulkan: use vector loads in scalar flash attention shader
* vulkan: remove PV matrix, helps with register usage
* vulkan: reduce register usage in scalar FA, but perf may be slightly worse
* vulkan: load each Q value once. optimize O reduction. more tuning
* vulkan: support q4_0/q8_0 KV in scalar FA
* CI: increase timeout to accommodate newly-supported tests
* vulkan: for scalar FA, select between 1 and 8 rows
* vulkan: avoid using Float16 capability in scalar FA
* [CANN] Support ELU and CONV_TRANSPOSE_1D
* [CANN]Modification review comments
* [CANN]Modification review comments
* [CANN]name adjustment
* [CANN]remove lambda used in template
* [CANN]Use std::func instead of template
* [CANN]Modify the code according to the review comments
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* ci: add visionOS build workflow
Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.
* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs
* ci: remove define hacks for u_xxx system types
---------
Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option, the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```
Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377
This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.
This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.
The commit add the name parameter to the upload-artifact action to
ensure that the artifact is uploaded with the correct name.
The motivation for this is that currently the uploaded xcframework
is named as llama-b1-xcframework.zip. With this change the name of this
artifact should contain the build number like the other artifacts.
* ci : remove xframework upload
This commit removes the upload of the xframework zip file as an
artifact.
The motivation for this change is that the xframework zip file is
currently being uploaded as part of strategy and will therefore be
attempted to be uploaded multiple times and will fail the build.
The uploading should be moved to somewhere else in the build to avoid
this.
* ci : add xcframework upload to macos-latest job
* llama : add xcframework build script
This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.
The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```
Refs: https://github.com/ggml-org/llama.cpp/issues/10747
* examples : remove llama.cpp (source dir ref) from project.pbxproj
This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.
* ci : updated build.yml to use build-xcframework.sh
* ci : add xcframework build to github releases
This commit adds the ability to create a GitHub release with the
xcframework build artifact.
* scripts : add apple app validation scripts
This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.
The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.
* llama : remove Package.swift
This commit removes the Package.swift file, as we are now building an
XCFramework for the project.
* llama : remove Sources and spm-headers directories
* llama : use TargetConditionals.h for visionOS/tvOS
* vulkan: initial support for IQ1_S and IQ1_M quantizations
* vulkan: define MMV kernels for IQ1 quantizations
* devops: increase timeout of Vulkan tests again
* vulkan: simplify ifdef for init_iq_shmem
* musa: Update MUSA SDK version to rc3.1.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: Remove workaround in PR #10042
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* initial porting of previous LLG patch
* update for new APIs
* build: integrate llguidance as an external project
* use '%llguidance' as marker to enable llg lark syntax
* add some docs
* clarify docs
* code style fixes
* remove llguidance.h from .gitignore
* fix tests when llg is enabled
* pass vocab not model to llama_sampler_init_llg()
* copy test-grammar-integration.cpp to test-llguidance.cpp
* clang fmt
* fix ref-count bug
* build and run test
* gbnf -> lark syntax
* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag
* rename llguidance test file to test-grammar-llguidance.cpp
* add gh action for llg test
* align tests with LLG grammar syntax and JSON Schema spec
* llama_tokenizer() in fact requires valid utf8
* update llg
* format file
* add $LLGUIDANCE_LOG_LEVEL support
* fix whitespace
* fix warning
* include <cmath> for INFINITY
* add final newline
* fail llama_sampler_init_llg() at runtime
* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes
* simplify #includes
* improve doc string for LLAMA_LLGUIDANCE
* typo in merge
* bump llguidance to 0.6.12
* vulkan: initial support for IQ3_S
* vulkan: initial support for IQ3_XXS
* vulkan: initial support for IQ2_XXS
* vulkan: initial support for IQ2_XS
* vulkan: optimize Q3_K by removing branches
* vulkan: implement dequantize variants for coopmat2
* vulkan: initial support for IQ2_S
* vulkan: vertically realign code
* port failing dequant callbacks from mul_mm
* Fix array length mismatches
* vulkan: avoid using workgroup size before it is referenced
* tests: increase timeout for Vulkan llvmpipe backend
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
* release : pack /lib and /include in the packages
* cmake : put libs in /bin
* TMP : push artifacts
* Revert "TMP : push artifacts"
This reverts commit 4decf2c4dfc5cdf5d96ea44c03c8f9801ab41262.
* ci : fix HIP cmake compiler options to be on first line
* ci : restore the original HIP commands
* ci : change ubuntu build from latest to 20.04
* ci : try to fix macos build rpaths
* ci : remove obsolete MacOS build
* TMP : push artifacts
* ci : change back to ubuntu latest
* ci : macos set build rpath to "@loader_path"
* ci : fix typo
* ci : change ubuntu package to 22.04
* Revert "TMP : push artifacts"
This reverts commit 537b09e70ffc604c414ee78acf3acb4c940ec597.
This is a fork of linenoise that is C++17 compatible. I intend on
adding it to llama-run so we can do things like traverse prompt
history via the up and down arrows:
https://github.com/ericcurtin/linenoise.cpp
Signed-off-by: Eric Curtin <ecurtin@redhat.com>