llama.cpp/ggml/src
Dan Johansson a71a4075cd
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053)
* ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

* * code review fixes

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

* * adds a comment that clarifies barrier usage

Signed-off-by: Dan Johansson <dan.johansson@arm.com>

---------

Signed-off-by: Dan Johansson <dan.johansson@arm.com>
Co-authored-by: Charles Xu <charles.xu@arm.com>
2025-05-12 13:06:19 +02:00
..
ggml-blas ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-cann CANN: Add support for async operator submission (#12864) 2025-04-17 20:34:16 +08:00
ggml-cpu ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053) 2025-05-12 13:06:19 +02:00
ggml-cuda CUDA: fix misaligned synchronization in FA (#13469) 2025-05-12 10:51:21 +02:00
ggml-hip CUDA/HIP: Share the same unified memory allocation logic. (#12934) 2025-04-15 11:20:38 +02:00
ggml-kompute llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-metal ggml : add mrope kernel for metal (#13457) 2025-05-12 10:29:13 +02:00
ggml-musa cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394) 2025-03-17 20:25:13 +02:00
ggml-opencl opencl: fix incorrect local_size index in profiling log (#12868) 2025-04-16 14:25:57 -07:00
ggml-rpc rpc : add rpc_msg_set_tensor_hash_req (#13353) 2025-05-09 10:31:07 +03:00
ggml-sycl enable dpcpp nightly builds with libraries (#13406) 2025-05-12 13:15:32 +08:00
ggml-vulkan vulkan: scalar flash attention implementation (#13324) 2025-05-10 08:07:07 +02:00
CMakeLists.txt cmake : removed stdc++fs (whisper/3097) 2025-05-07 17:28:36 +03:00
ggml-alloc.c ggml: Don't assert fail when tensor data changes (#13222) 2025-05-01 22:46:10 +02:00
ggml-backend-impl.h ggml : upgrade init_tensor API to return a ggml_status (#11854) 2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp ggml-backend : fix backend search path (#12330) 2025-03-11 14:25:17 +01:00
ggml-backend.cpp Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B (#13386) 2025-05-11 14:18:39 +02:00
ggml-common.h musa: fix all warnings, re-enable -DLLAMA_FATAL_WARNINGS=ON in ci and update doc (#12611) 2025-03-30 10:59:38 +02:00
ggml-impl.h ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) 2025-04-11 00:17:47 +03:00
ggml-opt.cpp ggml-opt: fix data corruption (ggml/1022) 2024-11-21 09:22:02 +02:00
ggml-quants.c whisper: remove MSVC warnings pragmas (whisper/3090) 2025-05-07 17:28:36 +03:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c metal : optimize MoE for large batches (#13388) 2025-05-09 15:14:56 +03:00
gguf.cpp Fix clang warning in gguf_check_reserved_keys (#12686) 2025-04-01 13:12:53 +02:00