llama.cpp/ggml
Alberto Cabrera Pérez 17512a94d6
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)
* sycl : Implemented reorder Q4_0 mmvq

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* sycl : Fixed mmvq being called when reorder is disabled

* sycl : Improved comments in the quants header

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Use static_assert

* safe_div -> ceil_div

* Clarify qi comment

* change the reorder tensor from init to execute OP

* dbg

* Undo changes to test-backend-ops

* Refactor changes on top of q4_0 reorder fix

* Missing Reverts

* Refactored opt_for_reorder logic to simplify code path

* Explicit inlining and unroll

* Renamed mul_mat_algo enum for consistency

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
Co-authored-by: romain.biessy <romain.biessy@codeplay.com>
2025-05-09 16:34:08 +01:00
..
cmake scripts : update sync + fix cmake merge 2025-03-27 10:09:29 +02:00
include CUDA: fix bad asserts for partial offload (#13337) 2025-05-06 13:58:51 +02:00
src sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858) 2025-05-09 16:34:08 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt whisper: remove MSVC warnings pragmas (whisper/3090) 2025-05-07 17:28:36 +03:00