[SYCL] Optimize mul_mat for Q4_0 on Intel GPU (#12035)

* opt performance by reorder for Intel GPU * detect hw type and save opt feature, and print opt feature * correct name * support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed * add env variable GGML_SYCL_DISABLE_OPT for debug * use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT * add performance data * mv getrows functions to separeted files * fix global variables --------- Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2025-02-24 22:33:23 +08:00 · 2025-02-24 22:33:23 +08:00 · 08d5986290
commit 08d5986290
parent 651adf4b66
14 changed files with 803 additions and 266 deletions
--- a/examples/sycl/run-llama2.sh
+++ b/examples/sycl/run-llama2.sh
@ -3,7 +3,7 @@
 #  MIT license
 #  Copyright (C) 2024 Intel Corporation
 #  SPDX-License-Identifier: MIT
-
+export ONEAPI_DEVICE_SELECTOR="level_zero:0"
 source /opt/intel/oneapi/setvars.sh

 #export GGML_SYCL_DEBUG=1
@ -13,7 +13,7 @@ source /opt/intel/oneapi/setvars.sh
 INPUT_PROMPT="Building a website can be done in 10 simple steps:\nStep 1:"
 MODEL_FILE=models/llama-2-7b.Q4_0.gguf
 NGL=33
-CONEXT=8192
+CONEXT=4096

 if [ $# -gt 0 ]; then
    GGML_SYCL_DEVICE=$1