* vulkan: initial support for IQ3_S
* vulkan: initial support for IQ3_XXS
* vulkan: initial support for IQ2_XXS
* vulkan: initial support for IQ2_XS
* vulkan: optimize Q3_K by removing branches
* vulkan: implement dequantize variants for coopmat2
* vulkan: initial support for IQ2_S
* vulkan: vertically realign code
* port failing dequant callbacks from mul_mm
* Fix array length mismatches
* vulkan: avoid using workgroup size before it is referenced
* tests: increase timeout for Vulkan llvmpipe backend
---------
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
* double the number of rows per workgroup
* Update ggml-vulkan.cpp
* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats
* only increase the number of rows for amd and subgroup size 64
* fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested
* use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)
* manual merge ggml-vulkan.cpp
* set min and max subgroup size in any case
* Also double the number of rows for Intel GPUs