CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014)

* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID

* fix logic for RoPE support, CUDA graphs
This commit is contained in:
Johannes Gäßler 2025-04-22 21:27:40 +02:00 committed by GitHub
parent dc39a5e7a8
commit 658987cfc9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 548 additions and 426 deletions

View file

@ -1,3 +1,5 @@
#pragma once
#include "common.cuh"
#include <cstdint>