CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014)
* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID * fix logic for RoPE support, CUDA graphs
This commit is contained in:
parent
dc39a5e7a8
commit
658987cfc9
9 changed files with 548 additions and 426 deletions
|
|
@ -1,3 +1,5 @@
|
|||
#pragma once
|
||||
|
||||
#include "common.cuh"
|
||||
#include <cstdint>
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue