CUDA: FA support for Deepseek (Ampere or newer) (#13306)
* CUDA: FA support for Deepseek (Ampere or newer) * do loop unrolling via C++ template
This commit is contained in:
parent
27ebfcacba
commit
0cf6725e9f
33 changed files with 852 additions and 547 deletions
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue