CUDA: FA support for Deepseek (Ampere or newer) (#13306)

* CUDA: FA support for Deepseek (Ampere or newer)

* do loop unrolling via C++ template
This commit is contained in:
Johannes Gäßler 2025-05-09 13:34:58 +02:00 committed by GitHub
parent 27ebfcacba
commit 0cf6725e9f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
33 changed files with 852 additions and 547 deletions

File diff suppressed because it is too large Load diff