CUDA: optimize FA for GQA + large batches (#12014)
This commit is contained in:
parent
335eb04a91
commit
5fa07c2f93
32 changed files with 940 additions and 411 deletions
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue