* Add attention and final logit softcapping. * fix * Add custom add_ functions * Disable flash attention for Gemma2 * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add default value for attention and final logit softcap value * Add custom kq scaling from Gemma2Attention * Remove custom pre attention scaling and use computed value instead. --------- Co-authored-by: slaren <slarengh@gmail.com> |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| llama.cpp | ||
| unicode-data.cpp | ||
| unicode-data.h | ||
| unicode.cpp | ||
| unicode.h | ||