ggml : use 8-bit precision for Q4_1 intermediate results (#1047)

* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)

* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32

56 ms/token with Q4_1 !

* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051)

* gitignore : ignore ppl-*.txt files

---------

Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
This commit is contained in:
Georgi Gerganov 2023-04-19 20:10:08 +03:00 committed by GitHub
parent 7cd5c4a3e9
commit 884e7d7a2b
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 192 additions and 194 deletions

15
.gitignore vendored
View file

@ -1,11 +1,15 @@
*.o
*.a
.DS_Store
.build/
.cache/
.direnv/
.envrc
.swiftpm
.venv
.vs/
.vscode/
.DS_Store
.build/
build/
build-em/
build-debug/
@ -30,12 +34,9 @@ models/*
arm_neon.h
compile_commands.json
.envrc
.direnv/
.venv
__pycache__
.swiftpm
zig-out/
zig-cache/
ppl-*.txt