llama.cpp/ggml/src/ggml-cpu
amritahs-ibm c7b43ab608
llamafile : ppc64le MMA implementation for Q4_0. (#12489)
This change upstreams llamafile's cpu matrix
multiplication kernels for ppc64le ISA using MMA
builtins. This patch handles matrix multiplication
between quantised datatypes, block_q4_0 and
block_q8_0.

This change results in 5% - 50% improvement
in total speed(ie all tokens/total time), across
various batch sizes.

The patch is tested with Meta-Lllama-3-8B,
Mistral-7B, Llama-2-7B-chat-hf models on a
IBM POWER10 machine.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
2025-03-27 08:51:47 +02:00
..
amx ggml : upgrade init_tensor API to return a ggml_status (#11854) 2025-02-28 14:41:47 +01:00
cmake ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
kleidiai ggml-cpu : update KleidiAI to v1.5.0 (#12568) 2025-03-25 13:10:18 +02:00
llamafile llamafile : ppc64le MMA implementation for Q4_0. (#12489) 2025-03-27 08:51:47 +02:00
CMakeLists.txt ggml : riscv: add 128-bit RVV support (#12530) 2025-03-27 08:38:34 +02:00
cpu-feats-x86.cpp ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) 2025-03-06 02:26:10 +01:00
ggml-cpu-aarch64.cpp ggml : fix MUL_MAT_ID repack with Q8_K (#12544) 2025-03-26 13:02:00 +02:00
ggml-cpu-aarch64.h ggml : refactor online repacking (#10446) 2024-12-07 14:37:50 +02:00
ggml-cpu-hbm.cpp ggml : refactor online repacking (#10446) 2024-12-07 14:37:50 +02:00
ggml-cpu-hbm.h ggml : refactor online repacking (#10446) 2024-12-07 14:37:50 +02:00
ggml-cpu-impl.h ggml-cpu: Support s390x SIMD Instruction Set (#12019) 2025-02-22 21:39:24 +00:00
ggml-cpu-quants.c ggml : riscv: add 128-bit RVV support (#12530) 2025-03-27 08:38:34 +02:00
ggml-cpu-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-cpu-traits.cpp ggml : refactor online repacking (#10446) 2024-12-07 14:37:50 +02:00
ggml-cpu-traits.h ggml : refactor online repacking (#10446) 2024-12-07 14:37:50 +02:00
ggml-cpu.c ggml : fix quantized cpy op (#12310) 2025-03-22 16:23:26 +02:00
ggml-cpu.cpp ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154) 2025-03-06 02:26:10 +01:00