llama.cpp

History

amritahs-ibm 13731766db llamafile : ppc64le GEMV forwarding for FP32. (#12594 ) This patch enables usage of MMA when one of the dimensions of the matrix(ie either M or N) is 1. This is useful in case of token generation where N < 2. The concept of 'GEMV Forwarding' is used where when one of the matrix has a single row/column, the elements are broadcasted, instead of using packing routine to prepack the matrix elements. This change results in 5% - 15% improvement in total speed(ie all tokens/total time), across various batch sizes. This is in comparision with the corresponding dot product implementation. The patch is tested with FP32 models of Meta-Lllama-3-8B, Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine. Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>		2025-03-28 09:43:22 +02:00
..
amx	ggml : upgrade init_tensor API to return a ggml_status (#11854 )	2025-02-28 14:41:47 +01:00
cmake	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
kleidiai	ggml-cpu : update KleidiAI to v1.5.0 (#12568 )	2025-03-25 13:10:18 +02:00
llamafile	llamafile : ppc64le GEMV forwarding for FP32. (#12594 )	2025-03-28 09:43:22 +02:00
CMakeLists.txt	cmake : sync/merge PowerPC build commands (#0 )	2025-03-27 09:04:38 +02:00
cpu-feats-x86.cpp	ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154 )	2025-03-06 02:26:10 +01:00
ggml-cpu-aarch64.cpp	ggml : fix MUL_MAT_ID repack with Q8_K (#12544 )	2025-03-26 13:02:00 +02:00
ggml-cpu-aarch64.h	ggml : refactor online repacking (#10446 )	2024-12-07 14:37:50 +02:00
ggml-cpu-hbm.cpp	ggml : refactor online repacking (#10446 )	2024-12-07 14:37:50 +02:00
ggml-cpu-hbm.h	ggml : refactor online repacking (#10446 )	2024-12-07 14:37:50 +02:00
ggml-cpu-impl.h	ggml-cpu: Support s390x SIMD Instruction Set (#12019 )	2025-02-22 21:39:24 +00:00
ggml-cpu-quants.c	ggml : riscv: add 128-bit RVV support (#12530 )	2025-03-27 08:38:34 +02:00
ggml-cpu-quants.h	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-cpu-traits.cpp	ggml : refactor online repacking (#10446 )	2024-12-07 14:37:50 +02:00
ggml-cpu-traits.h	ggml : refactor online repacking (#10446 )	2024-12-07 14:37:50 +02:00
ggml-cpu.c	ggml : fix quantized cpy op (#12310 )	2025-03-22 16:23:26 +02:00
ggml-cpu.cpp	ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions (#12154 )	2025-03-06 02:26:10 +01:00