llama.cpp

Author	SHA1	Message	Date
Acly	c6e8cc28c1	ggml : Depthwise 2D convolution (ggml/1152) * ggml-cpu : kernels for faster depthwise 2D convolution * fix compile: remove static after moving to ops.cpp * add dilation for depthwise_conv_2d * review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace * review: rename depthwise_conv_2d -> conv_2d_dw everywhere	2025-04-24 17:32:47 +03:00
Diego Devesa	fe92821ea9	ggml : add bilinear upscale support (ggml/1185)	2025-04-11 00:17:47 +03:00
Diego Devesa	459895c326	ggml : add more generic custom op, remove deprecated custom ops (ggml/1183) * ggml : add more generic ggml_custom op * ggml : remove deprecated custom ops	2025-04-11 00:17:47 +03:00
Diego Devesa	e0e912f49b	llama : add option to override model tensor buffers (#11397 ) * llama : add option to override tensor buffers * ggml : fix possible underflow in ggml_nbytes	2025-04-02 14:52:01 +02:00
Georgi Gerganov	b4ae50810e	metal : improve FA + improve MoE (#12612 ) * ggml : FA with different K, V head sizes (CPU) ggml-ci * metal : add FA with HS=192 * metal : extend FA to support different K and V head sizes ggml-ci * metal : add FA vector kernels for heads K 192 and V 128 ggml-ci * ggml : restrict op on other backends to equal head sizes ggml-ci * metal : optimize FA-vec kernel ggml-ci * metal : FA remove mq registers * metal : improve MoE mul_mat_id condition ggml-ci * metal : fix comments + remove unnecessary addition ggml-ci * metal : avoid too much shared memory usage with mul_mat_id ggml-ci	2025-03-28 20:21:59 +02:00
Molly Sophia	7dfad387e3	llama: Add support for RWKV v7 architecture (#12412 ) * ggml: Add op l2_norm Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * ggml: Add op rwkv_wkv7 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: Add support for RWKV7 and ARWKV7 models Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix inference with RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: add more (a)rwkv7 variants in size Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Apply code-format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * fix MUSA build Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * llama: fix shape error with rwkv using llama-parallel Signed-off-by: Molly Sophia <mollysophia379@gmail.com> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-03-18 07:27:50 +08:00
vmobilis	d6ae2fa061	ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118) * ggml_compute_forward_concat() for arbitrary tensor type * Check that tensors' type match * ggml-cpu.c: check type of source tensors * ggml-cpu.c: move tensor type check to ggml_compute_forward_concat() * ggml.c: check concatenated tensor type * Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c ..., as it was moved to ggml.c.	2025-03-07 14:49:44 +02:00
mgroeber9110	5bbe6a9fe9	ggml : portability fixes for VS 2017 (#12150 ) * Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>	2025-03-04 18:53:26 +02:00
Aaron Teo	af7747c95a	ggml-cpu: Support s390x SIMD Instruction Set (#12019 ) * ggml: add s390x ARCH_FLAGS for compilation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add SIMD for s390x using vector intrinsics SIMD is activated for: * ggml_vec_dot_f32 * ggml_vec_dot_f16 * ggml_vec_mad_f32 * ggml_vec_mad_f16 * ggml_vec_mad_f32_unroll * ggml_vec_scale_f32 * ggml_vec_scale_f16 SIMD is NOT activated for: * ggml_vec_dot_f16_unroll (pending bugfix) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix missing escape character in GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add temporary patch for GGML_F32_ARR and GGML_F16_ARR Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix s390x GGML_F32x4_REDUCE Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: full SIMD activation for F32,F16 s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add option to disable s390x VXE/VXE2 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: change vecintrin.h include to ggml-cpu-impl * add __VXE__ and __VXE2__ macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * cmake: add s390x target detection for VX/VXE/VXE2 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move s390x vector intrinsics to ggml-cpu-impl.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x Q8_0 SIMD Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: correct documentation for Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x reduce code complexity Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x bugfix typo Q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activated for Q4_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x inline vec_reve Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q4_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add VXE backend feature Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: remove test.py Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for quantize_row_q8_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for quantize_row_q8_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for iq4_xs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: bugfix iq4_xs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for iq4_nl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add float, double, and long vector data type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: clean up iq4_xs SIMD Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix improper use of restrict keyword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: update warning message for ggml_vec_tbl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: untested implementation of ggml_vec_dot_iq2_xxs_q8_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: update ggml_vec_dot_q4_1_q8_1 to use typedefs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: switch to restrict for iq4_nl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: slight dot product speed improvement for q4_1_q8_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for q6_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add missing `_t` to ggml_int8x16x4_t Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix missing `_t` for ggml_vec_xl_s8x4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix more missing `_t` Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add unroll and prefetch to Q8_0 increase of 3.86% for prompt processing and 32.22% for token generation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: patch Q8_0 to use proper vector sizes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: optimise Q8_0 dot prod compute kernel further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: add unroll and prefetch to Q4_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor Q6_K variable naming for readability Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q6_K typos Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q5_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix wrong charx16_t naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml: Q5_K y0 wrong signness Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q5_K invalid uchar type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: s390x SIMD activation for Q4_K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: fix Q4_K invalid vector intrinsics Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: simplify ggml_padd_s16 compute kernel Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: correct ggml-cpu vxe wording Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: change ggml_aligned_malloc alignment to 256 256 is the cache line size for s390x platforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: resolve pr merge via cherry-pick `225bbbf` Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml : fix LoongArch compile error with 128-bit SIMD (#11701) * ggml: resolve pr merge via cherry-pick 4571953 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: cmake remove fork when determining s390x machine type thank you @ericcurtin Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Jinyang He <hejinyang@loongson.cn> Co-authored-by: junchao-zhao <68935141+junchao-loongson@users.noreply.github.com>	2025-02-22 21:39:24 +00:00
Maxim Evtush	7b891bdc86	fix: typos in documentation files (#11791 ) * Update ggml.c * Update arg.cpp * Update speculative.h	2025-02-10 23:21:31 +01:00
William Tambellini	1a0e87d291	ggml : add option to not print stack on abort (ggml/1081) * Add option to not print stack on abort Add option/envvar to disable stack printing on abort. Also link some unittests with Threads to fix link errors on ubuntu/g++11. * Update ggml/src/ggml.c --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-29 11:24:53 +02:00
Johannes Gäßler	8137b4bb2b	CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380 )	2025-01-24 12:38:31 +01:00
Johannes Gäßler	9c8dcefe17	CUDA: backwards pass for misc. ops, add tests (#11257 ) * CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers	2025-01-16 16:43:38 +01:00
Johannes Gäßler	432df2d5f9	RoPE: fix back, CUDA support for back + noncont. (#11240 ) * RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]	2025-01-15 12:51:37 +01:00
Molly Sophia	ee7136c6d1	llama: add support for QRWKV6 model architecture (#11001 ) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-10 09:58:08 +08:00
Johannes Gäßler	53ff6b9b9f	GGUF: C++ refactor, backend support, misc fixes (#11030 ) * GGUF: C++ refactor, backend support, misc fixes remove ggml_tensor.backend update CODEOWNERS [no ci] remove gguf_get_data from API revise GGUF API data types	2025-01-07 18:01:58 +01:00
Georgi Gerganov	0bf2d10c55	tts : add OuteTTS support (#10784 ) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : be explicit about the pooling type in the tests ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * llama : add OuteTTS support (wip) * wip * extract features * first conv * group norm * resnet conv * resnet * attn * pos net * layer norm * convnext * head * hann window * fix n_embd + remove llama.cpp hacks * compute hann window * fft * spectrum processing * clean-up * tts : receive input text and generate codes * clip : fix new conv name * tts : minor fix * tts : add header + minor fixes ggml-ci * tts : add matchematical constant ggml-ci * tts : fix sampling + cut initial noise * tts : fixes * tts : update default samplers ggml-ci * tts : text pre-processing * tts : outetts-voc -> wavtokenizer-dec * tts : remove hardcoded constants ggml-ci * tts : fix tensor shapes * llama : refactor wavtokenizer tensors ggml-ci * cont ggml-ci * cont [no ci] * llama : update WavTokenizer to non-causal attn * llama : handle no-vocab detokenization * tts : add Python example for OuteTTS (wip) * tts : extend python example to generate spectrogram ggml-ci * server : fix rebase artifacts * tts : enable "return_tokens" in Python example ggml-ci * tts : minor fixes * common : support HF download for vocoder	2024-12-18 19:27:21 +02:00
Johannes Gäßler	081b29bd2a	tests: add tests for GGUF (#10830 )	2024-12-17 19:09:35 +01:00
Daniel Bevenius	3919da8e33	ggml : add check for grad_accs (ggml/1046) * ggml : add check for grad_accs This commit adds a check for grad_accs in ggml_graph_get_grad and ggml_graph_get_grad_acc functions. This is necessary to avoid segfaults when grad_accs is not initialized. The motivation for this change is that I find it nice to be able to print out a computation graph using ggml_graph_print but this function segfaults when grad_accs is not initialized: ```console (gdb) p g1 $2 = (ggml_cgraph ) 0x7ffff66004b0 (gdb) p g1 $3 = {size = 2048, n_nodes = 1, n_leafs = 2, nodes = 0x7ffff6600500, grads = 0x0, grad_accs = 0x0, leafs = 0x7ffff6604500, visited_hash_set = {size = 4099, used = 0x7ffff6610518, keys = 0x7ffff6608500}, order = GGML_CGRAPH_EVAL_ORDER_LEFT_TO_RIGHT} (gdb) p ggml_graph_print(g1) === GRAPH === n_nodes = 1 Program received signal SIGSEGV, Segmentation fault. 0x0000555555579775 in ggml_graph_get_grad (cgraph=0x7ffff66004b0,node=0x7ffff6600340) at /ggml/ggml/src/ggml.c:5990 5990 return igrad != GGML_HASHSET_FULL && ggml_bitset_get(cgraph->visited_hash_set.used, igrad) ? cgraph->grads[igrad] : NULL; ``` * squash! ggml : add check for grad_accs Fix the check in ggml_graph_get_grad. The check was incorrectly using cgraph->grad_accs instead of cgraph->grads.	2024-12-17 18:35:48 +02:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
Djip007	19d8762ab6	ggml : refactor online repacking (#10446 ) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-07 14:37:50 +02:00
PAB	c2082d93a8	ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) * ggml_pad_reflect_1d defined in header * implemented on CPU * called the forward pass * impl Metal kernel * added Metal kernel * added OP_PAD_REFLECT_1D in test-backend-ops.cpp * add test-pad-reflect-1d test case * test case support multiple backend	2024-12-05 13:27:31 +02:00
Shupei Fan	c202cef168	ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541 ) * ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard	2024-11-28 13:52:03 +01:00
Diego Devesa	5931c1f233	ggml : add support for dynamic loading of backends (#10469 ) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-25 15:13:39 +01:00
Diego Devesa	a5e47592b6	cuda : optimize argmax (#10441 ) * cuda : optimize argmax * remove unused parameter ggml-ci * fixup : use full warps ggml-ci * Apply suggestions from code review Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * fix ub * ggml : check ne00 <= INT32_MAX in argmax and argsort --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-11-21 18:18:50 +01:00
Johannes Gäßler	02e4eaf22f	ggml-opt: fix data corruption (ggml/1022)	2024-11-21 09:22:02 +02:00
Georgi Gerganov	68fcb4759c	ggml : fix compile warnings (#0 ) ggml-ci	2024-11-17 08:30:29 +02:00
Johannes Gäßler	8a43e940ab	ggml: new optimization interface (ggml/988)	2024-11-17 08:30:29 +02:00
slaren	883d206fbd	ggml : fix some build issues	2024-11-15 21:45:32 +02:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Georgi Gerganov	841f27abdb	metal : optimize FA kernels (#10171 ) * ggml : add ggml_flash_attn_ext_get_prec * metal : use F16 precision in FA kernels ggml-ci * metal : minor clean-up * metal : compile-guard bf16 FA kernels ggml-ci * build : remove obsolete compile flag [no ci] * metal : prevent int overflows [no ci] * cuda : disable BF16 FA ggml-ci * metal : fix BF16 requirement for FA kernels ggml-ci * make : clean-up [no ci]	2024-11-08 13:47:22 +02:00
Zhiyuan Li	3bcd40b3c5	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133 ) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-11-07 15:19:10 +08:00
Georgi Gerganov	1dc04b2dee	ggml : adjust is_first_call init value (#10193 ) ggml-ci	2024-11-06 11:20:10 +02:00
Diego Devesa	a9e8a9a030	ggml : fix arch check in bf16_to_fp32 (#10164 )	2024-11-04 23:17:01 +01:00
Diego Devesa	401558b7ba	ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167 )	2024-11-04 17:34:08 +01:00
Diego Devesa	9f40989351	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
Georgi Gerganov	1804adb0cf	ggml : remove ggml_scratch (#10121 ) ggml-ci	2024-11-01 12:58:45 +02:00
Georgi Gerganov	f221d56220	ggml : alloc ggml_contexts on the heap (whisper/2525)	2024-11-01 10:24:50 +02:00
Diego Devesa	c02e5ab2a6	llama : fix buffer checks for mamba and rwk (#10111 ) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-10-31 22:54:23 +01:00
Diego Devesa	dea5e86051	ggml : check tensor name lengths in gguf files (#10100 )	2024-10-31 11:40:59 +01:00
Diego Devesa	b9e02e8184	ggml : fix memory leaks when loading invalid gguf files (#10094 ) * ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file	2024-10-30 14:51:21 +01:00
Diego Devesa	c5b0f4b5d9	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
Johannes Gäßler	c39665f589	CUDA: fix MMQ for non-contiguous src0, add tests (#10021 ) * CUDA: fix MMQ for non-contiguous src0, add tests * revise test code	2024-10-24 11:09:36 +02:00
Daniel Bevenius	c19af0acb1	ggml : remove redundant set of contexts used field (ggml/978) This commit removes the setting of the `used` field of the contexts in the global state (g_state) in `ggml_init`. The motivation for this change is that I believe that this additional initialization might not be required after the changes in Commit 45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from whisper.cpp"), which changed the initialization of the contexts field from `{ 0 }` to `{ { 0 } }`: ```console g_state = (struct ggml_state) { - /.contexts =/ { 0 }, + /.contexts =/ { { 0 } }, }; ``` My understanding is that the `{0}` initialization might not have zero-initialized all the nested fields in every array element because of compiler differences, and might have been the reason for having the explicit setting of the `used` fields to false.	2024-10-23 16:50:02 +03:00
Georgi Gerganov	f594bc80ba	ggml : add asserts for type conversion in fattn kernels (#9971 ) ggml-ci	2024-10-21 16:20:46 +03:00
Ma Mingfei	60ce97c9d8	add amx kernel for gemm (#8998 ) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-10-18 13:34:36 +08:00
Gilad S.	73afe681aa	fix: use `vm_allocate` to allocate CPU backend buffer on macOS (#9875 ) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment	2024-10-17 00:36:51 +02:00
Diego Devesa	dca1d4b58a	ggml : fix BLAS with unsupported types (#9775 ) * ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it	2024-10-08 14:21:43 +02:00
Johannes Gäßler	fabdc3bda3	ggml/ex: calculate accuracy in graph, adapt MNIST (ggml/980)	2024-10-03 21:17:26 +03:00
Johannes Gäßler	eee39bdc96	ggml: refactor cross entropy loss CPU impl. (ggml/976)	2024-10-03 21:17:26 +03:00

1 2 3

106 commits