llama.cpp

Author	SHA1	Message	Date
hipudding	7a395f67a7	CANN: Add support for async operator submission (#12864 ) Submit operators using asynchronous threads to improve performance. Use the environment variable GGML_CANN_ASYNC_MODE to control whether asynchronous submission is enabled. It is disabled by default. Testing shows a 10%–20% performance improvement in scenarios with small parameter sizes, especially in quantized models.	2025-04-17 20:34:16 +08:00
Shanshan Shen	9a4b79bcfa	CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454 ) * improve inferencing performance for ascend npu. Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com> * some modification after review * some modifications after review * restore some modifications * restore some modifications --------- Co-authored-by: shanshan shen <shanshanshen333@gmail.com> Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>	2024-11-26 18:08:37 +08:00
Dou Xinpeng	904837e0cb	cann: fix crash when llama-bench is running on multiple cann devices (#9627 )	2024-09-25 11:30:38 +08:00
hipudding	1bdd8ae19f	[CANN] Add Ascend NPU backend (#6035 ) * [CANN] Add Ascend NPU backend Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Co-authored-by: wangshuai09 <391746016@qq.com> * delete trailing whitespaces * Modify the code based on review comment * Rename LLAMA_CANN to GGML_CANN * Make ggml-common.h private * add ggml_cann prefix for acl funcs * Add logging for CANN backend * Delete Trailing whitespace --------- Co-authored-by: wangshuai09 <391746016@qq.com>	2024-07-17 14:23:50 +03:00

4 commits