Chenguang Li
65cfe136a0
CANN: Support operator SIN COS ARGMAX ( #12709 )
...
* [CANN]support sin cos argmax
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]codestyle adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]Remove redundant code
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2025-04-03 15:18:08 +08:00
hipudding
2a0dc97e56
CANN: Fix failed test cases ( #12708 )
...
* CANN: Fix memory waste in aclnn_tensor
* CANN: fix backend ops fail
* CANN: fix acl_tensor memory alloc.
* CANN: format
* CANN: remove trailing whitespace
2025-04-03 08:49:51 +08:00
Chenguang Li
9bacd6b374
[CANN] get_rows and dup optimization ( #12671 )
...
* [CANN]get_rows and dup optimization.
Co-authored-by: hipudding <huafengchun@gmail.com>
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]GET_ROWS and CPY/DUP optimization
Co-authored-by: hipudding <huafengchun@gmail.com>
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
* [CANN]code style adjustment
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
---------
Signed-off-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2025-04-02 15:22:13 +08:00
Chenguang Li
92a391327e
[CANN]MUL_MAT optimization ( #12382 )
2025-03-15 09:31:08 +08:00
William Tambellini
70680c48e5
ggml : upgrade init_tensor API to return a ggml_status ( #11854 )
...
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com>
2025-02-28 14:41:47 +01:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE ( #10361 )
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Djip007
19d8762ab6
ggml : refactor online repacking ( #10446 )
...
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
Chenguang Li
938f608742
CANN: RoPE operator optimization ( #10563 )
...
* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-29 14:46:55 +08:00
Chenguang Li
b7420131bf
CANN: ROPE operator optimization ( #10540 )
...
* [cann] ROPE operator optimization
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-28 14:24:46 +08:00
Shanshan Shen
9a4b79bcfa
CANN: Improve the Inferencing Performance for Ascend NPU Device ( #10454 )
...
* improve inferencing performance for ascend npu.
Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
* some modification after review
* some modifications after review
* restore some modifications
* restore some modifications
---------
Co-authored-by: shanshan shen <shanshanshen333@gmail.com>
Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>
2024-11-26 18:08:37 +08:00
Chenguang Li
7066b4cce2
CANN: RoPE and CANCAT operator optimization ( #10488 )
...
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-26 17:31:05 +08:00
Diego Devesa
5931c1f233
ggml : add support for dynamic loading of backends ( #10469 )
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-25 15:13:39 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00