Commit graph

  • 13c9a3319b
    arg : remove CURLINFO_EFFECTIVE_METHOD (#13228) Xuan-Son Nguyen 2025-05-01 10:23:25 +02:00
  • a70183eb00
    llama-model : fix the reported size class for nomic-embed-text-v2-moe (#13223) Jared Van Bortel 2025-05-01 03:09:41 -04:00
  • 8d33d740c3 sync : ggml Georgi Gerganov 2025-05-01 09:59:02 +03:00
  • 4254bb4951 ggml : fix ggml_gallocr_ptr type (ggml/1205) Diego Devesa 2025-04-30 15:20:40 +02:00
  • 9998540149 cuda : fix unused variable compile warning (whisper/0) Georgi Gerganov 2025-04-24 18:59:06 +03:00
  • e1e8e0991f
    CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199) Johannes Gäßler 2025-04-30 23:12:59 +02:00
  • 6f67cf1f48
    arg : -hf do not fail if url mismatch (#13219) Xuan-Son Nguyen 2025-04-30 22:29:15 +02:00
  • 16a457facd
    fix typo: n_ctx_pre_seq -> n_ctx_per_seq (#13221) ddh0 2025-04-30 15:28:43 -05:00
  • 3e168bede4
    convert : improve model arch handling (#13122) Xuan-Son Nguyen 2025-04-30 16:56:24 +02:00
  • ceda28ef8e
    llava : remove duplicate include (#13207) Tatsuya Tanaka 2025-04-30 22:25:20 +09:00
  • 3b127c7385
    common : add -jf / --json-schema-file flag (#12011) Olivier Chafik 2025-04-30 13:52:35 +01:00
  • e5007a5edf
    vulkan: use uint array index to avoid glslang bug (#13193) Jeff Bolz 2025-04-30 07:38:37 -05:00
  • 416313773b
    ggml : fix ppc64le build (#13176) shalinib-ibm 2025-04-30 16:47:08 +05:30
  • 07c2e2f76c
    convert : correct typo image_mean --> image_std (#13208) Xuan-Son Nguyen 2025-04-30 13:06:15 +02:00
  • 44cd8d91ff
    feat(ggml-cpu): enable z17 compile (#13182) Aaron Teo 2025-04-30 17:47:35 +08:00
  • 5933e6fdc9
    arg : allow using -hf offline (#13202) Xuan-Son Nguyen 2025-04-30 10:46:32 +02:00
  • da84c04d8f
    docker : do not build tests (#13204) Xuan-Son Nguyen 2025-04-30 10:44:07 +02:00
  • a0f7016d17
    rpc : fix cache directory initialization (#13188) xiaofei 2025-04-30 14:29:22 +08:00
  • 19e899ce21
    scripts: n_depth for compare-llama-bench [no ci] (#13201) Johannes Gäßler 2025-04-29 23:32:04 +02:00
  • e2e1ddb93a
    server : Prefilling assistant message in openai compatible API (#13174) matteo 2025-04-29 20:33:10 +02:00
  • d9d398f84f
    sampling : when top-k <= 0 -> noop (#13173) Georgi Gerganov 2025-04-29 20:22:57 +03:00
  • 5a63980117
    llama-bench: fixed size of fields to correctly map to values (#13183) Alberto Cabrera Pérez 2025-04-29 16:24:36 +01:00
  • cdf76586b2
    CUDA: fix non-cont. inputs for batched mat mul (#13155) Johannes Gäßler 2025-04-29 16:00:27 +02:00
  • 7d3af70b08
    llama : llm_type order by size (#13177) Sigbjørn Skjæret 2025-04-29 13:25:53 +02:00
  • 00e3e5a194
    mtmd : add qwen2vl and qwen2.5vl (#13141) Xuan-Son Nguyen 2025-04-29 11:47:04 +02:00
  • e98b3692be
    llama : set qwen3 model type sizes (#13175) Sigbjørn Skjæret 2025-04-29 11:00:31 +02:00
  • b6ce7430b7
    llama-graph : fix text position for mrope (#13159) Xuan-Son Nguyen 2025-04-29 08:45:49 +02:00
  • 5f5e39e1ba
    model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architecture (#12466) AT 2025-04-28 15:52:15 -04:00
  • eaea325324
    clip : fix model size display (#13153) Xuan-Son Nguyen 2025-04-28 21:23:19 +02:00
  • 43ddab6eee
    fix(rpc): Improve input validation and error handling (#13069) Ville Vesilehto 2025-04-28 21:00:20 +03:00
  • 1831f538f7
    llama-bench: add -d depth arg (#13096) Vishal Agarwal 2025-04-28 20:20:39 +05:30
  • 4e87962e34
    mtmd : fix glm-edge redundant token count (#13139) Xuan-Son Nguyen 2025-04-28 16:12:56 +02:00
  • fb0471d175
    context : do not clear output buffer on reserve (#13152) pockers21 2025-04-28 06:45:40 -07:00
  • d2b2031e5f
    llama : (mrope) allow using normal 1D position for text token (#13138) Xuan-Son Nguyen 2025-04-28 14:20:56 +02:00
  • 5fa9e63be8
    clip : refactor set input for cgraph + fix qwen2.5vl input (#13136) Xuan-Son Nguyen 2025-04-28 12:18:59 +02:00
  • a4c340f974
    SYCL: Add all missing unary kernels (#13074) Akarshan Biswas 2025-04-28 15:03:25 +05:30
  • d0a417f3c7
    readme : update hot topics (#13150) Georgi Gerganov 2025-04-28 12:10:18 +03:00
  • 43f2b07193
    common : fix noreturn compile warning (#13151) Georgi Gerganov 2025-04-28 11:57:19 +03:00
  • e5d6c2554e
    llama-chat : fix typo GML --> GLM (#13143) Xuan-Son Nguyen 2025-04-28 10:11:58 +02:00
  • f0dd6a1926
    musa: fix typo in cc control (#13144) R0CKSTAR 2025-04-28 15:33:28 +08:00
  • 69699be48a
    CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 (#13137) Johannes Gäßler 2025-04-28 09:29:26 +02:00
  • 85f36e5e71
    arg : fix unused variable (#13142) Xuan-Son Nguyen 2025-04-28 07:16:59 +02:00
  • c0a97b762e
    llama-bench : Add --override-tensors arg (#12922) 4onen 2025-04-27 14:48:26 -07:00
  • ced44be342
    llama-chat : fix wrong template in GLM4-0414 (#13140) matteo 2025-04-27 21:57:32 +02:00
  • e291450b76
    musa: fix build warning (#13129) R0CKSTAR 2025-04-27 19:22:49 +08:00
  • 59e991c23c
    Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133) LostRuins Concedo 2025-04-27 18:43:37 +08:00
  • ca2bb89eac
    clip : Add Qwen2.5VL support (#12402) HimariO 2025-04-27 16:10:34 +08:00
  • 2d451c8059
    common : add common_remote_get_content (#13123) Xuan-Son Nguyen 2025-04-26 22:58:12 +02:00
  • 4753791e70
    clip : improve projector naming (#13118) Xuan-Son Nguyen 2025-04-26 22:39:47 +02:00
  • 77d5e9a76a
    ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107) SXX 2025-04-26 22:05:31 +08:00
  • d5fe4e81bd
    grammar : handle maxItems == 0 in JSON schema (#13117) frob 2025-04-26 10:10:20 +02:00
  • 295354ea68
    llama : fix K-shift with quantized K and BLAS backend (#13113) Diego Devesa 2025-04-25 19:40:11 +02:00
  • 558a764713
    Force FP32 compute in GLM4 FFN Down (#13101) City 2025-04-25 14:38:34 +02:00
  • edb18b6e8f
    clip : fix pixtral on some GPU backends (#13097) Xuan-Son Nguyen 2025-04-25 14:31:42 +02:00
  • 514c45608f
    change the reorder tensor from init to execute OP (#13003) Neo Zhang Jianyu 2025-04-25 17:37:51 +08:00
  • 553a5c3a9f
    rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943) Radoslav Gerganov 2025-04-25 10:08:08 +03:00
  • 13be08daf9
    clip : remove boi/eoi embeddings for GLM-edge model (#13081) Xuan-Son Nguyen 2025-04-24 22:17:04 +02:00
  • 226251ed56
    embeddings : fix batch sizes (#13076) Georgi Gerganov 2025-04-24 22:29:22 +03:00
  • 87616f0680 ggml : fix trailing whitespaces (#0) Georgi Gerganov 2025-04-24 17:22:27 +03:00
  • 63b4911494 sync : ggml Georgi Gerganov 2025-04-24 16:47:43 +03:00
  • c6e8cc28c1 ggml : Depthwise 2D convolution (ggml/1152) Acly 2025-04-17 14:16:45 +02:00
  • b10d8bfdb1
    CUDA: use switch statements in constexpr functions (#13095) Johannes Gäßler 2025-04-24 15:57:10 +02:00
  • 13b4548877
    cmake : do not include ./src as public for libllama (#13062) Georgi Gerganov 2025-04-24 16:00:10 +03:00
  • 572b3141d3
    clang-tidy : disable warning about missing math parenthesis (#13091) Georgi Gerganov 2025-04-24 15:44:05 +03:00
  • 7c727fbe39
    arg : add --no-mmproj-offload (#13093) Xuan-Son Nguyen 2025-04-24 14:04:14 +02:00
  • 80982e815e
    arg : clean up handling --mmproj with -hf (#13082) Xuan-Son Nguyen 2025-04-24 12:14:13 +02:00
  • 7604a7d6b8
    metal : fix floating-point range of attention scores in FA kernels (#13090) Georgi Gerganov 2025-04-24 10:38:30 +03:00
  • b3b6d862cf
    vulkan: matmul gcn tuning (#13016) Eve 2025-04-24 07:18:33 +00:00
  • 5630406959
    llama-mtmd-cli: Sigint rework in mtmd vision example (#13080) pl752 2025-04-24 02:32:35 +05:00
  • ecda2ec4b3
    mtmd : Support Pixtral 12B (#13065) Xuan-Son Nguyen 2025-04-23 20:21:59 +02:00
  • eb1776b15a
    convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z (#13021) piDack 2025-04-23 22:59:14 +08:00
  • 2cca6c01e4
    rpc : add command line option for number of threads for the CPU backend (#13060) Radoslav Gerganov 2025-04-23 10:32:49 +03:00
  • 658987cfc9
    CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014) Johannes Gäßler 2025-04-22 21:27:40 +02:00
  • dc39a5e7a8
    mtmd : support SmolVLM (version 1 and 2) (#13050) Xuan-Son Nguyen 2025-04-22 16:24:54 +02:00
  • ab47dec3d3
    security : add note about RPC and server functionality (#13061) Georgi Gerganov 2025-04-22 16:16:10 +03:00
  • 7b53389c24
    metal : add memory pool for temp allocs (#12850) Georgi Gerganov 2025-04-22 16:15:51 +03:00
  • 243453533e
    llava : update documentations (#13055) Xuan-Son Nguyen 2025-04-22 10:37:00 +02:00
  • 1d735c0b4f
    ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (#12871) Diego Devesa 2025-04-21 18:13:51 +02:00
  • 5368ddda7a
    SYCL: Add non-contiguous support in ROPE (#12993) Akarshan Biswas 2025-04-21 19:13:30 +05:30
  • 84a9bf2fc2
    mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli (#13012) Xuan-Son Nguyen 2025-04-21 15:32:58 +02:00
  • 2016f07bd1
    convert : experimental support for --mmproj flag (#13023) Xuan-Son Nguyen 2025-04-20 23:29:36 +02:00
  • 6602304814
    llava: fix errors in clip.h on certain compilers (#13030) Jeffrey Morgan 2025-04-20 03:15:41 -07:00
  • 66168204be
    vulkan: support noncontiguous rms_norm (#13031) Jeff Bolz 2025-04-20 03:50:02 -05:00
  • 4ba9d711ba
    metal: add neg operator (#13029) Jeffrey Morgan 2025-04-19 22:28:40 -07:00
  • 00137157fc
    Disable CI cross-compile builds (#13022) bandoti 2025-04-19 13:05:03 -03:00
  • fb28f4f80e
    gguf-py : fix upload python package workflow (#13020) Sigbjørn Skjæret 2025-04-19 16:26:38 +02:00
  • 37b9f0d29d
    clip : refactor, add image_manipulation and llava_uhd classes (#13011) Xuan-Son Nguyen 2025-04-19 09:15:45 +02:00
  • 6408210082
    main : Fix Ctrl+D/newline handling (#12951) Daniel Tang 2025-04-18 16:02:55 -04:00
  • aff9d107b0
    gguf-py : GGUF Editor GUI - Python + Qt6 (#12930) Chris Thompson 2025-04-18 12:30:41 -06:00
  • 35370ba945
    server : use std::move whenever possible (#12936) Xuan-Son Nguyen 2025-04-18 19:58:12 +02:00
  • 8d66005763
    SYCL: Refactor and enable FP16 in binary broadcast OPs (#12975) Akarshan Biswas 2025-04-18 19:27:56 +05:30
  • b9154ecff9
    mtmd : add methods to access mtmd_image_tokens (#12906) Xuan-Son Nguyen 2025-04-18 10:04:51 +02:00
  • 2db9ba1464
    rpc : add RPC_CMD_HELLO (#12955) Radoslav Gerganov 2025-04-18 10:13:42 +03:00
  • 2f74c354c0
    graph : make FA compatible with MLA + add initial Metal kernels (#12953) Georgi Gerganov 2025-04-17 18:16:36 +03:00
  • 207c22ec2d
    ggml: Re-enable CUDA graphs in presence of CONT and DUP nodes (#12970) Alan Gray 2025-04-17 14:19:42 +01:00
  • 7a395f67a7
    CANN: Add support for async operator submission (#12864) hipudding 2025-04-17 20:34:16 +08:00
  • 971f245b3b
    llama : recognize IBM Granite 3.3 FIM tokens (#12988) Mikko Juola 2025-04-17 01:37:05 -07:00
  • 12b17501e6
    opencl: fix incorrect local_size index in profiling log (#12868) kimminsu 2025-04-17 06:25:57 +09:00
  • 015022bb53
    vulkan: enable coopmat2 FA gqa and split_k optimizations more often (#12931) Jeff Bolz 2025-04-16 13:37:25 -05:00
  • b43d89e311
    CANN: Add 310P operator support check (#12962) Chenguang Li 2025-04-16 16:21:05 +08:00