graph : simplify attn input build for unified KV cache #12381

ggerganov · 2025-03-14T07:06:24Z

The llm_graph_context::build_attn_inp_kv_unified used to take 2 redundant arguments: wether the attention is causal and wether SWA is enabled. The causal is always true when using a KV cache, while the swa can be deduced from the hparams.

ggml-ci

graph : simplify attn input build for unified KV cache

Loading
Loading status checks…

342944c

ggml-ci

ggerganov mentioned this pull request Mar 14, 2025

Eval bug: Segmentation fault from latest git 84d547554123a62e9ac77107cb20e4f6cc503af4 #12380

Closed

ggerganov merged commit c522ce4 into master Mar 14, 2025
53 of 54 checks passed

ggerganov deleted the gg/graph-simplify-attn-inp branch March 14, 2025 08:47

jpohhhh pushed a commit to Telosnex/llama.cpp that referenced this pull request Mar 14, 2025

graph : simplify attn input build for unified KV cache (ggml-org#12381)

d1e130d

ggml-ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph : simplify attn input build for unified KV cache #12381

graph : simplify attn input build for unified KV cache #12381

ggerganov commented Mar 14, 2025 •

edited

Loading

graph : simplify attn input build for unified KV cache #12381

graph : simplify attn input build for unified KV cache #12381

Conversation

ggerganov commented Mar 14, 2025 • edited Loading

ggerganov commented Mar 14, 2025 •

edited

Loading