Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make kv_cache read operation perform a single gather #1027

Merged
merged 1 commit into from
Mar 4, 2025

Conversation

rsuderman
Copy link
Contributor

Slices can merge efficiently with with the attention / GQA kernels so its more computationally efficient single gather until gather fusion is better.

Slices can merge efficiently with with the attention / GQA kernels so
its more computationally efficient single gather until gather fusion is
better.

Signed-off-by: Rob Suderman <rob.suderman@gmail.com>
@rsuderman rsuderman requested a review from dan-garvey March 4, 2025 01:09
@rsuderman rsuderman merged commit 8f29c7a into nod-ai:main Mar 4, 2025
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants