[Kernel] Initial commit containing new Triton kernels for multi lora serving. #5025

FurtherAI · 2024-05-24T08:38:31Z

SGMV Triton Kernels

New Triton kernels for multi lora computation. These (should) handle any shape and data type, apply to loras in a paged format and compute at the actual lora rank and also speed up for grouped lora requests (especially prefill).

The PR contains the kernels, tests for the kernels and benchmarks. A follow up PR will work on adding the ability to use these in vLLM.

ping @Yard1

…computation. These (should) handle any shape and data type, apply to loras in a paged format and compute at the actual lora rank and also speed up for grouped lora requests (especially prefill).

FurtherAI · 2024-05-24T22:23:48Z

@Yard1 Any idea why the import fails? Works locally, but the kernel is in a new folder so maybe that path has to be added somewhere?

Yard1 · 2024-05-24T22:52:13Z

@FurtherAI add __init__.py to the new folder

FurtherAI · 2024-05-29T16:05:51Z

@Yard1 Is there a way to rerun the tests without an empty commit?

Yard1 · 2024-05-29T17:19:36Z

@FurtherAI Making an empty commit with git commit --allow-empty -m "Trigger CI" is the way to go

tensimixt · 2024-07-20T11:13:16Z

@FurtherAI Does this allow for larger vocabulary Sizes? For example NeMO-12B has a vocab size of 131072
If i run this version of vllm from the pull request with LoRA enabled = True will it still say

When using LoRA, vocab size must be "32000 >= vocab_size <= 128512"

FurtherAI · 2024-07-26T00:42:32Z

@tensimixt Yeah I think it does. It shouldn't have any issues with different sizes. I'll test it at some point

github-actions · 2024-10-26T02:00:40Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2024-11-25T02:06:12Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

[Kernel] Initial commit containing new Triton kernels for multi lora …

3631e37

…computation. These (should) handle any shape and data type, apply to loras in a paged format and compute at the actual lora rank and also speed up for grouped lora requests (especially prefill).

Yard1 mentioned this pull request May 24, 2024

[Kernel][RFC] Refactor the punica kernel based on Triton #5036

Merged

3 tasks

Yard1 self-requested a review May 24, 2024 17:48

[Bugfix] Add __init__.py to Triton kernel directory.

62423a2

FurtherAI force-pushed the sgmv_triton branch from 4cf5226 to 62423a2 Compare May 24, 2024 23:59

Trigger CI

eac17c3

FurtherAI mentioned this pull request Jun 8, 2024

[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving. #5356

Closed

1 task

github-actions bot added the stale Over 90 days of inactivity label Oct 26, 2024

github-actions bot closed this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Initial commit containing new Triton kernels for multi lora serving. #5025

[Kernel] Initial commit containing new Triton kernels for multi lora serving. #5025

FurtherAI commented May 24, 2024

FurtherAI commented May 24, 2024

Yard1 commented May 24, 2024

FurtherAI commented May 29, 2024

Yard1 commented May 29, 2024

tensimixt commented Jul 20, 2024

FurtherAI commented Jul 26, 2024

github-actions bot commented Oct 26, 2024

github-actions bot commented Nov 25, 2024

[Kernel] Initial commit containing new Triton kernels for multi lora serving. #5025

[Kernel] Initial commit containing new Triton kernels for multi lora serving. #5025

Conversation

FurtherAI commented May 24, 2024

SGMV Triton Kernels

FurtherAI commented May 24, 2024

Yard1 commented May 24, 2024

FurtherAI commented May 29, 2024

Yard1 commented May 29, 2024

tensimixt commented Jul 20, 2024

FurtherAI commented Jul 26, 2024

github-actions bot commented Oct 26, 2024

github-actions bot commented Nov 25, 2024