-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda error when running Llama3 and Llama3.1 #651
Comments
Hi @joshpopelka20 can you please run with |
Adding that didn't give me any additional output, but RUST_BACKTRACE=1 did:
Also, this error occurs during inference |
Ah ok, can you run with |
|
Ok thanks! So, you are just starting a chat interaction when this occurs? |
Exactly. It's my first prompt to test it. |
That 9th block is the first layer on the 2nd GPU. It happens when I put fewer layers on the 1st GPU. So, for whatever reason, the paged attention on the 2nd GPU is causing a problem. After fixing the NVCC flag bug last week, the code worked fine. It must be a recent change that is causing this issue. Also, it's something in the paged_attention because when I set --no-paged-attn, I don't get the error. |
@joshpopelka20 this should be fixed in #656! |
Still erroring though it's different this time:
|
Oops I violated an invariant! Should be fixed now in #658. |
It's no problem. It works now. I'll close the issue. Thanks! |
Same
Tried the latest commit from today and using Llama3_1_8bInstruct.
|
Describe the bug
When running this command
CUDA_NVCC_FLAGS="-fPIC" cargo run --release --features "cuda flash-attn cudnn" -- --token-source "literal:---" -n "0:8;1:8;2:8;3:8" -i plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama
, I'm getting this error:I ran it for both llama3 and 3.1. Got the same error for both.
Latest commit or version
77d6bf9
The text was updated successfully, but these errors were encountered: