-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression: output is nonsense with latest commit and CUDA support enabled #7451
Comments
I can reproduce on 201cc11 Startup
|
same for me on this thread |
Yes, I also get gibberish output, older version like b2953 works normally for the same gguf file. The latest cuda version somehow generates gibberish, vulkan works fine. |
I can confirm, after #7225 generation is completely broken. I checked it on CPU, 4090 and P40, on different models. I tried b2961, tried FORCE_MMQ, with and without FA. Nothing works. It's sad that we don't have normal autotests. |
Check if #7452 fixes the issue |
Looks good on my end cherry picking #7452 into master
|
I believe that this issue is still present on latest releases. I've gone back now to ecab1c7 and it works as before, since I really just need the new /health endpoint. |
On 201cc11, I get gibberish output trying to sample from Llama-3-8B quantized with Q5_K_M (same behavior with Q8_0, F16, F32, and Q4_K_M). This happens when llama.cpp is built with CUDA support, but not without. I'm building these with Nix. Here's an example output:
It starts talking about Annapolis, Maryland for some reason, instead of fabric. Other seeds are also nonsense, either gibberish or a nonsensical change of topic. In contrast, CPU only build is fine:
It's repeating itself, but it at least makes sense. 6369bf0 (the previous commit) is fine for CUDA (and CPU):
The text was updated successfully, but these errors were encountered: