-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phi3.5V Server API Error: Forward step expected a PagedAttention input metadata. #756
Comments
Same thing happens with Phi-3.5-MoE. For Phi-3.5-MoE: It looks like the default scheduler path can reach the NormalPipeline, which requires(?) paged attention metadata. Likely something similar is happening for the vision model. |
Ah, so the quantized pipelines can handle the lack of paged attention metadata. I'm not clear on why the Python API is routing through a different part of the code, though.
|
Potential confounder: I might be misreading things, but it looks like Paged Attention is disabled for non-CUDA targets (including Metal). mistral.rs/mistralrs-core/src/utils/mod.rs Lines 225 to 233 in 366f9f0
|
Hi @JCRPaquin @ytnvj2 thank you for the details and all investigation! Indeed, you are correct the issue arises when there is a discrepancy there. This bug was caused by a regression from #753, and I just merged #759 which should fix this. Can you please confim this works? |
@EricLBuehler thanks for the quick response! I'll try the fix in a few hours. |
@EricLBuehler Tried out the fix and it works for me now. Thank you for the quick fix. Appreciate it. |
Hi,
I am running the phi3.5 vision model using the below command on Apple M2 macbook:
'cargo run --release --features metal -- --port 1234 vision-plain -m microsoft/Phi-3.5-vision-instruct -a phi3v'
Everything loads fine, but when I query I get this error:
mistralrs_core::engine: prompt step - Model failed with error: Msg("Forward step expected a PagedAttention input metadata. This was not provided, please ensure that the scheduler config is correctly configured for PagedAttention.")
On the other hand, If I load the model using the python API everything works fine but I am not sure how to enable ISQ in python.
''
from mistralrs import Runner, Which, ChatCompletionRequest, VisionArchitecture
runner = Runner(
which=Which.VisionPlain(
model_id="microsoft/Phi-3.5-vision-instruct",
arch=VisionArchitecture.Phi3V,
),
)
''
Any idea what might be causing this and how to fix this?
The text was updated successfully, but these errors were encountered: