Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

Open
white-wolf-tech opened this issue Feb 27, 2025 · 4 comments
Open

Modelopt-v0.23.2 not support Qwen2.5 series LLM model? #142

white-wolf-tech opened this issue Feb 27, 2025 · 4 comments
Assignees

Comments

@white-wolf-tech
Copy link

When I use the Qwen2.5-3B model and perform quantization using the Int8_sq algorithm.
The checkpoint_convert.py script that comes with the TensorrtLLM library (that is, the Int8_sq algorithm implemented by them, without using the Modelopt library), the compiled engine can be used normally by the tritonserver tensorrtllm-backend.

However, when using the Modelopt library and the same algorithm, the compiled engine cannot be used normally by the tritonserver tensorrtllm-backend. Is this because the current version does not support this model? Or what other problems could there be?

@kevalmorabia97
Copy link
Collaborator

What error do you see when using ModelOpt's quantized checkpoint with tritonserver?
Note that TensorRT-LLM under the hood also uses ModelOpt library for quantization

@white-wolf-tech
Copy link
Author

The detailed situation is here.
NVIDIA/TensorRT-LLM#2810

The phenomenon is that, with the same algorithm, when using the conversion script that comes with TensorrtLLM, the output result is normal. However, after compiling with ModelOpt, all the output tokens are 1023, and the structure after decoding is:

"xx.Componentlocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklocklock"

@cjluo-nv
Copy link
Collaborator

cjluo-nv commented Feb 27, 2025

Also have you tried the llm_ptq examples in this repo as well?

@white-wolf-tech
Copy link
Author

Also have you tried the llm_ptq examples in this repo as well?

the result is the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants