You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to get the newly generated tokens. It looks like currently the model returns all the token ids including input.
I tried to update the postprocessing model to take in REQUEST_INPUT_LEN from preprocessing model, however I'm receiving error E1101 04:46:29.381834 3651 model_repository_manager.cc:563] Invalid argument: in ensemble ensemble, step of model 'ensemble' receives inputs originated from different decoupled models.
This appears to be that postprocessing model tries to take input from both the decouple model tensorrt_llm and non-decoupled model preprocessing. Ensemble loads if I set model_transaction_policy.decoupled of model tensorrt_llm to false.
I could also do some tokenizations outside of the model ensemble but that is duplicated calculations. Any suggestions?
The text was updated successfully, but these errors were encountered:
@yunfeng-scale thanks for reporting this. Several customers also report the similar request. We already have an internal MR to support this which is being under reviewed. After it is done, it will be released correspondingly and there also will be release announcement mentioning this.
Hi @yunfeng-scale , we pushed an update to the main branch for both TensorRT-LLM and TensorRT-LLM backend, including the feature to only get newly generated tokens.
Closing, please feel free to re-open if you have any questions, thanks.
Note that the team is doing the development in internal repos, and sync the changes to GitHub periodically, so we do not have a dedicated PR on GitHub to fix this issue.
If you are still seeing the issue, please feel free to ask and we will reopen the issue. Thanks.
Hi, I'm trying to get the newly generated tokens. It looks like currently the model returns all the token ids including input.
I tried to update the postprocessing model to take in
REQUEST_INPUT_LEN
from preprocessing model, however I'm receiving errorE1101 04:46:29.381834 3651 model_repository_manager.cc:563] Invalid argument: in ensemble ensemble, step of model 'ensemble' receives inputs originated from different decoupled models
.This appears to be that postprocessing model tries to take input from both the decouple model tensorrt_llm and non-decoupled model preprocessing. Ensemble loads if I set
model_transaction_policy.decoupled
of modeltensorrt_llm
to false.I could also do some tokenizations outside of the model ensemble but that is duplicated calculations. Any suggestions?
The text was updated successfully, but these errors were encountered: