You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CPU architecture: x86_64
GPU type: NVIDIA A100-SXM4-40GB
CUDA Version: 12.7
Driver Version: 565.57.01
Who can help?
Hey, @byshiue!
I saw you responding to other Encoder Model related issue, hope you might be the right person for this question.
Thank you!
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Hello!
I am trying to use the TensorRT-LLM/examples/bert to convert a RoBERTa model (FacebookAI/roberta-base) to a TRT-LLM engine.
I am using v.0.16.0 tag and am following the instructions from TensorRT-LLM/examples/bert/README.md
First, as a sanity check I verify that converting a BERT model (google-bert/bert-base-uncased) passes the run.py huggingface comparisson test (with intermediate checks). With the following commands
This results with final hidden outputs as well as intermediate layer outputs passing the torch.all_close checks with the huggingface model outputs (as implemented in run.py)
Next, I try to perform the same comparison with a RobertaModel
Even though I get a pass on the final check results:RobertaModel result is all close to HF reference!, I observe that the intermediate layer outputs do not match, starting from the 4th layer with default tolerance = 1e-2 (or from the 0th encoder layer with a lower tolerance of 1e-3). Here is the output of the default script:
Embedding are all close
BertEncoderLayer_0_output is close: True
BertEncoderLayer_1_output is close: True
BertEncoderLayer_2_output is close: True
BertEncoderLayer_3_output is close: True
BertEncoderLayer_4_output is close: False
BertEncoderLayer_5_output is close: True
BertEncoderLayer_6_output is close: False
BertEncoderLayer_7_output is close: False
BertEncoderLayer_8_output is close: False
BertEncoderLayer_9_output is close: False
BertEncoderLayer_10_output is close: False
When I try to use a fine-tuned roberta-base checkpoint I encounter both the intermediate and the final checks failing
Expected behavior
Getting the result of CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/ --hf_model_dir=FacebookAI/roberta-base --run_hf_test --debug as:
Embedding are all close
BertEncoderLayer_0_output is close: True
BertEncoderLayer_1_output is close: True
BertEncoderLayer_2_output is close: True
BertEncoderLayer_3_output is close: True
BertEncoderLayer_4_output is close: True
BertEncoderLayer_5_output is close: True
BertEncoderLayer_6_output is close: True
BertEncoderLayer_7_output is close: True
BertEncoderLayer_8_output is close: True
BertEncoderLayer_9_output is close: True
BertEncoderLayer_10_output is close: True
as well as the outputs of the final layer:
RobertaModel result is all close to HF reference!
actual behavior
Embedding are all close
BertEncoderLayer_0_output is close: True
BertEncoderLayer_1_output is close: True
BertEncoderLayer_2_output is close: True
BertEncoderLayer_3_output is close: True
BertEncoderLayer_4_output is close: False
BertEncoderLayer_5_output is close: True
BertEncoderLayer_6_output is close: False
BertEncoderLayer_7_output is close: False
BertEncoderLayer_8_output is close: False
BertEncoderLayer_9_output is close: False
BertEncoderLayer_10_output is close: False
additional notes
Based on the EmbedderLayer outputs matching in all cases, I would expect the difference be in the EncoderLayer, however based on the huggingface implementations those should be exactly the same for BERT and RoBERTa modelling_bert.py, modelling_roberta.py
I would appreciate any help or pointers on how to debug this and make the RobertaModel work with TensorRT-LLM.
The text was updated successfully, but these errors were encountered:
Hey @symphonylyh, @juney-nvidia, I noticed you were both involved in the original BERT + RoBERTa integration (#778).
Tagging you as well, in case it might be relevant. Thanks!
System Info
CPU architecture: x86_64
GPU type: NVIDIA A100-SXM4-40GB
CUDA Version: 12.7
Driver Version: 565.57.01
Who can help?
Hey, @byshiue!
I saw you responding to other Encoder Model related issue, hope you might be the right person for this question.
Thank you!
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Hello!
I am trying to use the
TensorRT-LLM/examples/bert
to convert a RoBERTa model (FacebookAI/roberta-base
) to a TRT-LLM engine.I am using
v.0.16.0
tag and am following the instructions fromTensorRT-LLM/examples/bert/README.md
google-bert/bert-base-uncased
) passes therun.py
huggingface comparisson test (with intermediate checks). With the following commandsThis results with final hidden outputs as well as intermediate layer outputs passing the
torch.all_close
checks with the huggingface model outputs (as implemented inrun.py
)Even though I get a pass on the final check results:
RobertaModel result is all close to HF reference!
, I observe that the intermediate layer outputs do not match, starting from the 4th layer with default tolerance = 1e-2 (or from the 0th encoder layer with a lower tolerance of 1e-3). Here is the output of the default script:roberta-base
checkpoint I encounter both the intermediate and the final checks failingExpected behavior
Getting the result of
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/ --hf_model_dir=FacebookAI/roberta-base --run_hf_test --debug
as:as well as the outputs of the final layer:
actual behavior
additional notes
Based on the
EmbedderLayer
outputs matching in all cases, I would expect the difference be in theEncoderLayer
, however based on the huggingface implementations those should be exactly the same for BERT and RoBERTa modelling_bert.py, modelling_roberta.pyI would appreciate any help or pointers on how to debug this and make the RobertaModel work with TensorRT-LLM.
The text was updated successfully, but these errors were encountered: