Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RoBERTa model conversion does not pass the huggingface test #2829

Open
2 of 4 tasks
arinaruck opened this issue Feb 26, 2025 · 1 comment
Open
2 of 4 tasks

RoBERTa model conversion does not pass the huggingface test #2829

arinaruck opened this issue Feb 26, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@arinaruck
Copy link

System Info

CPU architecture: x86_64
GPU type: NVIDIA A100-SXM4-40GB
CUDA Version: 12.7
Driver Version: 565.57.01

Who can help?

Hey, @byshiue!
I saw you responding to other Encoder Model related issue, hope you might be the right person for this question.

Thank you!

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Hello!
I am trying to use the TensorRT-LLM/examples/bert to convert a RoBERTa model (FacebookAI/roberta-base) to a TRT-LLM engine.
I am using v.0.16.0 tag and am following the instructions from TensorRT-LLM/examples/bert/README.md

  • First, as a sanity check I verify that converting a BERT model (google-bert/bert-base-uncased) passes the run.py huggingface comparisson test (with intermediate checks). With the following commands
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/convert_checkpoint.py --model=BertModel  --model_dir=google-bert/bert-base-uncased   --output_dir=trt_checkpoints/bert-base

CUDA_VISIBLE_DEVICES=0 trtllm-build --checkpoint_dir trt_checkpoints/bert-base/ --output_dir engines/bert-base --remove_input_padding=disable --max_batch_size=128 --max_seq_len=512  --bert_attention_plugin=disable --context_fmha=disable --enable_debug_output

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/bert-base/  --hf_model_dir=google-bert/bert-base-uncased --run_hf_test  --debug

This results with final hidden outputs as well as intermediate layer outputs passing the torch.all_close checks with the huggingface model outputs (as implemented in run.py)

  • Next, I try to perform the same comparison with a RobertaModel
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/convert_checkpoint.py --model=RobertaModel  --model_dir=FacebookAI/roberta-base --output_dir=trt_checkpoints/roberta-base 

CUDA_VISIBLE_DEVICES=0 trtllm-build --checkpoint_dir trt_checkpoints/roberta-base/ --output_dir engines/roberta-base --remove_input_padding=disable --max_batch_size=128 --max_seq_len=512  --bert_attention_plugin=disable --context_fmha=disable --enable_debug_output

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/  --hf_model_dir=FacebookAI/roberta-base --run_hf_test  --debug

Even though I get a pass on the final check results:RobertaModel result is all close to HF reference!, I observe that the intermediate layer outputs do not match, starting from the 4th layer with default tolerance = 1e-2 (or from the 0th encoder layer with a lower tolerance of 1e-3). Here is the output of the default script:

 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: False                                                                                     
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: False                                                                                     
 BertEncoderLayer_7_output is close: False                                                                                     
 BertEncoderLayer_8_output is close: False                                                                                     
 BertEncoderLayer_9_output is close: False                                                                                     
 BertEncoderLayer_10_output is close: False     
  • When I try to use a fine-tuned roberta-base checkpoint I encounter both the intermediate and the final checks failing

Expected behavior

Getting the result of
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/ --hf_model_dir=FacebookAI/roberta-base --run_hf_test --debug as:

 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: True                                                                                   
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: True                                                                                     
 BertEncoderLayer_7_output is close: True                                                                                     
 BertEncoderLayer_8_output is close: True                                                                                   
 BertEncoderLayer_9_output is close: True                                                                                     
 BertEncoderLayer_10_output is close: True     

as well as the outputs of the final layer:

RobertaModel result is all close to HF reference!

actual behavior

 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: False                                                                                     
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: False                                                                                     
 BertEncoderLayer_7_output is close: False                                                                                     
 BertEncoderLayer_8_output is close: False                                                                                     
 BertEncoderLayer_9_output is close: False                                                                                     
 BertEncoderLayer_10_output is close: False     

additional notes

Based on the EmbedderLayer outputs matching in all cases, I would expect the difference be in the EncoderLayer, however based on the huggingface implementations those should be exactly the same for BERT and RoBERTa modelling_bert.py, modelling_roberta.py

I would appreciate any help or pointers on how to debug this and make the RobertaModel work with TensorRT-LLM.

@arinaruck arinaruck added the bug Something isn't working label Feb 26, 2025
@arinaruck
Copy link
Author

Hey @symphonylyh, @juney-nvidia, I noticed you were both involved in the original BERT + RoBERTa integration (#778).
Tagging you as well, in case it might be relevant. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant