RoBERTa model conversion does not pass the huggingface test #2829

arinaruck · 2025-02-26T19:43:22Z

System Info

CPU architecture: x86_64
GPU type: NVIDIA A100-SXM4-40GB
CUDA Version: 12.7
Driver Version: 565.57.01

Who can help?

Hey, @byshiue!
I saw you responding to other Encoder Model related issue, hope you might be the right person for this question.

Thank you!

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Hello!
I am trying to use the TensorRT-LLM/examples/bert to convert a RoBERTa model (FacebookAI/roberta-base) to a TRT-LLM engine.
I am using v.0.16.0 tag and am following the instructions from TensorRT-LLM/examples/bert/README.md

First, as a sanity check I verify that converting a BERT model (google-bert/bert-base-uncased) passes the run.py huggingface comparisson test (with intermediate checks). With the following commands

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/convert_checkpoint.py --model=BertModel  --model_dir=google-bert/bert-base-uncased   --output_dir=trt_checkpoints/bert-base

CUDA_VISIBLE_DEVICES=0 trtllm-build --checkpoint_dir trt_checkpoints/bert-base/ --output_dir engines/bert-base --remove_input_padding=disable --max_batch_size=128 --max_seq_len=512  --bert_attention_plugin=disable --context_fmha=disable --enable_debug_output

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/bert-base/  --hf_model_dir=google-bert/bert-base-uncased --run_hf_test  --debug

This results with final hidden outputs as well as intermediate layer outputs passing the torch.all_close checks with the huggingface model outputs (as implemented in run.py)

Next, I try to perform the same comparison with a RobertaModel

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/convert_checkpoint.py --model=RobertaModel  --model_dir=FacebookAI/roberta-base --output_dir=trt_checkpoints/roberta-base 

CUDA_VISIBLE_DEVICES=0 trtllm-build --checkpoint_dir trt_checkpoints/roberta-base/ --output_dir engines/roberta-base --remove_input_padding=disable --max_batch_size=128 --max_seq_len=512  --bert_attention_plugin=disable --context_fmha=disable --enable_debug_output

CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/  --hf_model_dir=FacebookAI/roberta-base --run_hf_test  --debug

Even though I get a pass on the final check results:RobertaModel result is all close to HF reference!, I observe that the intermediate layer outputs do not match, starting from the 4th layer with default tolerance = 1e-2 (or from the 0th encoder layer with a lower tolerance of 1e-3). Here is the output of the default script:

 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: False                                                                                     
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: False                                                                                     
 BertEncoderLayer_7_output is close: False                                                                                     
 BertEncoderLayer_8_output is close: False                                                                                     
 BertEncoderLayer_9_output is close: False                                                                                     
 BertEncoderLayer_10_output is close: False

When I try to use a fine-tuned roberta-base checkpoint I encounter both the intermediate and the final checks failing

Expected behavior

Getting the result of
CUDA_VISIBLE_DEVICES=0 python /code/tensorrt_llm/examples/bert/run.py --engine_dir engines/roberta-base/ --hf_model_dir=FacebookAI/roberta-base --run_hf_test --debug as:

 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: True                                                                                   
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: True                                                                                     
 BertEncoderLayer_7_output is close: True                                                                                     
 BertEncoderLayer_8_output is close: True                                                                                   
 BertEncoderLayer_9_output is close: True                                                                                     
 BertEncoderLayer_10_output is close: True

as well as the outputs of the final layer:

RobertaModel result is all close to HF reference!

actual behavior

 Embedding are all close                                                                                                       
 BertEncoderLayer_0_output is close: True                                                                                      
 BertEncoderLayer_1_output is close: True                                                                                      
 BertEncoderLayer_2_output is close: True                                                                                      
 BertEncoderLayer_3_output is close: True                                                                                      
 BertEncoderLayer_4_output is close: False                                                                                     
 BertEncoderLayer_5_output is close: True                                                                                      
 BertEncoderLayer_6_output is close: False                                                                                     
 BertEncoderLayer_7_output is close: False                                                                                     
 BertEncoderLayer_8_output is close: False                                                                                     
 BertEncoderLayer_9_output is close: False                                                                                     
 BertEncoderLayer_10_output is close: False

additional notes

Based on the EmbedderLayer outputs matching in all cases, I would expect the difference be in the EncoderLayer, however based on the huggingface implementations those should be exactly the same for BERT and RoBERTa modelling_bert.py, modelling_roberta.py

I would appreciate any help or pointers on how to debug this and make the RobertaModel work with TensorRT-LLM.

The text was updated successfully, but these errors were encountered:

arinaruck · 2025-02-27T10:16:21Z

Hey @symphonylyh, @juney-nvidia, I noticed you were both involved in the original BERT + RoBERTa integration (#778).
Tagging you as well, in case it might be relevant. Thanks!

arinaruck added the bug Something isn't working label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RoBERTa model conversion does not pass the huggingface test #2829

RoBERTa model conversion does not pass the huggingface test #2829

arinaruck commented Feb 26, 2025

arinaruck commented Feb 27, 2025

RoBERTa model conversion does not pass the huggingface test #2829

RoBERTa model conversion does not pass the huggingface test #2829

Comments

arinaruck commented Feb 26, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

arinaruck commented Feb 27, 2025