Segmentation fault with pipeline parallelism and gather_all_token_logits
#1284
Labels
bug
Something isn't working
gather_all_token_logits
#1284
System Info
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Based on the Falcon examples, I added the use of pipeline parallelism and gather_all_token_logits:
python convert_checkpoint.py --model_dir ./falcon/7b-instruct --dtype bfloat16 --output_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --pp_size 2 trtllm-build --checkpoint_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --gemm_plugin bfloat16 --remove_input_padding enable --gpt_attention_plugin bfloat16 --output_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/ --gather_all_token_logits python ../summarize.py --test_trt_llm --hf_model_dir ./falcon/7b-instruct --engine_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/
Expected behavior
Produces a similar result to the case without pipelining and without gather_all_token_logits
actual behavior
Crashes with the following stack trace:
If I add --use_py_session, I get the following error:
additional notes
We noticed this error in different tasks that require us to gather logits and use pipeline parallelism. We managed to reproduce this issue based on the official examples. For simplicity, I base this issue description on these observations.
The text was updated successfully, but these errors were encountered: