Segmentation fault with pipeline parallelism and `gather_all_token_logits` #1284

Marks101 · 2024-03-12T13:35:34Z

System Info

NVIDIA H100 DGX
CUDA 12.1
TensorRT-LLM 0.8.0

Who can help?

@byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Based on the Falcon examples, I added the use of pipeline parallelism and gather_all_token_logits:

python convert_checkpoint.py --model_dir ./falcon/7b-instruct --dtype bfloat16 --output_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --pp_size 2

trtllm-build --checkpoint_dir ./falcon/7b-instruct/trt_ckpt/bf16/2-gpu/ --gemm_plugin bfloat16 --remove_input_padding enable --gpt_attention_plugin bfloat16 --output_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/ --gather_all_token_logits

python ../summarize.py --test_trt_llm --hf_model_dir ./falcon/7b-instruct --engine_dir ./falcon/7b-instruct/trt_engines/bf16/2-gpu/

Expected behavior

Produces a similar result to the case without pipelining and without gather_all_token_logits

actual behavior

Crashes with the following stack trace:

*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x8
[ 0] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fa73ade1520]
[ 1] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession18executeContextStepERKSt6vectorINS0_15GenerationInputESaIS3_EERKS2_IiSaIiEEPKNS_13batch_manager16kv_cache_manager14KVCacheManagerE+0x5a2)[0x7fa455c9a7c2]
[ 2] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession15generateBatchedERSt6vectorINS0_16GenerationOutputESaIS3_EERKS2_INS0_15GenerationInputESaIS7_EERKNS0_14SamplingConfigERKSt8functionIFvibEE+0xc0b)[0x7fa455c9b89b]
[ 3] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/virtualenv/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm7runtime10GptSession8generateERNS0_16GenerationOutputERKNS0_15GenerationInputERKNS0_14SamplingConfigE+0xc43)[0x7fa455c9d2f3]
[ 4] /virtualenv/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x42f79)[0x7fa484d80f79]
[ 5] /virtualenv/lib/python3.10/site-packages/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0x2d19e)[0x7fa484d6b19e]
[ 6] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x15a10e)[0x55cc0703e10e]
[ 7] python(_PyObject_MakeTpCall+0x25b)[0x55cc07034a7b]
[ 8] python(+0x168acb)[0x55cc0704cacb]
[ 9] python(_PyEval_EvalFrameDefault+0x614a)[0x55cc0702ccfa]
[10] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x1687f1)[0x55cc0704c7f1]
[11] python(PyObject_Call+0x122)[0x55cc0704d492]
[12] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyEval_EvalFrameDefault+0x2a27)[0x55cc070295d7]
[13] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyFunction_Vectorcall+0x7c)[0x55cc0703e9fc]
[14] python(_PyEval_EvalFrameDefault+0x198c)[0x55cc0702853c]
[15] python(_PyFunction_Vectorcall+0x7c)[0x55cc0703e9fc]
Tue Mar 12 12:30:27 2024[1,0]<stderr>:[16] python(_PyEval_EvalFrameDefault+0x6bd)[0x55cc0702726d]
[17] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x13f9c6)[0x55cc070239c6]
[18] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(PyEval_EvalCode+0x86)[0x55cc07119256]
[19] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x260108)[0x55cc07144108]
[20] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x2599cb)[0x55cc0713d9cb]
[21] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(+0x25fe55)[0x55cc07143e55]
[22] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyRun_SimpleFileObject+0x1a8)[0x55cc07143338]
[23] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_PyRun_AnyFileObject+0x43)[0x55cc07142f83]
[24] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(Py_RunMain+0x2be)[0x55cc07135a5e]
[25] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(Py_BytesMain+0x2d)[0x55cc0710c02d]
[26] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fa73adc8d90]
[27] Tue Mar 12 12:30:27 2024[1,0]<stderr>:/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fa73adc8e40]
[28] Tue Mar 12 12:30:27 2024[1,0]<stderr>:python(_start+0x25)[0x55cc0710bf25]
*** End of error message ***

If I add --use_py_session, I get the following error:

Traceback (most recent call last):
  File "/TensorRT-LLM/examples/falcon/../summarize.py", line 644, in <module>
    main(args)
  File "/TensorRT-LLM/examples/falcon/../summarize.py", line 388, in main
    output, *_ = eval_trt_llm(datapoint,
  File "/TensorRT-LLM/examples/falcon/../summarize.py", line 233, in eval_trt_llm
    outputs = runner.generate(
  File "/virtualenv/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 642, in generate
    outputs = self._prepare_outputs(outputs, input_lengths)
  File "/virtualenv/lib/python3.10/site-packages/tensorrt_llm/runtime/model_runner.py", line 237, in _prepare_outputs
    context_logits = context_logits.flatten(end_dim=-2)
AttributeError: 'NoneType' object has no attribute 'flatten'

additional notes

We noticed this error in different tasks that require us to gather logits and use pipeline parallelism. We managed to reproduce this issue based on the official examples. For simplicity, I base this issue description on these observations.

The text was updated successfully, but these errors were encountered:

yweng0828 · 2024-03-14T08:10:47Z

Hi @Marks101 , thanks for your feedback, we will try to reproduce and fix this issue.

yweng0828 · 2024-04-17T14:13:53Z

Hi @Marks101 , we have fixed this issue in our latest version, could you please verify it?
Please feel free to contact us if there are still problems.

Marks101 · 2024-04-18T06:20:31Z

Hi @yweng0828, thank you for taking care of this issue.
I was able to verify that everything is fixed. Great 😄

yweng0828 · 2024-04-18T06:23:49Z

Thanks for your update, @Marks101 . Let's close this issue! :)

byshiue · 2024-04-18T06:31:07Z

C

Marks101 added the bug Something isn't working label Mar 12, 2024

byshiue assigned yweng0828 Mar 14, 2024

kaiyux mentioned this issue Apr 16, 2024

Update TensorRT-LLM #1455

Merged

byshiue closed this as completed Apr 18, 2024

kaiyux mentioned this issue Jun 5, 2024

TensorRT-LLM v0.10 update #1734

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault with pipeline parallelism and `gather_all_token_logits` #1284

Segmentation fault with pipeline parallelism and `gather_all_token_logits` #1284

Marks101 commented Mar 12, 2024

yweng0828 commented Mar 14, 2024

yweng0828 commented Apr 17, 2024

Marks101 commented Apr 18, 2024

yweng0828 commented Apr 18, 2024

byshiue commented Apr 18, 2024

Segmentation fault with pipeline parallelism and gather_all_token_logits #1284

Segmentation fault with pipeline parallelism and gather_all_token_logits #1284

Comments

Marks101 commented Mar 12, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

yweng0828 commented Mar 14, 2024

yweng0828 commented Apr 17, 2024

Marks101 commented Apr 18, 2024

yweng0828 commented Apr 18, 2024

byshiue commented Apr 18, 2024

Segmentation fault with pipeline parallelism and `gather_all_token_logits` #1284

Segmentation fault with pipeline parallelism and `gather_all_token_logits` #1284