You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Falcon-40b build failing due to memory leaks. I monitored CPU memory usage and it reaches peak and then build fails. I am on aws g5.48xlarge ec2 instance with 768GB of RAM.
[11/01/2023-04:48:30] [TRT-LLM] [I] ========================================= Build Arguments ==========================================
[11/01/2023-04:48:30] [TRT-LLM] [I] - world_size....................: 4
[11/01/2023-04:48:30] [TRT-LLM] [I] - tp_size.......................: 4
[11/01/2023-04:48:30] [TRT-LLM] [I] - pp_size.......................: 1
[11/01/2023-04:48:30] [TRT-LLM] [I] - model_dir.....................: /workspace/falcon/tiiuae-falcon-40b
[11/01/2023-04:48:30] [TRT-LLM] [I] - dtype.........................: float16
[11/01/2023-04:48:30] [TRT-LLM] [I] - timing_cache..................: model.cache
[11/01/2023-04:48:30] [TRT-LLM] [I] - log_level.....................: info
[11/01/2023-04:48:30] [TRT-LLM] [I] - vocab_size....................: 65024
[11/01/2023-04:48:30] [TRT-LLM] [I] - n_layer.......................: 60
[11/01/2023-04:48:30] [TRT-LLM] [I] - n_positions...................: 2048
[11/01/2023-04:48:30] [TRT-LLM] [I] - n_embd........................: 8192
[11/01/2023-04:48:30] [TRT-LLM] [I] - n_head........................: 128
[11/01/2023-04:48:30] [TRT-LLM] [I] - n_kv_head.....................: 8
[11/01/2023-04:48:30] [TRT-LLM] [I] - mlp_hidden_size...............: None
[11/01/2023-04:48:30] [TRT-LLM] [I] - max_batch_size................: 128
[11/01/2023-04:48:30] [TRT-LLM] [I] - max_input_len.................: 512
[11/01/2023-04:48:30] [TRT-LLM] [I] - max_output_len................: 256
[11/01/2023-04:48:30] [TRT-LLM] [I] - max_beam_width................: 1
[11/01/2023-04:48:30] [TRT-LLM] [I] - use_gpt_attention_plugin......: float16
[11/01/2023-04:48:30] [TRT-LLM] [I] - bias..........................: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - parallel_attention............: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - new_decoder_architecture......: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - alibi.........................: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - logits_dtype..................: float32
[11/01/2023-04:48:30] [TRT-LLM] [I] - use_gemm_plugin...............: float16
[11/01/2023-04:48:30] [TRT-LLM] [I] - use_layernorm_plugin..........: float16
[11/01/2023-04:48:30] [TRT-LLM] [I] - parallel_build................: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - enable_context_fmha...........: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - enable_context_fmha_fp32_acc..: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - visualize.....................: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - load_by_shard.................: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - enable_debug_output...........: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - gpus_per_node.................: 8
[11/01/2023-04:48:30] [TRT-LLM] [I] - builder_opt...................: None
[11/01/2023-04:48:30] [TRT-LLM] [I] - output_dir....................: /tmp/tensorrtllm/falcon40b/1
[11/01/2023-04:48:30] [TRT-LLM] [I] - remove_input_padding..........: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - strongly_typed................: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - enable_fp8....................: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - quantized_fp8_model_path......: None
[11/01/2023-04:48:30] [TRT-LLM] [I] - fp8_kv_cache..................: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - use_inflight_batching.........: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - paged_kv_cache................: True
[11/01/2023-04:48:30] [TRT-LLM] [I] - tokens_per_block..............: 64
[11/01/2023-04:48:30] [TRT-LLM] [I] - max_num_tokens................: None
[11/01/2023-04:48:30] [TRT-LLM] [I] - use_custom_all_reduce.........: False
[11/01/2023-04:48:30] [TRT-LLM] [I] - quant_mode....................: 0
[11/01/2023-04:48:30] [TRT-LLM] [I] ====================================================================================================
[11/01/2023-04:48:30] [TRT-LLM] [W] Parallelly build TensorRT engines. Please make sure that all of the 4 GPUs are totally free.
[11/01/2023-04:48:44]
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
[11/01/2023-04:48:44]
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
[11/01/2023-04:48:44]
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
[11/01/2023-04:48:44]
WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
Loading checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]
Loading checkpoint shards: 0%| | 0/9 [00:00<?, ?it/s]Traceback (most recent call last):
File "/workspace/trt-llm/tensorrt_llm/examples/falcon/build.py", line 563, in <module>
mp.spawn(build, nprocs=args.world_size, args=(args, ))
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 246, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
while not context.join():
File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 145, in join
raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with signal SIGKILL
/usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The text was updated successfully, but these errors were encountered:
Falcon-40b build failing due to memory leaks. I monitored CPU memory usage and it reaches peak and then build fails. I am on aws g5.48xlarge ec2 instance with 768GB of RAM.
Command:
Log
The text was updated successfully, but these errors were encountered: