You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to see the output generated with NCCL_DEBUG=INFO. You might want to add NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH,TUNING,NET to the mix for some additional info that might help.
Is this reproducible with the latest NCCL version (2.25.1)?
Settings that lead to error 3:
(myenv) exx@sn4622123662:~/Desktop/fine-tune$ printenv | grep ^NCCL
NCCL_LEGACY_CUDA_REGISTER=0
NCCL_P2P_DISABLE=1
NCCL_SOCKET_IFNAME=eth0
NCCL_P2P_LEVEL=NVL
NCCL_DEBUG=INFO
NCCL_SET_STACK_SIZE=1
NCCL_CUMEM_ENABLE=0
Removing all settings lead to error 2, or SHM disable, P2P disable, all independantly reproduce error 2
GPUs run well on their own
NCCL test without MPI fails and hangs.
./build/all_reduce_perf -b 8 -e 128M -f 2 -g 2
\
The text was updated successfully, but these errors were encountered: