-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CalledProcessError: Command '['hostname -I']'
died with <Signals.SIGSEGV: 11>.
#2837
Comments
Hi @saforem2 - sorry for the late reply. This looks to be because the permissions of the Also would you mind making a PR with the suggested change? |
Yeah, no worries. Honestly this was actually a (seemingly?) intermittent issue (that I haven't seen in a while, come to think of it) so I never bothered to pin it down. I guess they both achieve the same thing, though maybe the method using but yeah, happy to submit a PR if you think this would be preferred |
Actually, I'm seeing different results on my machine using the two approaches, hostname is retuning the IPv6 address, the socket method is returning my machine name it seems. To avoid breaking other things, perhaps we leave it as it is if its not causing issues that you've not seen in a while? |
yeah sounds good, happy to close this then |
Thanks for reporting the bug, and hopefully whatever was causing it remains fixed! |
Not sure the cause, but trying to run multi-node training (launching with mpich), I'm getting the following error:
The error is originating from
deepspeed/comm/comm.py
:https://github.com/microsoft/DeepSpeed/blob/46784cb58edf7bbe9b6bbec95212de7b81e55b01/deepspeed/comm/comm.py#L676
An easy fix would be replacing the
with
The text was updated successfully, but these errors were encountered: