-
-
Notifications
You must be signed in to change notification settings - Fork 15.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[20.03] nvidia-docker: failure to find libnvidia-ml.so on LD_LIBRARY_PATH #83713
Comments
This was merged in 19.09 370d3af but reverted because applications had issues. We fixed those during 20.03 development so it is still disabled. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I'm guessing you also tried adding it to PATH? (which is really weird considering it's a lib) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Have you found any temporary work-around for this? I've had to revert back to 19.09 which is unfortunate. |
Not yet, I've tried directly starting the daemon with LD_LIBRARY_PATH and LD_PRELOAD pointing to libnvidia-ml.so; I'm wondering if this would be fix by bumping the version of nvidia-container-toolkit/runc. I haven't had time yet to try updating/patching the packages. |
I could reproduce this issue on my machine on 20.03, as could @tomberek. |
It looks like nvidia-docker is still using nvidia-docker2 as opposed to the newer nvidia-container-toolkit, which may warrant upgrading the nix nvidia-docker ecosystem as the newer version seems to be simpler (doesn't involve replacing docker's runc) and nvidia-container-toolkit is cross compatible with podman (rootless docker alternative by redhat) |
Well this is odd. First error makes sense, as there indeed is a mismatch:
|
Adding some symlinks manually works. But is not pretty.
Another approach is to override LD_CONFIG_PATH and mount in the host's drivers:
Seems like this has something to do with ldconfig not liking patchelf'd libraries: It also seems others have addressed the problem before: #51733 but those fixes are not working anymore. |
Please try this libnvc-container upgrade as a workaround: averelld@f295c70 |
@averelld : your patch works |
@averelld Did you want to be in charge of the PR? Otherwise I can handle it. I still think there should be a discussion about the merits of whether we should bother keeping nvidia-docker if we move to nvidia-container-toolkit (I don't see the point personally) but this at least closes out the issue. |
Nice. I'm also in favor of not keeping the legacy wrapper. |
Describe the bug
On 19.09 the opengl/graphics libraries populate the LD_LIBRARY_PATH. When setting the appropriate configuration to replicate this behavior in 20.03, nvidia-docker fails to initialize cuda.
Steps to replicate
Pertinent section of configuration.nix
The text was updated successfully, but these errors were encountered: