Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.4.0+ breaks compatibility with Nvidia Deep Learning Containers #892

Closed
tasercake opened this issue Oct 18, 2022 · 4 comments · Fixed by #919
Closed

v0.4.0+ breaks compatibility with Nvidia Deep Learning Containers #892

tasercake opened this issue Oct 18, 2022 · 4 comments · Fixed by #919
Labels
bug Something isn't working

Comments

@tasercake
Copy link
Contributor

tasercake commented Oct 18, 2022

Describe the bug

Nvidia's deep learning containers are a popular way to run machine learning workloads on top of Docker.

With diffusers 0.4.0+, I'm unable to import diffusers inside of this container because torch.backends.mps doesn't exist.

Culprit appears to be this line in src/diffusers/utils/testing_utils.py:

if is_torch_higher_equal_than_1_12:
        torch_device = "mps" if torch.backends.mps.is_available() else torch_device

The torch installation in the container doesn't have MPS, so it raises the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/lib/python3.8/site-packages/diffusers/__init__.py", line 1, in <module>
    from .utils import (
  File "/opt/conda/lib/python3.8/site-packages/diffusers/utils/__init__.py", line 43, in <module>
    from .testing_utils import floats_tensor, load_image, parse_flag_from_env, slow, torch_device
  File "/opt/conda/lib/python3.8/site-packages/diffusers/utils/testing_utils.py", line 23, in <module>
    torch_device = "mps" if torch.backends.mps.is_available() else torch_device
AttributeError: module 'torch.backends' has no attribute 'mps'

Reproduction

(assuming you have Docker installed & configured)

Start the deep learning container

docker run -it --rm --platform linux/amd64 nvcr.io/nvidia/pytorch:22.04-py3 bash

Inside the container:

# Install diffusers (>=0.4.0)
pip install diffusers==0.4.0

# Import diffusers from python
python -c python -c 'import diffusers'

Workaround

Adding another condition to the MPS check seems to fix at least the import issue for me:

if is_torch_higher_equal_than_1_12:
        torch_device = "mps" if (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()) else torch_device

Happy to contribute this as a PR if appropriate.

System Info

I've only tested this with version 22.04 of the deep learning container from nvidia because it's the latest one that comes with torch==1.12.0

Output from running diffusers-cli env inside the container:

  • diffusers version: 0.4.0 (also tested with 0.5.1)
  • Platform: Linux-4.19.121-linuxkit-x86_64-with-glibc2.10
  • Python version: 3.8.13
  • PyTorch version (GPU?): 1.12.0a0+bd13bc6 (False)
  • Huggingface_hub version: 0.10.1
  • Transformers version: 4.23.1
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No
@tasercake tasercake added the bug Something isn't working label Oct 18, 2022
@keturn
Copy link
Contributor

keturn commented Oct 18, 2022

Odd. Why would the pytorch api define a torch.backends.mps.is_built function if torch.backends.mps wasn't supposed to be available even on non-mps builds?

#849 suggests diffusers is PyTorch 1.13 compatible, does it work if you use the newest version of the NGC container instead?

@tasercake
Copy link
Contributor Author

tasercake commented Oct 18, 2022

Odd indeed. Tested this on nvcr.io/nvidia/pytorch:22.09-py3 which ships with PyTorch 1.13 and I'm able to import diffusers just fine since torch.backends.mps is present.

Interestingly, the v1.12 docs for torch.backends don't mention mps in the header section (but the 1.13 docs do). Not sure if this was just an oversight.

@pcuenca
Copy link
Member

pcuenca commented Oct 18, 2022

As far as I knew, the mps backend was added in PyTorch 1.12: https://pytorch.org/docs/1.12/notes/mps.html. This looks like this must be an issue with that container.

Feel free to open a PR to make it a bit more robust :)

@tasercake
Copy link
Contributor Author

tasercake commented Oct 18, 2022

Ran a few tests:

# Pull image & run container
docker run -it --platform linux/amd64 nvcr.io/nvidia/pytorch:<version>-py3 bash

# Inside container, install & import diffusers
pip install diffusers
python -c 'import diffusers'

My results:

Nvidia container version PyTorch version Did it work?
22.09 1.13.0a0+d0d6b1f
22.06 1.13.0a0+340c412
22.05 1.12.0a0+8a1a93a
22.04 1.12.0a0+bd13bc6

Will open a PR with the above workaround shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants