Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to execute cutlass gemm: internal error on turning. #167

Open
Ph0rk0z opened this issue Dec 8, 2024 · 2 comments
Open

Failed to execute cutlass gemm: internal error on turning. #167

Ph0rk0z opened this issue Dec 8, 2024 · 2 comments

Comments

@Ph0rk0z
Copy link

Ph0rk0z commented Dec 8, 2024

When you enable cuda graphs on turning, the compile fails. It succeeds on 3090 but that doesn't need the speedup. Is there something incompatible? I built from source.

@Ph0rk0z
Copy link
Author

Ph0rk0z commented Jan 6, 2025

I found out that compiling the C extension for multiple architectures causes import errors.

@Ph0rk0z
Copy link
Author

Ph0rk0z commented Jan 6, 2025

Source of the error is:

def _modify_model(
    m,
    enable_cnn_optimization=True,
    enable_fused_linear_geglu=True,
    prefer_lowp_gemm=True,
    enable_triton=False,
    enable_triton_reshape=False,
    enable_triton_layer_norm=False,
    memory_format=None,
):

inside: https://github.com/chengzeyi/stable-fast/blob/main/src/sfast/compilers/diffusion_pipeline_compiler.py

enable_fused_linear_geglu=True,

Geglu always enables even on unsupported architecture. I changed it to false and off I go.

I do wonder why SDnext is faster than comfy though.. maybe it's due to diffusers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant