Failed to execute cutlass gemm: internal error on turning. #167

Ph0rk0z · 2024-12-08T17:16:35Z

When you enable cuda graphs on turning, the compile fails. It succeeds on 3090 but that doesn't need the speedup. Is there something incompatible? I built from source.

Ph0rk0z · 2025-01-06T13:28:27Z

I found out that compiling the C extension for multiple architectures causes import errors.

Ph0rk0z · 2025-01-06T19:22:47Z

Source of the error is:

def _modify_model(
    m,
    enable_cnn_optimization=True,
    enable_fused_linear_geglu=True,
    prefer_lowp_gemm=True,
    enable_triton=False,
    enable_triton_reshape=False,
    enable_triton_layer_norm=False,
    memory_format=None,
):

inside: https://github.com/chengzeyi/stable-fast/blob/main/src/sfast/compilers/diffusion_pipeline_compiler.py

enable_fused_linear_geglu=True,

Geglu always enables even on unsupported architecture. I changed it to false and off I go.

I do wonder why SDnext is faster than comfy though.. maybe it's due to diffusers?

Ph0rk0z mentioned this issue Jan 15, 2025

SDXL support? welltop-cn/ComfyUI-TeaCache#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to execute cutlass gemm: internal error on turning. #167

Failed to execute cutlass gemm: internal error on turning. #167

Ph0rk0z commented Dec 8, 2024

Ph0rk0z commented Jan 6, 2025

Ph0rk0z commented Jan 6, 2025

Failed to execute cutlass gemm: internal error on turning. #167

Failed to execute cutlass gemm: internal error on turning. #167

Comments

Ph0rk0z commented Dec 8, 2024

Ph0rk0z commented Jan 6, 2025

Ph0rk0z commented Jan 6, 2025