-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] ControlNet execution fails with CUDNN_STATUS_NOT_SUPPORTED error due to CUDNN tensor contiguity issue in convolutions #169
Comments
Additional InformationI have attempted to ensure input and output contiguity but the error still occurs. Strangely it only occurs with some input sizes and not others. e.g error shape
working shape
Here is my code to enforce contiguity: def enforce_contiguous_hook(layer_name: str):
"""
Creates hooks that enforce contiguous tensors for both inputs and outputs
of Conv2d layers.
Args:
layer_name (str): Name of the layer being hooked
Returns:
pre_hook_fn: Forward pre-hook function for inputs
hook_fn: Forward hook function for outputs
"""
def pre_hook_fn(module, inputs):
inputs = list(inputs)
for idx, input_tensor in enumerate(inputs):
if (
isinstance(input_tensor, torch.Tensor)
and not input_tensor.is_contiguous()
):
print(f"🔄 Making input contiguous in {layer_name}")
inputs[idx] = input_tensor.contiguous()
return tuple(inputs)
def hook_fn(module, inputs, outputs):
if isinstance(outputs, torch.Tensor):
if not outputs.is_contiguous():
print(f"🔄 Making output contiguous in {layer_name}")
outputs = outputs.contiguous()
else:
# Handle case where outputs is a tuple
outputs = list(outputs)
for idx, output_tensor in enumerate(outputs):
if (
isinstance(output_tensor, torch.Tensor)
and not output_tensor.is_contiguous()
):
print(f"🔄 Making output {idx} contiguous in {layer_name}")
outputs[idx] = output_tensor.contiguous()
outputs = tuple(outputs)
return outputs
return pre_hook_fn, hook_fn
def register_contiguous_enforcement(model):
"""
Register hooks that enforce contiguous tensors for Conv2d layers.
Args:
model (nn.Module): PyTorch model to enforce contiguity
"""
for name, module in model.named_modules():
pre_hook_fn, hook_fn = enforce_contiguous_hook(name)
module.register_forward_pre_hook(pre_hook_fn)
module.register_forward_hook(hook_fn) |
@Kinyugo Can you try setting enable_cnn_optimization = False when compiling controlnet? |
@chengzeyi thanks for the fix compilation works when Do you have an idea of the amount of loss in performance that it causes? Since controlnet usually has large image inputs. |
Should be not very large, I guess. Besides, currently my major focus is on newer models so if you are interested you could see my other projects like ParaAttention or Comfy-WaveSpeed. |
Thank you. I will take a look at the projects. Here is the updated code. Let me know if you would like to add the option to skip cnn optimizations for the controlnet. I could add an option to the config and open a PR with the changes. import copy
import torch
from sfast.compilers.diffusion_pipeline_compiler import (
_build_lazy_trace,
compile_unet,
compile_vae,
make_dynamic_graphed_callable,
)
def sfast_compile(m, config):
# attribute `device` is not generally available
device = (
m.device
if hasattr(m, "device")
else torch.device("cuda" if torch.cuda.is_available() else "cpu")
)
enable_cuda_graph = config.enable_cuda_graph and device.type == "cuda"
m.unet = compile_unet(m.unet, config)
if hasattr(m, "controlnet"):
controlnet_config = copy.deepcopy(config)
controlnet_config.enable_cnn_optimization = False
m.controlnet = compile_unet(m.controlnet, controlnet_config)
m.vae = compile_vae(m.vae, config)
if config.enable_jit:
lazy_trace_ = _build_lazy_trace(config)
if getattr(m, "text_encoder", None) is not None:
m.text_encoder.forward = lazy_trace_(m.text_encoder.forward)
# for SDXL
if getattr(m, "text_encoder_2", None) is not None:
m.text_encoder_2.forward = lazy_trace_(m.text_encoder_2.forward)
# for SVD
if getattr(m, "image_encoder", None) is not None:
m.image_encoder.forward = lazy_trace_(m.image_encoder.forward)
if config.trace_scheduler:
m.scheduler.scale_model_input = lazy_trace_(m.scheduler.scale_model_input)
m.scheduler.step = lazy_trace_(m.scheduler.step)
if enable_cuda_graph:
if getattr(m, "text_encoder", None) is not None:
m.text_encoder.forward = make_dynamic_graphed_callable(
m.text_encoder.forward
)
if getattr(m, "text_encoder_2", None) is not None:
m.text_encoder_2.forward = make_dynamic_graphed_callable(
m.text_encoder_2.forward
)
if getattr(m, "image_encoder", None) is not None:
m.image_encoder.forward = make_dynamic_graphed_callable(
m.image_encoder.forward
)
if hasattr(m, "image_processor"):
from sfast.libs.diffusers.image_processor import patch_image_prcessor
patch_image_prcessor(m.image_processor)
return m |
Summary
The
stable_fast
library fails to run the ControlNet module in a Stable Diffusion Image-to-Image pipeline. The error occurs specifically during CUDNN convolution operations.Steps to Reproduce
do_compile_controlnet=True
Error
Investigation Findings
.contiguous()
to ControlNet inputs did not resolve the issueHere is the code I used to selectively compile different parts of the pipeline:
Questions
Environment Details
The text was updated successfully, but these errors were encountered: