Unloading multiple loras: norms do not return to their original values #10745

christopher5106 · 2025-02-07T15:43:12Z

When unloading from multiple loras on flux pipeline, I believe that the norm layers are not restored here.

Shouldn't we have:

        if len(transformer_norm_state_dict) > 0:
            original_norm_layers_state_dict = self._load_norm_into_transformer(
                transformer_norm_state_dict,
                transformer=transformer,
                discard_original_layers=False,
            )
            if not hasattr(transformer, "_transformer_norm_layers"):
                 transformer._transformer_norm_layers = original_norm_layers_state_dict

sayakpaul · 2025-02-12T13:36:29Z

Should it not already take care of it?

diffusers/src/diffusers/loaders/lora_pipeline.py

Line 1894 in 464374f

transformer.load_state_dict(transformer._transformer_norm_layers, strict=False)

What am I missing?

Additionally, the following test does ensure its effectivity:

diffusers/tests/lora/test_lora_layers_flux.py

Line 661 in 28f48f4

def test_lora_unload_with_parameter_expanded_shapes(self):

christopher5106 · 2025-02-12T17:57:11Z

Ah, is it possible to call load_lora_weights() multiple times on a pipeline to load multiple weights ? does it unload in between to restore original weights ?

sayakpaul · 2025-02-12T18:04:54Z

If you don’t call unload_lora_weights() it won’t be called automatically.

christopher5106 · 2025-02-12T18:44:03Z

so in that case of multiple calls to load_lora_weights(), the attribute _transformer_norm_layers become overwritten by the norms of the previously loaded lora ?

sayakpaul · 2025-02-13T02:09:45Z

If you're loading Control LoRA and want to keep it with others, the norm layer values that came with the Control LoRA will remain like so. Otherwise, the effectiveness of Control LoRA won't fully be there.

Or am I misinterpreting the core use case here?

christopher5106 · 2025-02-15T09:50:28Z

From what I understand, you make the assumption that only Control Lora have trained norms, right ?

From what i see in my code is:

for lora, adapter in loras:
   pipe.load_lora_weights(local_weights_cache, adapter_name=adapter_name)

that means if two loras come with trained norm layers, we loose original weights. I believe we should make it generic because we might forget this assumption.

sayakpaul · 2025-02-15T11:19:10Z

From what I understand, you make the assumption that only Control Lora have trained norms, right ?

Well, Control LoRA is a bit of a special case in that its state dicts have the exact same norm params as their non-LoRA variants. More specifically, these norm layer params differ from the base Flux.1 Dev model and there were taken from the non-LoRA control variants of Flux (so for the Depth Control LoRA, that would be https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev and for the Canny Control LoRA, that would be https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev/). The norm params are not LoRA params.

But on the other hand, it's totally possible to also target the norm layers for applying LoRA (which is something we see often in the community).

So, I don't think there's a need to change anything here.

christopher5106 · 2025-02-15T12:40:04Z

but my wonder is about the following:

I load one lora with norm layers, let's say lora1_norms:

it saves the original layer weights (let's say original_norms) in _transformer_norm_layers

I load a second lora with norm layers, let's s say lora2_norms:

it saves the overwritten layer weights in _transformer_norm_layers

so now in _transformer_norm_layers I have lora1_norms

when I apply unload_lora_weights(), does it revert to original_norms or to lora1_norms ? what did I miss ?

sayakpaul · 2025-02-15T12:54:53Z

when I apply unload_lora_weights(), does it revert to original_norms or to lora1_norms ? what did I miss ?

It unloads all the LoRA overwritten params including lora1 and lora2.

More notes:
https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#note-about-unloadloraweights-when-using-flux-loras

christopher5106 · 2025-02-15T13:53:56Z

Sorry I don't see what I miss

If I call two times load_lora_weights with 2 different loras lora1 and lora2 that have norm layers, that means to me that two times it's calling _load_norm_into_transformer() that returns the overwritten layers. But to me, where it takes these values is from the current transformer state dict

diffusers/src/diffusers/loaders/lora_pipeline.py

Line 1669 in 69f919d

overwritten_layers_state_dict[key] = transformer_state_dict[key].clone()

and if these norms have been overwritten by lora1 already, when you call unload_loraweights, it will restore the lora1 norm with

diffusers/src/diffusers/loaders/lora_pipeline.py

Line 1894 in 464374f

transformer.load_state_dict(transformer._transformer_norm_layers, strict=False)

but not the original norms in the model.

sayakpaul · 2025-02-15T13:58:25Z

Yeah you're right! But it's also not very common to have structures for norm layer params. We can prioritize that as in when such a LoRA comes in. For now, we can add a comment to make ourselves aware of it.

christopher5106 · 2025-02-15T14:05:34Z

As you want ;)
But you see my point now :-)
I believe it's a bit wider problem than norm layers since load_state_dict() and state_dict() can load whatever tensors for whatever layers in our models

sayakpaul · 2025-02-15T14:08:37Z

But it's also weird for LoRA state dicts to have arbitrary non-LoRA params. So, that is there.

christopher5106 · 2025-02-15T14:10:20Z

bfl could create a new trend that loras with bias and norms become a way to have better loras, no?

sayakpaul · 2025-02-15T14:13:49Z

And we support that already. My point was putting arbitrary keys into a LoRA state dict. As in when those things come, we can support. We cannot speculate about those.

christopher5106 · 2025-02-15T17:12:46Z

Some companies use diffusers code on their production and now there is a good way to hack them is to submit 2 loras, the first one with norms either infinite or close to zero, the second one with norms, and this will make their production produce either b&w images or images full of artifacts if these norms are never unloaded, no ?

christopher5106 · 2025-02-19T15:49:39Z

Let me summarize what I understand:

Case 1/ A user adds a BFL canny lora and BFL depth lora on the same pipe: unloading does not restore original norm layers for flux dev

Case2/ A hacker creates a lora with norm layers and submit to different platforms based on diffusers that heavily load/unload loras: unloading does not restore the model norm layers for flux.

Is that correct ?

a-r-r-o-w · 2025-02-19T16:26:24Z

What you mention is indeed a problem.

We can:

either raise an error if a second lora with norms is loaded mentioning only loading 1 is supported
only keep the original copy of the norms from base model and revert to it when unload_lora_weights is called. if any further layers with norms are loaded, we raise a warning and ignore

Anything apart from this is probably too much effort for something not as popularly done in the community as of yet. Folks that use diffusers in production this way should be mindful of the implications when allowing loading arbitrary state dict and develop their own countermeasures for such scenarios. WDYT?

christopher5106 · 2025-02-19T17:25:59Z

It's just 3 lines to add not to override already stored norm values:

            if hasattr(transformer, "_transformer_norm_layers"):
                  for key in original_norm_layers_state_dict:
                        if key not in transformer._transformer_norm_layers:
                                transformer._transformer_norm_layers[key] = original_norm_layers_state_dict[key]
            else:
                 transformer._transformer_norm_layers = original_norm_layers_state_dict

Other codes do it, for example here where only norm layers that have not been already stored are cloned for later restoration.

christopher5106 · 2025-02-20T16:34:38Z

There is another thing a bit surprising to me:

        if not (has_lora_keys or has_norm_keys):
            raise ValueError("Invalid LoRA checkpoint.")

diffusers/src/diffusers/loaders/lora_pipeline.py

Lines 1532 to 1538 in c7a8c43

    
           # Flux Control LoRAs also have norm keys 
        
           has_norm_keys = any( 
        
               norm_key in key for key in state_dict.keys() for norm_key in self._control_lora_supported_norm_keys 
        
           ) 
        
           if not (has_lora_keys or has_norm_keys): 
        
               raise ValueError("Invalid LoRA checkpoint.")

shouldn't it be simply if not has_lora_keys ? Why is allowed to load a statedict made of norm keys but no lora keys ?

thanks for your clarification

the less we allow to load layers this way by overriding other layers, the better

christopher5106 · 2025-02-20T21:14:17Z

Last, I'm wondering why only weights for text_encoder_1 are loaded. Anyway, we don't have any loras submitted with trained loras for text encoder until now.

sayakpaul · 2025-02-21T02:46:10Z

Because we don't have T5-trained LoRAs yet.

I have a PR open to help users warn if there are unused keys in the state dict: #10187.

We recently added support to load Flux community LoRAs that have text encoder (CLIP) too: #10810.

christopher5106 · 2025-02-21T13:06:34Z

Let me share you an extract of one Lora we got:

"diffusion_model.double_blocks.0.img_attn.proj.diff_b","[3072]","torch.float16"
"diffusion_model.double_blocks.0.img_attn.proj.lora_down.weight","[32, 3072]","torch.float16"
"diffusion_model.double_blocks.0.img_attn.proj.lora_up.weight","[3072, 32]","torch.float16"
"diffusion_model.double_blocks.0.img_attn.qkv.diff_b","[9216]","torch.float16""diffusion_model.double_blocks.0.img_attn.qkv.lora_down.weight","[32, 3072]","torch.float16"
"diffusion_model.double_blocks.0.img_attn.qkv.lora_up.weight","[9216, 32]","torch.float16""diffusion_model.double_blocks.0.img_mlp.0.diff_b","[12288]","torch.float16"
"diffusion_model.double_blocks.0.img_mlp.0.lora_down.weight","[32, 3072]","torch.float16""diffusion_model.double_blocks.0.img_mlp.0.lora_up.weight","[12288, 32]","torch.float16"
"diffusion_model.double_blocks.0.img_mlp.2.diff_b","[3072]","torch.float16""diffusion_model.double_blocks.0.img_mlp.2.lora_down.weight","[32, 12288]","torch.float16"
"diffusion_model.double_blocks.0.img_mlp.2.lora_up.weight","[3072, 32]","torch.float16""diffusion_model.double_blocks.0.img_mod.lin.diff_b","[18432]","torch.float16"
"diffusion_model.double_blocks.0.img_mod.lin.lora_down.weight","[32, 3072]","torch.float16""diffusion_model.double_blocks.0.img_mod.lin.lora_up.weight","[18432, 32]","torch.float16"
...
"diffusion_model.single_blocks.0.modulation.lin.diff_b","[9216]","torch.float16""diffusion_model.single_blocks.0.modulation.lin.lora_down.weight","[32, 3072]","torch.float16""diffusion_model.single_blocks.0.modulation.lin.lora_up.weight","[9216, 32]","torch.float16"
"diffusion_model.single_blocks.1.linear1.diff_b","[21504]","torch.float16"
...
"diffusion_model.vector_in.in_layer.diff_b","[3072]","torch.float16"
"diffusion_model.vector_in.in_layer.lora_down.weight","[32, 768]","torch.float16"
"diffusion_model.vector_in.in_layer.lora_up.weight","[3072, 32]","torch.float16"
"diffusion_model.vector_in.out_layer.diff_b","[3072]","torch.float16"
"diffusion_model.vector_in.out_layer.lora_down.weight","[32, 3072]","torch.float16"
"diffusion_model.vector_in.out_layer.lora_up.weight","[3072, 32]","torch.float16""text_encoders.clip_l.transformer.text_model.embeddings.position_embedding.lora_down.weight","[32, 768]","torch.float16"
"text_encoders.clip_l.transformer.text_model.embeddings.position_embedding.lora_up.weight","[77, 32]","torch.float16"
"text_encoders.clip_l.transformer.text_model.encoder.layers.0.layer_norm1.diff","[768]","torch.float16"
"text_encoders.clip_l.transformer.text_model.encoder.layers.0.layer_norm1.diff_b","[768]","torch.float16"
"text_encoders.clip_l.transformer.text_model.encoder.layers.0.layer_norm2.diff","[768]","torch.float16"
...
"text_encoders.t5xxl.transformer.encoder.block.0.layer.0.layer_norm.diff","[4096]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.DenseReluDense.wi_0.lora_down.weight","[32, 4096]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.DenseReluDense.wi_0.lora_up.weight","[10240, 32]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.DenseReluDense.wi_1.lora_down.weight","[32, 4096]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.DenseReluDense.wi_1.lora_up.weight","[10240, 32]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.DenseReluDense.wo.lora_down.weight","[32, 10240]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.DenseReluDense.wo.lora_up.weight","[4096, 32]","torch.float16"
"text_encoders.t5xxl.transformer.encoder.block.0.layer.1.layer_norm.diff","[4096]","torch.float16"

Looks like users train Flux with T5 or is it another model ?

sayakpaul · 2025-02-21T13:09:52Z

Nice, it's first time I saw a T5 trained LoRA. Is it Flux? Btw, we are deviating from the original thread. So, let's please move this discussion to a new one (as a feature request) and I will add support.

christopher5106 · 2025-02-21T13:26:42Z

#10862 yes

christopher5106 changed the title ~~Loading multiple loras: norms do not return to their original values~~ Unloading multiple loras: norms do not return to their original values Feb 7, 2025

christopher5106 mentioned this issue Feb 21, 2025

Support T5 loras for Flux #10862

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unloading multiple loras: norms do not return to their original values #10745

Unloading multiple loras: norms do not return to their original values #10745

christopher5106 commented Feb 7, 2025

sayakpaul commented Feb 12, 2025

christopher5106 commented Feb 12, 2025

sayakpaul commented Feb 12, 2025

christopher5106 commented Feb 12, 2025

sayakpaul commented Feb 13, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025 •

edited

Loading

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025 •

edited

Loading

christopher5106 commented Feb 19, 2025

a-r-r-o-w commented Feb 19, 2025

christopher5106 commented Feb 19, 2025 •

edited

Loading

christopher5106 commented Feb 20, 2025

christopher5106 commented Feb 20, 2025

sayakpaul commented Feb 21, 2025

christopher5106 commented Feb 21, 2025 •

edited

Loading

sayakpaul commented Feb 21, 2025

christopher5106 commented Feb 21, 2025

Unloading multiple loras: norms do not return to their original values #10745

Unloading multiple loras: norms do not return to their original values #10745

Comments

christopher5106 commented Feb 7, 2025

sayakpaul commented Feb 12, 2025

christopher5106 commented Feb 12, 2025

sayakpaul commented Feb 12, 2025

christopher5106 commented Feb 12, 2025

sayakpaul commented Feb 13, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025 • edited Loading

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025

sayakpaul commented Feb 15, 2025

christopher5106 commented Feb 15, 2025 • edited Loading

christopher5106 commented Feb 19, 2025

a-r-r-o-w commented Feb 19, 2025

christopher5106 commented Feb 19, 2025 • edited Loading

christopher5106 commented Feb 20, 2025

christopher5106 commented Feb 20, 2025

sayakpaul commented Feb 21, 2025

christopher5106 commented Feb 21, 2025 • edited Loading

sayakpaul commented Feb 21, 2025

christopher5106 commented Feb 21, 2025

christopher5106 commented Feb 15, 2025 •

edited

Loading

christopher5106 commented Feb 15, 2025 •

edited

Loading

christopher5106 commented Feb 19, 2025 •

edited

Loading

christopher5106 commented Feb 21, 2025 •

edited

Loading