Add Minimal Implementation of Masked Weight Loss #236

AI-Casanova · 2023-02-26T20:12:32Z

This is my first ever Pull Request so bare with me, please.

@cloneofsimo instituted weighted loss here: cloneofsimo/lora#96 based on a facial recognition mask.

Following his work somewhat (I am no programmer) I came up with this implementation. It takes the alpha channel of a PNG, and converts it to a weight mask between 0-1. This is multiplied to noise_pred and target immediately before loss is calculated.

I was unsure how you'd like to handle passing args.masked_loss to train_util.py so my PR loads masks from every image and stores them, regardless of whether the files have an alpha layer.

The new argument --masked_loss toggles the multiplication of the mask to noise_pred and target.

I have only tested this on train_network.py dreambooth, as I do not have any other datasets prepared, but I do not believe that the method I implemented will impact the other methods.

AI-Casanova · 2023-02-27T00:38:58Z

Quick and dirty example of --masked_loss vs normal loss. All hyper parameters were the same between training runs, including a fixed seed.

AI generated dataset and manual masks

First Row Unmasked

Second Row Masked

bmaltais · 2023-02-27T01:00:38Z

How is the mask used by the script? Is it seperate files in the image folder, one for each images with face? Pure white mean full weight for learning, black mean don't look at it and 50% grey is look at it but not as much as the face area?

Could you share the test dataset for testing, studying?

This is an interesting concept. More work but I think the results are well worth it.

AI-Casanova · 2023-02-27T02:17:26Z

Here is the dataset I showed above
https://drive.google.com/drive/folders/1-3wc_zbzJiQ3NkshIL0M3fMtNf6DBykx

The masks are saved as the alpha channel of the PNGs, which as long as you don't pre-multiply saves the RGB channels intact. And prevents yet another instance of find the match like exists with captions and .npz

I made these masks in Gimp, but I've also experimented with depth maps on my private dataset.

You are correct, Full white areas experience normal loss, and black are completely removed from loss.

The gray in these masks is 75% I believe, will run again in the next day with 50% and 25% minimums.

Feel free to check my math, and some things like the *8 in the reshape are straight from @cloneofsimo, and I don't know exactly how that operates.

bmaltais · 2023-02-27T03:08:05Z

Preliminary results of a crude test loo impressive. Here is a graph showing the loss average of the non masked training vs the masked training:

Masked training is the one in orange. The non masked training is not yet completed... but the samples from the masked training are pretty good.

I have never seen this low of a loss ever... and the results are still ok... Usually this low of a loss mean super fried model... but not this time.

bmaltais · 2023-02-27T03:12:32Z

And here is a quick comparison:

With masked face:

Without masked face:

Same seed, same prompt. This is shocking. Masked output is much more like the dataset... very interesting PR. The proportions of the non masked model are all over the place. Masked appear to have learned much better. I guess making must help the trainer to focus attention only on important aspects of the subject and not spend time learning blurry backgrounds, etc.

I used photoshop to create the mask ans isolated the subject as part of the masking.

Here is an example source and masked image:

I applied a 50% grey mask over the body, hair, no mask over the face and black over the background on each of the 16 source images.

bmaltais · 2023-02-27T03:26:34Z

And one last comp before going to bed.

Masked:

Not Masked:

Hairs appear more defined on the non masked... but the prompt was callink for a flowery dress... and the masked model complied... where the non masked was much more rigid... more experimentation needed... but interesting...

AI-Casanova · 2023-02-27T03:57:06Z

I have never seen this low of a loss ever... and the results are still ok... Usually this low of a loss mean super fried model... but not this time.

This is to be expected, as the tensors are being modified before the mean square error step. The errors in the larger blacked out backgrounds are not being counted at all.

My back of the envelope math says you can't account for the variation by a single factor, you'd have to run the MSE twice in order to get a comparable number to the unmasked training.

This complicates tracking losses against unmasked runs, but similar runs of masked datasets will have similar losses in my experimentation.

AI-Casanova · 2023-02-27T04:01:18Z

Hairs appear more defined on the non masked...

This is part of the reason I ran a gaussian blur on my alpha layers (though perhaps bloom might be more effective to only push higher values out, not meld evenly like blur)

My thought is that allowing some training at the interface between subject and background might help with those edge details.

bmaltais · 2023-02-27T13:09:55Z

Did another test with another subject. Somehow for this one the masking area are showing as part of the resulting model... but I did half the training epoch... Perhaps the masking is too strong for that model.

AI-Casanova · 2023-02-27T13:26:06Z

the masking area are showing
@bmaltais

Double check that your alpha channel was not pre-multiplied with the color channels. The first three layers should be exactly the same as the original image.

If that was at fault, it makes the case for side loading the masks, to prevent people from doing the same. Side loading would also allow the use of 16bit grayscale depth masks natively, ie the outputs from https://github.com/thygate/stable-diffusion-webui-depthmap-script

AI-Casanova · 2023-02-27T13:38:53Z

For my purposes Alpha channel was sufficient with these options in GIMP (save color values and 8bit RGBA being most important)

But I can see how that would be a problem for users who aren't familiar with the PNG file format.

I'll try to get a sideloader working this evening.

bmaltais · 2023-02-27T13:40:15Z

I am using photoshop... so it can be a bit difficult... perhaps the issue is that I tried to apply a gausian blur to the mask but perhaps it applied to the whole image?

Anyhow, I redid a mask of just the subject with no blur and it turned out pretty good. I n fact it appear to have better capture body proportions also.

AI-Casanova · 2023-02-27T14:34:54Z

The other test that could be run is whether or not EXTRA attention is beneficial.

DreamArtist used a non linear scale from 0-5 instead of the linear scale here of 0-1

https://github.com/7eu7d7/DreamArtist-sd-webui-extension#attention-mask

To test a linear scale from 0-2, line 521 could be changed to info.mask = mask/128 and line 528 to info.mask_flipped = mask[::-1]/128

AI-Casanova · 2023-02-28T06:58:40Z

I just switched to sideloaded masks to avoid confusion with the alpha layers.
train_util.py now searches for .mask files in the same directory, pretty much the same logic as .txt caption files.
These files can be any of the filetypes accepted by the trainer, and has been tested on 16bit PNG from depths maps as well.

This is not backward with alpha layer masks.

Also added a self.mask_max_attention stub for scaling beyond 0-1
Setting it to 1.25 would make losses in white areas 25% stronger.

kohya-ss · 2023-02-28T10:48:21Z

Thank you for this PR! This is quite interesting. I think masked-loss is an advanced feature and it is out of scope of the repo, but the minimal implementation is good.

I also think it might be an idea to use separated images for mask, because alpha channel is bit difficult to manage.

I am working on another big PR, so I will review in near future. There might be some things to consider how to implement (such as dataset features like cropping etc.)

AI-Casanova · 2023-02-28T12:29:21Z

@kohya-ss I enjoy the projects and sharing what I come up with for my own training, and I do like that you keep a very stable codebase, so I am by no means offended if I am outside of your intended scope.

May I ask why you pass all of the args to 'FineTuningDatasetandDreamBoothDatasetas positional arguments instead of passingargs` itself? I just noticed it when trying to send extra arguments through for the dataset handling I was doing, and had to add more positionals.

AI-Casanova · 2023-02-28T12:30:57Z

Oops, didn't mean to close it, fat thumbs, but I will leave the fork as is, so it's available for review.

kohya-ss · 2023-02-28T12:44:21Z

Thank you!

There is no major reason for not passing args. It is because the definition of args was not shared before across scripts. It will need to be refactored in the near future 😅

AI-Casanova · 2023-02-28T13:05:53Z

I don't envy you for having to do it, but I look forward to it being done at some point. :)

bmaltais · 2023-02-28T14:20:10Z

@AI-Casanova I tried the new .mask feature and it worked very well. I was not able to get anywhere with the photoshop alpha channed yesterday for a model but using depth map generated images named .mask has produced great results. Too bad this feature does not get integrated at the moment. Hopefully once the large change is implemented @kohya-ss might consider integrating it.

bmaltais · 2023-02-28T15:06:36Z

In a way using depth map mask is like a controlnet for learning... It guide SD to learn what is important, maximising the learning potential of each images.

AI-Casanova · 2023-02-28T22:22:02Z

It guide SD to learn what is important, maximising the learning potential of each images.

That was my hope, to reduce reliance on captioning backgrounds. It is my (perhaps mistaken) feeling that the longer the caption, the more likely it is for the identity I'm trying to train to bleed out into the rest of the caption.

bmaltais · 2023-03-01T14:09:11Z

It guide SD to learn what is important, maximising the learning potential of each images.

That was my hope, to reduce reliance on captioning backgrounds. It is my (perhaps mistaken) feeling that the longer the caption, the more likely it is for the identity I'm trying to train to bleed out into the rest of the caption.

I think you are right. I wonder if convnet depth_leres could be dynamically used at dataset image load time to generate a depth mash for training... using convnet as a tool to focus training vs manually creating the masks beforahand. I guess the convnet masks could be saved for reuse in other run...

AI-Casanova · 2023-03-01T14:21:35Z

convnet depth_leres

Are you talking about this? https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS

I just had a thought as well, we are reshaping the mask down to a 64x64 at the point of noise. I followed cloneofsimo in using 'nearest' but wouldn't we want a max function there? Id think we'd want to train the parts of the noise that included our important subject as strongly as we ask for the subject, even if that raises the surroundings some.

flesnuk · 2023-04-16T21:50:49Z

@AI-Casanova will this PR still be worked on? I don't know if this feature was added in other commit or not, or how to use it.
I saw that the original LoRA repo has something similar to this implemented but still doesn't clarify how to use it:
cloneofsimo/lora#96
Can you change the PR status to Open if is intended to be merged sometime, so there isn't any confusion?

AI-Casanova · 2023-04-16T22:47:51Z

@flesnuk this PR needs to be reworked from scratch because of the major changes that have happened in the mean time, but it something that I should probably work on, as it showed some promise.

TingTingin · 2023-06-05T13:25:12Z

Any movement on this?

AI-Casanova · 2023-06-05T13:59:05Z

@TingTingin Honestly it's quite fallen off my radar, in lieu of other projects.

@kohya-ss Would you be interested if I take another crack at this? I'd likely put the mask loading as a TOML config option to select a folder and then search for matching names.

It occurs to me that it might be quite possible to use mediapipe to auto-mask faces, with face set at one value and everything else at another. This would be potentially useful both to train faces, and to ignore them when training a costume etc.

TingTingin · 2023-06-05T14:11:06Z

Wouldn't including the mask in the same folder be the most sensible since that's how the other settings have worked thus far? It would be interesting though if the abilty to use some auto generated mask was added people could potentially extend with different segmenters later

AI-Casanova · 2023-06-05T14:32:38Z

There's a potential file naming issue with using the same folder. You could rename the files to *.mask or something, but IMO people might be more comfortable to have /dataset/ABCD.png and /dataset/mask/ABCD.png, and I believe it would make the code cleaner as well.

TingTingin · 2023-06-05T15:22:09Z

Could potentially pass a mask folder argument with the default being the dataset/mask folder

AI-Casanova · 2023-06-05T15:30:59Z

Good idea. For every folder, if masked loss is enabled, assume folder/mask/* unless specified otherwise.

kohya-ss · 2023-06-06T03:40:20Z

@AI-Casanova
Thanks for your proposal! Currently ddPn08 is working on training ControlNet, which requires one additional image for each image in the dataset. So it is fundamentally same to the mask images, I am thinking of modifying it to deal with the mask images after the merge of its PRs.

In addition, I wonder that face detection and automatic mask creation is a complex task and requires different dependencies, so it might be better as a separate repository.

AI-Casanova · 2023-06-06T04:01:05Z

I noticed there was a ControlNet branch, but never popped in, that sounds perfect.

I understand the point about auto-masking, I'll take some time to think about what that would best look like.

As to the extra input and data loader changes, it's somewhat tangential to another proposal I had for you. With the inclusion of sliced VAE, and thus the ability to load larger images, it might be possible now to add data augmentations like random crop, random zoom, affine transforms etc to cached latents by loading them larger, and slicing in the data loader. This would have to be replicated for potential masks, but that isn't an insurmountable problem.

kohya-ss · 2023-06-06T09:09:50Z

control_net branch is obsoleted (I implemented ControlNet as LoRA, but unfortunately it had less strength to control), ddPn08 is working on this PR: #551.

Certainly, it would be attractive if augmentation could be performed on cached latents. However, there seems to be a subtle difference between latents retrieved from a cropped image, and cropped latents of the entire image. This is a rather annoying problem (which is why VAE's simple tiling produces checkerboard patterns).

AI-Casanova · 2023-06-06T18:39:10Z

I understand your concern about cropping latents, but a quick empirical test doesn't seem to show any worrying distortions at the cropped edges.
Center crop:
latents = latents[:,:2,rc:latents.shape[-2]-2),2:latents.shape[-1]-2]
Corner Crop:
latents = latents[:,:,:latents.shape[-2]-4,:latents.shape[-1]-4]
Random crop:
rc= random.randint(0,4) latents = latents[:,:,rc:latents.shape[-2]-(4-rc),rc:latents.shape[-1]-(4-rc)]

Or are you worried about a tiled VAE encoding adding the seams to the base latent?

AI-Casanova · 2023-06-07T04:43:48Z

Added one more crop method, this time loading though the VAE at 1.5x size (ie 768 and 1152 instead of 512 and 768) and then cropping to the original size.

I have a hunch that the issues with the VAE are on the decoder side, and not the encoder, but I'm not sure how to prove that beside that I'm empirically not seeing any distortions except for the fact that I've cropped the face in training, and thus in my results.
(slight color shift towards natural skin tone in the last column is probably max_norm which I forgot to disable)

recris · 2023-06-14T18:26:57Z

@AI-Casanova I've submitted a slightly reworked version of your PR, it should work with current main branch

I've changed the mask loading logic to look for masks in a mask sub-folder, as previously suggested.

I've also tweaked the MSE loss calculation, mask values are now normalized using the mask mean value. I believe this makes the magnitude of the calculated loss to be less sensitive to the amount of non-black pixels in the mask - my testing agrees with this.

TingTingin · 2023-06-14T18:29:50Z

Just for the sake clarification the mask logic as is makes it so that pixels which are white have the most attention Grey the second most and black the lowest?

AI-Casanova · 2023-06-14T18:41:55Z

@recris that's amazing! I'll pull as soon as life lets me and give it a try.

recris · 2023-06-14T18:42:03Z

The relative attention of each pixel is still the same, grayscale value are mapped to the [0, 1] weight range.

As an example, lets say you have training set with 2 black and white masks, one has a very large white area (A) and the other has a small white area (B). Without re-scaling, the computed MSE loss will be proportional to the amount of white pixels in the overall image, meaning the calculated loss in A will be (statistically) greater than the loss in B, and this may skew the training. By re-scaling we eliminate this proportionality effect.

AI-Casanova · 2023-06-14T18:59:45Z

I worry a bit about that rescaling. In my use case, I'd mask faces at 1, bodies at .5, and backgrounds at .25 perhaps.

I would think that any face should be backpropagated wrt the same per pixel loss, disregarding the reported loss, as I don't find loss graphs to be valuable.

recris · 2023-06-14T19:40:14Z

I've done a few runs with re-scaling and haven't found negative effects yet. Keep in mind that the relative pixel weights stay the same since everything is scaled by the same factor.

The biggest improvement I found with this whole change is that certain details from the training data, like backgrounds, stopped leaking into the generated images. This was especially evident when the model is over-fitted. Previously I had to be smarter when choosing and labeling the images but now I can get the same quality with less effort put into the training set.

AI-Casanova · 2023-06-14T19:43:01Z

Out of curiosity, are you fully zeroing your backgrounds, or leaving a bit for context?

recris · 2023-06-14T19:54:33Z

My backgrounds are fully black, but I am still providing background descriptions in the captions (not sure if it matters)

recris · 2023-06-14T20:01:15Z

Because creating human subject masks is very tedious, I've came up with a small script to automate this process: https://github.com/recris/subject-masker

It uses a combination of face detection, parsing and recognition models, plus instance segmentation model to generate subject masks. It supports providing distinct weight values for face, hair, body and background.

Overall the generated masks are pretty good, with the occasional mask that needs some additional cleaning in GIMP.

AI-Casanova · 2023-06-14T20:06:18Z

All the things I was going to do """someday""" this boss already did.

Sideloaded masks, not backwards compatible

9c0a644

AI-Casanova force-pushed the masked_loss branch from 12e787c to 9c0a644 Compare February 28, 2023 06:53

AI-Casanova closed this Feb 28, 2023

recris mentioned this pull request Jun 14, 2023

add masked loss implementation #589

Closed

AI-Casanova deleted the masked_loss branch June 21, 2023 02:04

Add Minimal Implementation of Masked Weight Loss #236

Add Minimal Implementation of Masked Weight Loss #236

Conversation

AI-Casanova commented Feb 26, 2023

AI-Casanova commented Feb 27, 2023

bmaltais commented Feb 27, 2023 • edited Loading

AI-Casanova commented Feb 27, 2023

bmaltais commented Feb 27, 2023 • edited Loading

bmaltais commented Feb 27, 2023 • edited Loading

bmaltais commented Feb 27, 2023 • edited Loading

AI-Casanova commented Feb 27, 2023

AI-Casanova commented Feb 27, 2023

bmaltais commented Feb 27, 2023

AI-Casanova commented Feb 27, 2023 • edited Loading

AI-Casanova commented Feb 27, 2023

bmaltais commented Feb 27, 2023

AI-Casanova commented Feb 27, 2023

AI-Casanova commented Feb 28, 2023

kohya-ss commented Feb 28, 2023

AI-Casanova commented Feb 28, 2023

AI-Casanova commented Feb 28, 2023

kohya-ss commented Feb 28, 2023

AI-Casanova commented Feb 28, 2023

bmaltais commented Feb 28, 2023 • edited Loading

bmaltais commented Feb 28, 2023 • edited Loading

AI-Casanova commented Feb 28, 2023

bmaltais commented Mar 1, 2023

AI-Casanova commented Mar 1, 2023

flesnuk commented Apr 16, 2023

AI-Casanova commented Apr 16, 2023

TingTingin commented Jun 5, 2023

AI-Casanova commented Jun 5, 2023

TingTingin commented Jun 5, 2023

AI-Casanova commented Jun 5, 2023

TingTingin commented Jun 5, 2023

AI-Casanova commented Jun 5, 2023

kohya-ss commented Jun 6, 2023

AI-Casanova commented Jun 6, 2023

kohya-ss commented Jun 6, 2023

AI-Casanova commented Jun 6, 2023

AI-Casanova commented Jun 7, 2023

recris commented Jun 14, 2023

TingTingin commented Jun 14, 2023

AI-Casanova commented Jun 14, 2023

recris commented Jun 14, 2023

AI-Casanova commented Jun 14, 2023

recris commented Jun 14, 2023

AI-Casanova commented Jun 14, 2023

recris commented Jun 14, 2023

recris commented Jun 14, 2023

AI-Casanova commented Jun 14, 2023

bmaltais commented Feb 27, 2023 •

edited

Loading

bmaltais commented Feb 27, 2023 •

edited

Loading

bmaltais commented Feb 27, 2023 •

edited

Loading

bmaltais commented Feb 27, 2023 •

edited

Loading

AI-Casanova commented Feb 27, 2023 •

edited

Loading

bmaltais commented Feb 28, 2023 •

edited

Loading

bmaltais commented Feb 28, 2023 •

edited

Loading