Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Minimal Implementation of Masked Weight Loss #236

Closed
wants to merge 1 commit into from

Conversation

AI-Casanova
Copy link
Contributor

This is my first ever Pull Request so bare with me, please.

@cloneofsimo instituted weighted loss here: cloneofsimo/lora#96 based on a facial recognition mask.

Following his work somewhat (I am no programmer) I came up with this implementation. It takes the alpha channel of a PNG, and converts it to a weight mask between 0-1. This is multiplied to noise_pred and target immediately before loss is calculated.

I was unsure how you'd like to handle passing args.masked_loss to train_util.py so my PR loads masks from every image and stores them, regardless of whether the files have an alpha layer.

The new argument --masked_loss toggles the multiplication of the mask to noise_pred and target.

I have only tested this on train_network.py dreambooth, as I do not have any other datasets prepared, but I do not believe that the method I implemented will impact the other methods.

@AI-Casanova
Copy link
Contributor Author

Quick and dirty example of --masked_loss vs normal loss. All hyper parameters were the same between training runs, including a fixed seed.

AI generated dataset and manual masks
Dataset

First Row Unmasked
xyz_grid-0141-RPGv4
Second Row Masked

@bmaltais
Copy link
Contributor

bmaltais commented Feb 27, 2023

How is the mask used by the script? Is it seperate files in the image folder, one for each images with face? Pure white mean full weight for learning, black mean don't look at it and 50% grey is look at it but not as much as the face area?

Could you share the test dataset for testing, studying?

This is an interesting concept. More work but I think the results are well worth it.

@AI-Casanova
Copy link
Contributor Author

Here is the dataset I showed above
https://drive.google.com/drive/folders/1-3wc_zbzJiQ3NkshIL0M3fMtNf6DBykx

The masks are saved as the alpha channel of the PNGs, which as long as you don't pre-multiply saves the RGB channels intact. And prevents yet another instance of find the match like exists with captions and .npz

I made these masks in Gimp, but I've also experimented with depth maps on my private dataset.

You are correct, Full white areas experience normal loss, and black are completely removed from loss.

The gray in these masks is 75% I believe, will run again in the next day with 50% and 25% minimums.

Feel free to check my math, and some things like the *8 in the reshape are straight from @cloneofsimo, and I don't know exactly how that operates.

@bmaltais
Copy link
Contributor

bmaltais commented Feb 27, 2023

Preliminary results of a crude test loo impressive. Here is a graph showing the loss average of the non masked training vs the masked training:

image

Masked training is the one in orange. The non masked training is not yet completed... but the samples from the masked training are pretty good.

I have never seen this low of a loss ever... and the results are still ok... Usually this low of a loss mean super fried model... but not this time.

@bmaltais
Copy link
Contributor

bmaltais commented Feb 27, 2023

And here is a quick comparison:

With masked face:

grid-0005

Without masked face:

grid-0006

Same seed, same prompt. This is shocking. Masked output is much more like the dataset... very interesting PR. The proportions of the non masked model are all over the place. Masked appear to have learned much better. I guess making must help the trainer to focus attention only on important aspects of the subject and not spend time learning blurry backgrounds, etc.

I used photoshop to create the mask ans isolated the subject as part of the masking.

Here is an example source and masked image:

wp1997533-kirsten-dunst-wallpapers-gigapixel-standard-scale-1_00x-cropped

wp1997533-kirsten-dunst-wallpapers-gigapixel-standard-scale-1_00x-cropped

I applied a 50% grey mask over the body, hair, no mask over the face and black over the background on each of the 16 source images.

@bmaltais
Copy link
Contributor

bmaltais commented Feb 27, 2023

And one last comp before going to bed.

Masked:

grid-0009

Not Masked:

grid-0010

Hairs appear more defined on the non masked... but the prompt was callink for a flowery dress... and the masked model complied... where the non masked was much more rigid... more experimentation needed... but interesting...

@AI-Casanova
Copy link
Contributor Author

I have never seen this low of a loss ever... and the results are still ok... Usually this low of a loss mean super fried model... but not this time.

This is to be expected, as the tensors are being modified before the mean square error step. The errors in the larger blacked out backgrounds are not being counted at all.

My back of the envelope math says you can't account for the variation by a single factor, you'd have to run the MSE twice in order to get a comparable number to the unmasked training.

This complicates tracking losses against unmasked runs, but similar runs of masked datasets will have similar losses in my experimentation.

@AI-Casanova
Copy link
Contributor Author

Hairs appear more defined on the non masked...

This is part of the reason I ran a gaussian blur on my alpha layers (though perhaps bloom might be more effective to only push higher values out, not meld evenly like blur)

My thought is that allowing some training at the interface between subject and background might help with those edge details.

@bmaltais
Copy link
Contributor

Did another test with another subject. Somehow for this one the masking area are showing as part of the resulting model... but I did half the training epoch... Perhaps the masking is too strong for that model.

grid-0011

@AI-Casanova
Copy link
Contributor Author

AI-Casanova commented Feb 27, 2023

the masking area are showing
@bmaltais

Double check that your alpha channel was not pre-multiplied with the color channels. The first three layers should be exactly the same as the original image.

If that was at fault, it makes the case for side loading the masks, to prevent people from doing the same. Side loading would also allow the use of 16bit grayscale depth masks natively, ie the outputs from https://github.com/thygate/stable-diffusion-webui-depthmap-script

@AI-Casanova
Copy link
Contributor Author

For my purposes Alpha channel was sufficient with these options in GIMP (save color values and 8bit RGBA being most important)
image

But I can see how that would be a problem for users who aren't familiar with the PNG file format.

I'll try to get a sideloader working this evening.

@bmaltais
Copy link
Contributor

I am using photoshop... so it can be a bit difficult... perhaps the issue is that I tried to apply a gausian blur to the mask but perhaps it applied to the whole image?

Anyhow, I redid a mask of just the subject with no blur and it turned out pretty good. I n fact it appear to have better capture body proportions also.

grid-0012

@AI-Casanova
Copy link
Contributor Author

The other test that could be run is whether or not EXTRA attention is beneficial.

DreamArtist used a non linear scale from 0-5 instead of the linear scale here of 0-1

https://github.com/7eu7d7/DreamArtist-sd-webui-extension#attention-mask

To test a linear scale from 0-2, line 521 could be changed to info.mask = mask/128 and line 528 to info.mask_flipped = mask[::-1]/128

@AI-Casanova
Copy link
Contributor Author

I just switched to sideloaded masks to avoid confusion with the alpha layers.
train_util.py now searches for .mask files in the same directory, pretty much the same logic as .txt caption files.
These files can be any of the filetypes accepted by the trainer, and has been tested on 16bit PNG from depths maps as well.

This is not backward with alpha layer masks.

Also added a self.mask_max_attention stub for scaling beyond 0-1
Setting it to 1.25 would make losses in white areas 25% stronger.

@kohya-ss
Copy link
Owner

Thank you for this PR! This is quite interesting. I think masked-loss is an advanced feature and it is out of scope of the repo, but the minimal implementation is good.

I also think it might be an idea to use separated images for mask, because alpha channel is bit difficult to manage.

I am working on another big PR, so I will review in near future. There might be some things to consider how to implement (such as dataset features like cropping etc.)

@AI-Casanova
Copy link
Contributor Author

@kohya-ss I enjoy the projects and sharing what I come up with for my own training, and I do like that you keep a very stable codebase, so I am by no means offended if I am outside of your intended scope.

May I ask why you pass all of the args to 'FineTuningDatasetandDreamBoothDatasetas positional arguments instead of passingargs` itself? I just noticed it when trying to send extra arguments through for the dataset handling I was doing, and had to add more positionals.

@AI-Casanova
Copy link
Contributor Author

Oops, didn't mean to close it, fat thumbs, but I will leave the fork as is, so it's available for review.

@kohya-ss
Copy link
Owner

Thank you!

There is no major reason for not passing args. It is because the definition of args was not shared before across scripts. It will need to be refactored in the near future 😅

@AI-Casanova
Copy link
Contributor Author

I don't envy you for having to do it, but I look forward to it being done at some point. :)

@bmaltais
Copy link
Contributor

bmaltais commented Feb 28, 2023

@AI-Casanova I tried the new .mask feature and it worked very well. I was not able to get anywhere with the photoshop alpha channed yesterday for a model but using depth map generated images named .mask has produced great results. Too bad this feature does not get integrated at the moment. Hopefully once the large change is implemented @kohya-ss might consider integrating it.

@bmaltais
Copy link
Contributor

bmaltais commented Feb 28, 2023

In a way using depth map mask is like a controlnet for learning... It guide SD to learn what is important, maximising the learning potential of each images.

@AI-Casanova
Copy link
Contributor Author

It guide SD to learn what is important, maximising the learning potential of each images.

That was my hope, to reduce reliance on captioning backgrounds. It is my (perhaps mistaken) feeling that the longer the caption, the more likely it is for the identity I'm trying to train to bleed out into the rest of the caption.

@bmaltais
Copy link
Contributor

bmaltais commented Mar 1, 2023

It guide SD to learn what is important, maximising the learning potential of each images.

That was my hope, to reduce reliance on captioning backgrounds. It is my (perhaps mistaken) feeling that the longer the caption, the more likely it is for the identity I'm trying to train to bleed out into the rest of the caption.

I think you are right. I wonder if convnet depth_leres could be dynamically used at dataset image load time to generate a depth mash for training... using convnet as a tool to focus training vs manually creating the masks beforahand. I guess the convnet masks could be saved for reuse in other run...

@AI-Casanova
Copy link
Contributor Author

convnet depth_leres

Are you talking about this? https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS

I just had a thought as well, we are reshaping the mask down to a 64x64 at the point of noise. I followed cloneofsimo in using 'nearest' but wouldn't we want a max function there? Id think we'd want to train the parts of the noise that included our important subject as strongly as we ask for the subject, even if that raises the surroundings some.

@flesnuk
Copy link

flesnuk commented Apr 16, 2023

@AI-Casanova will this PR still be worked on? I don't know if this feature was added in other commit or not, or how to use it.
I saw that the original LoRA repo has something similar to this implemented but still doesn't clarify how to use it:
cloneofsimo/lora#96
Can you change the PR status to Open if is intended to be merged sometime, so there isn't any confusion?

@AI-Casanova
Copy link
Contributor Author

@flesnuk this PR needs to be reworked from scratch because of the major changes that have happened in the mean time, but it something that I should probably work on, as it showed some promise.

@TingTingin
Copy link
Contributor

Any movement on this?

@AI-Casanova
Copy link
Contributor Author

@TingTingin Honestly it's quite fallen off my radar, in lieu of other projects.

@kohya-ss Would you be interested if I take another crack at this? I'd likely put the mask loading as a TOML config option to select a folder and then search for matching names.

It occurs to me that it might be quite possible to use mediapipe to auto-mask faces, with face set at one value and everything else at another. This would be potentially useful both to train faces, and to ignore them when training a costume etc.

@TingTingin
Copy link
Contributor

Wouldn't including the mask in the same folder be the most sensible since that's how the other settings have worked thus far? It would be interesting though if the abilty to use some auto generated mask was added people could potentially extend with different segmenters later

@AI-Casanova
Copy link
Contributor Author

There's a potential file naming issue with using the same folder. You could rename the files to *.mask or something, but IMO people might be more comfortable to have /dataset/ABCD.png and /dataset/mask/ABCD.png, and I believe it would make the code cleaner as well.

@TingTingin
Copy link
Contributor

Could potentially pass a mask folder argument with the default being the dataset/mask folder

@AI-Casanova
Copy link
Contributor Author

Good idea. For every folder, if masked loss is enabled, assume folder/mask/* unless specified otherwise.

@kohya-ss
Copy link
Owner

kohya-ss commented Jun 6, 2023

@AI-Casanova
Thanks for your proposal! Currently ddPn08 is working on training ControlNet, which requires one additional image for each image in the dataset. So it is fundamentally same to the mask images, I am thinking of modifying it to deal with the mask images after the merge of its PRs.

In addition, I wonder that face detection and automatic mask creation is a complex task and requires different dependencies, so it might be better as a separate repository.

@AI-Casanova
Copy link
Contributor Author

I noticed there was a ControlNet branch, but never popped in, that sounds perfect.

I understand the point about auto-masking, I'll take some time to think about what that would best look like.

As to the extra input and data loader changes, it's somewhat tangential to another proposal I had for you. With the inclusion of sliced VAE, and thus the ability to load larger images, it might be possible now to add data augmentations like random crop, random zoom, affine transforms etc to cached latents by loading them larger, and slicing in the data loader. This would have to be replicated for potential masks, but that isn't an insurmountable problem.

@kohya-ss
Copy link
Owner

kohya-ss commented Jun 6, 2023

control_net branch is obsoleted (I implemented ControlNet as LoRA, but unfortunately it had less strength to control), ddPn08 is working on this PR: #551.

Certainly, it would be attractive if augmentation could be performed on cached latents. However, there seems to be a subtle difference between latents retrieved from a cropped image, and cropped latents of the entire image. This is a rather annoying problem (which is why VAE's simple tiling produces checkerboard patterns).

@AI-Casanova
Copy link
Contributor Author

I understand your concern about cropping latents, but a quick empirical test doesn't seem to show any worrying distortions at the cropped edges.
Center crop:
latents = latents[:,:2,rc:latents.shape[-2]-2),2:latents.shape[-1]-2]
Corner Crop:
latents = latents[:,:,:latents.shape[-2]-4,:latents.shape[-1]-4]
Random crop:
rc= random.randint(0,4) latents = latents[:,:,rc:latents.shape[-2]-(4-rc),rc:latents.shape[-1]-(4-rc)]
xyz_grid-0162-epicrealism_v1

Or are you worried about a tiled VAE encoding adding the seams to the base latent?

@AI-Casanova
Copy link
Contributor Author

Added one more crop method, this time loading though the VAE at 1.5x size (ie 768 and 1152 instead of 512 and 768) and then cropping to the original size.

I have a hunch that the issues with the VAE are on the decoder side, and not the encoder, but I'm not sure how to prove that beside that I'm empirically not seeing any distortions except for the fact that I've cropped the face in training, and thus in my results.
(slight color shift towards natural skin tone in the last column is probably max_norm which I forgot to disable)
image

@recris
Copy link

recris commented Jun 14, 2023

@AI-Casanova I've submitted a slightly reworked version of your PR, it should work with current main branch

I've changed the mask loading logic to look for masks in a mask sub-folder, as previously suggested.

I've also tweaked the MSE loss calculation, mask values are now normalized using the mask mean value. I believe this makes the magnitude of the calculated loss to be less sensitive to the amount of non-black pixels in the mask - my testing agrees with this.

@TingTingin
Copy link
Contributor

Just for the sake clarification the mask logic as is makes it so that pixels which are white have the most attention Grey the second most and black the lowest?

@AI-Casanova
Copy link
Contributor Author

@recris that's amazing! I'll pull as soon as life lets me and give it a try.

@recris
Copy link

recris commented Jun 14, 2023

The relative attention of each pixel is still the same, grayscale value are mapped to the [0, 1] weight range.

As an example, lets say you have training set with 2 black and white masks, one has a very large white area (A) and the other has a small white area (B). Without re-scaling, the computed MSE loss will be proportional to the amount of white pixels in the overall image, meaning the calculated loss in A will be (statistically) greater than the loss in B, and this may skew the training. By re-scaling we eliminate this proportionality effect.

@AI-Casanova
Copy link
Contributor Author

I worry a bit about that rescaling. In my use case, I'd mask faces at 1, bodies at .5, and backgrounds at .25 perhaps.

I would think that any face should be backpropagated wrt the same per pixel loss, disregarding the reported loss, as I don't find loss graphs to be valuable.

@recris
Copy link

recris commented Jun 14, 2023

I've done a few runs with re-scaling and haven't found negative effects yet. Keep in mind that the relative pixel weights stay the same since everything is scaled by the same factor.

The biggest improvement I found with this whole change is that certain details from the training data, like backgrounds, stopped leaking into the generated images. This was especially evident when the model is over-fitted. Previously I had to be smarter when choosing and labeling the images but now I can get the same quality with less effort put into the training set.

@AI-Casanova
Copy link
Contributor Author

Out of curiosity, are you fully zeroing your backgrounds, or leaving a bit for context?

@recris
Copy link

recris commented Jun 14, 2023

My backgrounds are fully black, but I am still providing background descriptions in the captions (not sure if it matters)

@recris
Copy link

recris commented Jun 14, 2023

Because creating human subject masks is very tedious, I've came up with a small script to automate this process: https://github.com/recris/subject-masker

It uses a combination of face detection, parsing and recognition models, plus instance segmentation model to generate subject masks. It supports providing distinct weight values for face, hair, body and background.

Overall the generated masks are pretty good, with the occasional mask that needs some additional cleaning in GIMP.

@AI-Casanova
Copy link
Contributor Author

All the things I was going to do """someday""" this boss already did.

@AI-Casanova AI-Casanova deleted the masked_loss branch June 21, 2023 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants