-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SD3] Token limit in training #8557
Comments
After rereading the SD3 paper, I notice that they indeed trained with 77 token limit, so the training script is correct. However, it is strange that SD3 still consist with 77 tokens while T5 supports 512 tokens in maximum. |
In the paper they trained on 77 tokens, but they reported they moved to finetuning on full context T5 (512 tokens) after the release of the paper. |
Hi, I'm taking notes about everyone's concerns about the token limit. However we need, as always, to keep the code as clean and simple as possible, until we can test or the community that using a higher token limit brings improvement in the training we have to wait. This is one of the main reasons we didn't add the long prompt weighting as a core functionality with previous models. It would help us a lot if the community could also test this and post their findings here or in a discussion. Initially to test this you only need to change the
However I don't see it as simple as this because we still have the 77 token limit for the clip models, so the prompt could get truncated in the wrong place for those and make the training worse. As I see it, to add this we should also add the possibility to train with a different prompt for each Finally we need to wait to see if even training the T5 really matters, without any of the techniques for lowering the VRAM consumption, the only ones capable of running it are 3090/4090 owners which are a minority. Hope this helps with your concerns. |
It should at the least be a command line argument for the training script. |
Hi, I'm curious where they state they fine-tuning on 512 tokens, I'm not aware it, any details in that? |
Thank you for detailed reply, I understand your concerns. FYI, pixart-sigma claimed benefits of longer tokens in training. Yes, they only uses T5 as text encoder, which is less complicated compared with SD3 that have multiple text encoders with varied token limits. |
There isn't any official correspondence, just heard from someone at SAI this released model was trained of the long T5 token context. There are other peculiarities with it, for example the qk RMSNorm appears to be missing despite it being show stabilising the model in the paper. |
Describe the bug
I'm aware there is a PR #8506 for token limit in inference, however, the token limit in training is not processed. I guess token limit in training will affect performance.
Reproduction
no
Logs
No response
System Info
no
Who can help?
@yiyixuxu @asomoza
The text was updated successfully, but these errors were encountered: