-
Notifications
You must be signed in to change notification settings - Fork 6
training problems.. #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please check the running requirement for GPU at here: #4 |
Thank you Professor |
What is your current training setting? |
Thank you for your attention. I am currently using a single A100 80GB. As shown in the attached image, memory usage gradually increases and seems to converge, but it keeps rising slowly until it eventually results in an OOM error. Have you also encountered this issue? I also have a second question. My third question is regarding the text processing in the code. I noticed that there is an implementation to truncate text exceeding 26 tokens, but it does not seem to be executed. Since the transformer processes only a single batch at a time, does this mean that variable-length text is directly fed into the model without truncation? If increasing the batch size was possible, would there have been additional logic to handle text length adjustments? I know this is a long question, but your responses have been incredibly helpful for our research. Thank you! |
Thank you for your interest in our work.
|
Thank you very much. |
I'm trying to train your model from scratch using the provided dataset (no transfer learning).
The backbone is frozen, but I'm running into OOM errors and really long training times. I'm using a single A100 (80GB).
Could you share the setup you used for training? Specifically, which GPU, memory, and how long it took? Also, do certain datasets take much longer to train?
The text was updated successfully, but these errors were encountered: