Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi GPU using nn.DataParallel error while after saving model during every epoch #12

Open
alanli2000tw opened this issue Sep 15, 2023 · 0 comments

Comments

@alanli2000tw
Copy link

due to memory size, I'm trying to using 2gpu to train model with batch size 8. Seting model using pytorch nn.DataParallel func
facing an error " Caught RuntimeError in replica 0 on device 0 " while after saving the model during every epoch training, i'm quite confused and currrently still learning using Pytorch.

Have you got any possible solution to solve it? The model training under batch size 4 is not quite compatiable to orignal paper's evluation score, I want to check whether if it's possible to have the same accuracy as the orginial paper using an batch size 8.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant