multi GPU using nn.DataParallel error while after saving model during every epoch #12

alanli2000tw · 2023-09-15T07:51:07Z

due to memory size, I'm trying to using 2gpu to train model with batch size 8. Seting model using pytorch nn.DataParallel func
facing an error " Caught RuntimeError in replica 0 on device 0 " while after saving the model during every epoch training, i'm quite confused and currrently still learning using Pytorch.

Have you got any possible solution to solve it? The model training under batch size 4 is not quite compatiable to orignal paper's evluation score, I want to check whether if it's possible to have the same accuracy as the orginial paper using an batch size 8.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi GPU using nn.DataParallel error while after saving model during every epoch #12

multi GPU using nn.DataParallel error while after saving model during every epoch #12

alanli2000tw commented Sep 15, 2023

multi GPU using nn.DataParallel error while after saving model during every epoch #12

multi GPU using nn.DataParallel error while after saving model during every epoch #12

Comments

alanli2000tw commented Sep 15, 2023