[Request] Fix multi-gpu training for train_db.py and fine_tune.py #359

SavvaI · 2023-03-31T11:24:31Z

Can you please fix the multi-gpu training for train_db.py and fine_tune.py in the same manner it was done with train_network.py in the recent commit? For now it is not working as intended. I try to launch the accelerate launch --num_cpu_threads_per_process 1 train_db.py on my two GPUs with the same accelerate env as I am doing it with train_network.py #247 . But it turns out that when launching the train_db.py (and fine_tune.py) with accelerate it consumes a few times more gpu memory (or fails with not enough memory CUDS) per one gpu, which makes its application inexpedient. I think this is the matter of the great importance, because the quality of the trained model greatly depends on the effective batch size. Thank you.

The text was updated successfully, but these errors were encountered:

kohya-ss · 2023-05-03T07:37:38Z

This issue has been fixed with #448. Please reopen if there is any issue.

kohya-ss added the enhancement New feature or request label Apr 9, 2023

Isotr0py mentioned this issue Apr 25, 2023

Fix DDP issues and Support DDP for all training scripts #448

Merged

kohya-ss closed this as completed May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] Fix multi-gpu training for train_db.py and fine_tune.py #359

[Request] Fix multi-gpu training for train_db.py and fine_tune.py #359

SavvaI commented Mar 31, 2023 •

edited

Loading

kohya-ss commented May 3, 2023

[Request] Fix multi-gpu training for train_db.py and fine_tune.py #359

[Request] Fix multi-gpu training for train_db.py and fine_tune.py #359

Comments

SavvaI commented Mar 31, 2023 • edited Loading

kohya-ss commented May 3, 2023

SavvaI commented Mar 31, 2023 •

edited

Loading