Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Rewrite fine_tune.py as train_native.py #1950

Open
wants to merge 1 commit into
base: sd3
Choose a base branch
from

Conversation

6DammK9
Copy link

@6DammK9 6DammK9 commented Feb 23, 2025

Referring to Issue #1947 and PR #1359 .

  • After code inspection, base (SD1) script fine_tune.py can be merged with the concepts in train_network.py, and becomes train_native.py.
  • Some network exclusive features (--skip_until_initial_step, --validation_split) has been added,
  • Tweaked for --mem_eff_attn, --xformers which applies for more aggressive checking (probable still VAE only?)

Tested with SDXL with this CLI command (hint: many features):

accelerate launch sdxl_train.py 
  --pretrained_model_name_or_path="/run/media/user/PM863a/stable-diffusion-webui/models/Stable-diffusion/x215c-AstolfoMix-24101101-6e545a3.safetensors" 
  --in_json "/run/media/user/Intel P4510 3/just_astolfo/test_lat_v3.json" 
  --train_data_dir="/run/media/user/Intel P4510 3/just_astolfo/test" 
  --output_dir="/run/media/user/Intel P4510 3/astolfo_xl/just_astolfo/model_out" 
  --log_with=tensorboard --logging_dir="/run/media/user/Intel P4510 3/astolfo_xl/just_astolfo/tensorboard" --log_prefix=just_astolfo_25022301_ 
  --seed=25022301 --save_model_as=safetensors --caption_extension=".txt" --enable_wildcard 
  --use_8bit_adam 
  --learning_rate=1e-6 --train_text_encoder --learning_rate_te1=1e-5 --learning_rate_te2=1e-5 
  --max_train_epochs=4 
  --xformers --mem_eff_attn --torch_compile --dynamo_backend=inductor --gradient_checkpointing 
  --deepspeed --gradient_accumulation_steps=4 --max_grad_norm=0 
  --train_batch_size=1 --full_bf16 --mixed_precision=bf16 --save_precision=fp16 
  --enable_bucket --cache_latents 
  --save_every_n_epochs=2 
  --skip_until_initial_step --initial_step=1 --initial_epoch=1

And the following accelerate config:

accelerate config
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
In which compute environment are you running?
This machine                                                                                                                                                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Which type of machine are you using?                                                                                                                                  
multi-GPU                                                                                                                                                             
How many different machines will you use (use more than 1 for multi-node training)? [1]: 1                                                                            
Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]: NO                                     
Do you wish to optimize your script with torch dynamo?[yes/NO]:yes                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Which dynamo backend would you like to use?                                                                                                                           
inductor                                                                                                                                                              
Do you want to customize the defaults sent to torch.compile? [yes/NO]: NO                                                                                             
Do you want to use DeepSpeed? [yes/NO]: yes                                                                                                                           
Do you want to specify a json file to a DeepSpeed config? [yes/NO]: NO                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
What should be your DeepSpeed's ZeRO optimization stage?                                                                                                              
2                                                                                                                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Where to offload optimizer states?                                                                                                                                    
none                                                                                                                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Where to offload parameters?                                                                                                                                          
none                                                                                                                                                                  
How many gradient accumulation steps you're passing in your script? [1]: 4                                                                                            
Do you want to use gradient clipping? [yes/NO]: NO                                                                                                                    
Do you want to enable `deepspeed.zero.Init` when using ZeRO Stage-3 for constructing massive models? [yes/NO]: NO                                                     
Do you want to enable Mixture-of-Experts training (MoE)? [yes/NO]: NO
How many GPU(s) should be used for distributed training? [1]:4
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Do you wish to use FP16 or BF16 (mixed precision)?
bf16                                                                                                                                                                  
accelerate configuration saved at /home/user/.cache/huggingface/accelerate/default_config.yaml 

(A bit off topic) It runs for around 15.5s / it (4 cards x 4 accumulation steps) with 4x RTX 3090 24GB (X299 DARK, 10980XE, P4510 4TB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant