Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning: TypedStorage is deprecated #756

Open
bsalberto77 opened this issue Aug 13, 2023 · 7 comments
Open

UserWarning: TypedStorage is deprecated #756

bsalberto77 opened this issue Aug 13, 2023 · 7 comments

Comments

@bsalberto77
Copy link

I try to train LORA based on XL and when I start the training it always gives me the same error (I have tried several different configurations). I have a 12gb NVIDIA 3060 and 32GB of ram on the PC

Thanks in advance!

08:04:47-745672 INFO Start training LoRA Standard ...
08:04:47-746674 INFO Checking for duplicate image filenames in training data directory...
08:04:47-747674 INFO Valid image folder names found in: C:/Users/Usuario/Documents/IA/Kohya/Supercute\img
08:04:47-749676 INFO Valid image folder names found in: C:/Users/Usuario/Documents/IA/Kohya/Supercute\reg
08:04:47-750678 INFO Folder 40_supercute style: 10 images found
08:04:47-751678 INFO Folder 40_supercute style: 400 steps
08:04:47-752678 WARNING Regularisation images are used... Will double the number of steps required...
08:04:47-753679 INFO Total steps: 400
08:04:47-754680 INFO Train batch size: 5
08:04:47-754680 INFO Gradient accumulation steps: 1
08:04:47-755682 INFO Epoch: 10
08:04:47-756682 INFO Regulatization factor: 2
08:04:47-757684 INFO max_train_steps (400 / 5 / 1 * 10 * 2) = 1600
08:04:47-758684 INFO stop_text_encoder_training = 0
08:04:47-759685 INFO lr_warmup_steps = 0
08:04:47-760686 INFO Saving training config to C:/Users/Usuario/Documents/IA/Kohya/Supercute\model\XL_supercute_style_20230813-080447.json...
08:04:47-763690 INFO accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="C:/Users/Usuario/Documents/IA/SD
XL/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors" --train_data_dir="C:/Users/Usuario/Documents/IA/Kohya/Supercute\img"
--reg_data_dir="C:/Users/Usuario/Documents/IA/Kohya/Supercute\reg" --resolution="1024,1024" --output_dir="C:/Users/Usuario/Documents/IA/Kohya/Supercute\model"
--logging_dir="C:/Users/Usuario/Documents/IA/Kohya/Supercute\log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=0.0009 --unet_lr=0.0009 --network_dim=256
--output_name="XL_supercute_style" --lr_scheduler_num_cycles="10" --no_half_vae --learning_rate="0.0009" --lr_scheduler="constant" --train_batch_size="5" --max_train_steps="1600" --save_every_n_epochs="1"
--mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False
warmup_init=False --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --xformers --bucket_no_upscale --noise_offset=0.0
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
prepare tokenizers
Using DreamBooth method.
prepare images.
found directory C:\Users\Usuario\Documents\IA\Kohya\Supercute\img\40_supercute style contains 10 image files
found directory C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style contains 1000 image files
No caption file found for 1000 images. Training will continue without captions for these images. If class token exists, it will be used. / 1000枚の画像にキャプションファイルが見つかりませんでした。これらの画像についてはキャプションなしで学習を続行します。class tokenが存在する場合はそれを使います。
C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style\style_0001.jpg
C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style\style_0002.jpg
C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style\style_0003.jpg
C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style\style_0004.jpg
C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style\style_0005.jpg
C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style\style_0006.jpg... and 995 more
400 train images with repeating.
1000 reg images.
some of reg images are not used / 正則化画像の数が多いので、一部使用されない正則化画像があります
[Dataset 0]
batch_size: 5
resolution: (1024, 1024)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True

[Subset 0 of Dataset 0]
image_dir: "C:\Users\Usuario\Documents\IA\Kohya\Supercute\img\40_supercute style"
image_count: 10
num_repeats: 40
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: supercute style
caption_extension: .txt

[Subset 1 of Dataset 0]
image_dir: "C:\Users\Usuario\Documents\IA\Kohya\Supercute\reg\1_style"
image_count: 1000
num_repeats: 1
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: True
class_tokens: style
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 410/410 [00:00<00:00, 1357.17it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (1024, 1024), count: 800
mean ar error (without repeats): 0.0
Warning: SDXL has been trained with noise_offset=0.0357 / SDXLはnoise_offset=0.0357で学習されています
noise_offset is set to 0.0 / noise_offsetが0.0に設定されました
preparing accelerator
loading model for process 0/1
load StableDiffusion checkpoint: C:/Users/Usuario/Documents/IA/SD XL/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors
building U-Net
loading U-Net from checkpoint
U-Net:
building text encoders
loading text encoders from checkpoint
text encoder 1:
text encoder 2:
building VAE
loading VAE from checkpoint
VAE:
Enable xformers for U-Net
import network module: networks.lora
[Dataset 0]
caching latents.
checking cache validity...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 410/410 [00:00<00:00, 420.47it/s]
caching latents...
0it [00:00, ?it/s]
create LoRA network. base dim (rank): 256, alpha: 1.0
neuron dropout: p=None, rank dropout: p=None, module dropout: p=None
create LoRA for Text Encoder 1:
create LoRA for Text Encoder 2:
create LoRA for Text Encoder: 264 modules.
create LoRA for U-Net: 722 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.
use Adafactor optimizer | {'scale_parameter': False, 'relative_step': False, 'warmup_init': False}
because max_grad_norm is set, clip_grad_norm is enabled. consider set to 0 / max_grad_normが設定されているためclip_grad_normが有効になります。0に設定して無効にしたほうがいいかもしれません
constant_with_warmup will be good / スケジューラはconstant_with_warmupが良いかもしれません
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 400
num reg images / 正則化画像の数: 1000
num batches per epoch / 1epochのバッチ数: 160
num epochs / epoch数: 10
batch size per device / バッチサイズ: 5
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 1600
steps: 0%| | 0/1600 [00:00<?, ?it/s]
epoch 1/10
C:\Users\Usuario\Documents\IA\Kohya\kohya_ss\venv\lib\site-packages\xformers\ops\fmha\flash.py:339: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
and inp.query.storage().data_ptr() == inp.key.storage().data_ptr()

@DarkAlchy
Copy link

A warning that has been warning for a couple of months now, maybe even a few months, and nothing to worry about until it gets deprecated and removed. It is just telling Kohya to wake up and change this before it becomes obsolete.

@kohya-ss
Copy link
Owner

The warning seems to come from xformers, and I still get it with 0.0.20. I hope the future version of xformers will handle the warning.

@DarkAlchy
Copy link

The warning seems to come from xformers, and I still get it with 0.0.20. I hope the future version of xformers will handle the warning.

I have all but given up on Xformers these days as I found sdp to be about as good and less of a headache on my 4090.

@etha302
Copy link

etha302 commented Aug 23, 2023

The warning seems to come from xformers, and I still get it with 0.0.20. I hope the future version of xformers will handle the warning.

I have all but given up on Xformers these days as I found sdp to be about as good and less of a headache on my 4090.

Hi could you possible explain in greater detail how have you achieved decent speeds in kohya with your 4090? I only use Gui, so i am not used to command line only training, also i am on windows for now.

@DarkAlchy
Copy link

The warning seems to come from xformers, and I still get it with 0.0.20. I hope the future version of xformers will handle the warning.

I have all but given up on Xformers these days as I found sdp to be about as good and less of a headache on my 4090.

Hi could you possible explain in greater detail how have you achieved decent speeds in kohya with your 4090? I only use Gui, so i am not used to command line only training, also i am on windows for now.

I can't because I had my tests I did and on a 4090 the results are abysmal. bmaltais/kohya_ss#961 (comment) Read that and all follow ups from me as I did the testing and something is seriously wrong.

@etha302
Copy link

etha302 commented Aug 23, 2023

The warning seems to come from xformers, and I still get it with 0.0.20. I hope the future version of xformers will handle the warning.

I have all but given up on Xformers these days as I found sdp to be about as good and less of a headache on my 4090.

Hi could you possible explain in greater detail how have you achieved decent speeds in kohya with your 4090? I only use Gui, so i am not used to command line only training, also i am on windows for now.

I can't because I had my tests I did and on a 4090 the results are abysmal. bmaltais/kohya_ss#961 (comment) Read that and all follow ups from me as I did the testing and something is seriously wrong.

Ok! Yeah the best i can get on my 4090 is 1.35s/it with 19gb of vram usage. On my friends 3090 i got around 1it/s That’s sdxl lora training. So idk what’s going on, in this case I’d be better of with much cheaper 3090

@DarkAlchy
Copy link

The warning seems to come from xformers, and I still get it with 0.0.20. I hope the future version of xformers will handle the warning.

I have all but given up on Xformers these days as I found sdp to be about as good and less of a headache on my 4090.

Hi could you possible explain in greater detail how have you achieved decent speeds in kohya with your 4090? I only use Gui, so i am not used to command line only training, also i am on windows for now.

I can't because I had my tests I did and on a 4090 the results are abysmal. bmaltais/kohya_ss#961 (comment) Read that and all follow ups from me as I did the testing and something is seriously wrong.

Ok! Yeah the best i can get on my 4090 is 1.35s/it with 19gb of vram usage. On my friends 3090 i got around 1it/s That’s sdxl lora training. So idk what’s going on, in this case I’d be better of with much cheaper 3090

Yeah, my credit card bill came today to be paid off on Sept 6, so I feel you. I feel as if I was suckered and stolen from tbh. If you notice in my trials 2.1 training (this was that test but XL is even worse) on Windows sucks. I was slower than a T4. I no longer train on Windows as my friend has a 3090 and his settings I could not use no matter what I tried as it was always OOM, so I then went over to Linux to only change the paths for it to work without a hiccup. I did end up using about 1gig more than he did.

My point is that this testing showed the SDXL version of the scripts is jacked even on 2.1 but go back to pre SDXL scripts and I recovered some speed. Number 1 is Kohya XL scripts has an undeniable issue that I showed. Number 2 Nvidia says there will be some fixes in a future driver release. In other words as an ADA based card (btw, it is ALL ADA based card even the pro 6000 ASA which is 6k USD) we got shafted, and after all the stunts Jensen has pulled I wouldn't put it past him to protect his hopper sales by LHR the cards for AI stuff. Unlike the old LHR for Crypto this is just in the driver code no hardware so once the code is eliminated we would get our speed back as we should be 57% faster than a 3090 not slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants