Code for initialize CausalConv3d from pretrained Conv2D. #168

Sutongtong233 · 2024-03-29T17:13:43Z

Hi, I find that you introduce in CausalVideoVAE.md that you use special initialization(tail initialization) for CausalConv3d training. I am interested in this trick, and I would be sincerely grateful if you could share the specific initialization code.

vivym · 2024-03-29T21:03:08Z

w = vae_2d_ckpt["state_dict"][key_2d]            # conv2d weight
new_w = torch.zeros(shape_3d, dtype=w.dtype)
new_w[:, :, -1, :, :] = w

https://github.com/vivym/OmniGen/blob/main/scripts/inflate_conv_for_video_vae.py

Birdylx · 2024-03-30T11:53:28Z

@vivym thanks, but I have another question about temporal upsample at this line https://github.com/vivym/OmniGen/blob/4f0bf7d7f7dcb6b1b79b50c90153f7477151e139/src/omni_gen/models/video_vae/upsamplers.py#L87, it isn't 2x upsample, it will always be odd frames.

vivym · 2024-03-30T12:21:12Z

@Birdylx It is indeed an odd number of frames. You can refer to the paper https://arxiv.org/abs/2310.05737

Birdylx · 2024-03-30T12:25:55Z

@vivym thanks for your quick rely!, I will read the paper for more details.

Birdylx · 2024-03-30T13:08:06Z

@vivym Do you train the full model? or freeze the model, just train the temporal block?

Sutongtong233 · 2024-03-30T16:11:34Z

Thanks:) I will have a try.

Sutongtong233 · 2024-04-02T16:44:37Z

It works. Thanks a lot!

Sutongtong233 · 2024-04-08T16:40:53Z

I see, "Despite the VAE in Diffusion training being frozen" mentioned in your latest doc. Is that means that you've found freezing 2d-VAE weight ("tail" of casual3d Conv) performs better?

Sutongtong233 · 2024-04-09T07:51:04Z

@vivym Do you train the full model? or freeze the model, just train the temporal block?

I've tried train the full model, the motion blurring is alleviated, while the single frame reconstruction degrade.

Catpp01 · 2024-10-07T15:08:05Z

w = vae_2d_ckpt["state_dict"][key_2d]  # conv2d weight
new_w = torch.zeros(shape_3d, dtype=w.dtype) # shape_3d = (batch_size, 3, t, height, width)
new_w[:, :, -1, :, :] = w #    --tail initialization
# center   : new_w[:, :, T/2, :, :]
# average  : new_w[:, :, :, :, :]

qqingzheng mentioned this issue Mar 31, 2024

[feat] add casual vqvae ✨ #145

Merged

3 tasks

Sutongtong233 closed this as completed Apr 2, 2024

Sutongtong233 reopened this Apr 8, 2024

qqingzheng mentioned this issue Apr 16, 2024

temporal embedding design in VideoVAE #246

Closed

LinB203 mentioned this issue Jul 9, 2024

Tile inflation code #331

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code for initialize CausalConv3d from pretrained Conv2D. #168

Code for initialize CausalConv3d from pretrained Conv2D. #168

Sutongtong233 commented Mar 29, 2024

vivym commented Mar 29, 2024 •

edited

Loading

Birdylx commented Mar 30, 2024

vivym commented Mar 30, 2024

Birdylx commented Mar 30, 2024

Birdylx commented Mar 30, 2024

Sutongtong233 commented Mar 30, 2024

Sutongtong233 commented Apr 2, 2024

Sutongtong233 commented Apr 8, 2024

Sutongtong233 commented Apr 9, 2024

Catpp01 commented Oct 7, 2024 •

edited

Loading

Code for initialize CausalConv3d from pretrained Conv2D. #168

Code for initialize CausalConv3d from pretrained Conv2D. #168

Comments

Sutongtong233 commented Mar 29, 2024

vivym commented Mar 29, 2024 • edited Loading

Birdylx commented Mar 30, 2024

vivym commented Mar 30, 2024

Birdylx commented Mar 30, 2024

Birdylx commented Mar 30, 2024

Sutongtong233 commented Mar 30, 2024

Sutongtong233 commented Apr 2, 2024

Sutongtong233 commented Apr 8, 2024

Sutongtong233 commented Apr 9, 2024

Catpp01 commented Oct 7, 2024 • edited Loading

vivym commented Mar 29, 2024 •

edited

Loading

Catpp01 commented Oct 7, 2024 •

edited

Loading