[PPDiffusers] ppdiffuser LDM weight to original LDM weight script (#3809

) * PPDiffusers版的LDM权重转换为原版LDM权重 * typo * update args Co-authored-by: gongenlei <gongel@qq.com>
PaddlePaddle · Nov 19, 2022 · cd78a9a · cd78a9a
1 parent de3509c
commit cd78a9a
Show file tree

Hide file tree

Showing 6 changed files with 395 additions and 14 deletions.
diff --git a/ppdiffusers/examples/text_to_image_laion400m/README.md b/ppdiffusers/examples/text_to_image_laion400m/README.md
@@ -2,18 +2,23 @@
 
 本教程带领大家如何开启32层的**Latent Diffusion Model**的训练（支持切换`中文`和`英文`分词器）。
 
+___注意___:
+___官方32层`CompVis/ldm-text2im-large-256`的Latent Diffusion Model使用的是vae，而不是vqvae！而Huggingface团队在设计目录结构的时候把文件夹名字错误的设置成了vqvae！为了与Huggingface团队保持一致，我们同样使用了vqvae文件夹命名！___
+
 ## 1 本地运行
 ### 1.1 安装依赖
 
 在运行这个训练代码前，我们需要安装下面的训练依赖。
 
+___注意___:
+___当前这部分的代码需要使用develop分支的paddlenlp以及develop分支的ppdiffusers才可以正常运行！！！！___
+
 ```bash
 # 安装cuda11.2, python 3.7, develop版本的paddle, commit号为b96a21df4e7a42b2445104426e2be407534705e6.
 wget https://paddlenlp.bj.bcebos.com/models/community/CompVis/paddlepaddle_gpu-0.0.0.post112-cp37-cp37m-linux_x86_64.whl
 pip install paddlepaddle_gpu-0.0.0.post112-cp37-cp37m-linux_x86_64.whl
-# 安装指定版本的 paddlenlp 和 ppdiffusers.
-pip install paddlenlp==2.4.2 ppdiffusers==0.6.2
-pip install -U visualdl fastcore Pillow
+# 注意当前该部分的训练需要使用develop分支的paddlenlp和develop分支的ppdiffusers。
+pip install -U paddlenlp ppdiffusers visualdl fastcore Pillow
 ```
 
 ### 1.2 准备数据
@@ -239,7 +244,7 @@ python generate_pipelines.py \
 ```shell
 ├── ldm_pipelines  # 我们指定的输出文件路径
     ├── model_index.json # 模型index文件
-    ├── vqvae # vae权重文件夹
+    ├── vqvae # vae权重文件夹！实际是vae模型，文件夹名字与HF保持了一致！
         ├── model_state.pdparams
         ├── config.json
     ├── bert # ldmbert权重文件夹

diff --git a/ppdiffusers/examples/text_to_image_laion400m/ldm/ldm_args.py b/ppdiffusers/examples/text_to_image_laion400m/ldm/ldm_args.py
@@ -61,17 +61,15 @@ class DataArguments:
     """
     Arguments pertaining to what data we are going to input our model for training.
     """
-    file_list: Optional[str] = field(
-        default="./data/filelist/train.filelist.list",
-        metadata={"help": "The name of the file_list."})
-    resolution: Optional[str] = field(
+    file_list: str = field(default="./data/filelist/train.filelist.list",
+                           metadata={"help": "The name of the file_list."})
+    resolution: int = field(
         default=256,
         metadata={
             "help":
             "The resolution for input images, all the images in the train/validation dataset will be resized to this resolution."
         })
-    num_records: Optional[str] = field(default=10000000,
-                                       metadata={"help": "num_records"})
+    num_records: int = field(default=10000000, metadata={"help": "num_records"})
     buffer_size: int = field(
         default=100,
         metadata={"help": "Buffer size"},

diff --git a/ppdiffusers/examples/text_to_image_laion400m/ldm/model.py b/ppdiffusers/examples/text_to_image_laion400m/ldm/model.py
@@ -49,7 +49,7 @@ def __init__(self, model_args):
 
         # init vae
         vae_name_or_path = model_args.vae_name_or_path if model_args.pretrained_model_name_or_path is None else os.path.join(
-            model_args.pretrained_model_name_or_path, "vae")
+            model_args.pretrained_model_name_or_path, "vqvae")
         self.vae = AutoencoderKL.from_pretrained(vae_name_or_path)
         freeze_params(self.vae.parameters())
         logger.info("Freeze vae parameters!")

diff --git a/ppdiffusers/examples/text_to_image_laion400m/scripts/README.md b/ppdiffusers/examples/text_to_image_laion400m/scripts/README.md
@@ -1,6 +1,10 @@
-# LDM原版Pytorch权重转换为PPDiffusers权重
+# LDM权重转换脚本
+本目录下包含了两个脚本文件：
+- **convert_orig_ldm_ckpt_to_ppdiffusers.py**: LDM原版Pytorch权重转换为PPDiffusers版LDM权重。
+- **convert_ppdiffusers_to_orig_ldm_ckpt.py**: PPDiffusers版的LDM权重转换为原版LDM权重。
 
-## 1. 转换权重
+## 1. LDM原版Pytorch权重转换为PPDiffusers版LDM权重
+### 1.1 转换权重
 假设已经有了原版权重`"ldm_1p4b_init0.ckpt"`
 ```bash
 python convert_orig_ldm_ckpt_to_ppdiffusers.py \
@@ -9,7 +13,7 @@ python convert_orig_ldm_ckpt_to_ppdiffusers.py \
     --original_config_file text2img_L32H1280_unet800M.yaml
 ```
 
-## 2. 推理预测
+### 1.2 推理预测
 ```python
 import paddle
 from ppdiffusers import LDMTextToImagePipeline
@@ -19,3 +23,38 @@ prompt = "a blue tshirt"
 image = pipe(prompt, guidance_scale=7.5)[0][0]
 image.save("demo.jpg")
 ```
+
+## 2. PPDiffusers版的LDM权重转换为原版LDM权重
+### 2.1 转换权重
+假设我们已经使用 `../generate_pipelines.py`生成了`ldm_pipelines`目录。
+```shell
+├── ldm_pipelines  # 我们指定的输出文件路径
+    ├── model_index.json # 模型index文件
+    ├── vqvae # vae权重文件夹！实际是vae模型，文件夹名字与HF保持了一致！
+        ├── model_state.pdparams
+        ├── config.json
+    ├── bert # ldmbert权重文件夹
+        ├── model_config.json
+        ├── model_state.pdparams
+    ├── unet # unet权重文件夹
+        ├── model_state.pdparams
+        ├── config.json
+    ├── scheduler # ddim scheduler文件夹
+        ├── scheduler_config.json
+    ├── tokenizer # bert tokenizer文件夹
+        ├── tokenizer_config.json
+        ├── special_tokens_map.json
+        ├── vocab.txt
+```
+
+```bash
+python convert_ppdiffusers_to_orig_ldm_ckpt.py \
+    --model_name_or_path ./ldm_pipelines \
+    --dump_path ldm_19w.ckpt
+```
+
+### 2.2 推理预测
+使用`CompVis`[原版txt2img.py](https://github.com/CompVis/latent-diffusion/blob/main/scripts/txt2img.py)脚本生成图片。
+```shell
+python ./txt2img.py --prompt "a blue t shirt" --ddim_eta 0.0 --n_samples 1 --n_iter 1 --scale 7.5  --ddim_steps 50
+```
diff --git a/ppdiffusers/examples/text_to_image_laion400m/scripts/convert_orig_ldm_ckpt_to_ppdiffusers.py b/ppdiffusers/examples/text_to_image_laion400m/scripts/convert_orig_ldm_ckpt_to_ppdiffusers.py
@@ -1,5 +1,6 @@
 # Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
 # Copyright 2022 The HuggingFace Inc. team.
+#
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at