|
| 1 | +# StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework |
| 2 | + |
| 3 | +<!-- [](https://h-y1heng.github.io/StableMoFusion-page/) |
| 4 | +[](https://steve-zeyu-zhang.github.io/MotionMamba/) |
| 5 | +[](https://arxiv.org/abs/2403.07487) [](https://huggingface.co/papers/2403.07487) --> |
| 6 | + |
| 7 | + |
| 8 | +The official PyTorch implementation of the paper [**"StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework"**](https://arxiv.org/abs/). |
| 9 | + |
| 10 | +<!-- StableMoFusion is a diffusion-based text-to-motion generation framework, --> |
| 11 | + |
| 12 | +<!-- ## News --> |
| 13 | + |
| 14 | +## Get Start |
| 15 | + |
| 16 | +This code was tested on `NVIDIA GeForce RTX A100` and requires: |
| 17 | + |
| 18 | +* conda3 or miniconda3 |
| 19 | + |
| 20 | + |
| 21 | + |
| 22 | +a. Create a conda virtual environment and activate it. |
| 23 | + |
| 24 | +```shell |
| 25 | +conda create -n stablemofusion python=3.8 -y |
| 26 | +conda activate stablemofusion |
| 27 | +``` |
| 28 | + |
| 29 | +b. Install PyTorch 1.10.0 following the [official instructions](https://pytorch.org/). |
| 30 | +```shell |
| 31 | +conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge |
| 32 | +``` |
| 33 | + |
| 34 | +**Important:** Make sure that your compilation CUDA version and runtime CUDA version match. |
| 35 | + |
| 36 | +c. Install other requirements |
| 37 | + |
| 38 | +```shell |
| 39 | +pip install -r requirements.txt |
| 40 | +``` |
| 41 | + |
| 42 | +d. Install ffmpeg for visualization |
| 43 | +```shell |
| 44 | +conda install ffmpeg x264=20131218 -c conda-forge |
| 45 | +``` |
| 46 | + |
| 47 | +e. Modify the `LayerNorm` module in clip for fp16 inference |
| 48 | +```python |
| 49 | +# miniconda3/envs/stablemofusion/lib/python3.8/site-packages/clip/model.py |
| 50 | +class LayerNorm(nn.LayerNorm): |
| 51 | + """Subclass torch's LayerNorm to handle fp16.""" |
| 52 | + |
| 53 | + def forward(self, x: torch.Tensor): |
| 54 | + if self.weight.dtype==torch.float32: |
| 55 | + |
| 56 | + orig_type = x.dtype |
| 57 | + ret = super().forward(x.type(torch.float32)) |
| 58 | + return ret.type(orig_type) |
| 59 | + else: |
| 60 | + return super().forward(x) |
| 61 | +``` |
| 62 | + |
| 63 | +## Quick Start |
| 64 | +1. Download pre-trained models from [Google Cloud](https://drive.google.com/drive/folders/1o3h0DHEz5gKG-9cTdl3lUEwjwW51Ay81?usp=sharing) and put them into ./ckeckpoints/ and arrange them in the following file structure: |
| 65 | +```text |
| 66 | +StableMoFusion |
| 67 | +├── checkpoints |
| 68 | +│ └── kit |
| 69 | +│ └── kit_condunet1d_batch64 |
| 70 | +│ ├── meta |
| 71 | +│ │ ├── mean.npy |
| 72 | +│ │ └── std.npy |
| 73 | +│ ├── model |
| 74 | +│ │ └── latest.tar |
| 75 | +│ └── opt.txt |
| 76 | +│ └── t2m |
| 77 | +│ └── t2m_condunet1d_batch64 |
| 78 | +│ ├── meta |
| 79 | +│ │ ├── mean.npy |
| 80 | +│ │ └── std.npy |
| 81 | +│ ├── model |
| 82 | +│ │ └── latest.tar |
| 83 | +│ └── opt.txt |
| 84 | +│ └── footskate |
| 85 | +│ ├── underpressure_pretrained.tar |
| 86 | +│ └── t2m_pretrained.tar |
| 87 | +``` |
| 88 | +2. Download the [UnderPressure code](https://github.com/InterDigitalInc/UnderPressure) and put them into ./UnderPressure/ like: |
| 89 | +``` |
| 90 | +StableMoFusion |
| 91 | +├── UnderPressure |
| 92 | +│ ├── dataset |
| 93 | +│ | |── S1 |
| 94 | +│ | |── S2 |
| 95 | +│ | └── ... |
| 96 | +│ ├── anim.py |
| 97 | +│ ├── data.py |
| 98 | +│ ├── demo.py |
| 99 | +│ └── ... |
| 100 | +``` |
| 101 | +3. Updating import paths within `./Underpressure/*.py`. |
| 102 | +To ensure modules within the ./Underpressure/ can be imported and utilized seamlessly via python -m, it's necessary to update the import paths within the Python files located in ./Underpressure/*.py. For example: |
| 103 | +* Replace `import util` with `from Underpressure import util` in UnderPressure/anim.py |
| 104 | +* Replace `import anim, metrics, models, util` with `from UnderPressure import anim, metrics, models, util` in UnderPressure/demo.py |
| 105 | +4. run demo.py or scripts/generate.py |
| 106 | +```shell |
| 107 | +# generate from a single prompta |
| 108 | +# e.g. generate a 4-second wave motion . Unit of `--motion_length` is seconds. |
| 109 | +python -m scripts.generate --text_prompt "a person waves with his right hand." --motion_length 4 --footskate_cleanup |
| 110 | + |
| 111 | +# Generate from your text file |
| 112 | +# e.g. generate 5 motions by different prompts in .txt file, and set the motion frame length separately by .txt file. Unit of `--input_len` is frame. |
| 113 | +python -m scripts.generate --footskate_cleanup --input_text ./assets/prompts.txt --input_lens ./asserts/motion_lens.txt |
| 114 | +# e.g. generate 5 motions by different prompts in .txt file with the same motion length. |
| 115 | +python -m scripts.generate --footskate_cleanup --input_text ./assets/prompts.txt --motion_length 4 |
| 116 | + |
| 117 | +# Generate from test set prompts |
| 118 | +# e.g. Randomly selecting 10 prompts in test set to generate motions |
| 119 | +python -m scripts.generate --num_samples 10 |
| 120 | +``` |
| 121 | + |
| 122 | +**You may also define :** |
| 123 | + |
| 124 | +* `--device` id. |
| 125 | +* `--diffuser_name` sampler type in diffuser (e.g. 'ddpm','ddim','dpmsolver'), related settings see [./config/diffuser_params.yaml](config/diffuser_params.yaml) |
| 126 | +* `--num_inference_steps` number of iterative denoising steps during inference |
| 127 | +* `--seed` to sample different prompts. |
| 128 | +* `--motion_length` in seconds . |
| 129 | +* `--opt_path` for loading model |
| 130 | +* `--footskate_cleanup` to use footskate module in the diffusion framework |
| 131 | + |
| 132 | +**You will get :** |
| 133 | + |
| 134 | +* `output_dir/joints_npy/xx.npy` - xyz pose sequence of the generated motion |
| 135 | +* `output_dir/xx.mp4` - visual animation for generated motion. |
| 136 | + |
| 137 | +outputdir is located in the ckeckpoint dir like `checkpoints/t2m/t2m_condunet1d_batch64/samples_t2m_condunet1d_batch64_50173_seed0_a_person_waves_with_his_right_hand/`. |
| 138 | + |
| 139 | +The visual animation will look something like this: |
| 140 | + |
| 141 | + |
| 142 | + |
| 143 | +## Train and Evaluation |
| 144 | + |
| 145 | +### 1. Download datasets |
| 146 | + |
| 147 | +**HumanML3D** - Follow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), |
| 148 | +then copy the result dataset to our repository: |
| 149 | + |
| 150 | +```shell |
| 151 | +cp -r ../HumanML3D/HumanML3D ./data/HumanML3D |
| 152 | +``` |
| 153 | + |
| 154 | +**KIT** - Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git) (no processing needed this time) and the place result in `./data/KIT-ML` |
| 155 | +</details> |
| 156 | + |
| 157 | +### 2. Download pretrained weights for evaluation |
| 158 | +We use the same evaluation protocol as [this repo](https://github.com/EricGuo5513/text-to-motion). You should download pretrained weights of the contrastive models in [t2m](https://drive.google.com/file/d/1DSaKqWX2HlwBtVH5l7DdW96jeYUIXsOP/view) and [kit](https://drive.google.com/file/d/1tX79xk0fflp07EZ660Xz1RAFE33iEyJR/view) for calculating FID and precisions. To dynamically estimate the length of the target motion, `length_est_bigru` and [Glove data](https://drive.google.com/drive/folders/1qxHtwffhfI4qMwptNW6KJEDuT6bduqO7?usp=sharing) are required. |
| 159 | + |
| 160 | +Unzipped all files and arrange them in the following file structure: |
| 161 | + |
| 162 | +```text |
| 163 | +StableMoFusion |
| 164 | +└── data |
| 165 | + ├── glove |
| 166 | + │ ├── our_vab_data.npy |
| 167 | + │ ├── our_vab_idx.pkl |
| 168 | + │ └── out_vab_words.pkl |
| 169 | + ├── pretrained_models |
| 170 | + │ ├── kit |
| 171 | + │ │ └── text_mot_match |
| 172 | + │ │ └── model |
| 173 | + │ │ └── finest.tar |
| 174 | + │ └── t2m |
| 175 | + │ │ ├── text_mot_match |
| 176 | + │ │ │ └── model |
| 177 | + │ │ │ └── finest.tar |
| 178 | + │ │ └── length_est_bigru |
| 179 | + │ │ └── model |
| 180 | + │ │ └── finest.tar |
| 181 | + ├── HumanML3D |
| 182 | + │ ├── new_joint_vecs |
| 183 | + │ │ └── ... |
| 184 | + │ ├── new_joints |
| 185 | + │ │ └── ... |
| 186 | + │ ├── texts |
| 187 | + │ │ └── ... |
| 188 | + │ ├── Mean.npy |
| 189 | + │ ├── Std.npy |
| 190 | + │ ├── test.txt |
| 191 | + │ ├── train_val.txt |
| 192 | + │ ├── train.txt |
| 193 | + │ └── val.txt |
| 194 | + ├── KIT-ML |
| 195 | + │ ├── new_joint_vecs |
| 196 | + │ │ └── ... |
| 197 | + │ ├── new_joints |
| 198 | + │ │ └── ... |
| 199 | + │ ├── texts |
| 200 | + │ │ └── ... |
| 201 | + │ ├── Mean.npy |
| 202 | + │ ├── Std.npy |
| 203 | + │ ├── test.txt |
| 204 | + │ ├── train_val.txt |
| 205 | + │ ├── train.txt |
| 206 | + │ └── val.txt |
| 207 | +``` |
| 208 | + |
| 209 | +### 3. Train CondUnet1D Model |
| 210 | +<details> |
| 211 | + <summary><b>HumanML3D</b></summary> |
| 212 | + |
| 213 | +```shell |
| 214 | +accelerate launch --config_file 1gpu.yaml --gpu_ids 0 -m scripts.train --name t2m_condunet1d --model-ema --dataset_name t2m |
| 215 | +``` |
| 216 | +</details> |
| 217 | + |
| 218 | +<details> |
| 219 | + <summary><b>KIT-ML</b></summary> |
| 220 | + |
| 221 | +```shell |
| 222 | +accelerate launch --config_file 1gpu.yaml --gpu_ids 0 -m scripts.train --name kit_condunet1d --model-ema --dataset_name kit |
| 223 | +``` |
| 224 | +</details> |
| 225 | + |
| 226 | +You may also define the `--config_file` for training on multi gpus. |
| 227 | + |
| 228 | + |
| 229 | +### 4. Evaluate |
| 230 | + |
| 231 | +<details> |
| 232 | + <summary><b>HumanML3D</b></summary> |
| 233 | +```shell |
| 234 | +python -m scripts.evaluation --opt_path ./checkpoints/t2m/t2m_condunet1d_batch64/opt.txt |
| 235 | +``` |
| 236 | +The evaluation results will be saved in `./checkpoints/t2m/t2m_condunet1d_batch64/eval` |
| 237 | +</details> |
| 238 | + |
| 239 | +<details> |
| 240 | + <summary><b>KIT-ML</b></summary> |
| 241 | +```shell |
| 242 | +python -m scripts.evaluation --opt_path ./checkpoints/kit/kit_condunet1d_batch64/opt.txt |
| 243 | +``` |
| 244 | + |
| 245 | +The evaluation results will be saved in `./checkpoints/kit/kit_condunet1d_batch64/eval` |
| 246 | +</details> |
| 247 | + |
| 248 | +### Train your own vGRF model for footskate cleanup |
| 249 | +Download [smplh](http://mano.is.tue.mpg.de) to folder `./data/smplh` and run train_UnderPressure_model.py |
| 250 | +```shell |
| 251 | +python -m scripts.train_UnderPressure_model --dataset_name t2m |
| 252 | +``` |
| 253 | +</details> |
| 254 | + |
| 255 | +## Acknowledgments |
| 256 | + |
| 257 | +This code is standing on the shoulders of giants. We want to thank the following contributors |
| 258 | +that our code is based on: |
| 259 | + |
| 260 | +[text-to-motion](https://github.com/EricGuo5513/text-to-motion), [MDM](https://github.com/GuyTevet/motion-diffusion-model), [MotionDiffuse](https://github.com/mingyuan-zhang/MotionDiffuse), [GMD](https://github.com/korrawe/guided-motion-diffusion). |
| 261 | + |
| 262 | +## License |
| 263 | +This code is distributed under an [MIT LICENSE](LICENSE). |
| 264 | + |
| 265 | +Note that our code depends on other libraries, including CLIP, Diffusers, SMPL-X, PyTorch3D, ... and uses datasets that each have their own respective licenses that must also be followed. |
0 commit comments