Skip to content

Commit d81d6b2

Browse files
committed
initial submit
0 parents  commit d81d6b2

File tree

126 files changed

+7344
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

126 files changed

+7344
-0
lines changed

1gpu.yaml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
compute_environment: LOCAL_MACHINE
2+
debug: false
3+
distributed_type: 'NO'
4+
downcast_bf16: 'no'
5+
machine_rank: 0
6+
main_training_function: main
7+
mixed_precision: no
8+
num_machines: 1
9+
num_processes: 1
10+
rdzv_backend: static
11+
same_network: true
12+
tpu_env: []
13+
tpu_use_cluster: false
14+
tpu_use_sudo: false
15+
use_cpu: false
16+
main_process_port: 21000

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2024 Yiheng Huang
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+265
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
# StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework
2+
3+
<!-- [![Project Page](https://img.shields.io/badge/Project_Page-<xxx>-<COLOR>.svg)](https://h-y1heng.github.io/StableMoFusion-page/)
4+
[![Website](https://img.shields.io/badge/Website-Demo-fedcba?style=flat-square)](https://steve-zeyu-zhang.github.io/MotionMamba/)
5+
[![arXiv](https://img.shields.io/badge/arXiv-2403.07487-b31b1b?style=flat-square&logo=arxiv)](https://arxiv.org/abs/2403.07487) [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-555555?style=flat-square)](https://huggingface.co/papers/2403.07487) -->
6+
7+
8+
The official PyTorch implementation of the paper [**"StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework"**](https://arxiv.org/abs/).
9+
10+
<!-- StableMoFusion is a diffusion-based text-to-motion generation framework, -->
11+
12+
<!-- ## News -->
13+
14+
## Get Start
15+
16+
This code was tested on `NVIDIA GeForce RTX A100` and requires:
17+
18+
* conda3 or miniconda3
19+
20+
21+
22+
a. Create a conda virtual environment and activate it.
23+
24+
```shell
25+
conda create -n stablemofusion python=3.8 -y
26+
conda activate stablemofusion
27+
```
28+
29+
b. Install PyTorch 1.10.0 following the [official instructions](https://pytorch.org/).
30+
```shell
31+
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
32+
```
33+
34+
**Important:** Make sure that your compilation CUDA version and runtime CUDA version match.
35+
36+
c. Install other requirements
37+
38+
```shell
39+
pip install -r requirements.txt
40+
```
41+
42+
d. Install ffmpeg for visualization
43+
```shell
44+
conda install ffmpeg x264=20131218 -c conda-forge
45+
```
46+
47+
e. Modify the `LayerNorm` module in clip for fp16 inference
48+
```python
49+
# miniconda3/envs/stablemofusion/lib/python3.8/site-packages/clip/model.py
50+
class LayerNorm(nn.LayerNorm):
51+
"""Subclass torch's LayerNorm to handle fp16."""
52+
53+
def forward(self, x: torch.Tensor):
54+
if self.weight.dtype==torch.float32:
55+
56+
orig_type = x.dtype
57+
ret = super().forward(x.type(torch.float32))
58+
return ret.type(orig_type)
59+
else:
60+
return super().forward(x)
61+
```
62+
63+
## Quick Start
64+
1. Download pre-trained models from [Google Cloud](https://drive.google.com/drive/folders/1o3h0DHEz5gKG-9cTdl3lUEwjwW51Ay81?usp=sharing) and put them into ./ckeckpoints/ and arrange them in the following file structure:
65+
```text
66+
StableMoFusion
67+
├── checkpoints
68+
│ └── kit
69+
│ └── kit_condunet1d_batch64
70+
│ ├── meta
71+
│ │ ├── mean.npy
72+
│ │ └── std.npy
73+
│ ├── model
74+
│ │ └── latest.tar
75+
│ └── opt.txt
76+
│ └── t2m
77+
│ └── t2m_condunet1d_batch64
78+
│ ├── meta
79+
│ │ ├── mean.npy
80+
│ │ └── std.npy
81+
│ ├── model
82+
│ │ └── latest.tar
83+
│ └── opt.txt
84+
│ └── footskate
85+
│ ├── underpressure_pretrained.tar
86+
│ └── t2m_pretrained.tar
87+
```
88+
2. Download the [UnderPressure code](https://github.com/InterDigitalInc/UnderPressure) and put them into ./UnderPressure/ like:
89+
```
90+
StableMoFusion
91+
├── UnderPressure
92+
│ ├── dataset
93+
│ | |── S1
94+
│ | |── S2
95+
│ | └── ...
96+
│ ├── anim.py
97+
│ ├── data.py
98+
│ ├── demo.py
99+
│ └── ...
100+
```
101+
3. Updating import paths within `./Underpressure/*.py`.
102+
To ensure modules within the ./Underpressure/ can be imported and utilized seamlessly via python -m, it's necessary to update the import paths within the Python files located in ./Underpressure/*.py. For example:
103+
* Replace `import util` with `from Underpressure import util` in UnderPressure/anim.py
104+
* Replace `import anim, metrics, models, util` with `from UnderPressure import anim, metrics, models, util` in UnderPressure/demo.py
105+
4. run demo.py or scripts/generate.py
106+
```shell
107+
# generate from a single prompta
108+
# e.g. generate a 4-second wave motion . Unit of `--motion_length` is seconds.
109+
python -m scripts.generate --text_prompt "a person waves with his right hand." --motion_length 4 --footskate_cleanup
110+
111+
# Generate from your text file
112+
# e.g. generate 5 motions by different prompts in .txt file, and set the motion frame length separately by .txt file. Unit of `--input_len` is frame.
113+
python -m scripts.generate --footskate_cleanup --input_text ./assets/prompts.txt --input_lens ./asserts/motion_lens.txt
114+
# e.g. generate 5 motions by different prompts in .txt file with the same motion length.
115+
python -m scripts.generate --footskate_cleanup --input_text ./assets/prompts.txt --motion_length 4
116+
117+
# Generate from test set prompts
118+
# e.g. Randomly selecting 10 prompts in test set to generate motions
119+
python -m scripts.generate --num_samples 10
120+
```
121+
122+
**You may also define :**
123+
124+
* `--device` id.
125+
* `--diffuser_name` sampler type in diffuser (e.g. 'ddpm','ddim','dpmsolver'), related settings see [./config/diffuser_params.yaml](config/diffuser_params.yaml)
126+
* `--num_inference_steps` number of iterative denoising steps during inference
127+
* `--seed` to sample different prompts.
128+
* `--motion_length` in seconds .
129+
* `--opt_path` for loading model
130+
* `--footskate_cleanup` to use footskate module in the diffusion framework
131+
132+
**You will get :**
133+
134+
* `output_dir/joints_npy/xx.npy` - xyz pose sequence of the generated motion
135+
* `output_dir/xx.mp4` - visual animation for generated motion.
136+
137+
outputdir is located in the ckeckpoint dir like `checkpoints/t2m/t2m_condunet1d_batch64/samples_t2m_condunet1d_batch64_50173_seed0_a_person_waves_with_his_right_hand/`.
138+
139+
The visual animation will look something like this:
140+
141+
![example](./assets/wave.gif)
142+
143+
## Train and Evaluation
144+
145+
### 1. Download datasets
146+
147+
**HumanML3D** - Follow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git),
148+
then copy the result dataset to our repository:
149+
150+
```shell
151+
cp -r ../HumanML3D/HumanML3D ./data/HumanML3D
152+
```
153+
154+
**KIT** - Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git) (no processing needed this time) and the place result in `./data/KIT-ML`
155+
</details>
156+
157+
### 2. Download pretrained weights for evaluation
158+
We use the same evaluation protocol as [this repo](https://github.com/EricGuo5513/text-to-motion). You should download pretrained weights of the contrastive models in [t2m](https://drive.google.com/file/d/1DSaKqWX2HlwBtVH5l7DdW96jeYUIXsOP/view) and [kit](https://drive.google.com/file/d/1tX79xk0fflp07EZ660Xz1RAFE33iEyJR/view) for calculating FID and precisions. To dynamically estimate the length of the target motion, `length_est_bigru` and [Glove data](https://drive.google.com/drive/folders/1qxHtwffhfI4qMwptNW6KJEDuT6bduqO7?usp=sharing) are required.
159+
160+
Unzipped all files and arrange them in the following file structure:
161+
162+
```text
163+
StableMoFusion
164+
└── data
165+
├── glove
166+
│ ├── our_vab_data.npy
167+
│ ├── our_vab_idx.pkl
168+
│ └── out_vab_words.pkl
169+
├── pretrained_models
170+
│ ├── kit
171+
│ │ └── text_mot_match
172+
│ │ └── model
173+
│ │ └── finest.tar
174+
│ └── t2m
175+
│ │ ├── text_mot_match
176+
│ │ │ └── model
177+
│ │ │ └── finest.tar
178+
│ │ └── length_est_bigru
179+
│ │ └── model
180+
│ │ └── finest.tar
181+
├── HumanML3D
182+
│ ├── new_joint_vecs
183+
│ │ └── ...
184+
│ ├── new_joints
185+
│ │ └── ...
186+
│ ├── texts
187+
│ │ └── ...
188+
│ ├── Mean.npy
189+
│ ├── Std.npy
190+
│ ├── test.txt
191+
│ ├── train_val.txt
192+
│ ├── train.txt
193+
│ └── val.txt
194+
├── KIT-ML
195+
│ ├── new_joint_vecs
196+
│ │ └── ...
197+
│ ├── new_joints
198+
│ │ └── ...
199+
│ ├── texts
200+
│ │ └── ...
201+
│ ├── Mean.npy
202+
│ ├── Std.npy
203+
│ ├── test.txt
204+
│ ├── train_val.txt
205+
│ ├── train.txt
206+
│ └── val.txt
207+
```
208+
209+
### 3. Train CondUnet1D Model
210+
<details>
211+
<summary><b>HumanML3D</b></summary>
212+
213+
```shell
214+
accelerate launch --config_file 1gpu.yaml --gpu_ids 0 -m scripts.train --name t2m_condunet1d --model-ema --dataset_name t2m
215+
```
216+
</details>
217+
218+
<details>
219+
<summary><b>KIT-ML</b></summary>
220+
221+
```shell
222+
accelerate launch --config_file 1gpu.yaml --gpu_ids 0 -m scripts.train --name kit_condunet1d --model-ema --dataset_name kit
223+
```
224+
</details>
225+
226+
You may also define the `--config_file` for training on multi gpus.
227+
228+
229+
### 4. Evaluate
230+
231+
<details>
232+
<summary><b>HumanML3D</b></summary>
233+
```shell
234+
python -m scripts.evaluation --opt_path ./checkpoints/t2m/t2m_condunet1d_batch64/opt.txt
235+
```
236+
The evaluation results will be saved in `./checkpoints/t2m/t2m_condunet1d_batch64/eval`
237+
</details>
238+
239+
<details>
240+
<summary><b>KIT-ML</b></summary>
241+
```shell
242+
python -m scripts.evaluation --opt_path ./checkpoints/kit/kit_condunet1d_batch64/opt.txt
243+
```
244+
245+
The evaluation results will be saved in `./checkpoints/kit/kit_condunet1d_batch64/eval`
246+
</details>
247+
248+
### Train your own vGRF model for footskate cleanup
249+
Download [smplh](http://mano.is.tue.mpg.de) to folder `./data/smplh` and run train_UnderPressure_model.py
250+
```shell
251+
python -m scripts.train_UnderPressure_model --dataset_name t2m
252+
```
253+
</details>
254+
255+
## Acknowledgments
256+
257+
This code is standing on the shoulders of giants. We want to thank the following contributors
258+
that our code is based on:
259+
260+
[text-to-motion](https://github.com/EricGuo5513/text-to-motion), [MDM](https://github.com/GuyTevet/motion-diffusion-model), [MotionDiffuse](https://github.com/mingyuan-zhang/MotionDiffuse), [GMD](https://github.com/korrawe/guided-motion-diffusion).
261+
262+
## License
263+
This code is distributed under an [MIT LICENSE](LICENSE).
264+
265+
Note that our code depends on other libraries, including CLIP, Diffusers, SMPL-X, PyTorch3D, ... and uses datasets that each have their own respective licenses that must also be followed.

assets/motion_lens.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
196
2+
103
3+
112
4+
196
5+
196

assets/prompts.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
a person crawls on the ground from east to west then goes back
2+
he walks forward and then turns around fast and walks back
3+
the person is striking a tennis ball unenthusiastically
4+
a person stands up from laying, walks in a clockwise circle, and lays down again.
5+
the person is dancing the waltz.

assets/wave.gif

689 KB
Loading

config/diffuser_params.yaml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
dpmsolver:
2+
scheduler_class: DPMSolverMultistepScheduler
3+
additional_params:
4+
algorithm_type: sde-dpmsolver++
5+
use_karras_sigmas: true
6+
7+
ddpm:
8+
scheduler_class: DDPMScheduler
9+
additional_params:
10+
variance_type: fixed_small
11+
clip_sample: false
12+
13+
ddim:
14+
scheduler_class: DDIMScheduler
15+
additional_params:
16+
clip_sample: false
17+
18+
deis:
19+
scheduler_class: DEISMultistepScheduler
20+
additional_params:
21+
num_train_timesteps: 1000
22+
# use_karras_sigmas: true
23+
24+
pndm:
25+
scheduler_class: PNDMScheduler
26+
additional_params:
27+
num_train_timesteps: 1000

config/evaluator.yaml

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
unit_length: 4
2+
max_text_len: 20
3+
text_enc_mod: bigru
4+
estimator_mod: bigru
5+
dim_text_hidden: 512
6+
dim_att_vec: 512
7+
dim_z: 128
8+
dim_movement_enc_hidden: 512
9+
dim_movement_dec_hidden: 512
10+
dim_movement_latent: 512
11+
dim_word: 300
12+
dim_pos_ohot: 15
13+
dim_motion_hidden: 1024
14+
dim_coemb_hidden: 512

datasets/__init__.py

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
2+
from .t2m_dataset import HumanML3D,KIT
3+
4+
from os.path import join as pjoin
5+
__all__ = [
6+
'HumanML3D', 'KIT', 'get_dataset',]
7+
8+
def get_dataset(opt, split='train', mode='train', accelerator=None):
9+
if opt.dataset_name == 't2m' :
10+
dataset = HumanML3D(opt, split, mode, accelerator)
11+
elif opt.dataset_name == 'kit' :
12+
dataset = KIT(opt,split, mode, accelerator)
13+
else:
14+
raise KeyError('Dataset Does Not Exist')
15+
16+
if accelerator:
17+
accelerator.print('Completing loading %s dataset' % opt.dataset_name)
18+
else:
19+
print('Completing loading %s dataset' % opt.dataset_name)
20+
21+
return dataset
22+
674 Bytes
Binary file not shown.
6.64 KB
Binary file not shown.

0 commit comments

Comments
 (0)