GitHub - AlbertTan404/RoLD: PyTorch implementation of RoLD: Robot Latent Diffusion for Multi-Task Policy Modeling (MMM2025 Best Paper)

Data Preprocessing:

The pre-training datasets are listed in RoLD/configs/tasks/rt-X_data_cfg.yaml

Convert tfds to h5 file with convert_tfds_to_h5.py # converting large datasets takes MASSIVE disk space. (up to 8 TB for kuka)
Visualize the datasets with the processed h5 file with check_data.ipynb.
Extract raw images with move_h5_image_to_png.py.
Extract image and language features for most efficient policy model training with extract_language_features.py and extract_image_features.py. (we use R3M and CLIP, and it's easy for you to customize it)
Normalize actions according to the statistics for unified training with normalize_actoins.py and rt-x_data_cfg.yaml.

In your python environment:

pip install tensorflow tensorflow-datasets

conda install h5py yaml jupyter tqdm omegaconf gdown matplotlib

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

pip install lightning transformers diffusers

pip install git+https://github.com/openai/CLIP.git

pip install git+https://github.com/facebookresearch/r3m.git

conda env create -n rold python=3.10
conda activate rold
source rold_env.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
RoLD		RoLD
data_preprocessing		data_preprocessing
.gitignore		.gitignore
README.md		README.md
rold_env.sh		rold_env.sh
train.py		train.py