OpenX/RLDS to LerobotDataset v2 #747

Tavish9 · 2025-02-18T14:38:37Z

What this does

This PR adds functionality for converting datasets from openx/rlds format to lerobot dataset v2.0 format.

Title	Label
OpenX / RLDS → LeRobot v2.0	(🗃️ Dataset)

How it was tested

Examples:

    python examples/port_datasets/openx_rlds.py \
        --raw-dir /path/to/bridge_orig/1.0.0 \
        --local-dir /path/to/local_dir \
        --repo-id your_id \
        --use-videos \
        --push-to-hub

Datasets Availability

The converted datasets are now accessible in huggingface 🤗.

Minimal Code Repo

The conversion code is now available at openx2lerobot. You can just install lerobot and openx2lerobot, and easily convert your datasets.

Cadene · 2025-02-19T12:48:02Z

Beautiful. Let me try it ;)

Cadene · 2025-02-19T17:59:10Z

Data looks good
https://huggingface.co/spaces/lerobot/visualize_dataset?dataset=cadene%2Fdroid&episode=1

But for Droid it takes 7 days to process the 92,233 episodes. Thus I am updating this code to handle parallelization over nodes.

Tavish9 · 2025-02-20T03:02:38Z

Hi, @Cadene, many thanks to your hands-on.

I think the parallelization across nodes should be implemented based on the functionality of LeRobotDataset. As far as i know, tfds currently does not support multi-node reading, but we can specific which episodes to read for each rank. Another issue is that LeRobotDataset's add_frame method is designed for single-node behavior.

A potential solution would be to add an identity key, such as “episode_id”, to the episode_buffer.

Cadene · 2025-02-20T23:09:15Z

#758

I am thinking to use datadrove to parallelize over slurm and create n LeRobotDataset, 1 for each shard.
Then aggregate them with a new function I will write tomorrow.

https://github.com/huggingface/lerobot/pull/758/files#diff-3bab29f41f975edaae832d8234d23b2032963427b989151c057735f7b842a5b5

for more information, see https://pre-commit.ci

support openx/rlds to lerobot

02bc4e0

imstevenpmwork added enhancement Suggestions for new features or improvements dataset Issues regarding data inputs, processing, or datasets labels Mar 4, 2025

imstevenpmwork and others added 2 commits March 4, 2025 17:48

Merge branch 'main' into main

5f2a476

[pre-commit.ci] auto fixes from pre-commit.com hooks

f1c50ea

for more information, see https://pre-commit.ci

Tavish9 closed this by deleting the head repository Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenX/RLDS to LerobotDataset v2 #747

OpenX/RLDS to LerobotDataset v2 #747

Tavish9 commented Feb 18, 2025 •

edited

Loading

Cadene commented Feb 19, 2025

Cadene commented Feb 19, 2025

Tavish9 commented Feb 20, 2025 •

edited

Loading

Cadene commented Feb 20, 2025

OpenX/RLDS to LerobotDataset v2 #747

OpenX/RLDS to LerobotDataset v2 #747

Conversation

Tavish9 commented Feb 18, 2025 • edited Loading

What this does

How it was tested

Datasets Availability

Minimal Code Repo

Cadene commented Feb 19, 2025

Cadene commented Feb 19, 2025

Tavish9 commented Feb 20, 2025 • edited Loading

Cadene commented Feb 20, 2025

Tavish9 commented Feb 18, 2025 •

edited

Loading

Tavish9 commented Feb 20, 2025 •

edited

Loading