LerobotDataset pushable to HF from any folder #563

Raziel90 · 2024-12-09T00:59:41Z

Path to card_template.md passed in relation to the current location of the lerobot package.

In the previous version the path was served statically and relative to the current folder: ./lerobot/common/datasets/card_template.md This creates problems when launching LerobotDataset.push_to_hub() outside of the package folder.

In the current version the path is provided in relation to the current path of the package: importlib.resources.path("lerobot.common.datasets", "card_template.md") This allows to execute the method create_lerobot_dataset_card from any folder. As long as Lerobot is installed.

On branch fix--dataset_push_to_hub
Changes to be committed:
modified: lerobot/common/datasets/utils.py

What this does

Explain what this PR does. Feel free to tag your PR with the appropriate label(s).

| Fixes #561 | (🐛 Bug) |

How it was tested

Executed the dataset creation from both inside and outside the lerobot folder.
It worked in both cases: https://huggingface.co/datasets/ccop/aloha_stationary_replay_test_v3.
The script used for the creation of the dataset will be object of another PR once refined. It converts a single episode aloha_hd5 dataset into a Lerobot Dataset V2. A draft snippet will be made available below.
tests executed with pytest without problems (Although, I notice that there is no current test for push_to_hub in the suite).

How to checkout & try? (for the reviewer)

Provide a simple way for the reviewer to try out your changes.

import h5py
from lerobot.common.datasets.lerobot_dataset import LeRobotDataset
from pathlib import Path
import cv2
import torch
from importlib.resources import path
data_path =  Path('/home/ccop/code/aloha_data')

def get_features(hdf5_file):
    topics = []
    features = {}
    hdf5_file.visititems(lambda name, obj : topics.append(name) if isinstance(obj, h5py.Dataset) else None)
    for topic in topics:
        # print(topic.replace('/', '.'))
        if 'images' in topic.split('/'):
            features[topic.replace('/', '.')] = {
                'dtype': "image",
                'shape': cv2.imdecode(hdf5_file[topic][0], 1).transpose(2, 0, 1).shape,
                'names': None
            }
        elif 'compress_len'  in topic.split('/'):
            continue
        else:
            features[topic.replace('/', '.')] = {
                'dtype': str(hdf5_file[topic][0].dtype),
                'shape': hdf5_file[topic][0].shape,
                'names': None
            }
            
    return features
if __name__ == '__main__':

    with h5py.File(data_path.absolute() / 'aloha_stationary_replay_test/episode_0.hdf5', 'r') as file:
        # List all groups
        print("Keys: %s" % file.keys())
        features = get_features(file)
        n_frames = file['observations/images/cam_high'][:].shape[0]
        print(n_frames)
        # print(cv2.imdecode(file['observations/images/cam_high'][0],1).shape)

    dataset = LeRobotDataset.create(
            repo_id='ccop/aloha_stationary_replay_test_v3',
            fps=50,
            robot_type="aloha-stationary",
            features=features,
            image_writer_threads=4,
        )
    with h5py.File(data_path.absolute() / 'aloha_stationary_replay_test/episode_0.hdf5', 'r') as file:
        # List all groups
        for frame_idx in range(n_frames):
            frame = {}
            for feature in features:
                if 'images' in feature.split('.'):
                    frame[feature] = torch.from_numpy(
                        cv2.imdecode(file[feature.replace('.', '/')][frame_idx], 1).transpose(2, 0, 1))
                else:    
                    frame[feature] = torch.from_numpy(file[feature.replace('.', '/')][frame_idx])
                # print(feature, frame[feature].shape)

            dataset.add_frame(frame)
    print('save episode!')
    dataset.save_episode(task='move_cube')
    dataset.consolidate()
    dataset.push_to_hub()

…f the lerobot package. In the previous version the path was served statically and relative to the current folder: `./lerobot/common/datasets/card_template.md` This creates problems when launching LerobotDataset.push_to_hub() outside of the package folder. In the current version the path is provided in relation to the current path of the package: `importlib.resources.path("lerobot.common.datasets", "card_template.md")` This allows to execute the method `create_lerobot_dataset_card` from any folder. As long as Lerobot is installed. On branch fix--dataset_push_to_hub Changes to be committed: modified: lerobot/common/datasets/utils.py

aliberts

Awesome, thank you @Raziel90!
We indeed didn't focus yet on packaging and releases of our code, this will come after refactoring but this is a welcome fix for people already using LeRobot as a dependency.

Side notes on your conversion script:

The task argument for save_episode is supposed to be a prompt in natural language describing your task. I'll try to make this appear more clearly in the code/docs.

- dataset.save_episode(task='move_cube')
+ dataset.save_episode(task='Move the cube to this spot.')

I would suggest using the "video" mode for storing images in your dataset as it would really benefit from it given their size (480x848)

if 'images' in topic.split('/'):
    features[topic.replace('/', '.')] = {
-       'dtype': "image",
+       'dtype': "video",
        'shape': cv2.imdecode(hdf5_file[topic][0], 1).transpose(2, 0, 1).shape,
        'names': None
    }

* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> * Fix config file (huggingface#495) * fix: broken images and a few minor typos in README (huggingface#499) Signed-off-by: ivelin <ivelin117@gmail.com> * Add support for Windows (huggingface#494) * bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450) * Add distinction between two unallowed cases in name check "eval_" (huggingface#489) * Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490) * Dataset v2.0 (huggingface#461) Co-authored-by: Remi <remi.cadene@huggingface.co> * Refactor OpenX (huggingface#505) * Fix missing local_files_only in record/replay (huggingface#540) Co-authored-by: Simon Alibert <alibert.sim@gmail.com> * Control simulated robot with real leader (huggingface#514) Co-authored-by: Remi <remi.cadene@huggingface.co> * Update 7_get_started_with_real_robot.md (huggingface#559) * LerobotDataset pushable to HF from any folder (huggingface#563) * Fix example 6 (huggingface#572) * fixing typo from 'teloperation' to 'teleoperation' (huggingface#566) * [vizualizer] for LeRobodDataset V2 (huggingface#576) * Fix broken `create_lerobot_dataset_card` (huggingface#590) * feat(act): support training end of episode token to ACT model * changes * feat(arx): add arx arm (#2) * feat(arx): support arx arm * changes * changes * changes * changes * pass pipes explicitly * changes * us ndarray over a pipe * changes * changes * replay basically works * patch arx sdk * changes * support cameras in arx5 * rename to arx5 * kind of works * changes * changes * changes * various changes * changes * revert a few changes * changes * changes * changes * changes * changes * changes * changes * changes * changes * remove TODO * allow multiple tasks --------- Signed-off-by: ivelin <ivelin117@gmail.com> Co-authored-by: Hirokazu Ishida <38597814+HiroIshida@users.noreply.github.com> Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Co-authored-by: Arsen Ohanyan <arsenohanyan@gmail.com> Co-authored-by: Ivelin Ivanov <ivelin117@gmail.com> Co-authored-by: Daniel Ritchie <daniel@brainwavecollective.ai> Co-authored-by: resolver101757 <kelster101757@hotmail.com> Co-authored-by: Jannik Grothusen <56967823+J4nn1K@users.noreply.github.com> Co-authored-by: KasparSLT <133706781+KasparSLT@users.noreply.github.com> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Co-authored-by: Remi <remi.cadene@huggingface.co> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Simon Alibert <alibert.sim@gmail.com> Co-authored-by: berjaoui <berjaoui@gmail.com> Co-authored-by: Claudio Coppola <Claudiocoppola90@gmail.com> Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com> Co-authored-by: Mishig <dmishig@gmail.com> Co-authored-by: Eugene Mironov <helper2424@gmail.com>

* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> * Fix config file (huggingface#495) * fix: broken images and a few minor typos in README (huggingface#499) Signed-off-by: ivelin <ivelin117@gmail.com> * Add support for Windows (huggingface#494) * bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450) * Add distinction between two unallowed cases in name check "eval_" (huggingface#489) * Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490) * Dataset v2.0 (huggingface#461) Co-authored-by: Remi <remi.cadene@huggingface.co> * Refactor OpenX (huggingface#505) * Fix missing local_files_only in record/replay (huggingface#540) Co-authored-by: Simon Alibert <alibert.sim@gmail.com> * Control simulated robot with real leader (huggingface#514) Co-authored-by: Remi <remi.cadene@huggingface.co> * Update 7_get_started_with_real_robot.md (huggingface#559) * LerobotDataset pushable to HF from any folder (huggingface#563) * Fix example 6 (huggingface#572) * fixing typo from 'teloperation' to 'teleoperation' (huggingface#566) * [vizualizer] for LeRobodDataset V2 (huggingface#576) * Fix broken `create_lerobot_dataset_card` (huggingface#590) * Update README.md (huggingface#612) * Fix Quality workflow (huggingface#622) * fix(docs): typos in benchmark readme.md (huggingface#614) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> * fix(visualise): use correct language description for each episode id (huggingface#604) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> * typo fix: batch_convert_dataset_v1_to_v2.py (huggingface#615) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> * [viz] Fixes & updates to html visualizer (huggingface#617) * fixes to SO-100 readme (huggingface#600) Co-authored-by: Philip Fung <no@one> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> --------- Signed-off-by: ivelin <ivelin117@gmail.com> Co-authored-by: Hirokazu Ishida <38597814+HiroIshida@users.noreply.github.com> Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Co-authored-by: Arsen Ohanyan <arsenohanyan@gmail.com> Co-authored-by: Ivelin Ivanov <ivelin117@gmail.com> Co-authored-by: Daniel Ritchie <daniel@brainwavecollective.ai> Co-authored-by: resolver101757 <kelster101757@hotmail.com> Co-authored-by: Jannik Grothusen <56967823+J4nn1K@users.noreply.github.com> Co-authored-by: KasparSLT <133706781+KasparSLT@users.noreply.github.com> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Co-authored-by: Remi <remi.cadene@huggingface.co> Co-authored-by: Michel Aractingi <michel.aractingi@huggingface.co> Co-authored-by: Simon Alibert <alibert.sim@gmail.com> Co-authored-by: berjaoui <berjaoui@gmail.com> Co-authored-by: Claudio Coppola <Claudiocoppola90@gmail.com> Co-authored-by: s1lent4gnt <kmeftah.khalil@gmail.com> Co-authored-by: Mishig <dmishig@gmail.com> Co-authored-by: Eugene Mironov <helper2424@gmail.com> Co-authored-by: CharlesCNorton <135471798+CharlesCNorton@users.noreply.github.com> Co-authored-by: Philip Fung <1054593+philfung@users.noreply.github.com> Co-authored-by: Philip Fung <no@one>

[Fix] Move back to manual calibration (#488) feat: enable to use multiple rgb encoders per camera in diffusion policy (#484) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Fix config file (#495) fix: broken images and a few minor typos in README (#499) Signed-off-by: ivelin <ivelin117@gmail.com> Add support for Windows (#494) bug causes error uploading to huggingface, unicode issue on windows. (#450) Add distinction between two unallowed cases in name check "eval_" (#489) WIP Fix autocalib moss (#486) [Fix] Move back to manual calibration (#488) feat: enable to use multiple rgb encoders per camera in diffusion policy (#484) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Fix config file (#495) fix: broken images and a few minor typos in README (#499) Signed-off-by: ivelin <ivelin117@gmail.com> Add support for Windows (#494) bug causes error uploading to huggingface, unicode issue on windows. (#450) Add distinction between two unallowed cases in name check "eval_" (#489) Rename deprecated argument (temporal_ensemble_momentum) (#490) Dataset v2.0 (#461) Co-authored-by: Remi <remi.cadene@huggingface.co> Refactor OpenX (#505) Fix missing local_files_only in record/replay (#540) Co-authored-by: Simon Alibert <alibert.sim@gmail.com> Control simulated robot with real leader (#514) Co-authored-by: Remi <remi.cadene@huggingface.co> Update 7_get_started_with_real_robot.md (#559) LerobotDataset pushable to HF from any folder (#563) Fix example 6 (#572) fixing typo from 'teloperation' to 'teleoperation' (#566) [vizualizer] for LeRobodDataset V2 (#576) Fix broken `create_lerobot_dataset_card` (#590) Update README.md (#612) Add draccus, create MainConfig WIP refactor train.py and ACT Add policies training presets Update diffusion policy Add pusht and xarm env configs Update tdmpc Update vqbet Fix poetry relax Add feature types to envs Add EvalPipelineConfig, parse features from envs Add custom parser Update pretrained loading mechanisms Add dependency fixes & lock update Fix pretrained_path Refactor envs, remove RealEnv Fix typo Enable end-to-end tests Fix Makefile Log eval config Fix end-to-end tests Fix Quality workflow (#622) Remove amp & add resume test Speed-up tests Fix poetry relax Remove config yaml for robot devices (#594) Co-authored-by: Simon Alibert <simon.alibert@huggingface.co> fix(docs): typos in benchmark readme.md (#614) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> fix(visualise): use correct language description for each episode id (#604) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> typo fix: batch_convert_dataset_v1_to_v2.py (#615) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> [viz] Fixes & updates to html visualizer (#617) Fix logger Remove hydra-core Add aggregate_stats Add estimate_num_samples for images, Add test image Remove NoneSchedulerConfig Add push_pretrained Remove eval.episode_length Fix wandb_video Fix typo Add features back into policy configs (#643) fixes to SO-100 readme (#600) Co-authored-by: Philip Fung <no@one> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Fix for the issue #638 (#639) Fix env_to_policy_features call Fix wandb init remove omegaconf Add branch arg Move deprecated Move training config Remove pathable_args Implement custom HubMixin Fixes Implement PreTrainedPolicy base class Add HubMixin to TrainPipelineConfig Udpate example 2 & 3 Update push_pretrained Bump`rerun-sdk` dependency to `0.21.0` (#618) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Fix config_class Fix from_pretrained kwargs Remove policy_protocol Camelize PretrainedConfig Additional fix while retraining policies (#629) Co-authored-by: Simon Alibert <simon.alibert@huggingface.co> Actually reactivate tdmpc online test Update example 4 Remove advanced example 1 Remove example 5 Move example 6 to advanced Use HubMixin.save_pretrained Enable config_path to be a repo_id Dry has_method Update example 4 Update README Cleanup pyproject.toml Update eval docstring Update README Clean example 4 Update README Make 'last' checkpoint symlink relative Fix cluster image (#653) Simplify example 4 fix stats per episodes and aggregate stats and casting to tensor

Cadene requested review from aliberts and Cadene December 9, 2024 01:27

aliberts approved these changes Dec 9, 2024

View reviewed changes

aliberts merged commit 44f9b21 into huggingface:main Dec 9, 2024
5 checks passed

helper2424 pushed a commit to helper2424/lerobot that referenced this pull request Dec 17, 2024

LerobotDataset pushable to HF from any folder (huggingface#563)

67f4d7e

michel-aractingi pushed a commit that referenced this pull request Jan 22, 2025

LerobotDataset pushable to HF from any folder (#563)

00dadca

chrisheninger pushed a commit to chrisheninger/lerobot that referenced this pull request Jan 26, 2025

LerobotDataset pushable to HF from any folder (huggingface#563)

8b7ef75

menhguin pushed a commit to menhguin/lerobot that referenced this pull request Feb 9, 2025

LerobotDataset pushable to HF from any folder (huggingface#563)

c7d8f3f

JIy3AHKO pushed a commit to vertix/lerobot that referenced this pull request Feb 27, 2025

LerobotDataset pushable to HF from any folder (huggingface#563)

690f043

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LerobotDataset pushable to HF from any folder #563

LerobotDataset pushable to HF from any folder #563

Raziel90 commented Dec 9, 2024

aliberts left a comment •

edited

Loading

LerobotDataset pushable to HF from any folder #563

LerobotDataset pushable to HF from any folder #563

Conversation

Raziel90 commented Dec 9, 2024

What this does

How it was tested

How to checkout & try? (for the reviewer)

aliberts left a comment • edited Loading

Choose a reason for hiding this comment

aliberts left a comment •

edited

Loading