Strange Issue: When recording dataset, when I use 'resume', the reset environment does not happen (it does happen when I don't use resume) #638

PradeepKadubandi · 2025-01-14T20:32:11Z

System Info

- `lerobot` version: 0.1.0
- Platform: Linux-6.8.0-51-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- Huggingface_hub version: 0.26.3
- Dataset version: 2.20.0
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- Cuda version: 12010
- Using GPU in script?: No

Information

One of the scripts in the examples/ folder of LeRobot
My own task or dataset (give details below)

Reproduction

I am observing something strange (I read the code in control_robot.py to see if I can figure out the root cause but not yet sure what's happening)

I first used this command:

python lerobot/scripts/control_robot.py record \
  --robot-path lerobot/configs/robot/koch.yaml \
  --fps 30 \
  --repo-id pkaduban/debug_record \
  --single-task debug-record-task \
  --tags tutorial \
  --warmup-time-s 5 \
  --episode-time-s 120 \
  --reset-time-s 120 \
  --num-episodes 3 \
  --resume 0 \
  --local-files-only 1 \
  --push-to-hub 0 | tee debug_record_`date -Is`.log

this behaves as expected. Lets me record an episode and I hit the 'right arrow' after I am done with it - then it starts resetting the environment. The first simplified log attached shows the log file for this command.

Then I continued collection using this command:

python lerobot/scripts/control_robot.py record \
  --robot-path lerobot/configs/robot/koch.yaml \
  --fps 30 \
  --repo-id pkaduban/debug_record \
  --single-task debug-record-task \
  --tags tutorial \
  --warmup-time-s 5 \
  --episode-time-s 120 \
  --reset-time-s 120 \
  --num-episodes 3 \
  --resume 1 \
  --local-files-only 1 \
  --push-to-hub 0 | tee debug_record_`date -Is`.log

This behaves unexpectedly. It lets me record an episode, however when I hit the 'right arrow' after I am done with the episode, it doesn't reset the environment but goes directly to recording the next episode. I'd expect the environment reset to be respected in this setting too. The second log shows the log file for this invocation.

For generating the log, I changed the log level of control info statement from info to debug to reduce log verbosity. I also redirect all log info messages of program by default to stdout instead of stderr which is the current behavior. I don't have any other code changes in the repository. The output of git diff is also attached in case needed.

debug_record_2025-01-14T12:12:42-08:00.log
debug_record_2025-01-14T12:14:25-08:00.log
git_diff.txt

Expected behavior

I'd expect the environment reset to be respected in this setting too. The second log shows the log file for this invocation.

The text was updated successfully, but these errors were encountered:

PradeepKadubandi · 2025-01-14T20:37:20Z

Hmm, I found out the root cause after I created the issue. I think the problem is this condition on line 303 of control_robot.py file:

dataset.num_episodes < num_episodes - 1

When resuming the dataset recording, this is likely to be False thus skipping the reset. I can perhaps create a PR a little later.

When resuming a dataset creation, the reset environment code is skipped and this simple code change fixes the issue.

PradeepKadubandi · 2025-01-15T14:02:30Z

Closing the issue now!

[Fix] Move back to manual calibration (#488) feat: enable to use multiple rgb encoders per camera in diffusion policy (#484) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Fix config file (#495) fix: broken images and a few minor typos in README (#499) Signed-off-by: ivelin <ivelin117@gmail.com> Add support for Windows (#494) bug causes error uploading to huggingface, unicode issue on windows. (#450) Add distinction between two unallowed cases in name check "eval_" (#489) WIP Fix autocalib moss (#486) [Fix] Move back to manual calibration (#488) feat: enable to use multiple rgb encoders per camera in diffusion policy (#484) Co-authored-by: Alexander Soare <alexander.soare159@gmail.com> Fix config file (#495) fix: broken images and a few minor typos in README (#499) Signed-off-by: ivelin <ivelin117@gmail.com> Add support for Windows (#494) bug causes error uploading to huggingface, unicode issue on windows. (#450) Add distinction between two unallowed cases in name check "eval_" (#489) Rename deprecated argument (temporal_ensemble_momentum) (#490) Dataset v2.0 (#461) Co-authored-by: Remi <remi.cadene@huggingface.co> Refactor OpenX (#505) Fix missing local_files_only in record/replay (#540) Co-authored-by: Simon Alibert <alibert.sim@gmail.com> Control simulated robot with real leader (#514) Co-authored-by: Remi <remi.cadene@huggingface.co> Update 7_get_started_with_real_robot.md (#559) LerobotDataset pushable to HF from any folder (#563) Fix example 6 (#572) fixing typo from 'teloperation' to 'teleoperation' (#566) [vizualizer] for LeRobodDataset V2 (#576) Fix broken `create_lerobot_dataset_card` (#590) Update README.md (#612) Add draccus, create MainConfig WIP refactor train.py and ACT Add policies training presets Update diffusion policy Add pusht and xarm env configs Update tdmpc Update vqbet Fix poetry relax Add feature types to envs Add EvalPipelineConfig, parse features from envs Add custom parser Update pretrained loading mechanisms Add dependency fixes & lock update Fix pretrained_path Refactor envs, remove RealEnv Fix typo Enable end-to-end tests Fix Makefile Log eval config Fix end-to-end tests Fix Quality workflow (#622) Remove amp & add resume test Speed-up tests Fix poetry relax Remove config yaml for robot devices (#594) Co-authored-by: Simon Alibert <simon.alibert@huggingface.co> fix(docs): typos in benchmark readme.md (#614) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> fix(visualise): use correct language description for each episode id (#604) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> typo fix: batch_convert_dataset_v1_to_v2.py (#615) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> [viz] Fixes & updates to html visualizer (#617) Fix logger Remove hydra-core Add aggregate_stats Add estimate_num_samples for images, Add test image Remove NoneSchedulerConfig Add push_pretrained Remove eval.episode_length Fix wandb_video Fix typo Add features back into policy configs (#643) fixes to SO-100 readme (#600) Co-authored-by: Philip Fung <no@one> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Fix for the issue #638 (#639) Fix env_to_policy_features call Fix wandb init remove omegaconf Add branch arg Move deprecated Move training config Remove pathable_args Implement custom HubMixin Fixes Implement PreTrainedPolicy base class Add HubMixin to TrainPipelineConfig Udpate example 2 & 3 Update push_pretrained Bump`rerun-sdk` dependency to `0.21.0` (#618) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com> Fix config_class Fix from_pretrained kwargs Remove policy_protocol Camelize PretrainedConfig Additional fix while retraining policies (#629) Co-authored-by: Simon Alibert <simon.alibert@huggingface.co> Actually reactivate tdmpc online test Update example 4 Remove advanced example 1 Remove example 5 Move example 6 to advanced Use HubMixin.save_pretrained Enable config_path to be a repo_id Dry has_method Update example 4 Update README Cleanup pyproject.toml Update eval docstring Update README Clean example 4 Update README Make 'last' checkpoint symlink relative Fix cluster image (#653) Simplify example 4 fix stats per episodes and aggregate stats and casting to tensor

PradeepKadubandi added a commit to PradeepKadubandi/lerobot that referenced this issue Jan 14, 2025

Fix for the issue huggingface#638

6735d0a

When resuming a dataset creation, the reset environment code is skipped and this simple code change fixes the issue.

PradeepKadubandi mentioned this issue Jan 14, 2025

Fix for the issue https://github.com/huggingface/lerobot/issues/638 #639

Merged

Cadene pushed a commit that referenced this issue Jan 15, 2025

Fix for the issue #638 (#639)

380b836

PradeepKadubandi closed this as completed Jan 15, 2025

chrisheninger pushed a commit to chrisheninger/lerobot that referenced this issue Jan 26, 2025

Fix for the issue huggingface#638 (huggingface#639)

810908e

michel-aractingi pushed a commit that referenced this issue Feb 3, 2025

Fix for the issue #638 (#639)

068efce

menhguin pushed a commit to menhguin/lerobot that referenced this issue Feb 9, 2025

Fix for the issue huggingface#638 (huggingface#639)

68d0bc9

JIy3AHKO pushed a commit to vertix/lerobot that referenced this issue Feb 27, 2025

Fix for the issue huggingface#638 (huggingface#639)

3570db7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange Issue: When recording dataset, when I use 'resume', the reset environment does not happen (it does happen when I don't use resume) #638

Strange Issue: When recording dataset, when I use 'resume', the reset environment does not happen (it does happen when I don't use resume) #638

PradeepKadubandi commented Jan 14, 2025

PradeepKadubandi commented Jan 14, 2025

PradeepKadubandi commented Jan 15, 2025

Strange Issue: When recording dataset, when I use 'resume', the reset environment does not happen (it does happen when I don't use resume) #638

Strange Issue: When recording dataset, when I use 'resume', the reset environment does not happen (it does happen when I don't use resume) #638

Comments

PradeepKadubandi commented Jan 14, 2025

System Info

Information

Reproduction

Expected behavior

PradeepKadubandi commented Jan 14, 2025

PradeepKadubandi commented Jan 15, 2025