Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix SAC and port HIL SERL #644

Open
wants to merge 70 commits into
base: user/michel-aractingi/2024-11-27-port-hil-serl
Choose a base branch
from

Conversation

AdilZouitine
Copy link
Member

@AdilZouitine AdilZouitine commented Jan 17, 2025

What this does

⚠️ This PR is not ready to be merged.

We evaluate the actor-learner architecture on ManiSkill.

  • Implements the actor-learner process:

    1. An actor machine interacts with the environment and sends data to a learner machine.
    2. The learner updates its weights using this data and sends the updated weights back to the actor.
  • Increases learning speed by 50% using a shared encoder for the ensemble critics.

    • Previously, each critic made a separate forward pass through the encoder, duplicating work.
    • Now, the observation is passed through the encoder only once, and the resulting representation is sent to the critic heads.

How it was tested

  • We trained an agent on ManiSkill using this actor-learner architecture.

How to check out & try it (for the reviewer) 😃

  • Install ManiSkill.

Examples:

python lerobot/scripts/server/actor_server.py policy=sac_maniskill env=maniskill_example device=cuda wandb.enable=True

python lerobot/scripts/server/learner_server.py policy=sac_maniskill env=maniskill_example device=cuda wandb.enable=True

mishig25 and others added 27 commits February 3, 2025 15:04
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
…604)

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
Co-authored-by: Philip Fung <no@one>
Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>
… the protobuf message types to split training into two processes, acting and learning. The actor rollouts the policy and collects interaction data while the learner recieves the data, trains the policy and sends the updated parameters to the actor. The two scripts are ran simultaneously

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Ran experiment with pushcube env from maniskill. The learning seem to work.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
… the policy loop and optimization loop.

- Optimized critic design that improves the performance of the learner loop by a factor of 2
- Cleaned the code and fixed style issues

- Completed the config with actor_learner_config field that contains host-ip and port elemnts that are necessary for the actor-learner servers.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…ac_maniskill.yaml` that are necessary to run the lerobot implementation of sac with the maniskill baselines.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
… side

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…policy state dict, optimizers state, optimization step and interaction step

Added functions for converting the replay buffer from and to LeRobotDataset. When we want to save the replay buffer, we convert it first to LeRobotDataset format and save it locally and vice-versa.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
michel-aractingi and others added 22 commits February 12, 2025 19:25
…ion rather than absolute actions

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…through time

Added an s keyboard command to force success in the case the reward classifier fails

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
added pretrained vision model in policy

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Added masking actions on the level of the intervention actions and offline dataset

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…he policy

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
… instead of a gpu device and send the batches to the gpu.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…d parameters to a json file in the meta of the dataset

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
… dataset actions. (scale by inverse delta)

Co-authored-by: Adil Zouitine <adizouitinegm@gmail.com>
Co-authored-by: Michel Aractingi <michel.aractingi@gmail.com>
- Modify logger to support multiple custom step keys
- Update logging method to handle custom step keys more flexibly

- Enhance logging of optimization step and frequency
Co-authored-by: michel-aractingi  <michel.aractingi@gmail.com>
- Uncomment and start the param_push_thread
- Restore thread joining for param_push_thread
… learner to the actor -- pass only the actor to the `update_policy_parameters` and remove `strict=False`

- Fixed big issue in the normalization of the actions in the `forward` function of the critic -- remove the `torch.no_grad` decorator in `normalize.py` in the normalization function
- Fixed performance issue to boost the optimization frequency by setting the storage device to be the same as the device of learning.

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…upport

- Introduced Ensemble and CriticHead classes for more efficient critic network handling
- Added support for multiple camera inputs in observation encoder
- Optimized image encoding by batching image processing
- Updated configuration for ManiSkill environment with reduced image size and action scaling
- Compiled critic networks for improved performance
- Simplified normalization and ensemble handling in critic networks
Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>
…r to limit the number of forward passes through the pretrained encoder when its frozen.

Added tensordict dependencies
Updated the version of torch and torchvision

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>
…n and dataset handling

- Reduced image size in ManiSkill environment configuration from 128 to 64
- Added support for truncation in replay buffer and actor server
- Updated SAC policy configuration to use a specific dataset and modify vision encoder settings
- Improved dataset conversion process with progress tracking and task naming
- Added flexibility for joint action space masking in learner server
… efficiency

- Replaced list-based memory storage with pre-allocated tensor storage
- Optimized sampling process with direct tensor indexing
- Added support for DrQ image augmentation during sampling for offline dataset
- Improved dataset conversion with more robust episode handling
- Enhanced buffer initialization and state tracking
- Added comprehensive testing for buffer conversion and sampling
- Specify storage device for replay buffer to optimize memory management
- Introduce `optimize_memory` parameter to reduce memory usage in replay buffer
- Implement simplified memory optimization by not storing duplicate next_states
- Update learner server and buffer initialization to use memory optimization by default
capacity=cfg.training.online_buffer_capacity,
device=device,
state_keys=cfg.policy.input_shapes.keys(),
storage_device=device,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this trigger gpu OOMs?

Suggested change
storage_device=device,
storage_device="cpu",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will handle it cleanly in the config. The issue is that if we move the storage to the CPU, the optimization speed is divided by 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this trigger gpu OOMs?

I reduced the online buffer size and it is fine. If default size yes it will trigger

AdilZouitine and others added 7 commits March 4, 2025 13:22
- Introduce `storage_device` parameter in SAC configuration and training settings
- Update learner server to use configurable storage device for replay buffer
- Reduce online buffer capacity in ManiSkill configuration
- Modify replay buffer initialization to support custom storage device
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ign with train.py (#715)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
… SAC policy

- Removed `@torch.no_grad` decorator from Unnormalize forward method

- Added TODO comment for optimizing next action prediction in SAC policy
- Minor formatting adjustment in NaN assertion for log standard deviation
Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>
- Implement `_save_pretrained` method to handle TensorDict state saving
- Add `_from_pretrained` class method for loading SAC policy from files
- Create utility function `find_and_copy_params` to handle parameter copying
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.