[WIP] Fix SAC and port HIL SERL #644

AdilZouitine · 2025-01-17T08:44:45Z

What this does

⚠️ This PR is not ready to be merged.

We evaluate the actor-learner architecture on ManiSkill.

Implements the actor-learner process:
1. An actor machine interacts with the environment and sends data to a learner machine.
2. The learner updates its weights using this data and sends the updated weights back to the actor.
Increases learning speed by 50% using a shared encoder for the ensemble critics.
- Previously, each critic made a separate forward pass through the encoder, duplicating work.
- Now, the observation is passed through the encoder only once, and the resulting representation is sent to the critic heads.

How it was tested

We trained an agent on ManiSkill using this actor-learner architecture.

How to check out & try it (for the reviewer) 😃

Install ManiSkill.

Examples:

python lerobot/scripts/server/actor_server.py policy=sac_maniskill env=maniskill_example device=cuda wandb.enable=True

python lerobot/scripts/server/learner_server.py policy=sac_maniskill env=maniskill_example device=cuda wandb.enable=True

lerobot/common/policies/sac/modeling_sac.py

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

…604) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

Co-authored-by: Philip Fung <no@one> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

… the protobuf message types to split training into two processes, acting and learning. The actor rollouts the policy and collects interaction data while the learner recieves the data, trains the policy and sends the updated parameters to the actor. The two scripts are ran simultaneously Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Ran experiment with pushcube env from maniskill. The learning seem to work. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

… the policy loop and optimization loop. - Optimized critic design that improves the performance of the learner loop by a factor of 2 - Cleaned the code and fixed style issues - Completed the config with actor_learner_config field that contains host-ip and port elemnts that are necessary for the actor-learner servers. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…ac_maniskill.yaml` that are necessary to run the lerobot implementation of sac with the maniskill baselines. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

… side Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…policy state dict, optimizers state, optimization step and interaction step Added functions for converting the replay buffer from and to LeRobotDataset. When we want to save the replay buffer, we convert it first to LeRobotDataset format and save it locally and vice-versa. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…ion rather than absolute actions Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…through time Added an s keyboard command to force success in the case the reward classifier fails Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

added pretrained vision model in policy Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Added masking actions on the level of the intervention actions and offline dataset Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…he policy Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

… instead of a gpu device and send the batches to the gpu. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…d parameters to a json file in the meta of the dataset Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

… dataset actions. (scale by inverse delta) Co-authored-by: Adil Zouitine <adizouitinegm@gmail.com>

Co-authored-by: Michel Aractingi <michel.aractingi@gmail.com>

- Modify logger to support multiple custom step keys - Update logging method to handle custom step keys more flexibly - Enhance logging of optimization step and frequency Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>

- Uncomment and start the param_push_thread - Restore thread joining for param_push_thread

… learner to the actor -- pass only the actor to the `update_policy_parameters` and remove `strict=False` - Fixed big issue in the normalization of the actions in the `forward` function of the critic -- remove the `torch.no_grad` decorator in `normalize.py` in the normalization function - Fixed performance issue to boost the optimization frequency by setting the storage device to be the same as the device of learning. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…upport - Introduced Ensemble and CriticHead classes for more efficient critic network handling - Added support for multiple camera inputs in observation encoder - Optimized image encoding by batching image processing - Updated configuration for ManiSkill environment with reduced image size and action scaling - Compiled critic networks for improved performance - Simplified normalization and ensemble handling in critic networks Co-authored-by: michel-aractingi <michel.aractingi@gmail.com>

…cy management for HIL-SERL (#722)

…r to limit the number of forward passes through the pretrained encoder when its frozen. Added tensordict dependencies Updated the version of torch and torchvision Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

…n and dataset handling - Reduced image size in ManiSkill environment configuration from 128 to 64 - Added support for truncation in replay buffer and actor server - Updated SAC policy configuration to use a specific dataset and modify vision encoder settings - Improved dataset conversion process with progress tracking and task naming - Added flexibility for joint action space masking in learner server

… efficiency - Replaced list-based memory storage with pre-allocated tensor storage - Optimized sampling process with direct tensor indexing - Added support for DrQ image augmentation during sampling for offline dataset - Improved dataset conversion with more robust episode handling - Enhanced buffer initialization and state tracking - Added comprehensive testing for buffer conversion and sampling

- Specify storage device for replay buffer to optimize memory management

- Introduce `optimize_memory` parameter to reduce memory usage in replay buffer - Implement simplified memory optimization by not storing duplicate next_states - Update learner server and buffer initialization to use memory optimization by default

ChorntonYoel · 2025-02-26T10:50:45Z

lerobot/scripts/server/learner_server.py

+            capacity=cfg.training.online_buffer_capacity,
+            device=device,
+            state_keys=cfg.policy.input_shapes.keys(),
+            storage_device=device,


Doesn't this trigger gpu OOMs?

Suggested change

storage_device=device,

storage_device="cpu",

I will handle it cleanly in the config. The issue is that if we move the storage to the CPU, the optimization speed is divided by 2.

Doesn't this trigger gpu OOMs?

I reduced the online buffer size and it is fine. If default size yes it will trigger

- Introduce `storage_device` parameter in SAC configuration and training settings - Update learner server to use configurable storage device for replay buffer - Reduce online buffer capacity in ManiSkill configuration - Modify replay buffer initialization to support custom storage device

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…ign with train.py (#715) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

… SAC policy - Removed `@torch.no_grad` decorator from Unnormalize forward method - Added TODO comment for optimizing next action prediction in SAC policy - Minor formatting adjustment in NaN assertion for log standard deviation Co-authored-by: Yoel Chornton <yoel.chornton@gmail.com>

- Implement `_save_pretrained` method to handle TensorDict state saving - Add `_from_pretrained` class method for loading SAC policy from files - Create utility function `find_and_copy_params` to handle parameter copying

for more information, see https://pre-commit.ci

ChorntonYoel reviewed Jan 31, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Show resolved Hide resolved

ChorntonYoel reviewed Jan 31, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Outdated Show resolved Hide resolved

ChorntonYoel reviewed Jan 31, 2025

View reviewed changes

lerobot/common/policies/sac/modeling_sac.py Show resolved Hide resolved

mishig25 and others added 27 commits February 3, 2025 15:04

[vizualizer] for LeRobodDataset V2 (#576)

0a4e9e2

Fix broken create_lerobot_dataset_card (#590)

4a43c83

Update README.md (#612)

b1cfb6a

Fix Quality workflow (#622)

31c34a4

fix(docs): typos in benchmark readme.md (#614)

d649815

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

fix(visualise): use correct language description for each episode id (#…

a1b5d0f

…604) Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

typo fix: batch_convert_dataset_v1_to_v2.py (#615)

c2f7af3

Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

[viz] Fixes & updates to html visualizer (#617)

100f54e

fixes to SO-100 readme (#600)

df7310e

Co-authored-by: Philip Fung <no@one> Co-authored-by: Simon Alibert <75076266+aliberts@users.noreply.github.com>

Fix for the issue #638 (#639)

068efce

[WIP] correct sac implementation

472a7f5

remove breakpoint

c86dace

SAC works

a0a50de

Add rlpd tricks

be96501

[WIP] correct sac implementation

956c547

remove breakpoint

86df8a4

SAC works

c1d4bf4

Add rlpd tricks

8105efb

Change SAC policy implementation with configuration and modeling classes

7d2970f

Add type annotations and restructure SACConfig class fields

1fb03d4

Stable version of rlpd + drq

d75b44f

FREEDOM, added back the optimization loop code in learner_server.py

36576c9

Ran experiment with pushcube env from maniskill. The learning seem to work. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Added missing config files env/maniskill_example.yaml and `policy/s…

9aabe21

…ac_maniskill.yaml` that are necessary to run the lerobot implementation of sac with the maniskill baselines. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Removed unnecessary time.sleep in the streaming server on the learner…

e856ffc

… side Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

michel-aractingi and others added 22 commits February 12, 2025 19:25

Added possiblity to record and replay delta actions during teleoperat…

b9217b0

…ion rather than absolute actions Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Added logging for interventions to monitor the rate of interventions …

dc086dc

…through time Added an s keyboard command to force success in the case the reward classifier fails Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

fix log_alpha in modeling_sac: change to nn.parameter

459f22e

added pretrained vision model in policy Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Hardcoded some normalization parameters. TODO refactor

c462a47

Added masking actions on the level of the intervention actions and offline dataset Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Changed bounds for a new so100 robot

0c32008

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Changed the init_final value to center the starting mean and std of t…

d9a7037

…he policy Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

removed uncomment in actor server

b07d95f

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

nit

95de8e2

Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Optimized the replay buffer from the memory side to store data on cpu…

c9e50bb

… instead of a gpu device and send the batches to the gpu. Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Modified crop_dataset_roi interface to automatically write the croppe…

36711d7

…d parameters to a json file in the meta of the dataset Co-authored-by: Adil Zouitine <adilzouitinegm@gmail.com>

Fixed bug in the action scale of the intervention actions and offline…

7ae368e

… dataset actions. (scale by inverse delta) Co-authored-by: Adil Zouitine <adizouitinegm@gmail.com>

Add maniskill support.

2f3370e

Co-authored-by: Michel Aractingi <michel.aractingi@gmail.com>

Re-enable parameter push thread in learner server

befa1fe

- Uncomment and start the param_push_thread - Restore thread joining for param_push_thread

[Port HIL-SERL] Adjust Actor-Learner architecture & clean up dependen…

3ffe0cf

…cy management for HIL-SERL (#722)

Add storage device parameter to replay buffer initialization

5b4a7aa

- Specify storage device for replay buffer to optimize memory management

ChorntonYoel reviewed Feb 26, 2025

View reviewed changes

AdilZouitine and others added 7 commits March 4, 2025 13:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

584cad8

for more information, see https://pre-commit.ci

[HIL-SERL] Migrate threading to multiprocessing (#759)

700f00c

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

[Port HIL-SERL] Balanced sampler function speed up and refactor to al…

d711e20

…ign with train.py (#715) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Add custom save and load methods for SAC policy

5081c14

- Implement `_save_pretrained` method to handle TensorDict state saving - Add `_from_pretrained` class method for loading SAC policy from files - Create utility function `find_and_copy_params` to handle parameter copying

[pre-commit.ci] auto fixes from pre-commit.com hooks

41219fe

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix SAC and port HIL SERL #644

[WIP] Fix SAC and port HIL SERL #644

AdilZouitine commented Jan 17, 2025 •

edited

Loading

ChorntonYoel Feb 26, 2025

AdilZouitine Feb 26, 2025

Ke-Wang1017 Feb 26, 2025

[WIP] Fix SAC and port HIL SERL #644

Are you sure you want to change the base?

[WIP] Fix SAC and port HIL SERL #644

Conversation

AdilZouitine commented Jan 17, 2025 • edited Loading

What this does

How it was tested

How to check out & try it (for the reviewer) 😃

Examples:

ChorntonYoel Feb 26, 2025

Choose a reason for hiding this comment

AdilZouitine Feb 26, 2025

Choose a reason for hiding this comment

Ke-Wang1017 Feb 26, 2025

Choose a reason for hiding this comment

AdilZouitine commented Jan 17, 2025 •

edited

Loading