Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add torchcodec cpu #798

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jadechoghari
Copy link

@jadechoghari jadechoghari commented Mar 3, 2025

What this does

This PR replaces torchvision CPU decoding by torchcodec CPU decoding.
Also added a decode_video_frames function that wraps multiple backends, instead of calling decode_video_frames_BACKENDNAME separately. This makes it more efficient and allows us to add more decoders later on!

The decoder used is decided based on the dataset.video_backend key, but defaults to torchcodec.

How it was tested

Test and Benchmark the decoders on different datasets/policies.

How to checkout & try? (for the reviewer)

Just run the training script, with a dataset containing videos to decode.
example:

python lerobot/scripts/train.py \
    --output_dir=outputs/train/act_aloha_insertion \
    --policy.type=act \
    --dataset.repo_id=lerobot/aloha_sim_insertion_human \
    --env.type=aloha \
    --env.task=AlohaInsertion-v0 \

Benchmarks

Ran one benchmark on lerobot/aloha_sim_insertion_human_image dataset
Comparison: PyAV vs TorchCodec (CPU)

Metric PyAV TorchCodec-CPU
Video to Images Load Time Ratio 1.87 1.25
Avg MSE 5.14e-05 4.88e-05
Avg PSNR 43.17 43.37
Avg SSIM 0.995 0.995

What's left

Remove/suppress libdav1d logs (they're noisy) -> there's no env variable to disable those for now but they'll be deactivated in the next version of torchcodec.

PR is in a good state ✅

@jadechoghari jadechoghari marked this pull request as draft March 3, 2025 06:49
@jadechoghari jadechoghari marked this pull request as ready for review March 3, 2025 07:32
@Cadene Cadene self-requested a review March 4, 2025 08:31
Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work Jade! Thanks :)

Let's wait for the next version of torchcodec then!

In the meantime, could you try reproducing results on pusht and aloha transfer cube? and adding the commands that you use and the success rate in the README?

THanks!

jadechoghari and others added 5 commits March 4, 2025 13:27
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
@imstevenpmwork imstevenpmwork added enhancement Suggestions for new features or improvements performance Issues aimed at improving speed or resource usage labels Mar 4, 2025
@jadechoghari
Copy link
Author

Torchcodec consistently outperforms pyav across all datasets and video codecs (encoders), it achieves lower MSE (better accuracy), higher PSNR (better quality), and higher SSIM (better perceptual similarity). this trend is evident across libsvtav1, libx264, and libx265, and it makes torchcodec the superior choice for both efficiency and quality. To reproduce the full results, check this link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Suggestions for new features or improvements performance Issues aimed at improving speed or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants