-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add torchcodec cpu #798
base: main
Are you sure you want to change the base?
Add torchcodec cpu #798
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work Jade! Thanks :)
Let's wait for the next version of torchcodec then!
In the meantime, could you try reproducing results on pusht and aloha transfer cube? and adding the commands that you use and the success rate in the README?
THanks!
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
Co-authored-by: Remi <re.cadene@gmail.com>
for more information, see https://pre-commit.ci
Torchcodec consistently outperforms pyav across all datasets and video codecs (encoders), it achieves lower MSE (better accuracy), higher PSNR (better quality), and higher SSIM (better perceptual similarity). this trend is evident across |
What this does
This PR replaces torchvision CPU decoding by torchcodec CPU decoding.
Also added a
decode_video_frames
function that wraps multiple backends, instead of callingdecode_video_frames_BACKENDNAME
separately. This makes it more efficient and allows us to add more decoders later on!The decoder used is decided based on the
dataset.video_backend
key, but defaults to torchcodec.How it was tested
Test and Benchmark the decoders on different datasets/policies.
How to checkout & try? (for the reviewer)
Just run the training script, with a dataset containing videos to decode.
example:
Benchmarks
Ran one benchmark on
lerobot/aloha_sim_insertion_human_image dataset
Comparison: PyAV vs TorchCodec (CPU)
What's left
Remove/suppresslibdav1d
logs (they're noisy) -> there's no env variable to disable those for now but they'll be deactivated in the next version of torchcodec.PR is in a good state ✅