Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

Official Repository of Panacea.

[Paper] Panacea: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen^1*†, Yucheng Zhao^2*,Yingfei Liu^2*, Fan Jia², Yanhui Wang¹, Chong Luo¹, Chi Zhang³, Tiancai Wang^2‡, Xiaoyan Sun^1‡, Xiangyu Zhang²
¹University of Science and Technology of China, ²MEGVII Technology, ³Mach Drive
^*Equal Contribution, ^†This work was done during the internship at MEGVII, ^‡Corresponding Author.

[Paper] Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving,
Yuqing Wen^1*†, Yucheng Zhao^2*,Yingfei Liu^2*, Binyuan Huang^4*, Fan Jia², Yanhui Wang¹, Chi Zhang³, Tiancai Wang^2‡, Xiaoyan Sun^1‡, Xiangyu Zhang²
¹University of Science and Technology of China, ²MEGVII Technology, ³Mach Drive, ⁴Wuhan University
^*Equal Contribution, ^†This work was done during the internship at MEGVII, ^‡Corresponding Author.

[WebPage] https://panacea-ad.github.io/

News

Aug. 15th, 2024: We release an enhanced version of Panacea, named Panancea+, which has improved performance and comprehensive validation on multiple datasets and tasks. For more details, please refer to the paper Panacea+.
Aug. 15th, 2024: We release the checkpoint and inference scripts for stage 2 of Panacea+, you can use it to generate multi-view video samples based on BEV layout sequences.
Apr. 18th, 2024: We release our Gen-nuScenes dataset generated by Panacea. Please check the metrics/ folder to use it.
Apr. 18th, 2024: We release the BEV-perception evaluation codes based on StreamPETR. Please check the metrics/ folder and follow the metrics/README.md for detailed evaluation.

Getting Started

Please follow our documentation step by step.

Environment Setup

Following the instruction from: Environment Setup.

Prepare dataset

Prepare real dataset following the instruction from Data Preparation.

Remember to put the dataset under the path data/nuscenes

Download pretrained checkpoint

Download the weights of the second stage from panaceaplus_40k_deepspeed.ckpt

Put it to folder checkpoints/

Inference

--split: to specify train or val sets

--use_last_frame=true means use the last frame as conditional image.

Run the following command to inference stage 2 on the whole training/val set of nuscenes.

python -m torch.distributed.launch --nproc_per_node=8 --master_port=1238 inference.py --base configs/inference_nuscenes.yaml --ckptpath --ckpt checkpoints/panaceaplus_40k_deepspeed.ckpt --split train --use_last_frame true --name EXP_NAME --bs 1

Generating Multi-View and Controllable Videos for Autonoumous Driving

Overview of Panacea. (a). The diffusion training process of Panacea, enabled by a diffusion encoder and decoder with the decomposed 4D attention module. (b). The decomposed 4D attention module comprises three components: intra-view attention for spatial processing within individual views, cross-view attention to engage with adjacent views, and cross-frame attention for temporal processing. (c). Controllable module for the integration of diverse signals. The image conditions are derived from a frozen VAE encoder and combined with diffused noises. The text prompts are processed through a frozen CLIP encoder, while BEV sequences are handled via ControlNet. (d). The details of BEV layout sequences, including projected bounding boxes, object depths, road maps and camera pose.

The two-stage inference pipeline of Panacea. Its two-stage process begins by creating multi-view images with BEV layouts, followed by using these images, along with subsequent BEV layouts, to facilitate the generation of following frames.

🎬 BEV-guided Video Generation 🎬

Controllable multi-view video generation. Panacea is able to generate realistic, controllable videos with good temporal and view consistensy.

🎞 Attribute Controllable Video Generation 🎞

Video generation with variable attribute controls, such as weather, time, and scene, which allows Panacea to simulate a variety of rare driving scenarios, including extreme weather conditions such as rain and snow, thereby greatly enhancing the diversity of the data.

🔥 Benefiting Autonomous Driving 🔥

(a). Panoramic video generation based on BEV (Bird’s-Eye-View) layout sequence facilitates the establishment of a synthetic video dataset, which enhances perceptual tasks. (b). Producing panoramic videos with conditional images and BEV layouts can effectively elevate image-only datasets to video datasets, thus enabling the advancement of video-based perception techniques.

BibTex

                
@inproceedings{wen2024panacea,
  title={Panacea: Panoramic and controllable video generation for autonomous driving},
  author={Wen, Yuqing and Zhao, Yucheng and Liu, Yingfei and Jia, Fan and Wang, Yanhui and Luo, Chong and Zhang, Chi and Wang, Tiancai and Sun, Xiaoyan and Zhang, Xiangyu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6902--6912},
  year={2024}
}
@misc{wen2024panaceapanoramiccontrollablevideo,
      title={Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving}, 
      author={Yuqing Wen and Yucheng Zhao and Yingfei Liu and Binyuan Huang and Fan Jia and Yanhui Wang and Chi Zhang and Tiancai Wang and Xiaoyan Sun and Xiangyu Zhang},
      year={2024},
      eprint={2408.07605},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.07605}, 
}
}

Contact

Feel free to contact us at wenyuqing AT mail.ustc.edu.cn or wangtiancai AT megvii.com

Acknowledgement

This code builds on Stability-AI, ControlNet and StreamPETR. Thanks for open-sourcing!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assests		assests
configs		configs
data		data
docs		docs
metrics		metrics
requirements		requirements
sgm		sgm
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
panacea.yml		panacea.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

News

Getting Started

Environment Setup

Prepare dataset

Download pretrained checkpoint

Inference

Generating Multi-View and Controllable Videos for Autonoumous Driving

🎬 BEV-guided Video Generation 🎬

🎞 Attribute Controllable Video Generation 🎞

🔥 Benefiting Autonomous Driving 🔥

BibTex

Contact

Acknowledgement

About

Releases

Packages

Languages

License

wenyuqing/panacea

Folders and files

Latest commit

History

Repository files navigation

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

News

Getting Started

Environment Setup

Prepare dataset

Download pretrained checkpoint

Inference

Generating Multi-View and Controllable Videos for Autonoumous Driving

🎬 BEV-guided Video Generation 🎬

🎞 Attribute Controllable Video Generation 🎞

🔥 Benefiting Autonomous Driving 🔥

BibTex

Contact

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages