Skip to content

Latest commit

 

History

History
299 lines (178 loc) · 9.7 KB

File metadata and controls

299 lines (178 loc) · 9.7 KB

Awesome-CVPR2025-AIGCAwesome

A Collection of Papers and Codes for CVPR2025 AIGC

整理汇总下今年CVPR AIGC相关的论文和代码,具体如下。

欢迎star,fork和PR~

Please feel free to star, fork or PR if helpful~

相关整理

参考或转载请注明出处

CVPR2025官网:https://cvpr.thecvf.com/Conferences/2025

CVPR接收论文列表:

CVPR完整论文库:

开会时间:2025月6月11日-2025月6月15日

论文接收公布时间:2025年2月27日

【Contents】

1.图像生成(Image Generation/Image Synthesis)

CacheQuant: Comprehensively Accelerated Diffusion Models

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

DiC: Rethinking Conv3x3 Designs in Diffusion Models

DreamText: High Fidelity Scene Text Synthesis

Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network

Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models

Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification

Parallelized Autoregressive Visual Generation

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Rectified Diffusion Guidance for Conditional Generation

SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

2.图像编辑(Image Editing)

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Attention Distillation: A Unified Approach to Visual Characteristics Transfer

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

EmoEdit: Evoking Emotions through Image Manipulation

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Recognition-Synergistic Scene Text Editing

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

3.视频生成(Video Generation/Video Synthesis)

ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

4.视频编辑(Video Editing)

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing

Generative Inbetweening through Frame-wise Conditions-Driven Video Generation

X-Dyna: Expressive Dynamic Human Image Animation

5.3D生成(3D Generation/3D Synthesis)

Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation

Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation

LT3SD: Latent Trees for 3D Scene Diffusion

Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture

You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale

6.3D编辑(3D Editing)

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters

7.多模态大语言模型(Multi-Modal Large Language Models)

Automated Generation of Challenging Multiple Choice Questions for Vision Language Model Evaluation

RAP-MLLM: Retrieval-Augmented Personalization for Multimodal Large Language Model

SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

8.其他任务(Others)

Continuous and Locomotive Crowd Behavior Generation

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

持续更新~