This repository provides AV(Audio-Video)synchronization for talking human videos. It uses face-detection model(S3FD) and lipsync model. Thanks for these great works!
Papers :
Git :
- Synchonize videos with audios automatically.
- Synced video can be utilized for video-generation(face-reenactment, lip-generation...) task.
conda create -n "video-processing" python=3.7
source activate video-processing
git clone https://github.com/jovis-gnn/video-processing.git
cd video-processing
pip install requirements.txt
mkdir tmp
Download pretrained S3FD, lipsync model checkpoints in this Link. And put them to directories as mentioned below.
# for S3FD
video-processing/s3fd/weights/
# for lipsync
video-processing/tmp/model_weight/
❗ It assumes that one person appears in every video frame.
💡 It can process single video for demo, and multiple videos for preprocessing.
- Input data
video-processing/test/test_video.mp4
- run
cd video-processing
python main.py \
--data_path test/test_video.mp4 \
--single_video \
--del_orig \
--check_after_sync
- Input data
video-processing/test/videos/video_names_0/video_0.mp4
video-processing/test/videos/video_names_0/video_1.mp4
...
video-processing/test/videos/video_names_k/video_k.mp4
- run
cd video-processing
python main.py \
--data_path test/videos \
--ds_name custom \
--del_orig \
--check_after_sync
Processed result wil be saved next to input directories with suffix '_prep'. Result contains contents like:
- Original audio & video
- Synced audio & video
- Synced video frames