Skip to content

πŸ”₯ [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

License

Notifications You must be signed in to change notification settings

zhipeixu/FakeShield

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Image Alt Text

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang

School of Electronic and Computer Engineering, Peking University

arXiv License Hits hf_space Home Page
wechat wechat zhihu csdn


πŸ’‘ We also have other Copyright Protection projects that may interest you ✨.

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024]
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang
github github arXiv

V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection [ACM MM 2024]
Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang
github github arXiv

GS-Hider: Hiding Messages into 3D Gaussian Splatting [NeurlPS 2024]
Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang
github github arXiv

πŸ“° News

  • [2025.02.14] πŸ€— We are progressively open-sourcing all code & pre-trained model weights. Welcome to watch πŸ‘€ this repository for the latest updates.
  • [2025.01.23] πŸŽ‰πŸŽ‰πŸŽ‰ Our FakeShield has been accepted at ICLR 2025!
  • [2024.10.03] πŸ”₯ We have released FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. We present explainable IFDL tasks, constructing the MMTD-Set dataset and the FakeShield framework. Check out the paper. The code and dataset are coming soon

FakeShield Overview

FakeShield is a novel multi-modal framework designed for explainable image forgery detection and localization (IFDL). Unlike traditional black-box IFDL methods, FakeShield integrates multi-modal large language models (MLLMs) to analyze manipulated images, generate tampered region masks, and provide human-understandable explanations based on pixel-level artifacts and semantic inconsistencies. To improve generalization across diverse forgery types, FakeShield introduces domain tags, which guide the model to recognize different manipulation techniques effectively. Additionally, we construct MMTD-Set, a richly annotated dataset containing multi-modal descriptions of manipulated images, fostering better interpretability. Through extensive experiments, FakeShield demonstrates superior performance in detecting and localizing various forgeries, including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.

alt text

πŸ† Contributions

  • FakeShield Introduction. We introduce FakeShield, a multi-modal framework for explainable image forgery detection and localization, which is the first to leverage MLLMs for the IFDL task. We also propose Domain Tag-guided Explainable Forgery Detection Module(DTE-FDM) and Multimodal Forgery Localization Module (MFLM) to improve the generalization and robustness of the models

  • Novel Explainable-IFDL Task. We propose the first explainable image forgery detection and localization (e-IFDL) task, addressing the opacity of traditional IFDL methods by providing both pixel-level and semantic-level explanations.

  • MMTD-Set Dataset Construction. We create the MMTD-Set by enriching existing IFDL datasets using GPT-4o, generating high-quality β€œimage-mask-description” triplets for enhanced multimodal learning.

πŸ› οΈ Requirements and Installation

Ensure your environment meets the following requirements:

  • Python == 3.9
  • Pytorch == 1.13.0
  • CUDA Version == 11.6

Installation

  1. Clone the repository:
    git clone https://github.com/zhipeixu/FakeShield.git
    cd FakeShield
  2. Install dependencies:
    apt update && apt install git
    pip install -r requirements.txt
    
    ## Install MMCV
    git clone https://github.com/open-mmlab/mmcv
    cd mmcv
    git checkout v1.4.7
    MMCV_WITH_OPS=1 pip install -e .
  3. Install DTE-FDM:
    cd ../DTE-FDM
    pip install -e .
    pip install -e ".[train]"
    pip install flash-attn --no-build-isolation

πŸ€– Prepare Model

  1. Download FakeShield weights from Hugging Face

    The model weights consist of three parts: DTE-FDM, MFLM, and DTG. For convenience, we have packaged them together and uploaded them to the Hugging Face repository.

    We recommend using huggingface_hub to download the weights:

    pip install huggingface_hub
    huggingface-cli download --resume-download zhipeixu/fakeshield-v1-22b --local-dir weight/
  2. Download pretrained SAM weight

    In MFLM, we will use the SAM pre-training weights. You can use wget to download the sam_vit_h_4b8939.pth model:

    wget https://huggingface.co/ybelkada/segment-anything/resolve/main/checkpoints/sam_vit_h_4b8939.pth -P weight/
  3. Ensure the weights are placed correctly

    Organize your weight/ folder as follows:

     FakeShield/
     β”œβ”€β”€ weight/
     β”‚   β”œβ”€β”€ fakeshield-v1-22b/
     β”‚   β”‚   β”œβ”€β”€ DTE-FDM/
     β”‚   β”‚   β”œβ”€β”€ MFLM/
     β”‚   β”‚   β”œβ”€β”€ DTG.pth
     β”‚   β”œβ”€β”€ sam_vit_h_4b8939.pth
    

πŸš€ Quick Start

CLI Demo

You can quickly run the demo script by executing:

bash scripts/cli_demo.sh

The cli_demo.sh script allows customization through the following environment variables:

  • WEIGHT_PATH: Path to the FakeShield weight directory (default: ./weight/fakeshield-v1-22b)
  • IMAGE_PATH: Path to the input image (default: ./playground/image/Sp_D_CRN_A_ani0043_ani0041_0373.jpg)
  • DTE_FDM_OUTPUT: Path for saving the DTE-FDM output (default: ./playground/DTE-FDM_output.jsonl)
  • MFLM_OUTPUT: Path for saving the MFLM output (default: ./playground/DTE-FDM_output.jsonl)

Modify these variables to suit different use cases.

πŸ‹οΈβ€β™‚οΈ Train

Training Data Preparation

The training dataset consists of three types of data:

  1. PhotoShop Manipulation Dataset: CASIAv2, Fantastic Reality
  2. DeepFake Manipulation Dataset: FFHQ, FaceAPP
  3. AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
  4. MMTD-Set Dataset: MMTD-Set (Coming soon)

Validation Data Preparation

The validation dataset consists of three types of data:

  1. PhotoShop Manipulation Dataset: CASIA1+, IMD2020, Columbia, coverage, NIST16, DSO, Korus
  2. DeepFake Manipulation Dataset: FFHQ, FaceAPP
  3. AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
  4. MMTD-Set Dataset: MMTD-Set (Coming soon)

Download them from the above links and organize them as follows:

dataset/
β”œβ”€β”€ photoshop/                # PhotoShop Manipulation Dataset
β”‚   β”œβ”€β”€ CASIAv2_Tp/           # CASIAv2 Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ CASIAv2_Au/           # CASIAv2 Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ FR_Tp/                # Fantastic Reality Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ FR_Au/                # Fantastic Reality Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ CASIAv1+_Tp/          # CASIAv1+ Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ CASIAv1+_Au/          # CASIAv1+ Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ IMD2020_Tp/           # IMD2020 Tampered Images
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ IMD2020_Au/           # IMD2020 Authentic Images
β”‚   β”‚   └── image/
β”‚   β”œβ”€β”€ Columbia/             # Columbia Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ coverage/             # Coverage Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ NIST16/               # NIST16 Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ DSO/                  # DSO Dataset
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   └── Korus/                # Korus Dataset
β”‚       β”œβ”€β”€ image/
β”‚       └── mask/
β”‚
β”œβ”€β”€ deepfake/                 # DeepFake Manipulation Dataset
β”‚   β”œβ”€β”€ FaceAPP_Train/        # FaceAPP Training Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ FaceAPP_Val/          # FaceAPP Validation Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ FFHQ_Train/           # FFHQ Training Data
β”‚   β”‚   └── image/
β”‚   └── FFHQ_Val/             # FFHQ Validation Data
β”‚       └── image/
β”‚
β”œβ”€β”€ aigc/                     # AIGC Editing Manipulation Dataset
β”‚   β”œβ”€β”€ SD_inpaint_Train/     # Stable Diffusion Inpainting Training Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ SD_inpaint_Val/       # Stable Diffusion Inpainting Validation Data
β”‚   β”‚   β”œβ”€β”€ image/
β”‚   β”‚   └── mask/
β”‚   β”œβ”€β”€ COCO2017_Train/       # COCO2017 Training Data
β”‚   β”‚   └── image/
β”‚   └── COCO2017_Val/         # COCO2017 Validation Data
β”‚       └── image/
β”‚
└── MMTD_Set/                 # Multi-Modal Tamper Description Dataset
    └── MMTD-Set-34k.json     # JSON Training File

LoRA Finetune DTE-FDM

You can fine-tune DTE-FDM using LoRA with the following script:

bash ./scripts/DTE-FDM/finetune_lora.sh

The script allows customization through the following environment variables:

  • OUTPUT_DIR: Directory for saving training output
  • DATA_PATH: Path to the training dataset (JSON format)
  • WEIGHT_PATH: Path to the pre-trained weights

Modify these variables as needed to adapt the training process to different datasets and setups.

LoRA Finetune MFLM

You can fine-tune MFLM using LoRA with the following script:

bash ./scripts/MFLM/finetune_lora.sh

The script allows customization through the following environment variables:

  • OUTPUT_DIR: Directory for saving training output
  • DATA_PATH: Path to the training dataset
  • WEIGHT_PATH: Path to the pre-trained weights
  • TRAIN_DATA_CHOICE: Selecting the training dataset
  • VAL_DATA_CHOICE: Selecting the validation dataset

Modify these variables as needed to adapt the training process to different datasets and setups.

πŸ› οΈ Test

You can test FakeShield using the following script:

bash ./scripts/test.sh

The script allows customization through the following environment variables:

  • WEIGHT_PATH: Path to the directory containing the FakeShield model weights.
  • QUESTION_PATH: Path to the test dataset in JSONL format. This file can be generated using ./playground/eval_jsonl.py.
  • DTE_FDM_OUTPUT: Path for saving the output of the DTE-FDM model.
  • MFLM_OUTPUT: Path for saving the output of the MFLM model.

Modify these variables as needed to adapt the evaluation process to different datasets and setups.

πŸ“œ Citation

    @inproceedings{xu2024fakeshield,
            title={FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models},
            author={Xu, Zhipei and Zhang, Xuanyu and Li, Runyi and Tang, Zecheng and Huang, Qing and Zhang, Jian},
            booktitle={International Conference on Learning Representations},
            year={2025}
    }

πŸ™ Acknowledgement

We are thankful to LLaVA, groundingLMM, and LISA for releasing their models and code as open-source contributions.

About

πŸ”₯ [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages