GitHub - zhipeixu/FakeShield: 🔥 [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang

School of Electronic and Computer Engineering, Peking University

💡 We also have other Copyright Protection projects that may interest you ✨.

EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024]
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang

V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection [ACM MM 2024]
Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang

GS-Hider: Hiding Messages into 3D Gaussian Splatting [NeurlPS 2024]
Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang

📰 News

[2025.02.14] 🤗 We are progressively open-sourcing all code & pre-trained model weights. Welcome to watch 👀 this repository for the latest updates.
[2025.01.23] 🎉🎉🎉 Our FakeShield has been accepted at ICLR 2025!
[2024.10.03] 🔥 We have released FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. We present explainable IFDL tasks, constructing the MMTD-Set dataset and the FakeShield framework. Check out the paper. The code and dataset are coming soon

FakeShield Overview

FakeShield is a novel multi-modal framework designed for explainable image forgery detection and localization (IFDL). Unlike traditional black-box IFDL methods, FakeShield integrates multi-modal large language models (MLLMs) to analyze manipulated images, generate tampered region masks, and provide human-understandable explanations based on pixel-level artifacts and semantic inconsistencies. To improve generalization across diverse forgery types, FakeShield introduces domain tags, which guide the model to recognize different manipulation techniques effectively. Additionally, we construct MMTD-Set, a richly annotated dataset containing multi-modal descriptions of manipulated images, fostering better interpretability. Through extensive experiments, FakeShield demonstrates superior performance in detecting and localizing various forgeries, including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.

🏆 Contributions

FakeShield Introduction. We introduce FakeShield, a multi-modal framework for explainable image forgery detection and localization, which is the first to leverage MLLMs for the IFDL task. We also propose Domain Tag-guided Explainable Forgery Detection Module(DTE-FDM) and Multimodal Forgery Localization Module (MFLM) to improve the generalization and robustness of the models
Novel Explainable-IFDL Task. We propose the first explainable image forgery detection and localization (e-IFDL) task, addressing the opacity of traditional IFDL methods by providing both pixel-level and semantic-level explanations.
MMTD-Set Dataset Construction. We create the MMTD-Set by enriching existing IFDL datasets using GPT-4o, generating high-quality “image-mask-description” triplets for enhanced multimodal learning.

🛠️ Requirements and Installation

Ensure your environment meets the following requirements:

Python == 3.9
Pytorch == 1.13.0
CUDA Version == 11.6

Installation

Clone the repository:

git clone https://github.com/zhipeixu/FakeShield.git
cd FakeShield

Install dependencies:

apt update && apt install git
pip install -r requirements.txt

## Install MMCV
git clone https://github.com/open-mmlab/mmcv
cd mmcv
git checkout v1.4.7
MMCV_WITH_OPS=1 pip install -e .

Install DTE-FDM:

cd ../DTE-FDM
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation

🤖 Prepare Model

Download FakeShield weights from Hugging Face

The model weights consist of three parts: DTE-FDM, MFLM, and DTG. For convenience, we have packaged them together and uploaded them to the Hugging Face repository.

We recommend using huggingface_hub to download the weights:
```
pip install huggingface_hub
huggingface-cli download --resume-download zhipeixu/fakeshield-v1-22b --local-dir weight/
```
Download pretrained SAM weight

In MFLM, we will use the SAM pre-training weights. You can use wget to download the sam_vit_h_4b8939.pth model:
```
wget https://huggingface.co/ybelkada/segment-anything/resolve/main/checkpoints/sam_vit_h_4b8939.pth -P weight/
```

Ensure the weights are placed correctly

Organize your weight/ folder as follows:

 FakeShield/
 ├── weight/
 │   ├── fakeshield-v1-22b/
 │   │   ├── DTE-FDM/
 │   │   ├── MFLM/
 │   │   ├── DTG.pth
 │   ├── sam_vit_h_4b8939.pth

🚀 Quick Start

CLI Demo

You can quickly run the demo script by executing:

bash scripts/cli_demo.sh

The cli_demo.sh script allows customization through the following environment variables:

WEIGHT_PATH: Path to the FakeShield weight directory (default: ./weight/fakeshield-v1-22b)
IMAGE_PATH: Path to the input image (default: ./playground/image/Sp_D_CRN_A_ani0043_ani0041_0373.jpg)
DTE_FDM_OUTPUT: Path for saving the DTE-FDM output (default: ./playground/DTE-FDM_output.jsonl)
MFLM_OUTPUT: Path for saving the MFLM output (default: ./playground/DTE-FDM_output.jsonl)

Modify these variables to suit different use cases.

🏋️‍♂️ Train

Training Data Preparation

The training dataset consists of three types of data:

PhotoShop Manipulation Dataset: CASIAv2, Fantastic Reality
DeepFake Manipulation Dataset: FFHQ, FaceAPP
AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
MMTD-Set Dataset: MMTD-Set (Coming soon)

Validation Data Preparation

The validation dataset consists of three types of data:

PhotoShop Manipulation Dataset: CASIA1+, IMD2020, Columbia, coverage, NIST16, DSO, Korus
DeepFake Manipulation Dataset: FFHQ, FaceAPP
AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
MMTD-Set Dataset: MMTD-Set (Coming soon)

Download them from the above links and organize them as follows:

dataset/
├── photoshop/                # PhotoShop Manipulation Dataset
│   ├── CASIAv2_Tp/           # CASIAv2 Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── CASIAv2_Au/           # CASIAv2 Authentic Images
│   │   └── image/
│   ├── FR_Tp/                # Fantastic Reality Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── FR_Au/                # Fantastic Reality Authentic Images
│   │   └── image/
│   ├── CASIAv1+_Tp/          # CASIAv1+ Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── CASIAv1+_Au/          # CASIAv1+ Authentic Images
│   │   └── image/
│   ├── IMD2020_Tp/           # IMD2020 Tampered Images
│   │   ├── image/
│   │   └── mask/
│   ├── IMD2020_Au/           # IMD2020 Authentic Images
│   │   └── image/
│   ├── Columbia/             # Columbia Dataset
│   │   ├── image/
│   │   └── mask/
│   ├── coverage/             # Coverage Dataset
│   │   ├── image/
│   │   └── mask/
│   ├── NIST16/               # NIST16 Dataset
│   │   ├── image/
│   │   └── mask/
│   ├── DSO/                  # DSO Dataset
│   │   ├── image/
│   │   └── mask/
│   └── Korus/                # Korus Dataset
│       ├── image/
│       └── mask/
│
├── deepfake/                 # DeepFake Manipulation Dataset
│   ├── FaceAPP_Train/        # FaceAPP Training Data
│   │   ├── image/
│   │   └── mask/
│   ├── FaceAPP_Val/          # FaceAPP Validation Data
│   │   ├── image/
│   │   └── mask/
│   ├── FFHQ_Train/           # FFHQ Training Data
│   │   └── image/
│   └── FFHQ_Val/             # FFHQ Validation Data
│       └── image/
│
├── aigc/                     # AIGC Editing Manipulation Dataset
│   ├── SD_inpaint_Train/     # Stable Diffusion Inpainting Training Data
│   │   ├── image/
│   │   └── mask/
│   ├── SD_inpaint_Val/       # Stable Diffusion Inpainting Validation Data
│   │   ├── image/
│   │   └── mask/
│   ├── COCO2017_Train/       # COCO2017 Training Data
│   │   └── image/
│   └── COCO2017_Val/         # COCO2017 Validation Data
│       └── image/
│
└── MMTD_Set/                 # Multi-Modal Tamper Description Dataset
    └── MMTD-Set-34k.json     # JSON Training File

LoRA Finetune DTE-FDM

You can fine-tune DTE-FDM using LoRA with the following script:

bash ./scripts/DTE-FDM/finetune_lora.sh

The script allows customization through the following environment variables:

OUTPUT_DIR: Directory for saving training output
DATA_PATH: Path to the training dataset (JSON format)
WEIGHT_PATH: Path to the pre-trained weights

Modify these variables as needed to adapt the training process to different datasets and setups.

LoRA Finetune MFLM

You can fine-tune MFLM using LoRA with the following script:

bash ./scripts/MFLM/finetune_lora.sh

The script allows customization through the following environment variables:

OUTPUT_DIR: Directory for saving training output
DATA_PATH: Path to the training dataset
WEIGHT_PATH: Path to the pre-trained weights
TRAIN_DATA_CHOICE: Selecting the training dataset
VAL_DATA_CHOICE: Selecting the validation dataset

Modify these variables as needed to adapt the training process to different datasets and setups.

🛠️ Test

You can test FakeShield using the following script:

bash ./scripts/test.sh

The script allows customization through the following environment variables:

WEIGHT_PATH: Path to the directory containing the FakeShield model weights.
QUESTION_PATH: Path to the test dataset in JSONL format. This file can be generated using ./playground/eval_jsonl.py.
DTE_FDM_OUTPUT: Path for saving the output of the DTE-FDM model.
MFLM_OUTPUT: Path for saving the output of the MFLM model.

Modify these variables as needed to adapt the evaluation process to different datasets and setups.

📜 Citation

    @inproceedings{xu2024fakeshield,
            title={FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models},
            author={Xu, Zhipei and Zhang, Xuanyu and Li, Runyi and Tang, Zecheng and Huang, Qing and Zhang, Jian},
            booktitle={International Conference on Learning Representations},
            year={2025}
    }

🙏 Acknowledgement

We are thankful to LLaVA, groundingLMM, and LISA for releasing their models and code as open-source contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

📰 News

FakeShield Overview

🏆 Contributions

🛠️ Requirements and Installation

Installation

🤖 Prepare Model

🚀 Quick Start

CLI Demo

🏋️‍♂️ Train

Training Data Preparation

Validation Data Preparation

LoRA Finetune DTE-FDM

LoRA Finetune MFLM

🛠️ Test

📜 Citation

🙏 Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
DTE-FDM		DTE-FDM
MFLM		MFLM
assets		assets
playground		playground
scripts		scripts
weight		weight
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

zhipeixu/FakeShield

Folders and files

Latest commit

History

Repository files navigation

FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

📰 News

FakeShield Overview

🏆 Contributions

🛠️ Requirements and Installation

Installation

🤖 Prepare Model

🚀 Quick Start

CLI Demo

🏋️‍♂️ Train

Training Data Preparation

Validation Data Preparation

LoRA Finetune DTE-FDM

LoRA Finetune MFLM

🛠️ Test

📜 Citation

🙏 Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages