
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Zhipei Xu, Xuanyu Zhang, Runyi Li, Zecheng Tang, Qing Huang, Jian Zhang
School of Electronic and Computer Engineering, Peking University
π‘ We also have other Copyright Protection projects that may interest you β¨.
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection [CVPR 2024]
Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, Jian Zhang
![]()
![]()
![]()
V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection [ACM MM 2024]
Xuanyu Zhang, Youmin Xu, Runyi Li, Jiwen Yu, Weiqi Li, Zhipei Xu, Jian Zhang
![]()
![]()
![]()
GS-Hider: Hiding Messages into 3D Gaussian Splatting [NeurlPS 2024]
Xuanyu Zhang, Jiarui Meng, Runyi Li, Zhipei Xu, Yongbing Zhang, Jian Zhang
![]()
![]()
![]()
- [2025.02.14] π€ We are progressively open-sourcing all code & pre-trained model weights. Welcome to watch π this repository for the latest updates.
- [2025.01.23] πππ Our FakeShield has been accepted at ICLR 2025!
- [2024.10.03] π₯ We have released FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models. We present explainable IFDL tasks, constructing the MMTD-Set dataset and the FakeShield framework. Check out the paper. The code and dataset are coming soon
FakeShield is a novel multi-modal framework designed for explainable image forgery detection and localization (IFDL). Unlike traditional black-box IFDL methods, FakeShield integrates multi-modal large language models (MLLMs) to analyze manipulated images, generate tampered region masks, and provide human-understandable explanations based on pixel-level artifacts and semantic inconsistencies. To improve generalization across diverse forgery types, FakeShield introduces domain tags, which guide the model to recognize different manipulation techniques effectively. Additionally, we construct MMTD-Set, a richly annotated dataset containing multi-modal descriptions of manipulated images, fostering better interpretability. Through extensive experiments, FakeShield demonstrates superior performance in detecting and localizing various forgeries, including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.
-
FakeShield Introduction. We introduce FakeShield, a multi-modal framework for explainable image forgery detection and localization, which is the first to leverage MLLMs for the IFDL task. We also propose Domain Tag-guided Explainable Forgery Detection Module(DTE-FDM) and Multimodal Forgery Localization Module (MFLM) to improve the generalization and robustness of the models
-
Novel Explainable-IFDL Task. We propose the first explainable image forgery detection and localization (e-IFDL) task, addressing the opacity of traditional IFDL methods by providing both pixel-level and semantic-level explanations.
-
MMTD-Set Dataset Construction. We create the MMTD-Set by enriching existing IFDL datasets using GPT-4o, generating high-quality βimage-mask-descriptionβ triplets for enhanced multimodal learning.
Ensure your environment meets the following requirements:
- Python == 3.9
- Pytorch == 1.13.0
- CUDA Version == 11.6
- Clone the repository:
git clone https://github.com/zhipeixu/FakeShield.git cd FakeShield
- Install dependencies:
apt update && apt install git pip install -r requirements.txt ## Install MMCV git clone https://github.com/open-mmlab/mmcv cd mmcv git checkout v1.4.7 MMCV_WITH_OPS=1 pip install -e .
- Install DTE-FDM:
cd ../DTE-FDM pip install -e . pip install -e ".[train]" pip install flash-attn --no-build-isolation
-
Download FakeShield weights from Hugging Face
The model weights consist of three parts:
DTE-FDM
,MFLM
, andDTG
. For convenience, we have packaged them together and uploaded them to the Hugging Face repository.We recommend using
huggingface_hub
to download the weights:pip install huggingface_hub huggingface-cli download --resume-download zhipeixu/fakeshield-v1-22b --local-dir weight/
-
Download pretrained SAM weight
In MFLM, we will use the SAM pre-training weights. You can use
wget
to download thesam_vit_h_4b8939.pth
model:wget https://huggingface.co/ybelkada/segment-anything/resolve/main/checkpoints/sam_vit_h_4b8939.pth -P weight/
-
Ensure the weights are placed correctly
Organize your
weight/
folder as follows:FakeShield/ βββ weight/ β βββ fakeshield-v1-22b/ β β βββ DTE-FDM/ β β βββ MFLM/ β β βββ DTG.pth β βββ sam_vit_h_4b8939.pth
You can quickly run the demo script by executing:
bash scripts/cli_demo.sh
The cli_demo.sh
script allows customization through the following environment variables:
WEIGHT_PATH
: Path to the FakeShield weight directory (default:./weight/fakeshield-v1-22b
)IMAGE_PATH
: Path to the input image (default:./playground/image/Sp_D_CRN_A_ani0043_ani0041_0373.jpg
)DTE_FDM_OUTPUT
: Path for saving the DTE-FDM output (default:./playground/DTE-FDM_output.jsonl
)MFLM_OUTPUT
: Path for saving the MFLM output (default:./playground/DTE-FDM_output.jsonl
)
Modify these variables to suit different use cases.
The training dataset consists of three types of data:
- PhotoShop Manipulation Dataset: CASIAv2, Fantastic Reality
- DeepFake Manipulation Dataset: FFHQ, FaceAPP
- AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
- MMTD-Set Dataset: MMTD-Set (Coming soon)
The validation dataset consists of three types of data:
- PhotoShop Manipulation Dataset: CASIA1+, IMD2020, Columbia, coverage, NIST16, DSO, Korus
- DeepFake Manipulation Dataset: FFHQ, FaceAPP
- AIGC-Editing Manipulation Dataset: SD_inpaint Dataset (Coming soon)
- MMTD-Set Dataset: MMTD-Set (Coming soon)
Download them from the above links and organize them as follows:
dataset/
βββ photoshop/ # PhotoShop Manipulation Dataset
β βββ CASIAv2_Tp/ # CASIAv2 Tampered Images
β β βββ image/
β β βββ mask/
β βββ CASIAv2_Au/ # CASIAv2 Authentic Images
β β βββ image/
β βββ FR_Tp/ # Fantastic Reality Tampered Images
β β βββ image/
β β βββ mask/
β βββ FR_Au/ # Fantastic Reality Authentic Images
β β βββ image/
β βββ CASIAv1+_Tp/ # CASIAv1+ Tampered Images
β β βββ image/
β β βββ mask/
β βββ CASIAv1+_Au/ # CASIAv1+ Authentic Images
β β βββ image/
β βββ IMD2020_Tp/ # IMD2020 Tampered Images
β β βββ image/
β β βββ mask/
β βββ IMD2020_Au/ # IMD2020 Authentic Images
β β βββ image/
β βββ Columbia/ # Columbia Dataset
β β βββ image/
β β βββ mask/
β βββ coverage/ # Coverage Dataset
β β βββ image/
β β βββ mask/
β βββ NIST16/ # NIST16 Dataset
β β βββ image/
β β βββ mask/
β βββ DSO/ # DSO Dataset
β β βββ image/
β β βββ mask/
β βββ Korus/ # Korus Dataset
β βββ image/
β βββ mask/
β
βββ deepfake/ # DeepFake Manipulation Dataset
β βββ FaceAPP_Train/ # FaceAPP Training Data
β β βββ image/
β β βββ mask/
β βββ FaceAPP_Val/ # FaceAPP Validation Data
β β βββ image/
β β βββ mask/
β βββ FFHQ_Train/ # FFHQ Training Data
β β βββ image/
β βββ FFHQ_Val/ # FFHQ Validation Data
β βββ image/
β
βββ aigc/ # AIGC Editing Manipulation Dataset
β βββ SD_inpaint_Train/ # Stable Diffusion Inpainting Training Data
β β βββ image/
β β βββ mask/
β βββ SD_inpaint_Val/ # Stable Diffusion Inpainting Validation Data
β β βββ image/
β β βββ mask/
β βββ COCO2017_Train/ # COCO2017 Training Data
β β βββ image/
β βββ COCO2017_Val/ # COCO2017 Validation Data
β βββ image/
β
βββ MMTD_Set/ # Multi-Modal Tamper Description Dataset
βββ MMTD-Set-34k.json # JSON Training File
You can fine-tune DTE-FDM using LoRA with the following script:
bash ./scripts/DTE-FDM/finetune_lora.sh
The script allows customization through the following environment variables:
OUTPUT_DIR
: Directory for saving training outputDATA_PATH
: Path to the training dataset (JSON format)WEIGHT_PATH
: Path to the pre-trained weights
Modify these variables as needed to adapt the training process to different datasets and setups.
You can fine-tune MFLM using LoRA with the following script:
bash ./scripts/MFLM/finetune_lora.sh
The script allows customization through the following environment variables:
OUTPUT_DIR
: Directory for saving training outputDATA_PATH
: Path to the training datasetWEIGHT_PATH
: Path to the pre-trained weightsTRAIN_DATA_CHOICE
: Selecting the training datasetVAL_DATA_CHOICE
: Selecting the validation dataset
Modify these variables as needed to adapt the training process to different datasets and setups.
You can test FakeShield using the following script:
bash ./scripts/test.sh
The script allows customization through the following environment variables:
WEIGHT_PATH
: Path to the directory containing the FakeShield model weights.QUESTION_PATH
: Path to the test dataset in JSONL format. This file can be generated using./playground/eval_jsonl.py
.DTE_FDM_OUTPUT
: Path for saving the output of the DTE-FDM model.MFLM_OUTPUT
: Path for saving the output of the MFLM model.
Modify these variables as needed to adapt the evaluation process to different datasets and setups.
@inproceedings{xu2024fakeshield,
title={FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models},
author={Xu, Zhipei and Zhang, Xuanyu and Li, Runyi and Tang, Zecheng and Huang, Qing and Zhang, Jian},
booktitle={International Conference on Learning Representations},
year={2025}
}
We are thankful to LLaVA, groundingLMM, and LISA for releasing their models and code as open-source contributions.