CROWD: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack

Siqi Sun, Procheta Sen and Wenjie Ruan*

*Corresponding Author

Our work has been accepted by Findings of the Association for Computational Linguistics: EMNLP 2024

A certified robustness verification is proposed with theoretical guarantees targeted to the NLP classifier against backdoor attacks.

Abstract

Language models are vulnerable to clandestinely modified data and manipulation by attackers. Despite considerable research dedicated to enhancing robustness against adversarial attacks, the realm of provable robustness for backdoor attacks remains relatively unexplored. In this paper, we initiate a pioneering investigation into the certified robustness of NLP models against backdoor triggers. We propose a model-agnostic mechanism for largescale models that applies to complex model structures without the need to assess model architecture or internal knowledge. More importantly, we take recent advances in randomized smoothing theory and propose a novel weightbased distribution algorithm to enable semantic similarity and provide theoretical robustness guarantees. Experimentally, we demonstrate the efficacy of our approach across a diverse range of datasets and tasks, highlighting its utility in mitigating backdoor triggers. Our results show strong performance in terms of certified accuracy, scalability, and semantic preservation.

Overview

Data Preparation

The generation of backdoor attacks is supported by Openbackdoor. More details can be found in Openbackdoor.

Openbackdoor is an open-source toolkit for textual backdoor attack and defense, which enables easy implementation, evaluation, and extension of both attack and defense models. The below code blocks present examples of built-in backdoor attack.

Attack

# Attack BERT on SST-2 with BadNet
import openbackdoor as ob 
from openbackdoor import load_dataset
# choose BERT as victim model 
victim = ob.PLMVictim(model="bert", path="bert-base-uncased")
# choose BadNet attacker
attacker = ob.Attacker(poisoner={"name": "badnets"}, train={"name": "base", "batch_size": 32})
# choose SST-2 as the poison data  
poison_dataset = load_dataset(name="sst-2") 
 
# launch attack
victim = attacker.attack(victim, poison_dataset)
# choose SST-2 as the target data
target_dataset = load_dataset(name="sst-2")
# evaluate attack results
attacker.eval(victim, target_dataset)

How to Run

Data Preprocess

python data_process.py

Train

python train_crowd.py

Evaluation

python eval_crowd.py

Citation

If you find our method useful, please kindly cite our paper:

@inproceedings{sun-etal-2024-crowd,
    title = "{CROWD}: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack",
    author = "Sun, Siqi  and
      Sen, Procheta  and
      Ruan, Wenjie",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.993",
    doi = "10.18653/v1/2024.findings-emnlp.993",
    pages = "17056--17070",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figs		figs
README.md		README.md
data_process.py		data_process.py
data_util.py		data_util.py
dataset_utils.py		dataset_utils.py
eval_crowd.py		eval_crowd.py
train_crowd.py		train_crowd.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CROWD: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack

Abstract

Overview

Data Preparation

Attack

How to Run

Data Preprocess

Train

Evaluation

Citation

About

Releases

Packages

Contributors 2

Languages

TrustAI/CROWD

Folders and files

Latest commit

History

Repository files navigation

CROWD: Certified Robustness via Weight Distribution for Smoothed Classifiers against Backdoor Attack

Abstract

Overview

Data Preparation

Attack

How to Run

Data Preprocess

Train

Evaluation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages