Skip to content

shauli-ravfogel/lm-counterfactuals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8838060 · Jan 12, 2025

History

95 Commits
Dec 13, 2024
Jan 12, 2025
Oct 8, 2024
Dec 13, 2024
Dec 6, 2024
Dec 13, 2024
Dec 4, 2024
Dec 13, 2024
Oct 22, 2024
Dec 5, 2024
Dec 4, 2024
Dec 5, 2024
Nov 27, 2024

Repository files navigation

This repository contains the code for the paper "Gumbel Counterfactual Generation from Language Models". In this work, we conceptualize LMs as Generalized Causal Models (GCMs), enabling us to generate true counterfactual strings from a given input string. By leveraging the Gumbel-Max trick, we separate the deterministic computations of the LM’s forward pass from the inherent randomness of the sampling process. This allows us to use hindsight sampling to identify the noise responsible for generating a specific string and reuse the same noise when generating a counterfactual string from the model, post-intervention.

To set up the environment:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# download models
gdown --folder https://drive.google.com/drive/folders/11PE8DxVqbfpsLhqLop71CqL6KsdeTnFf

Then, run run.py to re-generate the counterfactuals on the Wikipedia/Bios dataset. The notebook example.ipynb contains a minimal example for generating a counterfactual string based on an original string.

The directory counterfactuals contains the counterfactuals sentences we generated from Wikipedia and the Biosd dataset, based on several models and intervention techniques.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published