ALFA: Aligning LLMs to Ask Good Questions

This repository implements the ALFA framework for improving large language models’ ability to ask high-quality follow-up questions in clinical reasoning scenarios. It includes code for:

Data processing (preparing real-world interactions from r/AskDocs)
Counterfactual data generation (synthesizing diverse question variations with specific attributes)
Preference-based optimization via DPO/PPO/RLHF
Evaluation on both single-turn question quality and an interactive clinical reasoning benchmark (MediQ-AskDocs)

Below is an overview of the repository structure and pointers on how to run various components. For detailed technical explanations and design decisions, please see the associated paper and supplementary documentation.

Key Directories

data/
Holds raw and processed data for training and evaluation (r/AskDocs data, prompts, ID lists).
- ids/: Files listing specific train/test/eval question IDs.
- mediq_eval/: Data for the interactive MediQ experiments, including conversation files.
- prompts/: Prompt templates and references for LLM data generation.
src/
Source code for data processing, counterfactual generation, evaluation, etc.
- counterfactual_generation/: Scripts to synthesize "enhanced" or "corrupted" question variants for clarity, relevance, answerability, etc.
- data_loader/: Scripts to create preference training files, supervised fine-tuning (SFT) data, test splits, etc.
- mediq_eval/: Code for running interactive clinical QA with the MediQ framework.
- rank_eval/: Tools for pairwise ranking of generated questions (LLM-based or human annotator).
- training/: Contains RLHF code (OpenRLHF) and pipelines for DPO/PPO or reward modeling.
  - sample_configs/: Example YAML config files for each training/fine-tuning stage.
  - scripts/: Stand-alone scripts to run various tasks (DPO, PPO, SFT, merging model weights, or generating questions in batch).

0. Installation & Environment

To reproduce the paper results, you would need to follow each step, but for the ALFA framework and evaluation, skip to Step 3.

Clone the Repo

git clone https://github.com/stellalisy/alfa.git
cd alfa

Set Up Conda Environment

conda env create -f environment.yml
conda activate alfa

Directory Permissions
Ensure you have appropriate read/write permissions for data/model checkpoints.

1. Data Preparation

Prepare Raw Data
Place your original r/AskDocs data in data/. The code in src/data_loader/ will expect certain file naming conventions.
Generating Train/Test Splits
Use scripts like create_sft_files.py or create_test_files.py to generate final .jsonl files for each split.
Additional Metadata
If you have labels or specialized contexts, put them in data/ids/.

2. Counterfactual Generation

Scripts in src/counterfactual_generation/ use an LLM to rewrite questions with different attributes.

generate.py: Main script for attribute-based rewriting.
verifier_filter.py: Uses an LLM-based judge to confirm whether the generated rewrites match the intended direction.

cd src/counterfactual_generation
python generate.py --config path_to_generation_config.yaml
python verifier_filter.py --config path_to_verification_config.yaml

The output typically contains enhanced, original, and corrupted question versions in JSON.

3. The ALFA Framework

Reward Model Training

Train a reward model to score question pairs as "better" or "worse."

python scripts/launch_rm_with_yaml.py --config sample_configs/sample_config_rm.yaml

DPO or PPO Alignment

Use DPO (Direct Preference Optimization) or PPO. DPO is simpler, while PPO is RL-based.

# DPO
python scripts/launch_dpo_with_yaml.py --config sample_configs/sample_config_dpo.yaml
# PPO
python scripts/launch_ppo_with_yaml.py --config sample_configs/sample_config_ppo.yaml

Supervised Fine-Tuning (SFT)

If you want standard SFT on real or synthetic data:

python scripts/launch_sft_with_yaml.py --config sample_configs/sample_config_sft.yaml

4. MediQ Evaluation & Benchmarking

MediQ is in src/mediq_eval/. It simulates doctor-patient interactions with an LLM question generator.

Data Conversion
Use scripts like generate_questions_post.py to convert QA files to MediQ format.
Run the Simulator

cd src/mediq_eval
python evaluate.py --model_checkpoint path/to/aligned_model

This measures question quality and final diagnostic accuracy.

5. Ranking & Human Evaluation

rank_eval/rank_eval.py: Ranks pairs of questions automatically with GPT-4 or a local LLM.
annotators/: Tools for collecting human preferences.

cd rank_eval
python run_rank_eval.py --config sample_config.yaml

How to Run

example_run.sh files are provided in the training, mediq_eval, and rank_eval directories.

Citation & Acknowledgments

If you use this code or MediQ-AskDocs in your work, please cite our paper:

@misc{li2025aligningllmsaskgood,
      title={Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning},
      author={Shuyue Stella Li and Jimin Mun and Faeze Brahman and Jonathan S. Ilgen and Yulia Tsvetkov and Maarten Sap},
      year={2025},
      eprint={2502.14860},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.14860},
}

Thanks to r/AskDocs for their publicly shared Q&A data.
This project uses code from [OpenRLHF].
See the paper for more technical details.

Shield:

This work is licensed under a Creative Commons Attribution 4.0 International License.

Name	Name	Last commit message	Last commit date
Latest commit stellalisy user cite Feb 21, 2025 0481460 · Feb 21, 2025 History 7 Commits
data	data	initial codebase	Feb 18, 2025
src	src	initial codebase	Feb 18, 2025
LICENSE	LICENSE	Create LICENSE	Feb 19, 2025
README.md	README.md	cite	Feb 21, 2025
environment.yml	environment.yml	initial codebase	Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALFA: Aligning LLMs to Ask Good Questions

Table of Contents

Key Directories

0. Installation & Environment

1. Data Preparation

2. Counterfactual Generation

3. The ALFA Framework

Reward Model Training

DPO or PPO Alignment

Supervised Fine-Tuning (SFT)

4. MediQ Evaluation & Benchmarking

5. Ranking & Human Evaluation

How to Run

Citation & Acknowledgments

About

Releases

Packages

Languages

License

stellalisy/alfa

Folders and files

Latest commit

History

Repository files navigation

ALFA: Aligning LLMs to Ask Good Questions

Table of Contents

Key Directories

0. Installation & Environment

1. Data Preparation

2. Counterfactual Generation

3. The ALFA Framework

Reward Model Training

DPO or PPO Alignment

Supervised Fine-Tuning (SFT)

4. MediQ Evaluation & Benchmarking

5. Ranking & Human Evaluation

How to Run

Citation & Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages