Skip to content

HenryLau7/CFPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CFPO

Welcome! This repository provides the official implementation of our paper:
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization

Pipeline of CFPO
CFPO iteratively optimizes prompt content and format through a two-stage process, combining case-diagnosis, Monte Carlo sampling, and dynamic format exploration, including format selection with scoring system and format generation.


Key Contributions

  • Integrated Optimization: Jointly optimizes prompt content (via diagnosis and variations) and format (via UCT-based selection and LLM-guided generation).
  • Format optimization strategy: CFPO employs dynamic format optimization that generates new formats in an iterative manner and evaluates formats instance through a scoring system to select the best option.
  • Performance: CFPO consistently improves LLM performance in a measurable and effective manner.

Methodology

Intro

Teaser figure
Different models exhibit distinct format preferences and no single format consistently outperforms others across all contents. CFPO employs distinct optimization strategies tailored to the unique search spaces of content and format.


Structured Prompt Template

Structured prompt example
In CFPO, prompts are decomposed into:

  • Content Components: Task instructions, Task Detail, Output Format, Few-shot examples.
  • Format Components: Prompt renderers (overall structure) and query formats (example/query presentation).

Format Optimization

Built-in format pool
CFPO explores formats through:

  1. Initial Pool: Predefined formats.
  2. UCT Algorithm: Balances exploration of new formats and exploitation of high-scoring ones.
  3. LLM-Guided Generation: Expands the format pool using LLMs to propose novel variations.

Usage

Installation

git clone https://github.com/HenryLau7/CFPO.git  

Quick Start

bash scripts/*.sh

Key arguments:

--task #TASK NAME #\
--output_marker #LOG MARKER# \
--train_size #TRAINING SIZE# \
--minibatch_size #FOR DIAGNOSIS# \
--valid_size #VAL SIZE# \
--test_size #TEST SIZE# \
--controller #SCHEDULAR# \
--opt_llm #OPT_LLM# \
--eval_llm #EVAL_LLM# \
--vllm_pth #VLLM_LOCAL_PATH# \
--init_temperature #INIT_TEMP# \
--rounds #ROUNDS# \
--beam_size #BEAM SIZE TO MAINTAIN# \
--num_return #RETURN NUMBER OF OPTIMIZED PROMPTS# \
--num_feedbacks #NUMBER OF PROMPTS GENERATED BY DIAGNOSIS# \
--errors_per_feedback #NUMBER OF INCORRECT SAMPLE FOR DIAGNOSIS# \
--correct_per_feedback #NUMBER OF CORRECT SAMPLE FOR DIAGNOSIS# \
--apply_per_feedback #NUMBER OF SEARCHED PROMPT PER FEEDBACK# \
--num_random 1 #NUMBER OF PROMPTS GENERATED BY MONTE-CARLO SAMPLING# \
--num_format 1 #NUMBER OF PROMPTS GENERATED BY FORMAT MUTATION# \
--select_method #SELECT METHOD FOR FORMAT# \
--gpu_id 0 #SET GPU DEVICE ID## \

Citation

If you find the code useful, please cite our paper as follows:

@misc{liu2025cfpo,
      title={Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization}, 
      author={Yuanye Liu and Jiahang Xu and Li Lyna Zhang and Qi Chen and Xuan Feng and Yang Chen and Zhongxin Guo and Yuqing Yang and Cheng Peng},
      year={2025},
      eprint={2502.04295},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.04295}, 
}

Releases

No releases published

Packages

No packages published