Skip to content

DUTlf/IDGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation

Intruduction

Item Discrimination (ID) theory, which is widely used in educational assessment, measures the ability of individual test items to differentiate between high and low performers. Our data synthesis framework prioritizes both breadth and specificity. It can generate prompts that comprehensively evaluate the capabilities of LLMs while revealing meaningful performance differences between models, allowing for effective discrimination of their relative strengths and weaknesses across various tasks and domains.

Background

Research and applications in the field of Natural Language Processing (NLP) have made significant progress, with large language models (LLMs) performing exceptionally well across various tasks, and data serving as the foundation for model performance. Model evaluation plays a crucial role in the development of LLMs, as it guides the iterative improvements during training, enables the selection of the best model variations, and facilitates their deployment in real-world applications. However, obtaining such datasets often requires substantial human and time investment, and issues like poor data quality are frequently encountered. Therefore, it is necessary to explore automated methods for generating evaluation datasets.

Pipeline

The pipeline is illustrated in the following figure:

Pipeline OverviewFirstly, we handcraft a batch of seed data, dividing it into math category and general text category. Next, we generate a batch of dataset through "instruction gradient". For instructions in the general text category, we generate responses using a LLM, then generate new instructions through "response gradient", i.e., propose new questions based on the response. For problems in the math category, we check them through CoT check, and apply self-correct according to the CoT check’s feedback.

Citation

If you use the related theories or methods from our work or utilize our data, feel free to cite us.

@misc{lin2024idgenitemdiscriminationinduced,
      title={IDGen: Item Discrimination Induced Prompt Generation for LLM Evaluation}, 
      author={Fan Lin and Shuyi Xie and Yong Dai and Wenlin Yao and Tianjiao Lang and Zishan Xu and Zhichao Hu and Xiao Xiao and Yuhong Liu and Yu Zhang},
      year={2024},
      eprint={2409.18892},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.18892}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages