Add new paper: #51

wyzh0912 · 2025-02-23T10:50:32Z

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It

2025-02-17

arXiv

Consistency head

Innovation: The paper investigates the mechanisms behind arithmetic error detection in LLMs by identifying specific computational subgraphs, or circuits, responsible for detecting errors in arithmetic tasks. It highlights a structural dissociation between arithmetic computation and validation within these models, suggesting that this separation contributes to the models' difficulties in error detection.
Tasks: The study uses a mechanistic analysis approach, employing edge attribution patching to identify circuits in LLMs that are responsible for detecting arithmetic errors. The analysis involves generating controlled arithmetic problem prompts, both correct and with intentional errors, to examine how different parts of the model contribute to error detection.
Significant Result: The research finds that error detection circuits are structurally similar across different models and are primarily governed by attention heads termed consistency heads, which focus on surface-level alignment of numerical values. The study also shows that integrating latent activations from higher layers into lower layers can enhance models' error detection capabilities, effectively closing the validation gap.

Provide feedback