Combining argumentation theories, leveraging Toulmin's argument schema and the notions of critical questions, with the test-time compute paradigm enables LLMs' higher performances on logical and mathematical tasks. In particular, the probing action of the critical questions allows the model to adjust its reasoning plan, thus effectively correcting itself in case of wrong assumptions or thinking steps. The ensuing approach, denoted as Critical-Questions-of-Thought (CQoT), is composed of a pipeline rendered herein as a Python script.
We share the results achieved by the CQoT method as detailed in our paper.
The colour-coded evals CQoT_Evals.xlsx present the scores reached by 5 LLMs, both proprietary and open source, on 40 challenging questions retrieved from MT-Bench Reasoning and Math benchmark. Each model has been tested on its baseline, as well as CoT and CQoT implementation. Scores, assigned by an LLM judge (GPT-4o), span from 1 to 10 and reflect the performance of each model on the specific query. Low-graded responses (1-4) are displayed with a red background. Middle-ranged replies (5-7) are coloured in yellow, whereas good answers (8-10) are showcased in green.
Here we can preview the outcome of the experiments we accomplished to evaluate CQoT.
Models + CQoT | MT-Bench (Reasoning) | MT-Bench (Math) | ||
---|---|---|---|---|
Standard | CoT | Standard | CoT | |
Claude Sonnet 3.5 | +4.06% | +4.68% | +5.95% | +0% |
GPT-4.0 | +1.81% | +3.04% | +1.05% | +7.26% |
Gemini 1.5-pro-001 | +5.33% | +7.88% | +10.29% | +9.04% |
Llama 3.1-70b-Instruct | +4.35% | +1.12% | +6.70% | +2.14% |
Nemotron-51b-Instruct | +8.15% | -5.19% | +4.57% | +7.02% |
Average | +4.74% | +4.48% | +5.71% | +5.09% |
If you find our paper or pipeline useful, please consider referencing it:
@misc{castagna2024criticalquestionsofthoughtsteeringllmreasoning,
title={Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying},
author={Federico Castagna and Isabel Sassoon and Simon Parsons},
year={2024},
eprint={2412.15177},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2412.15177},
}