OvO-R1

The purpose of OvO-R1 is to explore the influence of using end-to-end reinforcement learning and various reward functions on the reasoning capabilities of different base models (Qwen2.5-1.5B/Qwen2.5-1.5B-Math/Qwen2.5-1.5B-Instruct).

Feature

Qwen2.5-1.5B/Qwen2.5-1.5B-Math/Qwen2.5-1.5B-Instruct scale model RL training
We use 0.75k dataset for fast train loop, more experiments on large scale datasets is around the corner
We release wandb log for comparison between difference base models using GRPO
We are exploring the impact of various reward functions on these models

Installation

conda create -n ovo_r1 python=3.11
conda activate ovo_r1

and

pip install -r requirements.txt

Results

Model	OvO-R1	OvO-R1-Math	OvO-R1-Instruct
Base Model	Qwen2.5-1.5B	Qwen2.5-1.5B-Math	Qwen2.5-1.5B-Instruct
Dataset_mini	X-R1-750	X-R1-750	X-R1-750
Dataset_middle	-	-	-
Dataset_large	-	-	-
Config: recipes	OvO_R1_config.yaml	OvO_R1_math_config.yaml	OvO_R1_instruct_config.yaml
num_generations	8	8	8
max_completion_length	1024	1024	1024
num_train_epochs	3	3	3

Wandb Log

Aha Moment

OvO_r1

OvO_r1_math

OvO_r1_instruct

Training

To train the proposed method, run the following commands:

OvO_r1

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/zero3.yaml --num_processes=3 src/ovo_r1/grpo.py --config recipes/OvO_R1_config.yaml > ./output/ovo_r1.log

OvO_r1_math

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/zero3.yaml --num_processes=3 src/ovo_r1/grpo.py --config recipes/OvO_R1_math_config.yaml > ./output/ovo_r1_math.log

OvO_r1_instruct

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/zero3.yaml --num_processes=3 src/ovo_r1/grpo.py --config recipes/OvO_R1_instruct_config.yaml > ./output/ovo_r1_instruct.log

Contact

Our email is xuzhaoli2001@gmail.com and xuchenli1030@gmail.com

Any discussions and suggestions are welcome!

Acknowledge

Thanks for Open-R1, TRL, X-R1!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
recipes		recipes
src/ovo_r1		src/ovo_r1
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OvO-R1

Feature

Installation

Results

Wandb Log

Aha Moment

OvO_r1

OvO_r1_math

OvO_r1_instruct

Training

OvO_r1

OvO_r1_math

OvO_r1_instruct

Contact

Acknowledge

About

Releases

Packages

Contributors 2

Languages

Xuchen-Li/OvO-R1

Folders and files

Latest commit

History

Repository files navigation

OvO-R1

Feature

Installation

Results

Wandb Log

Aha Moment

OvO_r1

OvO_r1_math

OvO_r1_instruct

Training

OvO_r1

OvO_r1_math

OvO_r1_instruct

Contact

Acknowledge

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages