Process-Supervised LLM Recommenders via Flow-guided Tuning

This repository hosts the official PyTorch-based implementation of the method presented in the paper "Process-Supervised LLM Recommenders via Flow-guided Tuning".

Installation

To install the project, follow these steps:

Clone this git repository and change directory to this repository.
Create a conda environment and activate.

conda create --name Flower python=3.9 -y

conda activate Flower

Install the required modules from pip.

pip install -r requirements.txt

Data processing

Due to file size limitations of GitHub, the files of training set are not uploaded to the repository, other files are available. The following steps describe our data processing procedure, using the Video Games dataset as an example.

Download the dataset

wget https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/categoryFiles/Video_Games.json.gz

wget https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/metaFiles2/meta_Video_Games.json.gz

Unzip

gunzip Video_Games.json.gz

gunzip meta_Video_Games.json.gz

cd /with_history

Process for BIGRec and IFairLRS

python ./code/process.py --category "Video_Games"

Process for SASRec

bash to_SASRec.sh

Process for Flower

run the code in process.ipynb

Distribution Fitting Capability (RQ1)

To reproduce the results in RQ1, follow these steps:

Data processing

run the code in /without_history/gfn/process.ipynb

Baseline

cd /without_history/base_line

bash ./shell/train_sft_100.sh

bash ./shell/train_sft_1500.sh

bash ./shell/ppo.sh

bash ./shell/dpo.sh

Flower

cd /without_history/gfn

CUDA_VISIBLE_DEVICES=X python train.py task=movie_all_param_1.5B_100 device=gpu > movie_1.5B_0.00001_0.05.out &

CUDA_VISIBLE_DEVICES=X python train.py task=movie_all_param_3B_1500 device=gpu > movie_3B_0.00001_0.4.out &

Next-item Recommendation Results (RQ2)

To reproduce the results in RQ2, follow these steps:

cd /with_history

Train SASRec

bash run_SASRec.sh

Train and evaluate BIGRec

bash run_sft.sh

bash evaluate_sft.sh

Train Flower

bash run_sft-gfn_logp_div_s.sh

Train IFairLRS

bash item_side_reweight.sh

Flower as a Reference Policy (RQ3)

To reproduce the results in RQ3, follow these steps:

cd /with_history/dpo

Data processing

run the code in dpo_dataset.ipynb

Due to file size limitations of GitHub, the files of training set are not uploaded to the repository, other files are available.

BIGRec as a Reference Policy

bash dmpo.sh

bash sdpo.sh

bash ppo.sh

bash rosedpo.sh

Flower as a Reference Policy

bash dmpo_gfn.sh

bash sdpo_gfn.sh

bash ppo_gfn.sh

bash rosedpo_gfn.sh

Analysis of Key Factors in Flower (RQ4)

To reproduce the results in RQ4, follow these steps:

cd /with_history

Effects of Reward Setting

bash run_sft-gfn_logp_div_s.sh

bash run_sft-gfn_logp_add_logs.sh

bash run_sft-gfn_logp.sh

Impact of Supervision Granularity

bash run_sft-gfn_logp_n.sh

Performance Varying 𝜆

bash run_sft-gfn_logp_div_s_lambda.sh

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
with_history		with_history
without_history		without_history
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Process-Supervised LLM Recommenders via Flow-guided Tuning

Installation

Data processing

Distribution Fitting Capability (RQ1)

Next-item Recommendation Results (RQ2)

Flower as a Reference Policy (RQ3)

Analysis of Key Factors in Flower (RQ4)

About

Releases

Packages

Languages

Mr-Peach0301/Flower

Folders and files

Latest commit

History

Repository files navigation

Process-Supervised LLM Recommenders via Flow-guided Tuning

Installation

Data processing

Distribution Fitting Capability (RQ1)

Next-item Recommendation Results (RQ2)

Flower as a Reference Policy (RQ3)

Analysis of Key Factors in Flower (RQ4)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages