This repository hosts the official PyTorch-based implementation of the method presented in the paper "Process-Supervised LLM Recommenders via Flow-guided Tuning".
To install the project, follow these steps:
-
Clone this git repository and change directory to this repository.
-
Create a conda environment and activate.
conda create --name Flower python=3.9 -y
conda activate Flower
- Install the required modules from pip.
pip install -r requirements.txt
Due to file size limitations of GitHub, the files of training set are not uploaded to the repository, other files are available. The following steps describe our data processing procedure, using the Video Games dataset as an example.
- Download the dataset
wget https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/categoryFiles/Video_Games.json.gz
wget https://datarepo.eng.ucsd.edu/mcauley_group/data/amazon_v2/metaFiles2/meta_Video_Games.json.gz
- Unzip
gunzip Video_Games.json.gz
gunzip meta_Video_Games.json.gz
cd /with_history
- Process for BIGRec and IFairLRS
python ./code/process.py --category "Video_Games"
- Process for SASRec
bash to_SASRec.sh
- Process for Flower
run the code in process.ipynb
To reproduce the results in RQ1, follow these steps:
- Data processing
run the code in /without_history/gfn/process.ipynb
- Baseline
cd /without_history/base_line
bash ./shell/train_sft_100.sh
bash ./shell/train_sft_1500.sh
bash ./shell/ppo.sh
bash ./shell/dpo.sh
- Flower
cd /without_history/gfn
CUDA_VISIBLE_DEVICES=X python train.py task=movie_all_param_1.5B_100 device=gpu > movie_1.5B_0.00001_0.05.out &
CUDA_VISIBLE_DEVICES=X python train.py task=movie_all_param_3B_1500 device=gpu > movie_3B_0.00001_0.4.out &
To reproduce the results in RQ2, follow these steps:
cd /with_history
- Train SASRec
bash run_SASRec.sh
- Train and evaluate BIGRec
bash run_sft.sh
bash evaluate_sft.sh
- Train Flower
bash run_sft-gfn_logp_div_s.sh
- Train IFairLRS
bash item_side_reweight.sh
To reproduce the results in RQ3, follow these steps:
cd /with_history/dpo
- Data processing
run the code in dpo_dataset.ipynb
Due to file size limitations of GitHub, the files of training set are not uploaded to the repository, other files are available.
- BIGRec as a Reference Policy
bash dmpo.sh
bash sdpo.sh
bash ppo.sh
bash rosedpo.sh
- Flower as a Reference Policy
bash dmpo_gfn.sh
bash sdpo_gfn.sh
bash ppo_gfn.sh
bash rosedpo_gfn.sh
To reproduce the results in RQ4, follow these steps:
cd /with_history
- Effects of Reward Setting
bash run_sft-gfn_logp_div_s.sh
bash run_sft-gfn_logp_add_logs.sh
bash run_sft-gfn_logp.sh
- Impact of Supervision Granularity
bash run_sft-gfn_logp_n.sh
- Performance Varying 𝜆
bash run_sft-gfn_logp_div_s_lambda.sh