Skip to content

Comparison study of ReFT and LoRA on Log Anomaly Detection

Notifications You must be signed in to change notification settings

mala-lab/LogADReft

Repository files navigation

LogADReft

This repository contains the code used in the exploration study to compare LoRA and ReFT for log anomaly detection.

Prerequisite

Process Dataset

  1. Navigate to the Data Loader Script

    • Go to data_process_logs/data_loader.py.
  2. Modify the Dataset Setting

    • Change the line dataset = "BGL" to one of the following options: "BGL", "HDFS", "Spirit", or "Thunderbird".
  3. Adjust Settings (if required)

    • You can modify the following settings as needed:
      • window_size: Default is 50.
      • step_size: Default is 50.
      • train_size: Default is 0.8.
      • is_test_train_ratio: Default is False.
  4. Evaluate Train Ratio

    • To evaluate the train ratio, set is_test_train_ratio to True and adjust train_size. Experimental settings are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8.
  5. Run the Data Loader Script

    • Execute the following command in your terminal:
      python data_process/data_loader.py

Example Configuration

Here is an example configuration for data_loader.py:

dataset = "HDFS"
window_size = 50
step_size = 50
train_size = 0.8
is_test_train_ratio = False

Note: Ensure you have the dataset from Le et al. in the logs_dataset directory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev

Raw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.

Main results and hyperparameters

  • Optional to set --max_n_train_example and --max_n_eval_example to limit the sample size
  • To adjust the rank, adjust -r to the desired rank
  • To adjust the intervention position, adjust -p to the desired position. Options includes fx, lx, fx+lx. Replace x with the number. E.g., f1 means the first input position, while l1 means the last input token.
  • Option to adjust other hyperparameters as desired
  1. Run the Script for Llama3-ReFT

    • Execute the following script:
      ./scripts/main_results/llama3_reft.sh
    • Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird"
    • Epoch -e is 3 for all "BGL", "HDFS", "Spirit", or "Thunderbird"
  2. Run the Script for RoBERTa-ReFT

    • Execute the following script:
      ./scripts/main_results/roberta_reft.sh
    • Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
    • Epoch -e is 6, 3, 3, 6 for "BGL", "HDFS", "Spirit", "Thunderbird" respectively
  3. Run the Script for GPT2-ReFT

    • Execute the following script:
      ./scripts/main_results/gpt2_reft.sh
    • Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
    • Epoch -e is 6 for all "BGL", "HDFS", "Spirit", "Thunderbird"
  4. Run the Script for Llama3-LoRA

    • Execute the following script:
      ./scripts/main_results/llama3_lora.sh
    • Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
    • Epoch -e is 3 for all "BGL", "HDFS", "Spirit", "Thunderbird"
  5. Run the Script for RoBERTa-LoRA

    • Execute the following script:
       ./scripts/main_results/roberta_lora.sh
    • Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
    • Epoch -e is 3, 3, 3, 9 for all "BGL", "HDFS", "Spirit", "Thunderbird" respectively
  6. Run the Script for GPT2-LoRA

    • Execute the following script:
       ./scripts/main_results/gpt2_lora.sh
    • Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
    • Epoch -e is 3, 6, 6, 6 for all "BGL", "HDFS", "Spirit", "Thunderbird" respectively

Train ratio experiments

First, generate all the necessary datasets, refer to Process Dataset Then run the scripts similar to Main results and hyperparameters, with the following modifications:

  • Add constant TRAIN_RATIO=0.1
  • Edit as follows -train_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}train.pkl -eval_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}test.pkl. Note the addition of the TRAIN_RATIO constant
  • Adjust TRAIN_RATIO according to the dataset generated. E.g., 0.1 to 0.7 with 0.1 increments
  • Example for Llama3-ReFT and Llama3-LoRA given in /scripts/train_ratio
  • Epoch used is 3 for all experiments. The other settings are kept the same.

Unstable logs experiments

  1. Run the Script for Llama3-ReFT

    • Execute the following script:
       ./scripts/unstable_logs/llama3_reft.sh
    • Settings for all dataset is as per the script, to adjust INJECTION_RATIO to one of 0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3
  2. Run the Script for Llama3-LoRA

    • Execute the following script:
       ./scripts/unstable_logs/llama3_lora.sh
    • Settings for all dataset is as per the script, to adjust INJECTION_RATIO to one of 0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3

Zero-shot experiments

  1. First, set the train dataset by changing DATASET_TRAIN to one of "BGL", "HDFS", "Spirit", "Thunderbird". Also remove -do_eval. Ensure you have -save_model
  2. Start training by running the script
       ./scripts/zero_shot/llama3_reft.sh
    OR
       ./scripts/zero_shot/llama3_lora.sh
  3. Once the model is finetuned, locate the model directory in results. You should see the directory in the logs
  4. Add -my_model {$NAME_OF_MODEL} example: -my_model ./results/REFT_HDFS_llama3HF_20240831080505129560 \
  5. Remove -do_train and add -do_eval. Also update the DATASET_TEST to the desired dataset to test on, to one of "BGL", "HDFS", "Spirit", "Thunderbird".

Other methods

They are contained in the directory /other_methods I clone the repository directly from the source and update some of the code so that they could run on the common dataset in logs_dataset

LogEmpirical

Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev

Run

python ./other_methods/LogADEmpirical/main_run.py --config_file=<config_file>
   # where `<config_file>` is the path to the configuration file.
   # e.g., python ./other_methods/LogADEmpirical/main_run.py --config_file=./config/other_methods/LogADEmpirical/HDFS/cnn.yaml

LogBERT

Source: https://github.com/HelenGuohx/logbert

Navigate to /other_methods/logbert/

Navigate to a dataset folder e.g., BGL or HDFS or Tbird or Spirit

Run

bash init.sh

Navigate back to logbert

Copy train.pkl and test.pkl from logs_dataset of the respective dataset to output/${DATASET} folder

Run

bash running_script_${DATASET}.sh

DATASET is one of bgl, hdfs, spirit, tbird

About

Comparison study of ReFT and LoRA on Log Anomaly Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published