LogADReft

This repository contains the code used in the exploration study to compare LoRA and ReFT for log anomaly detection.

Prerequisite

Python 3.11
Install required packages:
```
pip install -r requirements.txt
```
Ensure you have the dataset from Le et al. in the logs_dataset directory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev

Raw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.
Download Llama3 - https://github.com/meta-llama/llama3 and refer to Download section. Convert to Hugging Face by using the conversion script in this link https://huggingface.co/docs/transformers/en/model_doc/llama3. Get the 8B model and store the weights in llama3HF folder

Process Dataset

Navigate to the Data Loader Script
- Go to data_process_logs/data_loader.py.
Modify the Dataset Setting
- Change the line dataset = "BGL" to one of the following options: "BGL", "HDFS", "Spirit", or "Thunderbird".
Adjust Settings (if required)
- You can modify the following settings as needed:
  - window_size: Default is 50.
  - step_size: Default is 50.
  - train_size: Default is 0.8.
  - is_test_train_ratio: Default is False.
Evaluate Train Ratio
- To evaluate the train ratio, set is_test_train_ratio to True and adjust train_size. Experimental settings are 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8.
Run the Data Loader Script
- Execute the following command in your terminal:
```
python data_process/data_loader.py
```

Example Configuration

Here is an example configuration for data_loader.py:

dataset = "HDFS"
window_size = 50
step_size = 50
train_size = 0.8
is_test_train_ratio = False

Note: Ensure you have the dataset from Le et al. in the logs_dataset directory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev

Raw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.

Main results and hyperparameters

Optional to set --max_n_train_example and --max_n_eval_example to limit the sample size
To adjust the rank, adjust -r to the desired rank
To adjust the intervention position, adjust -p to the desired position. Options includes fx, lx, fx+lx. Replace x with the number. E.g., f1 means the first input position, while l1 means the last input token.
Option to adjust other hyperparameters as desired

Run the Script for Llama3-ReFT
- Execute the following script:
```
./scripts/main_results/llama3_reft.sh
```
- Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird"
- Epoch -e is 3 for all "BGL", "HDFS", "Spirit", or "Thunderbird"
Run the Script for RoBERTa-ReFT
- Execute the following script:
```
./scripts/main_results/roberta_reft.sh
```
- Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
- Epoch -e is 6, 3, 3, 6 for "BGL", "HDFS", "Spirit", "Thunderbird" respectively
Run the Script for GPT2-ReFT
- Execute the following script:
```
./scripts/main_results/gpt2_reft.sh
```
- Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
- Epoch -e is 6 for all "BGL", "HDFS", "Spirit", "Thunderbird"
Run the Script for Llama3-LoRA
- Execute the following script:
```
./scripts/main_results/llama3_lora.sh
```
- Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
- Epoch -e is 3 for all "BGL", "HDFS", "Spirit", "Thunderbird"
Run the Script for RoBERTa-LoRA
- Execute the following script:
```
 ./scripts/main_results/roberta_lora.sh
```
- Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
- Epoch -e is 3, 3, 3, 9 for all "BGL", "HDFS", "Spirit", "Thunderbird" respectively
Run the Script for GPT2-LoRA
- Execute the following script:
```
 ./scripts/main_results/gpt2_lora.sh
```
- Settings for all dataset is as per the script, to adjust DATASET to the right dataset: "BGL", "HDFS", "Spirit", or "Thunderbird".
- Epoch -e is 3, 6, 6, 6 for all "BGL", "HDFS", "Spirit", "Thunderbird" respectively

Train ratio experiments

First, generate all the necessary datasets, refer to Process Dataset Then run the scripts similar to Main results and hyperparameters, with the following modifications:

Add constant TRAIN_RATIO=0.1
Edit as follows -train_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}train.pkl -eval_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}test.pkl. Note the addition of the TRAIN_RATIO constant
Adjust TRAIN_RATIO according to the dataset generated. E.g., 0.1 to 0.7 with 0.1 increments
Example for Llama3-ReFT and Llama3-LoRA given in /scripts/train_ratio
Epoch used is 3 for all experiments. The other settings are kept the same.

Unstable logs experiments

Run the Script for Llama3-ReFT
- Execute the following script:
```
 ./scripts/unstable_logs/llama3_reft.sh
```
- Settings for all dataset is as per the script, to adjust INJECTION_RATIO to one of 0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3
Run the Script for Llama3-LoRA
- Execute the following script:
```
 ./scripts/unstable_logs/llama3_lora.sh
```
- Settings for all dataset is as per the script, to adjust INJECTION_RATIO to one of 0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.3

Zero-shot experiments

First, set the train dataset by changing DATASET_TRAIN to one of "BGL", "HDFS", "Spirit", "Thunderbird". Also remove -do_eval. Ensure you have -save_model

Start training by running the script

   ./scripts/zero_shot/llama3_reft.sh

OR

   ./scripts/zero_shot/llama3_lora.sh

Once the model is finetuned, locate the model directory in results. You should see the directory in the logs
Add -my_model {$NAME_OF_MODEL} example: -my_model ./results/REFT_HDFS_llama3HF_20240831080505129560 \
Remove -do_train and add -do_eval. Also update the DATASET_TEST to the desired dataset to test on, to one of "BGL", "HDFS", "Spirit", "Thunderbird".

Other methods

They are contained in the directory /other_methods I clone the repository directly from the source and update some of the code so that they could run on the common dataset in logs_dataset

LogEmpirical

Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev

Run

python ./other_methods/LogADEmpirical/main_run.py --config_file=<config_file>
   # where `<config_file>` is the path to the configuration file.
   # e.g., python ./other_methods/LogADEmpirical/main_run.py --config_file=./config/other_methods/LogADEmpirical/HDFS/cnn.yaml

LogBERT

Source: https://github.com/HelenGuohx/logbert

Navigate to /other_methods/logbert/

Navigate to a dataset folder e.g., BGL or HDFS or Tbird or Spirit

Run

bash init.sh

Navigate back to logbert

Copy train.pkl and test.pkl from logs_dataset of the respective dataset to output/${DATASET} folder

Run

bash running_script_${DATASET}.sh

DATASET is one of bgl, hdfs, spirit, tbird

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_process_logs		data_process_logs
dataset		dataset
other_methods		other_methods
scripts		scripts
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
compute_metrics.py		compute_metrics.py
data_process.py		data_process.py
data_synthesize.py		data_synthesize.py
main.py		main.py
main_lora_only.py		main_lora_only.py
requirements.txt		requirements.txt
templates.py		templates.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogADReft

Prerequisite

Process Dataset

Example Configuration

Main results and hyperparameters

Train ratio experiments

Unstable logs experiments

Zero-shot experiments

Other methods

LogEmpirical

LogBERT

About

Releases

Packages

Languages

mala-lab/LogADReft

Folders and files

Latest commit

History

Repository files navigation

LogADReft

Prerequisite

Process Dataset

Example Configuration

Main results and hyperparameters

Train ratio experiments

Unstable logs experiments

Zero-shot experiments

Other methods

LogEmpirical

LogBERT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages