This repository contains the code used in the exploration study to compare LoRA and ReFT for log anomaly detection.
- Python 3.11
- Install required packages:
pip install -r requirements.txt
- Ensure you have the dataset from Le et al. in the
logs_dataset
directory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/devRaw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.
- Download Llama3 - https://github.com/meta-llama/llama3 and refer to
Download
section. Convert to Hugging Face by using theconversion script
in this link https://huggingface.co/docs/transformers/en/model_doc/llama3. Get the8B
model and store the weights inllama3HF
folder
-
Navigate to the Data Loader Script
- Go to
data_process_logs/data_loader.py
.
- Go to
-
Modify the Dataset Setting
- Change the line
dataset = "BGL"
to one of the following options:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
.
- Change the line
-
Adjust Settings (if required)
- You can modify the following settings as needed:
window_size
: Default is50
.step_size
: Default is50
.train_size
: Default is0.8
.is_test_train_ratio
: Default isFalse
.
- You can modify the following settings as needed:
-
Evaluate Train Ratio
- To evaluate the train ratio, set
is_test_train_ratio
toTrue
and adjusttrain_size
. Experimental settings are0.1
,0.2
,0.3
,0.4
,0.5
,0.6
,0.7
,0.8
.
- To evaluate the train ratio, set
-
Run the Data Loader Script
- Execute the following command in your terminal:
python data_process/data_loader.py
- Execute the following command in your terminal:
Here is an example configuration for data_loader.py
:
dataset = "HDFS"
window_size = 50
step_size = 50
train_size = 0.8
is_test_train_ratio = False
Note: Ensure you have the dataset from Le et al. in the logs_dataset
directory. Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev
Raw and preprocessed datasets (including parsed logs and their embeddings) are available at Zenodo.
- Optional to set
--max_n_train_example
and--max_n_eval_example
to limit the sample size - To adjust the rank, adjust
-r
to the desired rank - To adjust the intervention position, adjust
-p
to the desired position. Options includesfx
,lx
,fx+lx
. Replace x with the number. E.g.,f1
means the first input position, whilel1
means the last input token. - Option to adjust other hyperparameters as desired
-
Run the Script for Llama3-ReFT
- Execute the following script:
./scripts/main_results/llama3_reft.sh
- Settings for all dataset is as per the script, to adjust
DATASET
to the right dataset:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
- Epoch
-e
is 3 for all"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
- Execute the following script:
-
Run the Script for RoBERTa-ReFT
- Execute the following script:
./scripts/main_results/roberta_reft.sh
- Settings for all dataset is as per the script, to adjust
DATASET
to the right dataset:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
. - Epoch
-e
is 6, 3, 3, 6 for"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
respectively
- Execute the following script:
-
Run the Script for GPT2-ReFT
- Execute the following script:
./scripts/main_results/gpt2_reft.sh
- Settings for all dataset is as per the script, to adjust
DATASET
to the right dataset:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
. - Epoch
-e
is 6 for all"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
- Execute the following script:
-
Run the Script for Llama3-LoRA
- Execute the following script:
./scripts/main_results/llama3_lora.sh
- Settings for all dataset is as per the script, to adjust
DATASET
to the right dataset:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
. - Epoch
-e
is 3 for all"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
- Execute the following script:
-
Run the Script for RoBERTa-LoRA
- Execute the following script:
./scripts/main_results/roberta_lora.sh
- Settings for all dataset is as per the script, to adjust
DATASET
to the right dataset:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
. - Epoch
-e
is 3, 3, 3, 9 for all"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
respectively
- Execute the following script:
-
Run the Script for GPT2-LoRA
- Execute the following script:
./scripts/main_results/gpt2_lora.sh
- Settings for all dataset is as per the script, to adjust
DATASET
to the right dataset:"BGL"
,"HDFS"
,"Spirit"
, or"Thunderbird"
. - Epoch
-e
is 3, 6, 6, 6 for all"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
respectively
- Execute the following script:
First, generate all the necessary datasets, refer to Process Dataset Then run the scripts similar to Main results and hyperparameters, with the following modifications:
- Add constant
TRAIN_RATIO=0.1
- Edit as follows
-train_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}train.pkl
-eval_dataset ./logs_dataset/${DATASET}/${TRAIN_RATIO}test.pkl
. Note the addition of theTRAIN_RATIO
constant - Adjust
TRAIN_RATIO
according to the dataset generated. E.g.,0.1
to0.7
with0.1
increments - Example for Llama3-ReFT and Llama3-LoRA given in
/scripts/train_ratio
- Epoch used is 3 for all experiments. The other settings are kept the same.
-
Run the Script for Llama3-ReFT
- Execute the following script:
./scripts/unstable_logs/llama3_reft.sh
- Settings for all dataset is as per the script, to adjust
INJECTION_RATIO
to one of0.01
,0.02
,0.03
,0.05
,0.1
,0.2
,0.3
- Execute the following script:
-
Run the Script for Llama3-LoRA
- Execute the following script:
./scripts/unstable_logs/llama3_lora.sh
- Settings for all dataset is as per the script, to adjust
INJECTION_RATIO
to one of0.01
,0.02
,0.03
,0.05
,0.1
,0.2
,0.3
- Execute the following script:
- First, set the train dataset by changing
DATASET_TRAIN
to one of"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
. Also remove-do_eval
. Ensure you have-save_model
- Start training by running the script
OR
./scripts/zero_shot/llama3_reft.sh
./scripts/zero_shot/llama3_lora.sh
- Once the model is finetuned, locate the model directory in
results
. You should see the directory in the logs - Add
-my_model {$NAME_OF_MODEL}
example:-my_model ./results/REFT_HDFS_llama3HF_20240831080505129560 \
- Remove
-do_train
and add-do_eval
. Also update theDATASET_TEST
to the desired dataset to test on, to one of"BGL"
,"HDFS"
,"Spirit"
,"Thunderbird"
.
They are contained in the directory /other_methods
I clone the repository directly from the source and update some of the code so that they could run on the common dataset in logs_dataset
Source: https://github.com/LogIntelligence/LogADEmpirical/tree/dev
Run
python ./other_methods/LogADEmpirical/main_run.py --config_file=<config_file>
# where `<config_file>` is the path to the configuration file.
# e.g., python ./other_methods/LogADEmpirical/main_run.py --config_file=./config/other_methods/LogADEmpirical/HDFS/cnn.yaml
Source: https://github.com/HelenGuohx/logbert
Navigate to /other_methods/logbert/
Navigate to a dataset folder e.g., BGL
or HDFS
or Tbird
or Spirit
Run
bash init.sh
Navigate back to logbert
Copy train.pkl
and test.pkl
from logs_dataset
of the respective dataset to output/${DATASET}
folder
Run
bash running_script_${DATASET}.sh
DATASET
is one of bgl
, hdfs
, spirit
, tbird