This repository contains the code used to produce results for my master thesis. I find useful to have access to the code to check the implementation describe by scientific papers, so here it is.
- Clone the repository and its submodule
datasets
1
git clone --recurse-submodules https://github.com/S1M0N38/master-thesis-code.git
- Enter the repository
cd master-thesis-code
- Create a folder or symbolic link to store experiments, i.e. training results
mkdir experiments
# This folder will become heavy by storing training results (checkpoints,
# models' outputs, etc.) so you can create where you have enough space and then
# just create a symbolic link to it:
# ln -s /path/to/experiments experiments
- Create a virtual environment with python 3.10 (check with
python -V
)
python -m venv .venv
- Activate the virtual environment
source .venv/bin/activate
- Install the requirements
python -m pip install -r requirements.txt
- Download the datasets
- Create symbolic links to the datasets
# Symbolic to CIFAR100
ln -s path/to/cifar-100-python datasets/datasets/CIFAR100/inputs/cifar-100-python
# Symbolic to iNaturalist19
# ln -s path/to/iNaturalist19/train datasets/datasets/iNaturalist19/inputs/train
# ln -s path/to/iNaturalist19/val datasets/datasets/iNaturalist19/inputs/val
# ln -s path/to/iNaturalist19/test datasets/datasets/iNaturalist19/inputs/test
# Symbolic to tieredImageNet
# ln -s path/to/tieredImageNet/train datasets/datasets/tieredImageNet/inputs/train
# ln -s path/to/tieredImageNet/val datasets/datasets/tieredImageNet/inputs/val
# ln -s path/to/tieredImageNet/test datasets/datasets/tieredImageNet/inputs/test
The entire pipeline consists of the following steps:
- Train the model
- Test the model
- Evaluate testing results
- Visualize results
Training step require at least one GPU (mine was ...) because do it on CPU it's unbearably slow. Assuming that you have activated the virtual environment you can train a model using a configuration file with:
python "train.py" "configs/CIFAR100/xe-onehot.toml"
# This train a EfficientNetB0 using CrossEntropy and onehot encoding
# on CIFAR100 dataset. Use other .toml file in configs or define your own
Everything about training is define in the TOML
configuration file, whose
key/values are used to dynamically initialize model, dataloaders, metrics, etc.
(This project is based on [π₯]
template, so take a look at that to understand how it works under the hood)
If training successfully started, a new directory is created inside
experiments/CIFAR100
with the following naming scheme:
{MONTHDAY}_{HOURMINUTE}_{CONFIGHASH}_{NAME}
- The first part it's contains date and time so it's easy to sort various experiments by creation time.
CONFIGHASH
is the hash of the configuration file so it's easy to quickly group different experiments with exactly the same configuration.NAME
is the name of the experiment define in the TOML file with the keyname
.
For example
0707_1458_8bc6fb3e_xe-onehot
βββ checkpoints
β βββ ...
βββ runs
β βββ events.out.tfevents.1688741895.hostname.localhost.3233031.0
βββ config.toml
βββ trainer.log
where config.toml
contains a copy of the configuration file specify
in the previous command.
You can track the training progress by
- following the log file:
tail -f experiments/CIFAR100/*/trainer.log
- using TensorBoard:
tensorboard --logdir experiments/CIFAR100/
Model's checkpoints (model graph and weights) will be save inside checkpoints
.
In the next step these checkpoints will be used to load the trained model in
memory.
After trained the model we want to test it, i.e. run the test dataset through the model and store results. For testing you still need GPU. The testing script will:
- Save model output
- Save features extracted from penultimate level
- Perform FGSM attack (targeted or untargeted)
- Save model output and features produced by the adversarial inputs
python "test.py" "configs/CIFAR100/xe-onehot.toml" --epsilon 0.001
This will search for all experiments in experiments
that were trained using
configs/CIFAR100/xe-onehot.toml
as configuration file and invite the user to
choose one. Then it will ask for the target of adversarial attack (suppose we
choose apple
as target).
After testing the experiment folder should contains a new directory named
results
.
0707_1458_8bc6fb3e_xe-onehot
βββ results
β βββ apple
β β βββ features-0.00100.npy
β β βββ outputs-0.00100.npy
β βββ features.npy
β βββ outputs.npy
β βββ targets.npy
βββ ...
targets.npy
is simply a numpy array containing they
of the test dataset (in the case of onehot encoding its value are simply integer number).outpus.npy
ansfeatures.npy
respectively contains the model outputs and features obtained by feeding the model with the images from datasets.{TARGET}/features-{EPSILON}.npy
and{TARGET}/features-{EPSILON}.npy
are the model's outputs and features in the case of the adversarial attack. If the attack was untargeted,{TARGET}
is_
.
Footnotes
-
Alternatively, you can git clone
https://github.com/S1M0N38/master-thesis-datasets
and create symbolic link β©