- Training
- Baseline:
main.py
with baseline set to true (alternativelyrun_main.sh
) - Subgraph:
main.py
(alternativelyrun_main.sh
)
- Baseline:
- Inference:
- Baseline:
inference_baseline.py
(alternativelyrun_inference_baseline.sh
) - Subgraph:
inference.py
(alternativelyrun_inference.sh
)
- Baseline:
- Saved Models: Models stored in the following directory:
./save/
Refer to the following .sh
files for examples:
- Training (both subgraph and baseline):
run_main.sh
- Inference for baselines:
run_inference_baseline.sh
- Inference for subgraphs:
run_inference.sh
Refer to the csv file: dataset_info.csv
pip install -r requirements.txt
dataset
:- Dataset name
- Node Classification: cora, citeseer, pubmed, dblp, Physics
- Node Regression: chameleon, squirrel, crocodile
- Graph Classification: ENZYMES, AIDS, PROTEINS
- Graph Regression: QM9, ZINC (subset)
- Dataset name
experiment
: {fixed, random, few}- Parameter specific to Node Classification for splitting nodes into train, val and test sets.
- fixed: cora, citeseer, pubmed
- few: cora, citeseer, pubmed, dblp, Physics
- random: dblp, Physics
- Parameter specific to Node Classification for splitting nodes into train, val and test sets.
runs
: default = 20- Number of times to run node-level task
baseline
: default = True- To train the baseline model
train_fitgnn
: default = False- To train the FIT_GNN model
- Note: If both
baseline
andtrain_fitgnn
are set to be true, thentrain_fitgnn
will be considered.
exp_setup
: {Gc_train_2_Gs_infer, Gs_tran_2_Gs_infer, Gc_train_2_Gs_train}- Type of experiment setup to run
- Gc_train_2_Gs_infer: Train and val on Gc >> Test on Gs
- Gs_train_2_Gs_infer: Train, val and test on Gs
- Gc_train_2_Gs_train: Train and val on Gc >> transfer learnt weights >> Train, val and test on Gs
- Type of experiment setup to run
extra_node
: {True, False}- Boolean parameter to train model by incorporating extra nodes.
cluster_node
: {True, False}- Boolean parameter to train model by incorporating cluster nodes.
coarsening_ratio
: [0, 1]- Extent of coarsening, 0 implying fewer subgraphs created and more nodes in each subgraph while 1 indicating large number of subgraphs created and fewer number of nodes in each subgraph.
coarsening_method
: {variation_neighborhoods, algebraic_JC, affinity_GS, kron}- Method used to coarsen graphs into subgraphs.
output_dir
:- Directory to save best model.
task
: {node_cls, node_reg, graph_cls, graph_reg}- Type of node-level or graph-level task being performed.
multi_prop
: {True, False}- Boolean parameter specific to QM9
dataset
for Node Regression task. Should be set to True while performing experiments using QM9, else False.
- Boolean parameter specific to QM9
property
: {0, 1, ... , 18}- Parameter specific to QM9
dataset
for Node Regression task. Should be given one of the 19 targets for prediction.
- Parameter specific to QM9
hidden
: default = 512- Number of nodes in hidden layers of GNN
epochs1
: default = 100- Parameter specific to Gc_train_2_Gs_infer
exp_setup
. Number of epochs to train on Gc.
- Parameter specific to Gc_train_2_Gs_infer
epochs2
: default = 300- Parameter specific to Gs_train_2_Gs_infer
exp_setup
. Number of epochs to train on Gs.
- Parameter specific to Gs_train_2_Gs_infer
num_layers1
: default = 2- Parameter specific to Gc_train_2_Gs_infer
exp_setup
. Number of layers in Gc training model.
- Parameter specific to Gc_train_2_Gs_infer
num_layers2
: default = 2- Parameter specific to Gs_train_2_Gs_infer
exp_setup
. Number of layers in Gs training model.
- Parameter specific to Gs_train_2_Gs_infer
train_ratio
: [0, 1], default = 0.3- Parameter specific to graph-level tasks. Ratio of graphs reserved for training to total number of graphs in dataset.
val_ratio
: [0, 1], default = 0.2- Parameter specific to graph-level tasks. Ratio of graphs reserved for validation to total number of graphs in dataset.
use_community_detection
: default = False- Leiden algorithm is used to detect the top k communities to construct a proxy graph of a large graph.