Utility for tracking experiment execution inside Docker containers through Weights & Biases


This repository is useful for tracking experiments run within Docker containers by leveraging Weights & Biases cloud services.

In particular, this is designed around ns-3 simulations carried out via ns3-woss images, with the goal of improving tracking and reproducibility of results.

Key features:

  • Automated experiment execution and tracking

  • Configuration based on shell environment variables and YAML files

  • Support for repeated experiment runs: increases results' robustness

  • Support for hyperparameter search via Weights & Biases Sweep: simplifies finding optimal tuning

  • Parallel execution: multi-processing used to execute independent runs or different sweep agents


The following requirements must be satified to correctly take advantage of this tool:

  • Internet connection on machines for experiment execution

  • Docker CLI w/ Docker Compose plugin

  • Weights & Biases account w/ API subscription key

  • A ns-3 simulation script to track

Basic usage

Please use the following procedure for achieving the intended functionality:

  1. Clone this repository to your Internet-connected simulation machine of choice and cd into the main directory
git clone
cd docker-experiments-tracking
  1. Adjust the Docker image to use for your experiment in file docker-compose.yml (e.g. fully replace egiona/ns3-woss:u18.04-n3.37-w1.12.4)

  2. Grant execution user rights to tracking/ and tracking/ files:

chmod +x tracking/*.sh
  1. Modify the environment variabiles file vars.env as desired:
  • WANDB_API_KEY must be set to a valid Weights & Biases API key for login purposes

  • PROJ_NAME must be set to a project name, which will appear within your profile

  • (Optional) EXP_CONFIG_FILE may be set to a different YAML file; this file contains general configuration as well as single experiment run setup (see next section for more)

  • (Optional) SWEEP_CONFIG_FILE may be uncommented to run a W&B Sweep and may be set to a different YAML file; this file contains the sweep configuration (see next section for more)

  • (Optional) EXP_ARGS may be modified to specify static experiment arguments

  1. Modify the tracking/ as you desire; this should run the ns-3 simulation script and write metrics files, log file and possible artifacts to the paths defined in vars.env

  2. Launch Docker Compose and enjoy your experiments tracking!

docker compose up

Note: if you wanted to leave the experiments running and logout from your simulation machine, you could use the following instead:

nohup docker compose up &
  1. (Optional) Remove the container once the experiment has finished
docker compose down

Usage details

This tool executes Docker Compose which creates a container from the specified image. Additionally it mounts the tracking directory of this repository as a volume within the container under the path /home/tracking.

The user-provided simulation script is expected to:

  • write metrics files within the directory specified by $METRICS_DIRPATH (vars.env file);

  • optionally write a log file to the path specified by $LOGS_DIRPATH (vars.env file);

  • optionally produce artifacts within the directory specified by $ARTIFACTS_DIRPATH (vars.env file).

Python script will read metrics files and log each entry using wandb.log; log files and each other file produced inside the directory specified by $ARTIFACTS_DIRPATH are going to be pushed as artifacts.

Execution modes

This tool may be used to launch experiments in two execution modes: single experiment or sweep.

The tracking/config.yaml file specifies general Weights & Biases configuration and the single experiment mode setup.

General configuration is specified by the following reserved YAML entries:

    method:     --> name of the function to use to parse metrics
    args:       --> dictionary of metrics-parsing configuration
    method:     --> name of the function to use to parse logfiles
    args:       --> dictionary of logfile-parsing configuration
    method:     --> name of the function to use to parse artifacts
    args:       --> dictionary of artifacts-parsing configuration

Single experiment mode

Single experiment configuration is specified by tracking/config.yaml file and via the following reserved YAML entries:

  group-by:             --> grouping name for repeated runs or "auto"
  num-runs-limit:       --> upper bound for repeated runs
  num-runs-start:       --> lower bound for repeated runs
  parallel:             --> number of processes to execute runs

Repeated runs of a single experiment are achieved by iterating in the range [num-runs-start, num-runs-limit), and passing this value via the --run parameter. If a specific name is provided via group-by, runs will be considered as a group when logged to Weights & Biases; otherwise, using "auto" will generate a different ID for each run.

The rest of YAML keys are directly passed to the experiment script as arguments in the following format:

--key1=value1 ... --keyN=valueN

Sweep mode

Sweep configuration is specified by tracking/sweep.yaml file and according to Weights & Biases guidelines, with the following additional reserved YAML entries:

  agents:               --> number of parallel sweep agents to run
  runs-per-sweep:       --> number of repeated runs per sweep instance

The value of agents will be used to spin up that number of W&B sweep agents, each in its own Python process. The agent will create a sweep instance, with configuration determined by the sweep algorithm in place, and it will execute a number of experiment runs equal to the runs-per-sweep value. Each run will use the same configuration as provided by the sweep instance, only varying the --run parameter to the current run number.

Note: Repeated runs are not individually pushed to Weights & Biases. Instead, metrics are aggregated across runs and logged as metrics of a single sweep instance. By default, the aggregation method is the arithmetic mean across runs.

Note: random and bayes sweep methods may run indefinitely, validate your configuration accordingly.

Metrics format

Default metrics parsing operates on metrics exported to YAML files containing a type entry.

This repository ships with a pre-defined network-size type that is suitable for tracking network performance metrics at varying network sizes. It accepts files compliant to the following format:

type: network-size      --> Type for metrics parsing
metric-name:            --> Name of metrics tracked
x-axis:                 --> Name of X axis in a W&B Lineseries plot
y-axis:                 --> Title of Y axis in a W&B Lineseries plot
x-values:               --> List of values for the X axis
  - x1
  - x2
  - xN
y-values:               --> List of values for the Y axis
  - y1
  - y2
  - yN

This results in pushing to Weights & Biases the following information:

  • <metric-name>_at_xI with value equal to yI (for I in [1, N]);

  • <metric-name>_avg with value equal to the arithmetic mean across values in the X axis (useful for W&B Sweep metric configuration);

  • <metric-name>_series as a custom W&B Chart (Lineseries) with the usual performance vs network size shape.

The <metric-name>_series output is enabled only if the args entry in metrics (within parsing-setup configuration) contains the lineseries key and its value equals true, e.g.:

  method: "default"
  args: {lineseries: true}

Custom metrics, logfile, and artifacts handling

Additional methods and arguments interpretations may be provided in the files and

Users must take care of updating Python dictionaries at the end of the aforementioned files after implementing their own methods, or they will not be able to use their custom handling routines.

Note: Custom functions must preserve function signature, otherwise more customization is required.

Note: Apart from custom metrics types handling, it is possible to specify custom aggregation methods for repeated runs within a W&B Sweep instance. By default, simple arithmetic mean is performed.


Copyright (c) 2023 Emanuele Giona

This repository is distributed under MIT License. However, software packages, tools, and other external components used may be subject to a different license, and the license chosen for this repository does not necessarily apply to them.


