Skip to content

User Manual

Ryan edited this page Mar 21, 2018 · 10 revisions

PyBioNetFit v0.1

Introduction

PyBioNetFit is a tool for parameter fitting of models written in the BioNetGen language (BNGL). It currently runs on most Linux and macOS platforms

Installation

Python

PyBNF requires an installation of Python version 3.5 or higher. This should come built-in with most new Linux and Mac operating systems. However, we recommend installing the Anaconda python distribution for Python v3.5 or greater. Installing Anaconda facilitates managing and installing Python packages as well as maintaining multiple Python environments. Instructions for installing on various platforms can be found on the Anaconda website.

PyBNF

The pip package manager comes with Anaconda and should be used to install PyBNF from the command line.

Installing from PyPI (not yet available)

Simply type

pip install pybnf

to install the most recent version of PyBNF released on the Python Package Index.

Installing from source (developer mode)

Download the source code repository or clone the project and run from the root directory:

pip install -e .

This allows the user to make changes to the source code as desired, while still having access to the command line functionality anywhere in the filesystem

(Optional) Configuring logging on remote machines (clusters or networked computers only)

By default, PyBNF logs to the file bnf.log to maintain a record of important events in the application. When running PyBNF on a cluster, some of the logs may be written while on a node distinct from the main thread. If these logs are desired, the user must configure the scheduler to retrieve these logs.

Upon installation of PyBNF, the dependencies dask and distributed should be installed. Installing them will create a .dask/ folder in the home directory with a single file: config.yaml. Open this file to find a logging: block containing information for how distributed outputs logs. Add the following line to the file, appropriately indented:

pybnf.algorithms.job: info

where info can be any string corresponding to a Python logging level (e.g. info, debug, warning)

Installation of Simulators

PyBNF is designed primarily to work with the simulator BioNetGen, version 2. The current BioNetGen distribution includes support for both network-based simulations and network-free simulations (via the NFSim software) BioNetGen can be installed from http://www.bionetgen.org.

PyBNF will need to know the location of BioNetGen – specifically the location of the script BNG2.pl within the BioNetGen installation. This path can be included in the PyBNF configuration file (see below). A convenient alternative is to set the environment variable BNGPATH to the BioNetGen directory with the following command, where /path/to/bng2 is the path of the folder that contains BNG2.pl:

export BNGPATH=/path/to/bng2 

This setting can be made permanent as of your next login, by copying the above command into the file .bash_profile in your home directory.

Quick Start

Setting Up a Fitting Job

Model Files

Models for fitting in PyBNF are plain text files written in BioNetGen language (BNGL). Documentation for BNGL can be found at http://www.csb.pitt.edu/Faculty/Faeder/?page_id=409.

Two small modifications of a BioNetGen-compatible BNGL file are necessary to use the file with PyBNF

  1. Replace each value to be fit with a name that ends in the string “FREE”.

For example, if the parameters block in our original file was the following:

begin parameters

	v1 17
	v2 42
	v3 37
	NA 6.02e23

end parameters

the revised version for PyBNF should look like:

begin parametershttps://github.com/NAU-BioNetFit/PyBNF/wiki/User-Manual/_edit

	v1 v1__FREE__
	v2 v2__FREE__
	v3 v3__FREE__
	NA 6.02e23

end parameters

We have replaced each fixed parameter value in the original file with a “FREE” parameter to be fit. Parameters that we do not want to fit (such as the physical constant NA) are left as is.

  1. Use the “suffix” argument to create a correspondence between your simulation command and your experimental data file.

For example, if your simulation call simulate({method=>”ode”}) generates data to be fit using the data file data1.exp, you should edit your call to simulate({method=>”ode”, suffix=>”data1”})

Experimental Data Files

Experimental data file are plain text files with the extension “.exp” that contain whitespace-delimited tables of data to be used for fitting.

The first line of the .exp file is the header. It should contain the character #, followed by the names of each column. The first column name should be the name of the independent variable (e.g. “time” for a time course simulation). The rest of the column names should match the names of observables in the model file. The following lines should contain data, with numbers separated by whitespace. Use “nan” to indicate missing data. Here is a simple example of an exp file. In this case, the corresponding BNGL file should contain observables named X and Y:

#	time	X	Y
	0	5	1e4
	5	7	1.5e4
	10	9	4e4
	15	nan	6.5e4
	20	15	1.1e5

If your are fitting with the chi-squared objective function, you also need to provide a standard deviation for each experimental data point. To do so, include a column in the .exp file with "_SD" appended to the variable name. For example:

#	time	X	Y		X_SD	Y_SD
	0	5	1e4		1	2e2
	5	7	1.5e4	1.2	2e2
	10	9	4e4		1.4	4e2
	15	nan	6.5e4	nan	5e2
	20	15	1.1e5	0.9	5e2

The Configuration File

The configuration file is a plain text file with the extension “.conf” that specifies all of the information that PyBNF needs to perform the fitting: the location of the model and data files, and the details of the fitting algorithm to be run.

Several examples of .conf files are included in the examples/ folder.

Each line of a conf file has the general format config_key=value, which assigns the configuration key “config_key” to the value “value”.

The available configuration keys to be specified are detailed in the sections below.

[During development, please refer to config_documentation.txt]

Constraint files

Constraint files are plain text files with the extension ".con" that contain inequality constraints to be imposed on the outputs of the model. Such constraints can be used to formalize qualitative data known about the biological system of interest.

Each line of the .con file should contain constraint declaration consisting of three parts: an inequality to be satisfied, an enforcement condition that specifies when in the simulation time course the constraint is applied, and a weight indicating the penalty to add to the objective function if the constraint is not satisfied. The weight may be omitted and defaults to 1. The inequality and enforcement clauses are required

Inequality

The inequality can consist of any relationship (<, >, <=, or >=) between two observables, or between one observable and a constant. For example A < 5 , or A >= B. Note that < and <= are equivalent unless the min keyword is used (see Weights, below)

Enforcement

Four keywords are available to specify when the inequality is enforced.

always - Enforce the inequality at all time points during the simulation.
A < 5 always

once - Require that the inequality be true at at least one time point during the simulation.
A < 5 once

at - Enforce the inequality at one specific time point. This could be a constant time point:
A < 5 at 6 or equivalently, A < 5 at time=6

It is also possible to specify the time point in terms of another observable.
A < 5 at B=6 would enforce the inequality at the first time point such that B=6 (more exactly, the first time such that B crosses the value of 6 between two consecutive time steps)

Using similar syntax, we can specify that the constraint is enforced at every time B=6, not just the first, using the everytime keyword
A < 5 at B=6 everytime
The first keyword says that the constraint should only (this is the default behavior, so this keyword is optional)
A<5 at B=6 first

If the specified condition (B=6 in the example) is never met, then the constraint is not applied. It is often useful to add a second constraint to ensure that an "at" constraint is enforce. In this example, assuming the initial value of B is below 6, we could add the constraint B>=6 once

between - Enforce the inequality at all times between the two specified time points. The time points may be specified in the same format as with the at keyword above, and should be separated by a comma.
A < 5 between 7, B=6 would enforce the inequality from time=7 to the first time after time=7 such that B=6.

If the first condition (time=7 in the example) is never met, then the constraint is never enforced. If the second condition (B=6 in the example) is never met, then the constraint is enforced from the start time until the end of the simulation.

Weight

The weight clause consists of the weight keyword followed by a number. This number is multiplied by the extent of constraint violation to give the value to be added to the objective function. For example:
A < 5 at 6 weight 2
If the inequality A < 5 is not satisfied at time 6, then a penalty of 2*(A-5) is added to the objective function.

The min keyword indicates the minimum possible penalty to apply if the constraint is violated.
A < 5 at 6 weight 2 min 4
If the inequality A < 5 is not satisfied at time 6, the penalty is max(2*(A-5), 4). Since we used the strict < operator, the penalty of 4 is applied even if A=5 at time 6.

In some unusual cases, it is desirable to use a different observable for calculating penalties than the one used in the inequality. For example, the variable in the inequality might be a discrete variable, and it would be desirable to calculate the penalty with a corresponding continuous variable. This substitution may be made using the altpenalty keyword in the weight clause, followed by the new inequality to use for calculating the penalty.

A < 5 at B=3 weight 10 altpenalty A2<4 min 1
This constraint would check if A<5 when B reaches 3. If A >= 5 at that time, it instead calculates the penalty based on the inequality A2<4 with a weight of 10: 10*max(0, A2-4). If the initial inequality is violated but the penalty inequality is satisfied, then the penalty is equal to the min value (1 in the example), or zero if no min was declared.

Algorithms

PyBNF contains 7 fitting algorithms that I will describe here later.

Differential Evolution

How it works

A population of individuals (points in parameter space) are iteratively evaluated with an objective function. Parent individuals from the current iteration are selected to form new individuals in the next iteration. The new individual's parameters are derived by combining parameters from the parents

Running in Parallel

PyBNF offers parallel, synchronous differential evolution. In each iteration, n simulations are run in parallel, but all must complete before moving on to the next iteration. It also offers parallel, asynchronous differential evolution, in which the current population consists of m islands. Each island is able to move on to the next iteration even if other islands are still in progress. If m is set to the number of available processors, then processors will never sit idle. Note however that this might not be the most efficient thing to do.

When to use it

In our experience, differential evolution tends to be the best general-purpose algorithm, and we suggest it as a starting point for a new fitting problem if you are unsure which algorithm to choose.

Configuration options

Scatter Search

How it works

Scatter Search functions similarly to differential evolution, but maintains a smaller current population than the number of available processors. In each iteration, every possible pair of individuals are combined to propose a new individual.

Particle Swarm Optimization

How it works

In particle swarm optimization, each parameter set is represented by a particle moving through parameter space at some velocity. Each particle accelerates towards the

Running in Parallel

Particle swarm optimization is fundamentally an asynchronous, parallel algorithm. As soon as one simulation completes, that particle can calculate its next parameter set and begin a new simulation. Processors will never remain idle, and adding an arbitrarily large number of processors will continue to improve the performance of the algorithm [citation needed].

When to use it

Particle swarm optimization becomes advantageous over the other available algorithms when many processors are available (>100). Be warned that if your problem is under-constrained, this algorithm tends to choose parameters that sit on the edge of box constraints. This solution is arguably fine, but makes it very obvious to a reader that your model is under-constrained.

Simulated Annealing

Markov Chain Monte Carlo

Parallel Tempering

Simplex

Running on a cluster

PyBNF is designed to run on computing clusters that utilize a shared network filesystem, regardless of what cluster manager is used (Slurm, Torque, etc.). The user is expected to interact with the cluster manager to allocate cluster nodes for the job, and then tell PyBNF which nodes to run on.

The Dask.Distributed package, which you installed as a dependency of PyBNF, has a scheduler that we use for handling simulations in distributed computing environments (clusters). More information on Dask and Dask.Distributed

While users can likely install PyBNF using pip's --user flag, assistance from the cluster administrators may be helpful

SLURM

The user may run PyBNF interactively or as a batch job using the salloc or sbatch commands respectively. Note that the user must have set up their Python environment prior to running PyBNF on a cluster

Interactive (quickstart)

Execute the salloc -Nx command where x is an integer denoting the number of nodes the user wishes to allocate

Log in to one of the nodes with the command slogin

Load appropriate Python environment

Initiate a PyBNF fitting run

Batch

Write a shell script specifying the desired nodes and their properties according to SLURM specifications

Submit the batch job to the queueing system using the sbatch script.sh command where script.sh is the name of the shell script

TORQUE/PBS

Not yet implemented