Analysis scripts for light sheet microscopy and the cerebellar tracing project using a slurm based computing cluster.
Includes three-dimensional CNN with a U-Net architecture (Gornet et al., 2019; K. Lee, Zung, Li, Jain, & Sebastian Seung, 2017) with added packages developed by Kisuk Lee (Massachusetts Institute of Technology), Nick Turner (Princeton University), James Gornet (Columbia University), and Kannan Umadevi Venkatarju (Cold Spring Harbor Laboratories).
- Things you will need to do beforehand:
- Elastix needs to be compiled on the cluster - this was challenging for IT here and suspect it will be for your IT as well.
- After downloading this package onto your data server (where the cluster has access to it), you will need to install the following depencies. I suggest using an python environment to do this.
- This package was made for linux/osx, not windows. If running windows I would suggest using a virtual machine. (1) Download Virtual Box (2) Download Linux Ubuntu (3) Install the VM machine
Create an anaconda python environment (Install anaconda if not already):
I suggest naming the environment 'lightsheet' (in python 3.7+) to help with setup.
$ conda create -n lightsheet python=3.7.3
$ pip install opencv-python scikit-image==0.15.0 scikit-learn seaborn tqdm numba natsort tifffile numpy==1.20.2 scipy pandas h5py==2.9.0 SimpleITK matplotlib futures xvfbwrapper xlrd openpyxl cython tensorboardX torch torchvision tensorflow
If on a local machine:
$ sudo apt-get install elastix
$ sudo apt-get install xvfb
If on a local machine, make sure you have all the boost libraries installed (important for working with torms3's DataTools)
$ sudo apt-get install libboost-all-dev
Navigate to tools/conv_net
and clone the necessary C++ extension scripts for working with DataProvider3:
$ git clone
Go to the dataprovider3 and DataTools directories in tools/conv_net
and (making sure you have your new lightsheet conda environment activated) run (for each directory):
$ python install
Then go to the augmentor directory in tools/conv_net
and (making sure you have your new lightsheet conda environment activated) run:
$ pip install -e .
- [Download] and unpack(])
$ bash
- Modify Path in ~/.bashrc:
export PATH="<path/to/software>TeraStitcher-Qt4-standalone-1.16.11-Linux/bin:$PATH"
- Check to see if successful
$ which terastitcher
- Need to load anacondapy 5.3.1 on cluster (something like):
module load anacondapy/5.3.1
- Need to load elastix on cluster (something like):
module load elastix/4.8
- Need to then activate your python environment where everything is installed (if your enviroment is named 'lightsheet' then you do not need to change this):
. activate <<<your python environment>>>
- Check to make sure your slurm job dependecies and match structure is similar to what our cluster uses.
- Each of these needs the same changes as file, e.g.
module load anacondapy/5.3.1
module load elastix/4.8
. activate <<<your python environment>>>
- Check/change the resource allocations and email alerts at the top of each .sh file based on cluster and settings
- Add your paths for BOTH the cluster and local machinery
- main GPU-based scripts are located in the pytorchutils directory
--> training- lines 64-98: modify data directory, train and validation sets, and named experiment directory (in which the experiment directory of logs and model weights is stored)
--> inference- lines 57 & 65: modify experiment and data directory
--> large-scale inference- lines 82 & 90: modify experiment and data directory
- if working with a slurm-based scheduler:
- modify
- use
python pytorchutils/ -h
for more info on command line arguments
- modify parameters (stride, window, # of iterations, etc.) in the main parameter dictionaries
--> CPU-based pre-processing and post-processing- output is a "3dunet_output" directory containing a '[brain_name]_cell_measures.csv'
- if working with a slurm-based scheduler,
--> chunks full sized data from working processed
--> reconstructs and uses connected components to find cell measures- these need the same changes as
file, e.g.
module load anacondapy/5.3.1
. activate <<<your python environment>>>
- Open
- For each brain modify:
- NOTE we've noticed that elastix (registration software) can have issues if there are spaces in path name. I suggest removing ALL spaces in paths.
- Then, I suggest, using a local machine, run 'step 0' (be sure that
is edited is before):
preprocessing.generateparamdict(os.getcwd(), **params)
if not os.path.exists(os.path.join(params['outputdirectory'], 'lightsheet')):
shutil.copytree(os.getcwd(), os.path.join(params['outputdirectory'], 'lightsheet'),
- why: This generates a folder where data will be generated, allowing to run multiple brains on the cluster at once.
- then using the cluster's headnode (in the new folder's lightsheet directory generated from the previous step) submit the batch job:
file to be used to submit to a slurm scheduler- this can change depending on scheduler+cluster but generally batch structure requires 2 variables to pass to
= controlling which 'step' to runjobid
= controlling which the jobid (iteration) of each step
- Steps:
: set up dictionary and save; requires a single job (jobid=0)1
: process (stitch, resize) zplns, ensure that 1000 > zplns/slurmfactor. typically submit 80 jobs for LBVT (jobid=0-80).2
: resample and combine; typically submit 3 jobs (requires 1 job/channel; jobid=0-3)3
: registration via elastix
(will add to this)
file to be used to manage the parallelization to a SLURM cluster- inputdictionary and params need to be changed for each brain
- the function
REQUIRES MODIFICATION for both your local machine and cluster. This function handles different paths to the same file server. - generally the process is using a local machine, run step 0 (be sure that files are saved *BEFORE( running this step) to generate a folder where data will be stored
- then using the cluster's headnode (in the new folder's lightsheet directory generated from the previous step) submit the batch job:
file to be used to manage the parallelization of CNN preprocessing to a SLURM cluster- params need to be changed per cohort.
- see the tutorial for more info.
tools: convert 3D STP stack to 2D representation based on colouring
- imageprocessing:
: functions use to preprocess, stitch, 2d cell detect, and save light sheet images
- analysis:
: simple function used to generate atlas list of structures in coordinate space- other functions useful when comparing multiple brains that have been processed using the pipeline
- imageprocessing:
, image used to generate registration visualizationallen_id_table.xlsx
, list of structures from Allen Brain Atlas used to determine anatomical correspondence of xyz location.
- folder consisting of elastix parameter files with prefixes
to specify application order
- folder consisting of elastix parameter files with prefixes
- demo script to run training and large-scale inference
- useful to make sure the environment and modules are imported correctly
- if working with a slurm-based scheduler:
- run
within the tools/conv_net- make sure you have an environment setup under your cluster username named "3dunet" or "lightsheet" that has the dependencies described in the installation instructions
- NOTE: the environments "3dunet" and "lightsheet" are sometimes used interchangeably in all bash scripts (but represent the same environment)
- make sure you have the correct environment name in your bash scripts before executing them
- you will also need CUDA installed under your username; check with IT on how to setup CUDA properly under your cluster username
- load the modules and environment in the bash script as such:
- make sure you have an environment setup under your cluster username named "3dunet" or "lightsheet" that has the dependencies described in the installation instructions
- run
module load cudatoolkit/10.0 cudnn/cuda-10.0/7.3.1 anaconda3/5.3.1
. activate <<<your python environment>>>
- else, navigate to tools/conv_net; in the terminal, in the lightsheet environment, run:
$ python
$ cd pytorchutils/
$ python demo models/ samplers/ augmentors/ 10 --batch_sz 1 --nobn --noeval --tag demo
- output will be in a 'tools/conv_net/demo/cnn_output' subfolder (as a TIFF)