searches for candidate biomarkers in RNA Sequence data
Author: Andrew E. Davidson
set up conda env with required packages
$ conda create --name findBiomarkers --file requirements.txt
you may need to install tensorflow and jupyter notebooks manually as follows
$ conda activate findBiomarkes
(findBiomarkers) $ pip install tensorflow
(findBiomarkers) $ pip install -q git+
(findBiomarkers) $ conda install -c conda-forge notebook
(findBiomarkers) $ conda install -c conda-forge jupyter_contrib_nbextensions
- clone
- download the required data set
- you can find a copy at s3://
- or run Rob Curries' ingest notebook
cd ~workSpace/UCSC/findBiomarkers
conda activate findBiomarkers
export PYTHONPATH="${PYTHONPATH}:`pwd`/src"
cd src/test
python -m unittest discover .
Jupyter notebooks
- basic exploration to get an idea of how to train a classifier
- checks to see if data set is balanced. Use to suggest data sub sets to train with
- binary classifier
- How well does model work
- used to develop dataUtilities/
- test dataUtilities/
Depecated Tree house notebooks The use a data set the compins the TCGA-target-GTex data sets with tree house childhood cancer
lungCancerClassifierExploration-TreeHouse.ipynb lungCancerClassifier-TreeHouse.ipynb lungCancerClassifierEvaluation-Treehouse.ipynb