searches for candidate biomarkers in RNA Sequence data
Author: Andrew E. Davidson aedavids@ucsc.edu https://github.com/aedavids/findBiomarkers
bin/startNotesbooks.sh
set up conda env with required packages
$ conda create --name findBiomarkers --file requirements.txt
you may need to install tensorflow and jupyter notebooks manually as follows
$ conda activate findBiomarkes
(findBiomarkers) $ pip install tensorflow
(findBiomarkers) $ pip install -q git+https://github.com/tensorflow/docs
(findBiomarkers) $ conda install -c conda-forge notebook
(findBiomarkers) $ conda install -c conda-forge jupyter_contrib_nbextensions
- clone https://github.com/aedavids/findBiomarkers
- download the required data set
- you can find a copy at s3://bme-230a.santacruzintegration.com/tcga_target_gtex.h5
- or run Rob Curries' ingest notebook
cd ~workSpace/UCSC/findBiomarkers
conda activate findBiomarkers
export PYTHONPATH="${PYTHONPATH}:`pwd`/src"
cd src/test
python -m unittest discover .
Jupyter notebooks
-
lungCancerClassifierExploration.ipynb
- basic exploration to get an idea of how to train a classifier
-
- checks to see if data set is balanced. Use to suggest data sub sets to train with
-
- binary classifier
-
lungCancerClassifierEvaluation.ipynb
- How well does model work
-
TCGA_Target_GTexPrototype.ipynb
- used to develop dataUtilities/TCGA_Target_GTex.py
-
- test dataUtilities/TCGA_Target_GTex.py
Depecated Tree house notebooks The use a data set the compins the TCGA-target-GTex data sets with tree house childhood cancer
lungCancerClassifierExploration-TreeHouse.ipynb lungCancerClassifier-TreeHouse.ipynb lungCancerClassifierEvaluation-Treehouse.ipynb