NanoRMS: predicting NANOpore Rna Modification Stoichiometry

Below you'll find information how to retrieve pre-read features and estimate modification stoichiometry from Nanopore raw data.

First install all dependencies

from conda: conda install hdf5 minimap2
from pip: pip install jupyterlab matplotlib numba numpy ont-fast5-api ont-tombo[full] openpyxl pandas seaborn scipy sklearn
manually: nanopolish

Then download the data

# downsampled Fast5 for CC, rRNA & ncRNA (7GB)
wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read/guppy3.0.3.hac -q --show-progress -r -c -nc -np -nH --cut-dirs=6 --reject="index.html*"

# pre-computed BAMs for mRNA (42GB) - this is only needed if you wish to analyse yeast mRNA for pseudoU
wget https://public-docs.crg.es/enovoa/public/lpryszcz/src/nanoRMS/per_read.bam/bams -q --show-progress -r -c -nc -np -nH --cut-dirs=6 --reject="index.html*"

CC, rRNA and ncRNA are downsampled to up to 1,000 reads per each reference in order to facillicate reasonable size and speed-up analysis. It wasn't feasible to downsample mRNA dataset, thus you'll need to download it from SRA (see paper for accession).

Retrieve per-read features from all samples.

# for rRNA and CC # ~10 minutes
./get_features.py --rna -f cc_yeast_rrna.fa -t 6 -i guppy3.0.3.hac/{CC,*snR,*wt}*/workspace/*.fast5

# for yeast ncRNA # ~10 minutes
./get_features.py --rna -f guppy3.0.3.hac/Saccharomyces_cerevisiae.R64-1-1_firstcolumn.ncrna.fa -t 6 -i guppy3.0.3.hac/*WT??C/workspace/*.fast5
# and link output files
cd guppy3.0.3.hac && for f in *_WT??C/workspace/batch0.fast5.bam*; do d=`echo $f|cut -f1 -d'/'`; ln -s $f $d.`basename $f|cut -f3- -d"."`; done && ../

BAM files from yeast mRNAs Pus1/4 KO and heat-shock are already preprocessed. For the raw Fast5 files check the paper.

Run nanopolish if you wish to compare it with tombo. You can skip this step if you don't want to compare the tools.

ref=cc_yeast_rrna.fa 
for d in guppy3.0.3.hac/*_{snR*,wt}/; do
 if [ ! -s $d/fq.gz.bam.events.gz ]; then
  echo `date` $d;
  cat $d/*.fastq.gz > $d/fq.gz;
  nanopolish index -d $d $d/fq.gz 2> /dev/null;
  minimap2 -ax map-ont $ref $d/fq.gz 2> /dev/null | samtools sort -o $d/fq.gz.bam;
  samtools index $d/fq.gz.bam;
  nanopolish eventalign -n -t 6 -q 10 --progress --signal-index --scale-events --reads $d/fq.gz --bam $d/fq.gz.bam --genome $ref | gzip > $d/fq.gz.bam.events.gz;
 fi;
done; date

Finally you can analyse the data using one of the Jupyter Notebooks:

rRNA_density.ipynb: plot feature densities for rRNA (Fig S6)
rRNA_stoichometry.ipynb: benchmarking of methods and features for varying stoichiometries (Fig S5C)
and estimation of modification frequency across all samples (Fig 3J)
rRNA_tombo_vs_nanopolish.ipynb: comparison of Nanopolish and Tombo (Fig S5 A-B)
mRNA.ipynb: prediction of RNA modification stoichiometries in mRNAs (Fig 6)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NanoRMS: predicting NANOpore Rna Modification Stoichiometry

Files

README.md

Latest commit

History

README.md

File metadata and controls

NanoRMS: predicting NANOpore Rna Modification Stoichiometry