Unified pipeline for RNA-seq data procesing:
Please check our Contribuing page.
All scripts should be executed from RNA-seq
(main directory).
- e.g.
$ Rscript ./R/ReadData.R
- The raw data files are already downloaded from GDC.
- The raw data is organized accorgding to the Data Structure section.
- The raw data has been cuantified using htseq for hg38.pXX Human genome.
- R required packages according to respective section.
- Obtain the normalized expression matrix.
- The data is stored in a Summarized Experiment object.
The data follows this directory structure
| | - raw
| | |- metadata.txt
| | |-> manifest
| | | |- experimentalcondition1.manifest
| | | |- experimentalcondition1.manifest
| | | ...
| | | |- experimentalconditionN.manifest
| | |-> experimentalcondition1
| | | |- ID_bla-bla.htseq.counts
| | |-> experimentalcondition2
| | | |- ID_bla-bla.htseq.counts
| | |...
| | |-> experimentalconditionN
| | |- ID_bla-bla.htseq.counts
| | - summarized_experiment
| |- Out.RData
|- ReadData.R
- The md5.txt contains the md5 data for each file.
- The metadata.txt file contains all metadata for each sample.
- The Manifest folder contains the respective sample metadata in a tab separeted format.
- The Experimental-Condition-X folder contains the respective htseq.counts data files. For example:
Ensembl_ID.Version \t Raw_Counts
ENSG000000003.13 25
- The Summarized_Experiment folder contains the RData object.
- The R directory contains the different R files.
- Check if the required R and Bioconductor packages are installed
- Install the R and Bioconductor missing packages
- Read Experimental Data
- Read Biomart Data
- Merge count and annotations
- Variable and function names: camel case.
- Column names: lower case and underscore notation.
- Experimental condition names: lower case without underscore.
- Folder names: lower case with underscore.
- 80 characters per line.
- Function definition: above.
- Source code file name: lower case without underscore.