IDEA

Unified pipeline for RNA-seq data procesing:

Contribuing

Please check our Contribuing page.

Warning

All scripts should be executed from RNA-seq (main directory).

e.g. $ Rscript ./R/ReadData.R

Pre-conditions

The raw data files are already downloaded from GDC.
The raw data is organized accorgding to the Data Structure section.
The raw data has been cuantified using htseq for hg38.pXX Human genome.
R required packages according to respective section.

Post-conditions

Obtain the normalized expression matrix.
The data is stored in a Summarized Experiment object.

Data Structure

The data follows this directory structure

The data follows this directory structure
|--data
|     | - raw
|     |     |- metadata.txt
|     |     |-> manifest
|     |     |    |- experimentalcondition1.manifest
|     |     |    |- experimentalcondition1.manifest
|     |     |    ...
|     |     |    |- experimentalconditionN.manifest
|     |     |-> experimentalcondition1
|     |     |    |- ID_bla-bla.htseq.counts
|     |     |-> experimentalcondition2
|     |     |    |- ID_bla-bla.htseq.counts
|     |     |...
|     |     |-> experimentalconditionN
|     |          |- ID_bla-bla.htseq.counts
|     | - summarized_experiment                  
|           |- Out.RData
|--R
   |- ReadData.R

The md5.txt contains the md5 data for each file.
The metadata.txt file contains all metadata for each sample.
The Manifest folder contains the respective sample metadata in a tab separeted format.
The Experimental-Condition-X folder contains the respective htseq.counts data files. For example:

  Ensembl_ID.Version \t Raw_Counts
  ENSG000000003.13      25

The Summarized_Experiment folder contains the RData object.
The R directory contains the different R files.

R

requirements.R

Check if the required R and Bioconductor packages are installed
Install the R and Bioconductor missing packages

ReadData.R

Read Experimental Data
Read Biomart Data
Merge count and annotations

Conventions

Variable and function names: camel case.
Column names: lower case and underscore notation.
Experimental condition names: lower case without underscore.
Folder names: lower case with underscore.
80 characters per line.
Function definition: above.
Source code file name: lower case without underscore.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
R		R
data/raw		data/raw
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RNA-seq.Rproj		RNA-seq.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDEA

Contribuing

Warning

Pre-conditions

Post-conditions

Data Structure

R

Conventions

About

Releases

Packages

Contributors 6

Languages

License

CSB-IG/RNA-seq

Folders and files

Latest commit

History

Repository files navigation

IDEA

Contribuing

Warning

Pre-conditions

Post-conditions

Data Structure

R

Conventions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages