-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
175 lines (115 loc) · 6.99 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# SeedMatchR
<!-- badges: start -->
[](https://github.com/tacazares/SeedMatchR/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of SeedMatchR is to help users identify potential seed-mediated effects in their RNAseq data.
## Installation
You can install the development version of SeedMatchR from [GitHub](https://github.com/) or the stable build from CRAN.
```{r eval = FALSE}
# Install from GitHub
install.packages("devtools")
devtools::install_github("tacazares/SeedMatchR")
```
### Troubleshooting installation issues
There have been some issues with installing `SeedMatchR` on Mac OS and Linux, especially if you use the most recent version of R (≥ 4.3). One potential suggestion is to use the version of R that was used in development (4.1.2), though `SeedMatchR` should work on the most recent version as well.
In addition, some users have issues with `ggmsa`. The package `ggmsa` requires the installations of the `msa` package to work. There have been issues with using the `plot_seeds()` function that cause errors due to a missing loaded package. To fix this, users must install and load the `msa` package from CRAN, in addition to `SeedMatchR`.
`SeedMatchR` has been installed and tested on the most recent builds of R for Mac OS, Windows, and Linux. The results of the build tests are found here: https://github.com/tacazares/SeedMatchR/actions/runs/6634297920
The file containing the working versions of each dependency is listed in the GitHub actions logs. For example, the most recent Mac OS build dependencies that work can be found in this [log](https://github.com/tacazares/SeedMatchR/actions/runs/6634297920/job/18023503364#step:5:5857).
## Quick start example with public siRNA data
```{r include = FALSE}
# Import library
library(SeedMatchR)
library(msa)
```
This example uses the siRNA sequence, D1, targeting the Ttr gene in rat liver from the publication:
Schlegel MK, Janas MM, Jiang Y, Barry JD, Davis W, Agarwal S, Berman D, Brown CR, Castoreno A, LeBlanc S, Liebow A, Mayo T, Milstein S, Nguyen T, Shulga-Morskaya S, Hyde S, Schofield S, Szeto J, Woods LB, Yilmaz VO, Manoharan M, Egli M, Charissé K, Sepp-Lorenzino L, Haslett P, Fitzgerald K, Jadhav V, Maier MA. From bench to bedside: Improving the clinical safety of GalNAc-siRNA conjugates using seed-pairing destabilization. Nucleic Acids Res. 2022 Jul 8;50(12):6656-6670. doi: 10.1093/nar/gkac539. PMID: 35736224; PMCID: PMC9262600.
The guide sequence of interest is 23 bp long and oriented 5' -\> 3'.
```{r}
# siRNA sequence of interest targeting a 23 bp region of the Ttr gene
guide.seq = "UUAUAGAGCAAGAACACUGUUUU"
```
### Load rat specific annotation data.
We use `AnnotationHub` to derive the `GTF` and `DNA` sequence files for the species of interest. Once you have derived the annotations, you could save them as an Rdata object to increase the speed of loading the datasets.
#### Load annotation databases
```{r echo = T, results = 'hide', message=FALSE, warning=FALSE, error=FALSE}
# Load the species specific annotation database object
anno.db <- load_species_anno_db("rat")
```
#### Extract features and sequences of interest from annotations
We will use the annotations to derive the features and feature sequences that we want to scan for each gene.
```{r}
features = get_feature_seqs(anno.db$tx.db, anno.db$dna, feature.type = "3UTR")
```
### Prepare DESEQ2 Results
The test data that is provided with `SeedMatchR` was derived from the 2022 publication by Schlegel et al. The data set represents a DESeq2 analysis performed on rat liver that had been treated with Ttr targeting siRNA. We will use this example to explore seed mediated activity.
#### Download data (only need to perform once, can skip to loading if done)
We start by downloading the example data set. This function will download three files from the GEO accession [GSE184929](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE184929). These files represent three samples with different siRNA treatments at two dosages.
```{r echo = T, results = 'hide', message=FALSE, warning=FALSE, error=FALSE}
get_example_data("sirna")
```
#### Load example data
We can load the example data into the environment.
```{r}
sirna.data = load_example_data("sirna")
```
The DESeq2 results are available through the names `Schlegel_2022_Ttr_D1_30mkg`, `Schlegel_2022_Ttr_D4_30mkg` and `Schlegel_2022_Ttr_D1_10mkg`. The data set name is long, so it will be renamed to `res`.
```{r}
res <- sirna.data$Schlegel_2022_Ttr_D1_30mkg
```
#### Filter example results
The DESeq2 results file is then filtered. The function `filter_deseq()` can be used to filter a results file by log2FoldChange, padj, baseMean, and remove NA entries.
```{r}
# Dimensions before filtering
dim(res) # [1] 32883 6
# Filter DESeq2 results for SeedMatchR
res = filter_deseq(res, fdr.cutoff=1, fc.cutoff=0, rm.na.log2fc = T)
# Dimensions after filtering
dim(res) # [1] 13582 8
```
### Counting seed matches in transcripts
You can perform a seed match for a single seed using the `SeedMatchR()` function.
```{r}
res = SeedMatchR(res, anno.db$gtf, features$seqs, guide.seq, "mer7m8")
head(res)
```
#### Match multiple seeds
You can perform seed matching for all available seeds using a for loop. The results will be appended as a new column to the results data frame.
```{r}
for (seed in c("mer8", "mer6", "mer7A1")){
res <- SeedMatchR(res, anno.db$gtf, features$seqs, guide.seq, seed.name = seed)
}
head(res)
```
### Comparing the expression profiles of seed targets to background
Many factors that perturb gene expression, like miRNA, show cumulative changes in their targets gene expression. Cumulative changes in the profile of genes expression can be visualized and tested with the emperical distribution function (ecdf) coupled with a statistical test such as the Kolmogorov-Smirnov test.
`SeedMatchR` provides functions for comparing the log2(Fold Change) of two gene sets. The function `deseq_fc_ecdf` is designed to work directly with a DESeq2 results data frame.
Required Inputs:
- `res`: DESeq2 results data frame
- `gene.lists`: A list of lists containing gene names
```{r fig.height=5, fig.width=6, out.retina=1}
# Gene set 1
mer7m8.list = res$gene_id[res$mer7m8 >= 1]
# Gene set 2
background.list = res$gene_id[res$mer7m8 == 0]
ecdf.results = de_fc_ecdf(res,
list("Background" = background.list,
"mer7m8" = mer7m8.list),
stats.test = "KS",
factor.order = c("Background",
"mer7m8"),
null.name = "Background",
target.name = "mer7m8",)
ecdf.results$plot
```