Skip to content

Commit c5c6423

Browse files
committed
Update velocity material
1 parent e2d798b commit c5c6423

12 files changed

+221
-192
lines changed

introduction/Welcome.pptx

6 MB
Binary file not shown.

rna-velocity/rna-velocity.Rmd

+35-28
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ _RNA velocity_ (introduced by @La_Manno2018-velocyto) is one approach to address
4848
In practice, RNA velocity analyses are often summarized by plots such as those shown below (from the [scVelo tutorial](https://scvelo.readthedocs.io/DynamicalModeling.html)): on the left, a vector field overlaid on a low-dimensional embedding, visualizing the 'flow' encoded by the velocities, and on the right, phase plots illustrating single genes.
4949
We will see in this lecture how to generate such plots from raw droplet scRNA-seq data, and how to interpret the results.
5050

51-
![](figures/scvelo_example_dynmod.png)
51+
![](rna-velocity-figures/scvelo_example_dynmod.png)
5252

5353
The RNA velocity is defined as the rate of change of the mature RNA abundance in a cell, and can be estimated from scRNA-seq data by joint modeling of estimated unspliced (pre-mRNA) and spliced (mature mRNA) abundances.
5454
This exploitation of the underlying molecular dynamics of the process sets it apart from other approaches for trajectory analysis, which typically use the similarity of the estimated gene expression profiles among cells to construct a path through the observed data.
@@ -126,7 +126,7 @@ This is, in essence, the idea behind the approach taken by @La_Manno2018-velocyt
126126
If we fix one of the parameter values (e.g., setting $\beta=1$ as in @La_Manno2018-velocyto, corresponding to an assumption of a shared splicing rate between genes) we can estimate the other one ($\gamma$), and consequently obtain an estimate of the RNA velocity $v$, since $$v=\frac{ds(t)}{dt}=\beta u(t)-\gamma s(t).$$
127127
Notably, these velocities can be derived directly from the phase plot:
128128

129-
<img src="figures/steady_state_velocity.001.png" width = "50%">
129+
<img src="rna-velocity-figures/steady_state_velocity.001.png" width = "50%">
130130

131131
Consider any point along the trajectory.
132132
By construction, the y coordinate of this point is equal to $u(t)$.
@@ -230,7 +230,7 @@ For this reason, below we will focus on methods defining the pre-mRNA abundance
230230

231231
Let's consider the gene in the figure below.
232232

233-
![](figures/intron_definition_2.001.png)
233+
![](rna-velocity-figures/intron_definition_2.001.png)
234234

235235
It has two transcript isoforms, one with two exons and one with three exons.
236236
The isoforms are partly overlapping.
@@ -263,18 +263,18 @@ In order to better understand some of these differences, we show below a few exa
263263

264264
* Chkb - overlapping features on the same strand. In this case, only _alevin_ assigns a non-zero UMI count (and _STARsolo-diff_, which defines the intronic count as the difference between a "gene body count" and the regular gene expression).
265265

266-
![](figures/pancreas_Chkb.png)
266+
![](rna-velocity-figures/pancreas_Chkb.png)
267267

268268
* Rassf1 - overlapping features on different strands.
269269
Whether or not the tool accounts for the strandedness of the reads makes a difference.
270270

271-
![](figures/pancreas_Rassf1.png)
271+
![](rna-velocity-figures/pancreas_Rassf1.png)
272272

273273
* Tspan3 - many ambiguous regions.
274274
The way that the introns are defined makes a substantial difference.
275275
The intronic count is much higher with the 'separate' intron definition approach.
276276

277-
![](figures/pancreas_Tspan3.png)
277+
![](rna-velocity-figures/pancreas_Tspan3.png)
278278

279279
These differences between counts obtained by different methods propagate also to the estimated velocities, and can affect the biological interpretation of the final results.
280280

@@ -321,9 +321,9 @@ We will practice generating the [_Salmon_](https://salmon.readthedocs.io/en/late
321321
Here, we first set the path to the data (`datadir`), as well as to the folder where we will store the generated index and quantifications (`outdir`).
322322

323323
```{r setpaths, class.source = "rchunk", eval = TRUE}
324-
if (file.exists("/home/rstudio/adv_scrnaseq_2020")) {
325-
datadir <- "/home/rstudio/adv_scrnaseq_2020/data/spermatogenesis_subset"
326-
outdir <- "/home/rstudio/adv_scrnaseq_2020/data/spermatogenesis_subset/txintron"
324+
if (file.exists("/work/adv_scrnaseq_2020")) {
325+
datadir <- "/work/adv_scrnaseq_2020/data/spermatogenesis_subset"
326+
outdir <- "/work/adv_scrnaseq_2020/data/spermatogenesis_subset/txintron"
327327
} else {
328328
datadir <- "data/spermatogenesis_subset"
329329
outdir <- "data/spermatogenesis_subset/txintron"
@@ -334,8 +334,8 @@ Sys.setenv(datadir = datadir, outdir = outdir)
334334

335335
```{bash listfiles, class.source = "bashchunk"}
336336
## If run in a console
337-
## datadir=/home/rstudio/adv_scrnaseq_2020/data/spermatogenesis_subset
338-
## outdir=/home/rstudio/adv_scrnaseq_2020/data/spermatogenesis_subset/txintron
337+
## datadir=/work/adv_scrnaseq_2020/data/spermatogenesis_subset
338+
## outdir=/work/adv_scrnaseq_2020/data/spermatogenesis_subset/txintron
339339
## Check what is included in the data directory
340340
ls $datadir
341341
```
@@ -633,8 +633,8 @@ from os import path
633633

634634
```{python set-velodir, class.source = "pythonchunk"}
635635
## Path to data to use for RNA velocity calculations
636-
if (path.exists("/home/rstudio/adv_scrnaseq_2020")):
637-
velodir = Path('/home/rstudio/adv_scrnaseq_2020/data/spermatogenesis_rnavelocity')
636+
if (path.exists("/work/adv_scrnaseq_2020")):
637+
velodir = Path('/work/adv_scrnaseq_2020/data/spermatogenesis_rnavelocity')
638638
else:
639639
velodir = Path('data/spermatogenesis_rnavelocity')
640640
```
@@ -746,7 +746,7 @@ The model assumes the existence of four different transcriptional states - two s
746746
The EM algorithm iterates between estimating the latent time of a cell (the 'position' of the cell along the phase space trajectory) and assigning it a transcriptional state, and optimizing the values of the parameters (see Figure below from @Bergen2019-scvelo).
747747
The likelihood is obtained by assuming that the observations follow a normal distribution:$$x_i^{obs}\sim N((\hat{u}(t), \hat{s}(t)), \sigma^2).$$
748748

749-
![](figures/BergenFig1.jpg)
749+
![](rna-velocity-figures/BergenFig1.jpg)
750750

751751
Here, we will focus on the dynamical model, since it is generally the most accurate, and although it's a bit slower than the other methods, usually it's not prohibitively slow.
752752

@@ -779,6 +779,13 @@ This step adds several columns to `adata.var` (see [https://scvelo.readthedocs.i
779779
* estimates of switching time points (`fit_t_`)
780780
* the likelihood value of the fit (`fit_likelihood`), averaged across all cells. The likelihood value for a gene and a cell indicates how well the cell is described by the learned phase trajectory.
781781

782+
Since the step above is quite time consuming, we'll save an intermediate object at this point:
783+
784+
```{python save-object-1, class.source = "pythonchunk"}
785+
adata.write(velodir/'AdultMouseRep3_alevin_GRCm38.gencode.vM21.spliced.intron.fl90.gentrome.k31_sce_nometa_with_velocity.h5ad')
786+
```
787+
788+
782789
Once the kinetic rate parameters are estimated, the `tl.velocity` function estimates the actual velocities based on these.
783790
This adds a `velocity` layer to the `adata` object, and the `velocity_genes` column in `adata.var`.
784791
This column indicates whether the fit for a gene is considered 'good enough' for downstream use.
@@ -839,8 +846,8 @@ The graph can also be used to estimate the most likely cell transitions, and the
839846

840847
```{python trace-descendants, class.source = "pythonchunk"}
841848
x, y = scv.utils.get_cell_transitions(adata, basis = 'tsne', starting_cell = 70)
842-
ax = scv.pl.velocity_graph(adata, basis = 'tsne', c = 'lightgrey', edge_width = 0.05, show = False)
843-
ax = scv.pl.scatter(adata, x = x, y = y, s = 120, c = 'ascending', cmap = 'gnuplot', ax = ax)
849+
ax = scv.pl.velocity_graph(adata, basis = 'tsne', color = 'lightgrey', edge_width = 0.05, show = False)
850+
ax = scv.pl.scatter(adata, x = x, y = y, size = 120, color = 'ascending', ax = ax)
844851
```
845852

846853
### Visualizing the velocities in low dimension
@@ -937,25 +944,25 @@ Such genes may help to explain the vector field and the inferred lineages.
937944
The module `tl.rank_velocity_genes` runs a differential velocity t-test and outpus a gene ranking for each cluster.
938945

939946
```{python rank-velocity-genes, class.source = "pythonchunk"}
947+
## min_corr is the minimum accepted Spearman correlation coefficient
948+
## between spliced and unspliced
940949
scv.tl.rank_velocity_genes(adata, groupby = 'celltype', min_corr = 0.3)
941950
942951
df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
943952
df.head()
944-
945-
kwargs = dict(frameon = False, size = 20, linewidth = 1.5)
946-
947-
for cluster in ['DIplotene/Secondary spermatocytes', 'Mid Round spermatids']:
948-
scv.pl.scatter(adata, df[cluster][:5], ylabel = cluster, **kwargs, color = 'celltype')
949953
```
950954

951-
Moreover, partial gene likelihoods (average likelihood over a subset of the cells) can be computed for a each cluster of cells to enable cluster-specific identification of potential drivers.
955+
In the most recent release of _scVelo_ (0.2.0), the possibility of performing a 'differential kinetics' test was introduced.
956+
The purpose of this is to detect genes that display a different kinetic behaviour in some cell types than in others, giving rise to multiple trajectories.
957+
The `tl.differential_kinetic_test` module performs a likelihood ratio test evaluating whether allowing different kinetics for different cell populations give a significantly better likelihood than forcing them to follow the same one.
952958

953-
```{python dynamical-genes, class.source = "pythonchunk"}
954-
scv.tl.rank_dynamical_genes(adata, groupby='celltype')
955-
df = scv.get_df(adata, 'rank_dynamical_genes/names')
956-
df.head(5)
957-
for cluster in ['DIplotene/Secondary spermatocytes', 'Mid Round spermatids']:
958-
scv.pl.scatter(adata, df[cluster][:5], ylabel = cluster, frameon = False, color = 'celltype')
959+
```{python diff-kinetics, class.source = "pythonchunk"}
960+
scv.tl.differential_kinetic_test(adata, var_names = 'velocity_genes', groupby = 'celltype')
961+
top_genes_kin = adata.var['fit_pval_kinetics'].sort_values(ascending = True).index[:5]
962+
scv.get_df(adata[:, top_genes_kin], ['fit_diff_kinetics', 'fit_pval_kinetics'], precision = 2)
963+
964+
scv.pl.scatter(adata, basis = top_genes_kin, legend_loc = 'none', size = 80,
965+
frameon = True, ncols = 5, fontsize = 20, color = 'celltype')
959966
```
960967

961968

rna-velocity/rna-velocity.html

+127-107
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)