-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
102 lines (69 loc) · 2.86 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# pangoRo
<!-- badges: start -->
[](https://github.com/al-obrien/pangoRo/actions/workflows/R-CMD-check.yaml)
[](https://app.codecov.io/gh/al-obrien/pangoRo?branch=master)
<!-- badges: end -->
COVID-19 lineage names can be confusing to navigate; there are many aliases and if you want to catch them all to examine further, it helps to have some additional tools...
{pangoRo} is an R package to support interacting with [PANGO lineage](https://cov-lineages.org/index.html) information. The core functionality was inspired by a similar package called [pango_aliaser](https://github.com/corneliusroemer/pango_aliasor) created by Cornelius Roemer for the Python language.
## Installation
You can install {pangoRo} from GitHub:
``` r
remotes::install_github('al-obrien/pangoRo')
```
## Examples
The basic usage of {pangoRo} is to expand, collapse, and sort COVID-19 lineages. Start by creating the *pangoro* object that links to the latest (or cached) PANGO reference. This is then passed to subsequent operations as reference.
```{r example}
library(pangoRo)
# Create pangoro object
my_pangoro <- pangoro()
```
### Collapse
With a vector of PANGO lineages, provide fully collapsed output.
```{r}
# Vector of COVID-19 lineages to collapse
cov_lin <- c('B.1.617.2', 'BL.2', 'B.1.1.529.2.75.1.2', 'BA.2.75.1.2', 'XD.1')
# Collapse lineage names as far as possible
collapse_pangoro(my_pangoro, cov_lin)
```
Can also define how far to collapse each input.
```{r}
collapse_pangoro(my_pangoro, cov_lin, max_level = 1)
```
### Expand
```{r}
# Vector of COVID-19 lineages to expand
cov_lin <- c('B.1.617.2', 'B.1.617.2.6', 'AY.4', 'AY.39', 'BL.2', 'BA.1', 'AY.2', 'XD.1')
# Expand lineage names as far as possible
exp_lin <- expand_pangoro(my_pangoro, cov_lin)
exp_lin
```
### Sort
Perform a pseudo-sort on the lineage names.
```{r}
# Sort lineages
sort_pangoro(my_pangoro, exp_lin)
```
Split the lineages by their lowest alias codes and sort within each grouping
```{r}
collapsed_full <- collapse_pangoro(my_pangoro, cov_lin, aliase_parent = TRUE)
grps <- split(collapsed_full, sapply(strsplit(collapsed_full, split = '\\.'), `[[`, 1))
lapply(grps, function(x) sort_pangoro(my_pangoro, x))
```
### Detect recombinant lineages
Although initial recombinant variants are typically obvious based upon their *X* prefix, their children may not be (e.g. *EG.1*).
```{r}
is_recombinant(my_pangoro,
c('EG.1', 'EC.1', 'BA.1', 'XBB.1.9.1.1.5.1', 'B.1.529.1'))
```