-
Notifications
You must be signed in to change notification settings - Fork 263
/
Copy pathstacked_bars_optimization.Rmd
130 lines (107 loc) · 4.08 KB
/
stacked_bars_optimization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
title: "Optimize_stacked_bars"
author: "Chenxin Li"
date: "2023-11-17"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
Stacked bar plots are useful for visualizing proportion data.
Stacked bar plots are commonly used to visualize community structure or population structure or admixture analysis.
This kind of visualization boils down to a collection of samples, where each sample contains multiple classes of members.
However, when we have many samples and many classes, stacked bar plots need to be optimized to be effective.
And by "optimize" I mean the grouping and ordering of samples.
# Required packages
```{r}
library(tidyverse)
library(patchwork) # Not actually required, for this discussion only
library(RColorBrewer) # Only required for coloring the graphs
```
# Data
```{r}
set.seed(666)
```
```{r}
generate_percentages <- function(N){
random_N <- rlnorm(n = N, sdlog = 3)
(random_N / sum(random_N)) * 100
}
my_data <- replicate(100, generate_percentages(8)) %>%
as.data.frame() %>%
cbind(class = letters[1:8]) %>%
pivot_longer(cols = !class, names_to = "sample", values_to = "percentage")
my_data
```
Don't worry about what happened here. I just simulated some data.
Here we have an example data with 100 samples and 8 classes of member.
# Without optimization
Let's see what will happen if we just make a plot as such without optimization.
```{r}
no_reorder <- my_data %>%
ggplot(aes(x = sample, y = percentage)) +
geom_bar(stat = "identity", aes(fill = class)) +
scale_fill_manual(values = brewer.pal(8, "Set2")) +
theme_classic() +
theme(axis.text.x = element_blank())
no_reorder
```
Due to the number of samples and classes, it is very hard to discern anything from this graph without optimizing the order of bars.
# Group samples by peak class
A prerequisite for reordering samples is grouping them.
Usually there is a natural way to group them.
For example grouping by control vs. treatment according to the design of the experiment.
In this example, we don't have any experimental condition, I will just group them by peak class.
This just means I am grouping samples by which class is the most abundant member in said sample.
```{r}
sample_grouping <- my_data %>%
group_by(sample) %>%
slice_max(order_by = percentage) %>%
select(class, sample) %>%
rename(peak_class = class)
sample_grouping
```
# Reordering bars
This is a 3 step process.
1. Grouping by sample grouping using `group_by()`.
2. Rank samples at _each level of sample grouping_ with `rank()`. R knows it should rank at each level of sample grouping because we called `group_by()`.
3. Reorder samples by rank using `reorder()`.
```{r}
my_data_reordered <- my_data %>%
inner_join(sample_grouping, by = "sample") %>%
group_by(peak_class) %>%
mutate(rank = rank(percentage)) %>% # rank samples at the level of each peak subtype
mutate(sample = reorder(sample, -rank)) %>% # this reorders samples
ungroup()
head(my_data_reordered)
```
# Plot reordered
Now let's plot the optimized graph
```{r}
bars_reordered <- my_data_reordered %>%
ggplot(aes(x = sample, y = percentage)) +
geom_bar(stat = "identity", aes(fill = class)) +
scale_fill_manual(values = brewer.pal(8, "Set2")) +
theme_classic() +
theme(axis.text.x = element_blank(),
strip.background = element_blank(),
strip.text = element_blank(),
panel.spacing.x = unit(0.1, "line"))
```
...and make a comparison.
```{r}
wrap_plots(
no_reorder +
labs(title = "Without reordering bars"),
bars_reordered +
labs(title = "Bars reordered"),
guides = "collect",
nrow = 2
) &
labs(y = "relative abundance (%)") &
theme(title = element_text(size = 10))
ggsave("../Results/Reorder_stacked_bars.svg", height = 4, width = 6, bg = "white")
ggsave("../Results/Reorder_stacked_bars.png", height = 4, width = 6, bg = "white")
```
After reordering the the bars, wow, that really made a difference, don't you think?