human_volunteer_16S_analysis.Rmd

---
title: "Human Volunteer 16S Amplicon Analysis"
author: "Barry Hykes Jr. and Scott A. Handley"
date: '`r format(Sys.Date(), "%B %d, %Y")`'
output: html_document
---
**Project Description:** Analysis of 16S rRNA amplicon data from a cohort of human volunteers (heatlhy men aged 18-35) randomized into three treatment groups: 1) control (no antibiotics), 2) narrow spectrum (oral vancomycin) or 3) broad-spectrum (oral vancomycin, ciprofloxacin, and metronidazole) antibiotics. Antibiotics were taken for 7-days prior to vaccination with Rotarix (RVV), polysaccharide-pneumococcal (Pneumo 23) and tetanus-toxoid vaccine. The primary endpoint was difference in 28 days-post-vaccination anti-RV IgA. Secondary endpoints were proportion of volunteers with day 7 anti-RV IgA boosting (>=2 fold-increase), absolute and proportion of RV-antigen shedding, anti-RV, pneumococcal and anti-tetanus IgG.

**Primary Collaborator:**
Vanessa Harris (v.harris@aighd.org)

**Other relevant files:**
The following Phyloseq objects are available. Each is distinguished based on the 16S reference database used for taxonomic classification. RDP and Silva were processed through the species assignment workflow:

* ps0.human_volunteer.gg.RDS
* ps0.human_volunteer.silva.RDS
* ps0.human_volunteer.rdp.RDS

* Mapping file: mapping_human_volunteer.txt

**Workflow details:** The R commands below represent a full analysis of the following:

1) Sample assessment
2) ASV properties
3) Community composition
4) Alpha diversity
5) Beta diversity
6) Differential abundance testing

```{r initiate-environment}
# Set default knitr option
knitr::opts_chunk$set(fig.width=8,
                      fig.height=6,
                      fig.path="./figures/",
                      dev='png',
                      warning=FALSE,
                      message=FALSE)
# Load libraries
library("tidyverse"); packageVersion("tidyverse")
library("reshape2"); packageVersion("reshape2")
library("plyr"); packageVersion("plyr")
library("phyloseq"); packageVersion("phyloseq")
library("RColorBrewer"); packageVersion("RColorBrewer")
library("vegan"); packageVersion("vegan")
library("gridExtra"); packageVersion("gridExtra")
library("knitr"); packageVersion("knitr")
library("plotly"); packageVersion("plotly")
library("microbiome"); packageVersion("microbiome")
library("ggpubr"); packageVersion("ggpubr")
library("data.table"); packageVersion("data.table")
library("pairwiseAdonis"); packageVersion("pairwiseAdonis")
library("DESeq2"); packageVersion("DESeq2")

# ggplot2 settings
# Set global theming
theme_set(theme_bw(base_size = 12))

# Set a seed value so that exact results can be reproduced
set.seed(1000)

```
##Read in data

```{r initiate-data}
# Load Phyloseq Object
# Selected RDP due to it's up-to-date nature and conservative taxonomy. Other files are also valid for anlysis but are not explored here but are available in the ./data/ directory
ps0 <- readRDS("~/Dropbox/Research/human_volunteer/data/ps0.human_volunteer.rdp.RDS")
ps0

# Load mapping file
map <- import_qiime_sample_data("~/Dropbox/Research/human_volunteer/data/mapping_human_volunteer.txt")
ps0 <- merge_phyloseq(ps0, map)
ps0

# View sample variables & generate basic stats
sample_variables(ps0)
sd(sample_sums(ps0))
get_taxa_unique(ps0, "Phylum")
ntaxa(ps0)

```
##Factor reordering and renaming

```{r factor-adjustments}
# Order and update randomization arms factor
sample_data(ps0)$randomization_arm
sample_data(ps0)$randomization_arm <- factor(sample_data(ps0)$randomization_arm, levels = c("No_antibiotics", "Narrow_spectrum_antibiotics", "Broad_spectrum_antibiotics"))
sample_data(ps0)$randomization_arm
sample_data(ps0)$randomization_arm <- factor(sample_data(ps0)$randomization_arm, labels = c("No antibiotics", "Narrow spectrum antibiotics", "Broad spectrum antibiotics"))
sample_data(ps0)$randomization_arm

```
##Sample assessment

```{r sample-removal-identification}
# Format a data table to combine sample summary data with sample variable data
ss <- sample_sums(ps0)
sd <- as.data.frame(sample_data(ps0)) # useful to coerce the phyloseq object into an R data frame
ss.df <- merge(sd, data.frame("ASV" = ss), by ="row.names") # merge ss with sd by row names. Rename ss to ASVs in the new data frame

# Plot the data by the treatment variable
y = 100 # Set a threshold for the minimum number of acceptable reads. Can start as a guess
x = "randomization_arm" # Set the x-axis variable you want to examine
label = "Description" # This is the label you want to overlay on the points that are below threshold y. Should be something sample specific

# Plot
p.ss.boxplot<- ggplot(ss.df, aes_string(x, y = "ASV")) + # x is what you assigned it above
  geom_boxplot(outlier.colour="NA", aes(group = randomization_arm)) +
  scale_y_log10() +
  geom_hline(yintercept = y, lty = 2) + # Draws a dashed line across the threshold you set above as y
  geom_jitter(alpha = 0.6, width = 0.15, size = 3) +
  labs(y = "ASV (log10)") +
  facet_grid(~day) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  theme(legend.position = "NULL") +
  theme(axis.title.x = element_blank())
p.ss.boxplot

```
##Outlier sample removal

```{r sample-removal}
# Remove samples with fewer than y ASV
# Save samples that fell below the selected threshold
ps.failed <- prune_samples(sample_sums(ps0) < y, ps0)
nsamples(ps.failed)

# Create a table of failed sample information
failed.samples <- data.frame(sample_names(ps.failed), sample_data(ps.failed)$sample_ID, sample_data(ps.failed)$tube_ID)
write.table(failed.samples, file = "./results/failed_samples.txt", sep = "\t")

# Remove samples with fewer than y ASV
ps0
ps0 <- prune_samples(sample_sums(ps0) >=y, ps0)
ps0
min(sample_sums(ps0))

```
##Taxon cleaning 

```{r taxon-cleaning}
# Begin by removing sequences that were not classified as Bacteria or were classified as either mitochondria or chlorplast
ps0 # Check the number of taxa prior to removal
ps1 <- ps0 %>%
  subset_taxa(
    Kingdom == "Bacteria" &
    Family  != "mitochondria" &
    Class   != "Chloroplast" &
    Phylum != "Cyanobacteria/Chloroplast"
  )
ps1 # Confirm that the taxa were removed

```
##Data transformations

```{r data-transform}
# Some of these are not used in subsequent analysis, but are coded as a chunk here for convenience
# Transform to Realative abundances
ps1.ra <- transform_sample_counts(ps1, function(OTU) OTU/sum(OTU))

```
##Subsetting

```{r subsetting}
# No antibiotics (control)
ps1.con <- subset_samples(ps1, randomization_arm == "No antibiotics")
ps1.con
ps1.con <- prune_samples(sample_sums(ps1.con) > 0, ps1.con)
ps1.con
ps1.con.log <- transform_sample_counts(ps1.con, function(x) log(1 + x))

# Narrow spectrum antibiotics
ps1.narrow <- subset_samples(ps1, randomization_arm == "Narrow spectrum antibiotics")
ps1.narrow
ps1.narrow <- prune_samples(sample_sums(ps1.narrow) > 0, ps1.narrow)
ps1.narrow
ps1.narrow.log <- transform_sample_counts(ps1.narrow, function(x) log(1 + x))

# Broad spectrum antibiotics
ps1.broad <- subset_samples(ps1, randomization_arm == "Broad spectrum antibiotics")
ps1.broad
ps1.broad <- prune_samples(sample_sums(ps1.broad) > 0, ps1.broad)
ps1.broad
ps1.broad.log <- transform_sample_counts(ps1.broad, function(x) log(1 + x))

# Antibiotics
ps1.abx <- subset_samples(ps1, antibiotics == "Antibiotics")
ps1.abx
ps1.abx <- prune_samples(sample_sums(ps1.abx) > 0, ps1.abx)
ps1.abx
ps1.abx.log <- transform_sample_counts(ps1.abx, function(x) log(1 + x))

```
##Community composition plotting

These plots were not included in the final manuscript but remain here as compositional overviews. Interactive plots are also produced.

```{r community-composition-plots}
# Phyla level plots
# Melt to long format (for ggploting) 
# Prune out phyla below % in each sample

# Set propotional threshold for taxa to be removed
threshold = 0.03

ps1_phylum <- ps1 %>%
  tax_glom(taxrank = "Phylum") %>%                     # agglomerate at phylum level
  transform_sample_counts(function(x) {x/sum(x)} ) %>% # Transform to rel. abundance
  psmelt() %>%                                         # Melt to long format
  filter(Abundance > threshold) %>%                    # Filter out low abundance taxa
  arrange(Phylum)                                      # Sort data frame alphabetically by phylum

# Convert Sample No to a factor because R is weird sometime
ps1_phylum$patient_ID <- as.factor(ps1_phylum$patient_ID)

p.comm.bar <- ggplot(ps1_phylum, aes(x = patient_ID, y = Abundance, fill = Phylum)) + 
  geom_bar(stat = "identity", width = 0.9) +
  facet_wrap(randomization_arm~day, scales = "free_x") +
  ggtitle("Community Composition Over Time and Randomization Arm") +
  labs(y = "Relative Abundance") +
  theme(legend.position = "bottom") +
  theme(axis.text.x = element_blank()) +
  theme(axis.title.x = element_blank()) +
  scale_fill_brewer(palette = "Dark2")
p.comm.bar

# Draw interactive plot
ggplotly(p.comm.bar)

```
##Phyla-level summary plots

```{r phyla-level-boxplots}
# Agglomerate taxa
glom <- tax_glom(ps1.ra, taxrank = 'Phylum')

# Create dataframe from phyloseq object
dat <- as.tibble(psmelt(glom))

# Set comparisons
my_comparisons.1 <- list( c("No antibiotics", "Narrow spectrum antibiotics"), c("No antibiotics", "Broad spectrum antibiotics"), c("Narrow spectrum antibiotics", "Broad spectrum antibiotics") )
my_comparisons.2 <- list( c("-9", "0"), c("-9", "7"), c("0", "7") )

# Table of mean abundances per Phylum
dat.summary <- dat %>%
  group_by(Phylum) %>%
  summarise(mean = mean(Abundance)) %>%
  arrange(desc(mean))
ggplot(dat.summary, aes(x = fct_reorder(Phylum, mean), y = mean)) +
  geom_point() +
  coord_flip()

# Reorder
levels(dat$Phylum)
dat.reorder <- dat %>%
  mutate(Phylum = reorder(Phylum, Abundance, mean))
levels(dat.reorder$Phylum)

# Determine how mean abundances of Phyla are altered by treatment
p.boxplot.phylum.1 <- ggboxplot(dat.reorder, x = "randomization_arm", y = "Abundance", outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  labs(title = "Figure S3", y = "Relative Abundance", x = "") +
  coord_cartesian(ylim = c(0, 1.5)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(comparisons = my_comparisons.1, label = "p.signif", hide.ns = TRUE) +
  facet_grid(day~Phylum)
p.boxplot.phylum.1
# Note: For the figure in the manuscript we manually replaced the asterisks and removed the ns for aesthetic purposes

# Subset to phyla with treatment alteration in mean abundnaces
dat.1 <- filter(dat.reorder, Phylum %in% c("Firmicutes",
                                   "Bacteroidetes",
                                   "Proteobacteria",
                                   "Verrucomicrobia",
                                   "Actinobacteria",
                                   "Fusobacteria"))
levels(dat.1$Phylum)
dat.1 <- droplevels(dat.1)
levels(dat.1$Phylum)

# Boxplot with treatment altered phyla
p.boxplot.phylum.2 <- ggboxplot(dat.1, x = "randomization_arm", y = "Abundance", outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  labs(title = "Figure S3", y = "Relative Abundance", x = "") +
  coord_cartesian(ylim = c(0, 1.5)) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(comparisons = my_comparisons.1, label = "p.format", hide.ns = TRUE) +
  facet_grid(day~Phylum)
p.boxplot.phylum.2
# Note: For the figure in the manuscript we manually replaced the asterisks and removed the ns for aesthetic purposes

# Not inlcuded in the final manuscript, but referenced in the text
p.boxplot.phylum.3 <- ggpaired(dat.1, x = "day", y = "Abundance", outlier.shape = NA, id = "patient_ID") +
  geom_jitter(width = 0.2) +
  labs(y = "Relative Abundance") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(comparisons = my_comparisons.2, hide.ns = FALSE, label = "p.signif") +
  facet_grid(randomization_arm~Phylum) +
  coord_cartesian(ylim = c(0, 1.5))
p.boxplot.phylum.3

```

```{r phyla-smoother-plots}
# Reorder for legend consistency
# Reorder
levels(dat.1$Phylum)
dat.1.reorder <- dat.1 %>%
  mutate(Phylum = reorder(Phylum, Abundance, function(x) -mean(x)))
levels(dat.1.reorder$Phylum)

p.gam.phylum <- ggplot(dat.1.reorder, aes(x = day, y = Abundance, color = Phylum)) +
  stat_smooth(method = "gam", formula = y ~ s(x, bs = "cr", k = 3)) +
  labs(y = "Relative Abundance", x = "Day") +
  scale_x_continuous(breaks = c(-9,0,7)) +
  coord_cartesian(ylim = c(0, 1)) +
  geom_jitter(size = 3, alpha = 0.4, width = 1.5) +
  facet_grid(~randomization_arm)
p.gam.phylum

```
##Alpha diversity

```{r add-sample-data}
# Diversity calculations
diversity <- global(ps1)
head(diversity)

# Bind sample data to diversity data
sd.1 <- as.data.frame(sample_data(ps1)) # useful to coerce the phyloseq object into an R data frame
ps1.rich <- merge(sd.1, diversity, by ="row.names") # merge sd.1 by row names

```

```{r paired-richness}
my.comparisons.paired <- list(c("Pre", "0"), c("Pre", "7"), c("0","7"))

p.paired.rich.1 <- ggpaired(ps1.rich, x = "day", y = "richness_0", outlier.shape = NA, id = "patient_ID") +
  geom_jitter(width = 0.2) +
  labs(y = "Richness") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_grid(~randomization_arm) +
  stat_compare_means(label = "p.signif", method = "t.test", ref.group = "-9", hide.ns = TRUE) +
  theme(axis.text.x = element_blank())

p.paired.sd.1 <- ggpaired(ps1.rich, x = "day", y = "diversities_shannon", outlier.shape = NA, id = "patient_ID") +
  geom_jitter(width = 0.2) +
  labs(y = "Shannon Diversity") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_grid(~randomization_arm) +
  stat_compare_means(label = "p.signif", method = "t.test", ref.group = "-9", hide.ns = TRUE)

ggarrange(p.paired.rich.1, p.paired.sd.1, nrow = 2, labels = c("A)", "B)"))

# Alpha diversity comparisons between treatments
p.paired.rich.2 <- ggboxplot(ps1.rich, x = "randomization_arm", y = "richness_0", outlier.shape = NA, id = "patient_ID") +
  geom_jitter(width = 0.2) +
  labs(y = "Richness") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_grid(~day) +
  stat_compare_means(label = "p.format", comparisons = my_comparisons.1, hide.ns = TRUE) +
  theme(axis.text.x = element_blank())

p.paired.sd.2 <- ggboxplot(ps1.rich, x = "randomization_arm", y = "diversities_shannon", outlier.shape = NA, id = "patient_ID") +
  geom_jitter(width = 0.2) +
  labs(y = "Shannon Diversity") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  facet_grid(~day) +
  stat_compare_means(label = "p.format", comparisons = my_comparisons.1, hide.ns = TRUE)

ggarrange(p.paired.rich.2, p.paired.sd.2, nrow = 2, labels = c("A)", "B)"))
# Note: This figure was not included in the final manuscript, but the analysis and p-values were referenced in the text

```

```{r alpha-div-day7-roto-boost-alpha-diversity}
# These plots were not inlcuded in the final manuscript, but were discussed in the results
p.rich.rota_boost <- ggboxplot(subset(ps1.rich, randomization_arm == "Narrow spectrum antibiotics"), x = "d7_rota_boost_updated", y = "richness_0", outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  ylim(0,250) +
  labs(y = "Richness", title = "Boost") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(label = "p.format", method = "t.test", hide.ns = TRUE) +
  facet_grid(~day) +
  theme(axis.text.x = element_blank())

p.sd.rota_boost <- ggboxplot(subset(ps1.rich, randomization_arm == "Narrow spectrum antibiotics"), x = "d7_rota_boost_updated", y = "diversities_shannon", outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  ylim(0,5) +
  labs(y = "Shannon diversity") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(label = "p.format", method = "t.test", hide.ns = TRUE) +
  facet_grid(~day)

ggarrange(p.rich.rota_boost, p.sd.rota_boost, nrow = 2, labels = c("A)", "B)"))
# Note: This figure was not included in the final manuscript, but the analysis and p-values were referenced in the text

```
##Alpha diversity and shedding

Shedding was defined as having one or more stool samples per patient positive for rotavirus shedding.

```{r alpha-div-shedding-alpha-diversity}
p.rich.shedding <- ggboxplot(subset(ps1.rich, randomization_arm == "Narrow spectrum antibiotics"), x = "Shedding", y = "richness_0", outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  ylim(0,250) +
  labs(y = "Richness", title = "Shedding") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(label = "p.format", method = "t.test", hide.ns = TRUE) +
  facet_grid(~day) +
  theme(axis.text.x = element_blank())

p.sd.shedding <- ggboxplot(subset(ps1.rich, randomization_arm == "Narrow spectrum antibiotics"), x = "Shedding", y = "diversities_shannon", outlier.shape = NA) +
  geom_jitter(width = 0.2) +
  ylim(0,5) +
  labs(y = "Shannon diversity") +
  theme(axis.title.x = element_blank()) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  stat_compare_means(label = "p.format", method = "t.test", hide.ns = TRUE) +
  #facet_grid(~randomization_arm) +
  scale_x_discrete(labels = c("No", "Yes"))

ggarrange(p.rich.shedding, p.sd.shedding, nrow = 2, labels = c("A)", "B)"))
# Note: This figure was not included in the final manuscript, but the analysis and p-values were referenced in the text

```
##Beta diversity analysis

```{r beta-div}
# UniFrac
ord.ps1.uni <- ordinate(ps1, method = "PCoA", distance = "unifrac")

# weighted UniFrac
ord.ps1.wuni <- ordinate(ps1, method = "PCoA", distance = "wunifrac")

# PCoA plot - Boost
p.ord.wuni <- plot_ordination(ps1, ord.ps1.wuni, color = "randomization_arm", shape = "d7_rota_boost_updated") +
  geom_point(size=3, alpha = 0.7) +
  scale_color_brewer(palette = "Dark2") +
  geom_point(colour = "grey50", size = 0.5) +
  facet_grid(~ day) +
  labs(color = "Treatment", shape = "Day 7 Rotavirus Boosted") +
  theme(legend.position = "null") +
  geom_hline(yintercept = 0, size = 0.1, lty = 2) +
  geom_vline(xintercept = 0, size = 0.1, lty = 2) +
  theme(panel.grid.major = element_blank()) +
  theme(panel.grid.minor = element_blank())

p.ord.uni <- plot_ordination(ps1, ord.ps1.uni, color = "randomization_arm", shape = "d7_rota_boost_updated") +
  geom_point(size=3, alpha = 0.7) +
  scale_color_brewer(palette = "Dark2") +
  geom_point(colour = "grey50", size = 0.5) +
  facet_grid(~ day) +
  labs(color = "Treatment", shape = "Rotavirus Boosted") +
  theme(legend.position = "null")+
  geom_hline(yintercept = 0, size = 0.1, lty = 2) +
  geom_vline(xintercept = 0, size = 0.1, lty = 2) +
  theme(panel.grid.major = element_blank()) +
  theme(panel.grid.minor = element_blank())

ggarrange(p.ord.uni, p.ord.wuni, nrow = 2)

# Species ordination for all Phylum
# Note: The final manuscript just showed the top 6 most abundant phylum. Code for this is in the following chunk
plot_ordination(ps1, ord.ps1.wuni, color = "Genus", type = "taxa") +
  geom_point(size=2.5) +
  geom_point(colour = "grey50", size = 0.25) +
  labs(color = "Day 7 Rotavirus Boosted", shape = "Treatment") +
  theme(legend.position = "null") +
  facet_wrap(~Phylum, ncol = 4) +
  geom_hline(yintercept = 0, size = 0.1, lty = 2) +
  geom_vline(xintercept = 0, size = 0.1, lty = 2) +
  theme(panel.grid.major = element_blank()) +
  theme(panel.grid.minor = element_blank())

```
## Beta diversity with top 6 Phyla

```{r beta-div-top-6-phylum}
# Select only top phyla for display purposes
phylum.sum <- tapply(taxa_sums(ps1), tax_table(ps1)[, "Phylum"], sum, na.rm=TRUE)

# Select top 6 most abundant phyla (to correspond with other figrures)
top6phyla <- names(sort(phylum.sum, TRUE))[1:6]
top6phyla
ps1.top6 <- prune_taxa((tax_table(ps1)[, "Phylum"] %in% top6phyla), ps1)
get_taxa_unique(ps1.top6, "Phylum")

# UniFrac
ord.ps1.uni.top6 <- ordinate(ps1.top6, method = "PCoA", distance = "unifrac")

# weighted UniFrac
ord.ps1.wuni.top6 <- ordinate(ps1.top6, method = "PCoA", distance = "wunifrac")

p.ord.species <- plot_ordination(ps1.top6, ord.ps1.wuni, color = "Genus", type = "taxa") +
  geom_point(size=2.5) +
  geom_point(colour = "grey50", size = 0.25) +
  labs(color = "Day 7 Rotavirus Boosted", shape = "Treatment") +
  theme(legend.position = "null") +
  facet_wrap(~Phylum, ncol = 6) +
  geom_hline(yintercept = 0, size = 0.1, lty = 2) +
  geom_vline(xintercept = 0, size = 0.1, lty = 2) +
  theme(panel.grid.major = element_blank()) +
  theme(panel.grid.minor = element_blank())
p.ord.species

```

```{r beta-div-figure-for-manuscript}
ggarrange(p.ord.uni, p.ord.wuni, p.ord.species, nrow = 3, labels = c("A)", "B)", "C)"))
# Note: p.ord.species was re-ordered in Keynote because SH couldn't figure out a simple way to reorder the phyla in PhyloSeq. He is sure there is a more sophisticated way to do this though :)

```
##Premanova significance testing (ADONIS)

```{r pairwise-ADONIS}
# Create relevant subsets
ps1.d_9 <- subset_samples(ps1, day == "-9")
ps1.d0 <- subset_samples(ps1, day == "0")
ps1.d7 <- subset_samples(ps1, day == "7")

# Day -9
ps1_otu_table.d_9 <- as.data.frame(otu_table(ps1.d_9))
sd.df.d_9 <- as.data.frame(sample_data(ps1.d_9))
ps1_otu_table.d_9$randomization_arm <- sd.df.d_9$randomization_arm

kable(pairwise.adonis(ps1_otu_table.d_9[,1:1588], ps1_otu_table.d_9$randomization_arm), format = "pandoc", caption = "Day -9")

# Day 0
ps1_otu_table.d0 <- as.data.frame(otu_table(ps1.d0))
sd.df.d0 <- as.data.frame(sample_data(ps1.d0))
ps1_otu_table.d0$randomization_arm <- sd.df.d0$randomization_arm

kable(pairwise.adonis(ps1_otu_table.d0[,1:1588], ps1_otu_table.d0$randomization_arm), format = "pandoc", caption = "Day 0")

# Day 7
ps1_otu_table.d7 <- as.data.frame(otu_table(ps1.d7))
sd.df.d7 <- as.data.frame(sample_data(ps1.d7))
ps1_otu_table.d7$randomization_arm <- sd.df.d7$randomization_arm

kable(pairwise.adonis(ps1_otu_table.d7[,1:1588], ps1_otu_table.d7$randomization_arm), format = "pandoc", caption = "Day 7")

```
#Differential Abundance Testing

##Pairwise comparisons across treatments at each sampling time

- Control Vs. narrow x's 3 time points (-9, 0, 7)
- Control vs. Broad x's 3 time points (-9, 0, 7)
- Narrow vs. Broad x's 3 time points (-9, 0, 7)
**Total:** 9 Comparisons

# Differential abundance testing preperation
```{r diffab-treatment-pairwise-prep}
# Set alpha
alpha = 0.05

# Calculate geometric means prior to estimate size factors
# This is required for data sets with lots of zeros
gm_mean = function(x, na.rm=TRUE){
  exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}

# Generate subsets
# Pairwise treatment groups
ps1.CvN <- subset_samples(ps1, randomization_arm != "Broad spectrum antibiotics")
sample_data(ps1.CvN)$randomization_arm

ps1.CvB <- subset_samples(ps1, randomization_arm != "Narrow spectrum antibiotics")
sample_data(ps1.CvB)$randomization_arm

ps1.NvB <- subset_samples(ps1, randomization_arm != "No antibiotics")
sample_data(ps1.NvB)$randomization_arm

# Day subsets
ps1.CvN
ps1.CvN_9 <- subset_samples(ps1.CvN, day == "-9")
ps1.CvN_9
sample_data(ps1.CvN_9)$day

ps1.CvN.0 <- subset_samples(ps1.CvN, day == "0")
ps1.CvN.0
sample_data(ps1.CvN.0)$day

ps1.CvN.7 <- subset_samples(ps1.CvN, day == "7")
ps1.CvN.7
sample_data(ps1.CvN.7)$day

# Control vs Broad by day
ps1.CvB_9 <- subset_samples(ps1.CvB, day == "-9")
ps1.CvB_9
sample_data(ps1.CvB_9)$day

ps1.CvB.0 <- subset_samples(ps1.CvB, day == "0")
ps1.CvB.0
sample_data(ps1.CvB.0)$day

ps1.CvB.7 <- subset_samples(ps1.CvB, day == "7")
ps1.CvB.7
sample_data(ps1.CvB.7)$day

# Narrow vs Broad by day
ps1.NvB_9 <- subset_samples(ps1.NvB, day == "-9")
ps1.NvB_9
sample_data(ps1.NvB_9)$day

ps1.NvB.0 <- subset_samples(ps1.NvB, day == "0")
ps1.NvB.0
sample_data(ps1.NvB.0)$day

ps1.NvB.7 <- subset_samples(ps1.NvB, day == "7")
ps1.NvB.7
sample_data(ps1.NvB.7)$day

```
##Control vs Narrow Spectrum Day -9

```{r diff-ab-control-v-narrow-d.9}
# Differential Abundance Testing
sample_data(ps1.CvN_9)$randomization_arm
sample_data(ps1.CvN_9)$day
ds.NoAbx.Narrow.d_9 <- phyloseq_to_deseq2(ps1.CvN_9, ~randomization_arm)

geoMeans.ds.NoAbx.Narrow.d_9 <- apply(counts(ds.NoAbx.Narrow.d_9), 1, gm_mean)
ds.NoAbx.Narrow.d_9 <- estimateSizeFactors(ds.NoAbx.Narrow.d_9, geoMeans = geoMeans.ds.NoAbx.Narrow.d_9)
dds.NoAbx.Narrow.d_9 <- DESeq(ds.NoAbx.Narrow.d_9, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.NoAbx.Narrow.d_9 = results(dds.NoAbx.Narrow.d_9, cooksCutoff = FALSE)
sigtab_dds.dds.NoAbx.Narrow.d_9 = res.dds.NoAbx.Narrow.d_9[which(res.dds.NoAbx.Narrow.d_9$padj < alpha), ]
sigtab_dds.dds.NoAbx.Narrow.d_9 = cbind(as(sigtab_dds.dds.NoAbx.Narrow.d_9, "data.frame"), as(tax_table(ps1.CvN_9)[rownames(sigtab_dds.dds.NoAbx.Narrow.d_9), ], "matrix"))
summary(res.dds.NoAbx.Narrow.d_9)
head(sigtab_dds.dds.NoAbx.Narrow.d_9)
write.table(sigtab_dds.dds.NoAbx.Narrow.d_9, file="./results/deseq_d_9_NoAbx.Narrow.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.NoAbx.Narrow.d_9, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.NoAbx.Narrow.d_9 = data.table(as(results(dds.NoAbx.Narrow.d_9, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.NoAbx.Narrow.d_9, "rn", "OTU")
taxdt.dds.d_9.com = data.table(data.frame(as(tax_table(ps1.CvN_9), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d_9.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d_9.com, "OTU")
setkeyv(resdt.dds.NoAbx.Narrow.d_9, "OTU")
resdt.dds.NoAbx.Narrow.d_9 <- taxdt.dds.d_9.com[resdt.dds.NoAbx.Narrow.d_9]
resdt.dds.NoAbx.Narrow.d_9
resdt.dds.NoAbx.Narrow.d_9[, Significant := padj < alpha]
resdt.dds.NoAbx.Narrow.d_9[!is.na(Significant)]
resdt.dds.NoAbx.Narrow.d_9

volcano.NoAbx.Narrow.d_9 = ggplot(
  data = resdt.dds.NoAbx.Narrow.d_9[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.NoAbx.Narrow.d_9[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay -9 No Antibiotics vs Narrow Spectrum Antibiotics") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.NoAbx.Narrow.d_9
summary(res.dds.NoAbx.Narrow.d_9)
mcols(res.dds.NoAbx.Narrow.d_9, use.names = TRUE)

ggplotly(volcano.NoAbx.Narrow.d_9)

```
##Control vs Narrow Spectrum Day 0

```{r diff-ab-control-v-narrow-d0}
# Differential Abundance Testing
sample_data(ps1.CvN.0)$randomization_arm
sample_data(ps1.CvN.0)$day
ds.NoAbx.Narrow.d0 <- phyloseq_to_deseq2(ps1.CvN.0, ~randomization_arm)

geoMeans.ds.NoAbx.Narrow.d0 <- apply(counts(ds.NoAbx.Narrow.d0), 1, gm_mean)
ds.NoAbx.Narrow.d0 <- estimateSizeFactors(ds.NoAbx.Narrow.d0, geoMeans = geoMeans.ds.NoAbx.Narrow.d0)

dds.NoAbx.Narrow.d0 <- DESeq(ds.NoAbx.Narrow.d0, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.NoAbx.Narrow.d0 = results(dds.NoAbx.Narrow.d0, cooksCutoff = FALSE)
sigtab_dds.dds.NoAbx.Narrow.d0 = res.dds.NoAbx.Narrow.d0[which(res.dds.NoAbx.Narrow.d0$padj < alpha), ]
sigtab_dds.dds.NoAbx.Narrow.d0 = cbind(as(sigtab_dds.dds.NoAbx.Narrow.d0, "data.frame"), as(tax_table(ps1.CvN_9)[rownames(sigtab_dds.dds.NoAbx.Narrow.d0), ], "matrix"))
summary(res.dds.NoAbx.Narrow.d0)
head(sigtab_dds.dds.NoAbx.Narrow.d0)
write.table(sigtab_dds.dds.NoAbx.Narrow.d0, file="./results/deseq_d0_NoAbx.Narrow.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.NoAbx.Narrow.d0, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.NoAbx.Narrow.d0 = data.table(as(results(dds.NoAbx.Narrow.d0, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.NoAbx.Narrow.d0, "rn", "OTU")
taxdt.dds.d0.com = data.table(data.frame(as(tax_table(ps1.CvN.0), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d0.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d0.com, "OTU")
setkeyv(resdt.dds.NoAbx.Narrow.d0, "OTU")
resdt.dds.NoAbx.Narrow.d0 <- taxdt.dds.d0.com[resdt.dds.NoAbx.Narrow.d0]
resdt.dds.NoAbx.Narrow.d0
resdt.dds.NoAbx.Narrow.d0[, Significant := padj < alpha]
resdt.dds.NoAbx.Narrow.d0[!is.na(Significant)]
resdt.dds.NoAbx.Narrow.d0

volcano.NoAbx.Narrow.d0 = ggplot(
  data = resdt.dds.NoAbx.Narrow.d0[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.NoAbx.Narrow.d0[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay 0 No Antibiotics vs Narrow Spectrum Antibiotics") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.NoAbx.Narrow.d0
summary(res.dds.NoAbx.Narrow.d0)
mcols(res.dds.NoAbx.Narrow.d0, use.names = TRUE)

ggplotly(volcano.NoAbx.Narrow.d0)

nrow(res.dds.NoAbx.Narrow.d0)

```
##Control vs Narrow Spectrum Day 7

```{r diff-ab-control-v-narrow-d7}
# Differential Abundance Testing
sample_data(ps1.CvN.7)$randomization_arm
sample_data(ps1.CvN.7)$day
ds.NoAbx.Narrow.d7 <- phyloseq_to_deseq2(ps1.CvN.7, ~randomization_arm)

geoMeans.ds.NoAbx.Narrow.d7 <- apply(counts(ds.NoAbx.Narrow.d7), 1, gm_mean)
ds.NoAbx.Narrow.d7 <- estimateSizeFactors(ds.NoAbx.Narrow.d7, geoMeans = geoMeans.ds.NoAbx.Narrow.d7)

dds.NoAbx.Narrow.d7 <- DESeq(ds.NoAbx.Narrow.d7, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.NoAbx.Narrow.d7 = results(dds.NoAbx.Narrow.d7, cooksCutoff = FALSE)
sigtab_dds.dds.NoAbx.Narrow.d7 = res.dds.NoAbx.Narrow.d7[which(res.dds.NoAbx.Narrow.d7$padj < alpha), ]
sigtab_dds.dds.NoAbx.Narrow.d7 = cbind(as(sigtab_dds.dds.NoAbx.Narrow.d7, "data.frame"), as(tax_table(ps1.CvN_9)[rownames(sigtab_dds.dds.NoAbx.Narrow.d7), ], "matrix"))
summary(res.dds.NoAbx.Narrow.d7)
head(sigtab_dds.dds.NoAbx.Narrow.d7)
write.table(sigtab_dds.dds.NoAbx.Narrow.d7, file="./results/deseq_d7_NoAbx.Narrow.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.NoAbx.Narrow.d7, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.NoAbx.Narrow.d7 = data.table(as(results(dds.NoAbx.Narrow.d7, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.NoAbx.Narrow.d7, "rn", "OTU")
taxdt.dds.d7.com = data.table(data.frame(as(tax_table(ps1.CvN.7), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d7.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d7.com, "OTU")
setkeyv(resdt.dds.NoAbx.Narrow.d7, "OTU")
resdt.dds.NoAbx.Narrow.d7 <- taxdt.dds.d7.com[resdt.dds.NoAbx.Narrow.d7]
resdt.dds.NoAbx.Narrow.d7
resdt.dds.NoAbx.Narrow.d7[, Significant := padj < alpha]
resdt.dds.NoAbx.Narrow.d7[!is.na(Significant)]
resdt.dds.NoAbx.Narrow.d7

volcano.NoAbx.Narrow.d7 = ggplot(
  data = resdt.dds.NoAbx.Narrow.d7[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.NoAbx.Narrow.d7[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay 7 No Antibiotics vs Narrow Spectrum Antibiotics") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.NoAbx.Narrow.d7
summary(res.dds.NoAbx.Narrow.d7)
mcols(res.dds.NoAbx.Narrow.d7, use.names = TRUE)

ggplotly(volcano.NoAbx.Narrow.d7)

```
#Control vs. Broad Spectrum Antibiotics

##Control vs. Broad Spectrum Antibiotics Day -9

```{r diffab-control-v-broad-d.9}
# Differential Abundance Testing
sample_data(ps1.CvB_9)$randomization_arm
sample_data(ps1.CvB_9)$day
ds.NoAbx.Broad.d_9 <- phyloseq_to_deseq2(ps1.CvB_9, ~randomization_arm)

geoMeans.ds.NoAbx.Broad.d_9 <- apply(counts(ds.NoAbx.Broad.d_9), 1, gm_mean)
ds.NoAbx.Broad.d_9 <- estimateSizeFactors(ds.NoAbx.Broad.d_9, geoMeans = geoMeans.ds.NoAbx.Broad.d_9)
dds.NoAbx.Broad.d_9 <- DESeq(ds.NoAbx.Broad.d_9, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.NoAbx.Broad.d_9 = results(dds.NoAbx.Broad.d_9, cooksCutoff = FALSE)
sigtab_dds.dds.NoAbx.Broad.d_9 = res.dds.NoAbx.Broad.d_9[which(res.dds.NoAbx.Broad.d_9$padj < alpha), ]
sigtab_dds.dds.NoAbx.Broad.d_9 = cbind(as(sigtab_dds.dds.NoAbx.Broad.d_9, "data.frame"), as(tax_table(ps1.CvB_9)[rownames(sigtab_dds.dds.NoAbx.Broad.d_9), ], "matrix"))
summary(res.dds.NoAbx.Broad.d_9)
head(sigtab_dds.dds.NoAbx.Broad.d_9)
write.table(sigtab_dds.dds.NoAbx.Broad.d_9, file="./results/deseq_d_9_NoAbx.Broad.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.NoAbx.Broad.d_9, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.NoAbx.Broad.d_9 = data.table(as(results(dds.NoAbx.Broad.d_9, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.NoAbx.Broad.d_9, "rn", "OTU")
taxdt.dds.d_9.com = data.table(data.frame(as(tax_table(ps1.CvB_9), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d_9.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d_9.com, "OTU")
setkeyv(resdt.dds.NoAbx.Broad.d_9, "OTU")
resdt.dds.NoAbx.Broad.d_9 <- taxdt.dds.d_9.com[resdt.dds.NoAbx.Broad.d_9]
resdt.dds.NoAbx.Broad.d_9
resdt.dds.NoAbx.Broad.d_9[, Significant := padj < alpha]
resdt.dds.NoAbx.Broad.d_9[!is.na(Significant)]
resdt.dds.NoAbx.Broad.d_9

volcano.NoAbx.Broad.d_9 = ggplot(
  data = resdt.dds.NoAbx.Broad.d_9[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.NoAbx.Broad.d_9[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay -9 No Antibiotics vs Broad Spectrum Antibiotics") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.NoAbx.Broad.d_9
summary(res.dds.NoAbx.Broad.d_9)
mcols(res.dds.NoAbx.Broad.d_9, use.names = TRUE)

ggplotly(volcano.NoAbx.Broad.d_9)

```
##Control vs. Broad Spectrum Antibiotics Day 0

```{r diffab-control-vs-broad-d0}
# Differential Abundance Testing
sample_data(ps1.CvB.0)$randomization_arm
sample_data(ps1.CvB.0)$day
ds.NoAbx.Broad.d0 <- phyloseq_to_deseq2(ps1.CvB.0, ~randomization_arm)

geoMeans.ds.NoAbx.Broad.d0 <- apply(counts(ds.NoAbx.Broad.d0), 1, gm_mean)
ds.NoAbx.Broad.d0 <- estimateSizeFactors(ds.NoAbx.Broad.d0, geoMeans = geoMeans.ds.NoAbx.Broad.d0)
dds.NoAbx.Broad.d0 <- DESeq(ds.NoAbx.Broad.d0, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.NoAbx.Broad.d0 = results(dds.NoAbx.Broad.d0, cooksCutoff = FALSE)
sigtab_dds.dds.NoAbx.Broad.d0 = res.dds.NoAbx.Broad.d0[which(res.dds.NoAbx.Broad.d0$padj < alpha), ]
sigtab_dds.dds.NoAbx.Broad.d0 = cbind(as(sigtab_dds.dds.NoAbx.Broad.d0, "data.frame"), as(tax_table(ps1.CvB_9)[rownames(sigtab_dds.dds.NoAbx.Broad.d0), ], "matrix"))
summary(res.dds.NoAbx.Broad.d0)
head(sigtab_dds.dds.NoAbx.Broad.d0)
write.table(sigtab_dds.dds.NoAbx.Broad.d0, file="./results/deseq_d0_NoAbx.Broad.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.NoAbx.Broad.d0, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.NoAbx.Broad.d0 = data.table(as(results(dds.NoAbx.Broad.d0, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.NoAbx.Broad.d0, "rn", "OTU")
taxdt.dds.d0.com = data.table(data.frame(as(tax_table(ps1.CvB.0), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d0.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d0.com, "OTU")
setkeyv(resdt.dds.NoAbx.Broad.d0, "OTU")
resdt.dds.NoAbx.Broad.d0 <- taxdt.dds.d0.com[resdt.dds.NoAbx.Broad.d0]
resdt.dds.NoAbx.Broad.d0
resdt.dds.NoAbx.Broad.d0[, Significant := padj < alpha]
resdt.dds.NoAbx.Broad.d0[!is.na(Significant)]
resdt.dds.NoAbx.Broad.d0

volcano.NoAbx.Broad.d0 = ggplot(
  data = resdt.dds.NoAbx.Broad.d0[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.NoAbx.Broad.d0[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay 0 No Antibiotics vs Broad Spectrum Antibiotics") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.NoAbx.Broad.d0
summary(res.dds.NoAbx.Broad.d0)
mcols(res.dds.NoAbx.Broad.d0, use.names = TRUE)

ggplotly(volcano.NoAbx.Broad.d0)

```
##Control vs. Broad Spectrum Antibiotics Day 7

```{r diffab-control-vs-broad-d7}
# Differential Abundance Testing
sample_data(ps1.CvB.7)$randomization_arm
sample_data(ps1.CvB.7)$day
ds.NoAbx.Broad.d7 <- phyloseq_to_deseq2(ps1.CvB.7, ~randomization_arm)

geoMeans.ds.NoAbx.Broad.d7 <- apply(counts(ds.NoAbx.Broad.d7), 1, gm_mean)
ds.NoAbx.Broad.d7 <- estimateSizeFactors(ds.NoAbx.Broad.d7, geoMeans = geoMeans.ds.NoAbx.Broad.d7)

dds.NoAbx.Broad.d7 <- DESeq(ds.NoAbx.Broad.d7, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.NoAbx.Broad.d7 = results(dds.NoAbx.Broad.d7, cooksCutoff = FALSE)
sigtab_dds.dds.NoAbx.Broad.d7 = res.dds.NoAbx.Broad.d7[which(res.dds.NoAbx.Broad.d7$padj < alpha), ]
sigtab_dds.dds.NoAbx.Broad.d7 = cbind(as(sigtab_dds.dds.NoAbx.Broad.d7, "data.frame"), as(tax_table(ps1.CvB_9)[rownames(sigtab_dds.dds.NoAbx.Broad.d7), ], "matrix"))
summary(res.dds.NoAbx.Broad.d7)
head(sigtab_dds.dds.NoAbx.Broad.d7)
write.table(sigtab_dds.dds.NoAbx.Broad.d7, file="./results/deseq_d7_NoAbx.Broad.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.NoAbx.Broad.d7, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.NoAbx.Broad.d7 = data.table(as(results(dds.NoAbx.Broad.d7, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.NoAbx.Broad.d7, "rn", "OTU")
taxdt.dds.d7.com = data.table(data.frame(as(tax_table(ps1.CvB.7), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d7.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d7.com, "OTU")
setkeyv(resdt.dds.NoAbx.Broad.d7, "OTU")
resdt.dds.NoAbx.Broad.d7 <- taxdt.dds.d7.com[resdt.dds.NoAbx.Broad.d7]
resdt.dds.NoAbx.Broad.d7
resdt.dds.NoAbx.Broad.d7[, Significant := padj < alpha]
resdt.dds.NoAbx.Broad.d7[!is.na(Significant)]
resdt.dds.NoAbx.Broad.d7

volcano.NoAbx.Broad.d7 = ggplot(
  data = resdt.dds.NoAbx.Broad.d7[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.NoAbx.Broad.d7[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay 7 No Antibiotics vs Broad Spectrum Antibiotics") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.NoAbx.Broad.d7
summary(res.dds.NoAbx.Broad.d7)
mcols(res.dds.NoAbx.Broad.d7, use.names = TRUE)

ggplotly(volcano.NoAbx.Broad.d7)

```
##Narrow vs. Broad Spectrum Antibiotics Day -9

```{r diffab-narrow-vs-broad-d.9}
# Differential Abundance Testing
sample_data(ps1.NvB_9)$randomization_arm
sample_data(ps1.NvB_9)$day
ds.Narrow.Broad.d_9 <- phyloseq_to_deseq2(ps1.NvB_9, ~randomization_arm)

geoMeans.ds.Narrow.Broad.d_9 <- apply(counts(ds.Narrow.Broad.d_9), 1, gm_mean)
ds.Narrow.Broad.d_9 <- estimateSizeFactors(ds.Narrow.Broad.d_9, geoMeans = geoMeans.ds.Narrow.Broad.d_9)
dds.Narrow.Broad.d_9 <- DESeq(ds.Narrow.Broad.d_9, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.Narrow.Broad.d_9 = results(dds.Narrow.Broad.d_9, cooksCutoff = FALSE)
sigtab_dds.dds.Narrow.Broad.d_9 = res.dds.Narrow.Broad.d_9[which(res.dds.Narrow.Broad.d_9$padj < alpha), ]
sigtab_dds.dds.Narrow.Broad.d_9 = cbind(as(sigtab_dds.dds.Narrow.Broad.d_9, "data.frame"), as(tax_table(ps1.NvB_9)[rownames(sigtab_dds.dds.Narrow.Broad.d_9), ], "matrix"))
summary(res.dds.Narrow.Broad.d_9)
head(sigtab_dds.dds.Narrow.Broad.d_9)
write.table(sigtab_dds.dds.Narrow.Broad.d_9, file="./results/deseq_d_9_Narrow.Broad.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.Narrow.Broad.d_9, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.Narrow.Broad.d_9 = data.table(as(results(dds.Narrow.Broad.d_9, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.Narrow.Broad.d_9, "rn", "OTU")
taxdt.dds.d_9.com = data.table(data.frame(as(tax_table(ps1.NvB_9), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d_9.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d_9.com, "OTU")
setkeyv(resdt.dds.Narrow.Broad.d_9, "OTU")
resdt.dds.Narrow.Broad.d_9 <- taxdt.dds.d_9.com[resdt.dds.Narrow.Broad.d_9]
resdt.dds.Narrow.Broad.d_9
resdt.dds.Narrow.Broad.d_9[, Significant := padj < alpha]
resdt.dds.Narrow.Broad.d_9[!is.na(Significant)]
resdt.dds.Narrow.Broad.d_9

volcano.Narrow.Broad.d_9 = ggplot(
  data = resdt.dds.Narrow.Broad.d_9[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.Narrow.Broad.d_9[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay -9 Narrow Spectrum vs Broad Spectrum") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.Narrow.Broad.d_9
summary(res.dds.Narrow.Broad.d_9)
mcols(res.dds.Narrow.Broad.d_9, use.names = TRUE)

ggplotly(volcano.Narrow.Broad.d_9)

```
##Narrow vs. Broad Spectrum Antibiotics Day 0

```{r diffab-narrow-vs-broad-d0}
# Differential Abundance Testing
sample_data(ps1.NvB.0)$randomization_arm
sample_data(ps1.NvB.0)$day
ds.Narrow.Broad.d0 <- phyloseq_to_deseq2(ps1.NvB.0, ~randomization_arm)

geoMeans.ds.Narrow.Broad.d0 <- apply(counts(ds.Narrow.Broad.d0), 1, gm_mean)
ds.Narrow.Broad.d0 <- estimateSizeFactors(ds.Narrow.Broad.d0, geoMeans = geoMeans.ds.Narrow.Broad.d0)
dds.Narrow.Broad.d0 <- DESeq(ds.Narrow.Broad.d0, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.Narrow.Broad.d0 = results(dds.Narrow.Broad.d0, cooksCutoff = FALSE)
sigtab_dds.dds.Narrow.Broad.d0 = res.dds.Narrow.Broad.d0[which(res.dds.Narrow.Broad.d0$padj < alpha), ]
sigtab_dds.dds.Narrow.Broad.d0 = cbind(as(sigtab_dds.dds.Narrow.Broad.d0, "data.frame"), as(tax_table(ps1.NvB_9)[rownames(sigtab_dds.dds.Narrow.Broad.d0), ], "matrix"))
summary(res.dds.Narrow.Broad.d0)
head(sigtab_dds.dds.Narrow.Broad.d0)
write.table(sigtab_dds.dds.Narrow.Broad.d0, file="./results/deseq_d0_Narrow.Broad.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.Narrow.Broad.d0, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.Narrow.Broad.d0 = data.table(as(results(dds.Narrow.Broad.d0, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.Narrow.Broad.d0, "rn", "OTU")
taxdt.dds.d0.com = data.table(data.frame(as(tax_table(ps1.NvB.0), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d0.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d0.com, "OTU")
setkeyv(resdt.dds.Narrow.Broad.d0, "OTU")
resdt.dds.Narrow.Broad.d0 <- taxdt.dds.d0.com[resdt.dds.Narrow.Broad.d0]
resdt.dds.Narrow.Broad.d0
resdt.dds.Narrow.Broad.d0[, Significant := padj < alpha]
resdt.dds.Narrow.Broad.d0[!is.na(Significant)]
resdt.dds.Narrow.Broad.d0

volcano.Narrow.Broad.d0 = ggplot(
  data = resdt.dds.Narrow.Broad.d0[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.Narrow.Broad.d0[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay 0 Narrow Spectrum vs Broad Spectrum") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.Narrow.Broad.d0
summary(res.dds.Narrow.Broad.d0)
mcols(res.dds.Narrow.Broad.d0, use.names = TRUE)

ggplotly(volcano.Narrow.Broad.d0)

```
##Narrow vs. Broad Spectrum Antibiotics Day 7

```{r diffab-narrow-vs-broad-d7}
# Differential Abundance Testing
sample_data(ps1.NvB.7)$randomization_arm
sample_data(ps1.NvB.7)$day
ds.Narrow.Broad.d7 <- phyloseq_to_deseq2(ps1.NvB.7, ~randomization_arm)

geoMeans.ds.Narrow.Broad.d7 <- apply(counts(ds.Narrow.Broad.d7), 1, gm_mean)
ds.Narrow.Broad.d7 <- estimateSizeFactors(ds.Narrow.Broad.d7, geoMeans = geoMeans.ds.Narrow.Broad.d7)

dds.Narrow.Broad.d7 <- DESeq(ds.Narrow.Broad.d7, test="Wald", fitType="local", betaPrior = FALSE)

# Tabulate and write results
res.dds.Narrow.Broad.d7 = results(dds.Narrow.Broad.d7, cooksCutoff = FALSE)
sigtab_dds.dds.Narrow.Broad.d7 = res.dds.Narrow.Broad.d7[which(res.dds.Narrow.Broad.d7$padj < alpha), ]
sigtab_dds.dds.Narrow.Broad.d7 = cbind(as(sigtab_dds.dds.Narrow.Broad.d7, "data.frame"), as(tax_table(ps1.NvB_9)[rownames(sigtab_dds.dds.Narrow.Broad.d7), ], "matrix"))
summary(res.dds.Narrow.Broad.d7)
head(sigtab_dds.dds.Narrow.Broad.d7)
write.table(sigtab_dds.dds.Narrow.Broad.d7, file="./results/deseq_d7_Narrow.Broad.txt", sep = "\t") # Change filename to save results to appropriate file

# Quick check of factor levels
mcols(res.dds.Narrow.Broad.d7, use.names = TRUE)

# Prepare to join results data.table and taxonomy table
resdt.dds.Narrow.Broad.d7 = data.table(as(results(dds.Narrow.Broad.d7, cooksCutoff = FALSE), "data.frame"),
                   keep.rownames = TRUE)
setnames(resdt.dds.Narrow.Broad.d7, "rn", "OTU")
taxdt.dds.d7.com = data.table(data.frame(as(tax_table(ps1.NvB.7), "matrix")), keep.rownames = TRUE)
setnames(taxdt.dds.d7.com, "rn", "OTU")

# Join results data.table and taxonomy table
setkeyv(taxdt.dds.d7.com, "OTU")
setkeyv(resdt.dds.Narrow.Broad.d7, "OTU")
resdt.dds.Narrow.Broad.d7 <- taxdt.dds.d7.com[resdt.dds.Narrow.Broad.d7]
resdt.dds.Narrow.Broad.d7
resdt.dds.Narrow.Broad.d7[, Significant := padj < alpha]
resdt.dds.Narrow.Broad.d7[!is.na(Significant)]
resdt.dds.Narrow.Broad.d7

volcano.Narrow.Broad.d7 = ggplot(
  data = resdt.dds.Narrow.Broad.d7[!is.na(Significant)][(pvalue < 1)],
  mapping = aes(x = log2FoldChange,
                y = -log10(pvalue),
                color = Phylum,
                label = OTU, label1 = Genus)) +
  theme_bw() +
  geom_point() + 
  geom_point(data = resdt.dds.Narrow.Broad.d7[(Significant)], size = 7, alpha = 0.7) + 
  # geom_text(data = resdt[(Significant)], mapping = aes(label = paste("Genus:", Genus)), color = "black", size = 3) +
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5)) +
  geom_hline(yintercept = -log10(alpha)) +
  ggtitle("DESeq2 Negative Binomial Test Volcano Plot\nDay 7 Narrow Spectrum vs Broad Spectrum") +
  theme(axis.title = element_text(size=12)) +
  theme(axis.text = element_text(size=12)) +
  theme(legend.text = element_text(size=12)) +
  geom_vline(xintercept = 0, lty = 2)
volcano.Narrow.Broad.d7
summary(res.dds.Narrow.Broad.d7)
mcols(res.dds.Narrow.Broad.d7, use.names = TRUE)

ggplotly(volcano.Narrow.Broad.d7)

```
## Enrich and compile deseq2 results tables

```{r diff-abund-tables}
# Control vs. Narrow D_9
nrow(res.dds.NoAbx.Narrow.d_9)
df1 <- as.data.frame(res.dds.NoAbx.Narrow.d_9[ which(res.dds.NoAbx.Narrow.d_9$padj < 0.05), ])
nrow(df1)
df1 <- as.data.frame(df1[ which(df1$baseMean > 100), ])
nrow(df1)
df1 <- rownames_to_column(df1, var = "ASV")
# df1$Comparison <- "Control_v_Narrow_d_9" errors out as there are no taxa
write.table(df1, file = "./Results/df1.txt", sep = "\t")

# Control vs. Narrow D0
nrow(res.dds.NoAbx.Narrow.d0)
df2 <- as.data.frame(res.dds.NoAbx.Narrow.d0[ which(res.dds.NoAbx.Narrow.d0$padj < 0.05), ])
nrow(df2)
df2 <- as.data.frame(df2[ which(df2$baseMean > 100), ])
nrow(df2)
df2 <- rownames_to_column(df2, var = "ASV")
df2$Comparison <- "Control_v_Narrow_d0"
write.table(df2, file = "./Results/df2.txt", sep = "\t")

# Control vs. Narrow D7
nrow(res.dds.NoAbx.Narrow.d7)
df3 <- as.data.frame(res.dds.NoAbx.Narrow.d7[ which(res.dds.NoAbx.Narrow.d7$padj < 0.05), ])
nrow(df3)
df3 <- as.data.frame(df3[ which(df3$baseMean > 100), ])
nrow(df3)
df3 <- rownames_to_column(df3, var = "ASV")
# df3$Comparison <- "Control_v_Narrow_d7" errors out as there are no taxa
write.table(df3, file = "./Results/df3.txt", sep = "\t")

# Control vs. Broad D_9
nrow(res.dds.NoAbx.Broad.d_9)
df4 <- as.data.frame(res.dds.NoAbx.Broad.d_9[ which(res.dds.NoAbx.Broad.d_9$padj < 0.05), ])
nrow(df4)
df4 <- as.data.frame(df4[ which(df4$baseMean > 100), ])
nrow(df4)
df4 <- rownames_to_column(df4, var = "ASV")
df4$Comparison <- "Control_v_Broad_d_9"
write.table(df4, file = "./Results/df4.txt", sep = "\t")

# Control vs. Broad D0
nrow(res.dds.NoAbx.Broad.d0)
df5 <- as.data.frame(res.dds.NoAbx.Broad.d0[ which(res.dds.NoAbx.Broad.d0$padj < 0.05), ])
nrow(df5)
df5 <- as.data.frame(df5[ which(df5$baseMean > 100), ])
nrow(df5)
df5 <- rownames_to_column(df5, var = "ASV")
df5$Comparison <- "Control_v_Broad_d0"
write.table(df5, file = "./Results/df5.txt", sep = "\t")

# Control vs. Broad D7
nrow(res.dds.NoAbx.Broad.d7)
df6 <- as.data.frame(res.dds.NoAbx.Broad.d7[ which(res.dds.NoAbx.Broad.d7$padj < 0.05), ])
nrow(df6)
df6 <- as.data.frame(df6[ which(df6$baseMean > 100), ])
nrow(df6)
df6 <- rownames_to_column(df6, var = "ASV")
df6$Comparison <- "Control_v_Broad_d7"
write.table(df6, file = "./Results/df6.txt", sep = "\t")

# Narrow vs. Broad D_9
nrow(res.dds.Narrow.Broad.d_9)
df7 <- as.data.frame(res.dds.Narrow.Broad.d_9[ which(res.dds.Narrow.Broad.d_9$padj < 0.05), ])
nrow(df7)
df7 <- as.data.frame(df7[ which(df7$baseMean > 100), ])
nrow(df7)
df7 <- rownames_to_column(df7, var = "ASV")
#df7$Comparison <- "Narrow_v_Broad_d_9" errors out as there are no taxa
write.table(df7, file = "./Results/df7.txt", sep = "\t")

# Narrow vs. Broad D0
nrow(res.dds.Narrow.Broad.d0)
df8 <- as.data.frame(res.dds.Narrow.Broad.d0[ which(res.dds.Narrow.Broad.d0$padj < 0.05), ])
nrow(df8)
df8 <- as.data.frame(df8[ which(df8$baseMean > 100), ])
nrow(df8)
df8 <- rownames_to_column(df8, var = "ASV")
df8$Comparison <- "Narrow_v_Broad_d0"
write.table(df8, file = "./Results/df8.txt", sep = "\t")

# Narrow vs. Broad D7
nrow(res.dds.Narrow.Broad.d7)
df9 <- as.data.frame(res.dds.Narrow.Broad.d7[ which(res.dds.Narrow.Broad.d7$padj < 0.05), ])
nrow(df9)
df9 <- as.data.frame(df9[ which(df9$baseMean > 100), ])
nrow(df9)
df9 <- rownames_to_column(df9, var = "ASV")
df9$Comparison <- "Narrow_v_Broad_d7"
write.table(df9, file = "./Results/df9.txt", sep = "\t")

# Combine all differential abundance tables
df.all <- rbind(df1, df2, df3, df4, df5, df6, df7, df8, df9)
nrow(df.all)
write.table(df.all, file = "./Results/df_all.txt", sep = "\t")

# Create table of unique differentially abundant ASV
dfs <- list(df1, df2, df3, df4, df5, df6, df7, df8, df9)
df.unique <- join_all(dfs, type = "full", by = "ASV")
nrow(df.unique)
write.table(df.unique, file = "./Results/df_unique.txt", sep = "\t")

```
## Bind taxonomy to results

```{r bind-taxonomy-dataframes}
# Load other taxonomies preprocessed with dada2
ps1.rdp <- readRDS("./data/ps0.human_volunteer.rdp.RDS")

# Create appropirately formated taxa table
# RDP
ps1.rdp.tax <- as.tibble(as.data.frame(tax_table(ps1.rdp)))
ps1.rdp.tax <- rownames_to_column(ps1.rdp.tax, var = "ASV")
df.all.rdp <- left_join(df.all, ps1.rdp.tax, by = "ASV")
write.table(df.all.rdp, file = "./results/df_all_rdp.txt", sep = "\t")
df.unique.rdp <- left_join(df.unique, ps1.rdp.tax, by = "ASV")
write.table(df.unique.rdp, file = "./results/df_unique_rdp.txt", sep = "\t")

```
# Individual ASV plots

```{r ground-truth-plots-prep}
##Ground truth plots
replace_counts = function(physeq, dds) {

  dds_counts = counts(dds, normalized = TRUE)
  if (!identical(taxa_names(physeq), rownames(dds_counts))) {
    stop("OTU ids don't match")
  }
  otu_table(physeq) = otu_table(dds_counts, taxa_are_rows = TRUE)
  return(physeq)

}

# Make deseq ready object
ds.all <- phyloseq_to_deseq2(ps1, ~randomization_arm)
geoMeans.all <- apply(counts(ds.all), 1, gm_mean)
ds.all <- estimateSizeFactors(ds.all, geoMeans = geoMeans.all)
dds.all <- DESeq(ds.all, test="Wald", betaPrior = TRUE)
rlog.all <- replace_counts(ps1, dds.all)
rlog.all <- psmelt(rlog.all)

# Need to change Day to numeric
class(rlog.all$day)
rlog.all$Day <- as.numeric(as.character(rlog.all$day))
class(rlog.all$day)

# Rename OTU to ASV
rlog.all <- rename(rlog.all, c("OTU" = "ASV"))

# Select out taxa of interest from rlog.all
rlog.all.rdp <- inner_join(df.all.rdp, rlog.all, by = "ASV")
rlog.unique.rdp <- inner_join(df.unique.rdp, rlog.all, by = "ASV")

# Note: ggplot2 will not plot smoothers for individual groups if the first smoother "fails"
# to adjust for this stat_smooth is applied to each individual groups
p.unique.rdp <- ggplot(rlog.all.rdp, aes(x = day, y = Abundance, group = randomization_arm, color = randomization_arm)) +
  geom_jitter(size = 1, alpha = 0.6, width = 0.5) +
  scale_y_log10() +
  theme(legend.position = "NULL") +
  facet_wrap(Phylum.x~Family.x) +
  stat_smooth(data = subset(rlog.all.rdp, randomization_arm == "Narrow spectrum antibiotics", method = "loess")) +
  stat_smooth(data = subset(rlog.all.rdp, randomization_arm == "Broad spectrum antibiotics", method = "loess")) +
  stat_smooth(data = subset(rlog.all.rdp, randomization_arm == "No antibiotics", method = "loess")) +
  labs(color = "Treatment", x = "Day", y = "Abundance (rlog)")
p.unique.rdp

```

```{r diffab-summary-volcano-plots}
# Combine summary plots with > 0 taxa
dfs.2 <- list(df2, df4, df5, df6, df8, df9)
df.all.2 <- join_all(dfs.2, type = "full", by = "Comparison")
nrow(df.all.2)

# Bind taxonomu
df.all.rdp.2 <- left_join(df.all.2, ps1.rdp.tax, by = "ASV")

# Select from rlog normalized data
dfs.all.rdp.2 <- inner_join(df.all.2, rlog.all, by = "ASV")

# Adjust factor levels to order by time point (roughly)
df.all.rdp.2$Comparison <- as.factor(df.all.rdp.2$Comparison)
levels(df.all.rdp.2$Comparison)
df.all.rdp.2$Comparison <- factor(df.all.rdp.2$Comparison, levels = c("Control_v_Broad_d_9", "Control_v_Narrow_d0", "Control_v_Broad_d0", "Narrow_v_Broad_d0", "Control_v_Broad_d7", "Narrow_v_Broad_d7"))
levels(df.all.rdp.2$Comparison)
df.all.rdp.2$Comparison <- factor(df.all.rdp.2$Comparison, labels = c("Control v. Broad: Day -9", "Control v. Narrow: Day 0", "Control v. Broad: Day 0", "Broad v Narrow: Day 0", "Control v. Broad: Day 7", "Broad v. Narrow: Day 7"))
df.all.rdp.2$Comparison

p.volcano.summary.rdp <- ggplot(df.all.rdp.2, aes(x = log2FoldChange, y = -log10(padj), color = Comparison, size = log10(baseMean))) +
  geom_point(alpha = 0.7) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "dark grey", lwd = 1) +
  facet_wrap(Phylum~Family, ncol = 3) +
  annotate("text", x = -15, y = 27, label = "Control") +
  annotate("text", x = 15, y = 27, label = "Treated") +
  labs(color = "Comparison") +
  labs(x=expression(log[2]*" fold-change")) +
  labs(y=expression(-log[10]*" adjusted p-value")) +
  labs(size=expression(log[10]*" base mean")) +
  scale_color_brewer(palette = "Dark2") +
  scale_y_continuous(limits = c(0,30), expand = c(0, 0)) +
  guides(colour = guide_legend(override.aes = list(size=3))) +
  theme(legend.position = "null")
p.volcano.summary.rdp

# Interctive plot
ggplotly(p.volcano.summary.rdp)

```

```{r treatment-volcano-smoother-for-manuscript}
ggarrange(p.volcano.summary.rdp, p.unique.rdp, nrow = 2, labels = c("A)", "B)"), heights = c(1.3,1))

```
## Differential abundance for boosting at each time point

```{r diffab-boosting}
# Day -9
ds.boost.d_9 <- phyloseq_to_deseq2(ps1.d_9, ~randomization_arm + d7_rota_boost_updated)
geoMeans.ds.boost.d_9 <- apply(counts(ds.boost.d_9), 1, gm_mean)
ds.boost.d_9 <- estimateSizeFactors(ds.boost.d_9, geoMeans = geoMeans.ds.boost.d_9)
dds.boost.d_9 <- DESeq(ds.boost.d_9, test="Wald")
res.dds.boost.d_9 <- results(dds.boost.d_9, cooksCutoff = FALSE)
summary(res.dds.boost.d_9)

# Day 0
ds.boost.d0 <- phyloseq_to_deseq2(ps1.d0, ~randomization_arm + d7_rota_boost_updated)
geoMeans.ds.boost.d0 <- apply(counts(ds.boost.d0), 1, gm_mean)
ds.boost.d0 <- estimateSizeFactors(ds.boost.d0, geoMeans = geoMeans.ds.boost.d0)
dds.boost.d0 <- DESeq(ds.boost.d0, test="Wald")
res.dds.boost.d0 = results(dds.boost.d0, cooksCutoff = FALSE)
summary(res.dds.boost.d0)

# Day 7
ds.boost.d7 <- phyloseq_to_deseq2(ps1.d7, ~randomization_arm + d7_rota_boost_updated)
geoMeans.ds.boost.d7 <- apply(counts(ds.boost.d7), 1, gm_mean)
ds.boost.d7 <- estimateSizeFactors(ds.boost.d7, geoMeans = geoMeans.ds.boost.d7)
dds.boost.d7 <- DESeq(ds.boost.d7, test="Wald")
res.dds.boost.d7 = results(dds.boost.d7, cooksCutoff = FALSE)
summary(res.dds.boost.d7)

# Tabulate results
# Day -9
nrow(res.dds.boost.d_9)
df1.boost <- as.data.frame(res.dds.boost.d_9[ which(res.dds.boost.d_9$padj < 0.05), ])
nrow(df1.boost)
df1.boost <- as.data.frame(df1.boost[ which(df1.boost$baseMean > 100), ])
nrow(df1.boost)
df1.boost <- rownames_to_column(df1.boost, var = "ASV")
df1.boost$Comparison <- "Boost: Day -9"
df1.boost$Variable <- "Boost"

# Day 0
nrow(res.dds.boost.d0)
df2.boost <- as.data.frame(res.dds.boost.d0[ which(res.dds.boost.d0$padj < 0.05), ])
nrow(df2.boost)
df2.boost <- as.data.frame(df2.boost[ which(df2.boost$baseMean > 100), ])
nrow(df2.boost)
df2.boost <- rownames_to_column(df2.boost, var = "ASV")
df2.boost$Comparison <- "Boost: Day 0"
df2.boost$Variable <- "Boost"

# Day 7
nrow(res.dds.boost.d7)
df3.boost <- as.data.frame(res.dds.boost.d7[ which(res.dds.boost.d7$padj < 0.05), ])
nrow(df3.boost)
df3.boost <- as.data.frame(df3.boost[ which(df3.boost$baseMean > 100), ])
nrow(df3.boost)
df3.boost <- rownames_to_column(df3.boost, var = "ASV")
df3.boost$Comparison <- "Boost: Day 7"
df3.boost$Variable <- "Boost"

# Combine all differential abundance tables
df.all.boost <- rbind(df1.boost, df2.boost, df3.boost)
nrow(df.all.boost)

dfs.boost <- list(df1.boost, df2.boost, df3.boost)
df.unique.boost <- join_all(dfs.boost, type = "full", by = "ASV")
nrow(df.unique.boost)

# Bind taxonomy
df.all.boost <- left_join(df.all.boost, ps1.rdp.tax, by = "ASV")
df.unique.boost <- left_join(df.unique.boost, ps1.rdp.tax, by = "ASV")
write.table(df.unique.boost, file = "./results/df_unique_boost.txt", sep = "\t")
write.table(df.unique.boost, file = "./results/df_all_boost.txt", sep = "\t")

p.boost.point <- ggplot(subset(df.unique.boost, Comparison != "Boost: Day -9"), aes(x = log2FoldChange, y = Family, color = Phylum)) +
  geom_point(aes(size = log10(baseMean)), alpha = 0.7) +
  facet_grid(~Comparison) +
  geom_vline(xintercept = 0, lty = 2) +
  theme(legend.position = "NULL") +
  xlim(-40, 40) +
  scale_color_brewer(palette = "Dark2") +
  geom_errorbarh(aes(xmax = log2FoldChange + lfcSE, xmin = log2FoldChange - lfcSE), size=0.5, height = 0.3) +
  labs(x=expression(log[2]*" fold-change")) +
  labs(size=expression(log[10]*" base mean"))
p.boost.point

```
## Differential abundance for shedding at each time point

```{r diffab-shedding}
# Day -9
ds.shed.d_9 <- phyloseq_to_deseq2(ps1.d_9, ~randomization_arm + Shedding)
geoMeans.ds.shed.d_9 <- apply(counts(ds.shed.d_9), 1, gm_mean)
ds.shed.d_9 <- estimateSizeFactors(ds.shed.d_9, geoMeans = geoMeans.ds.shed.d_9)
dds.shed.d_9 <- DESeq(ds.shed.d_9, test="Wald")
res.dds.shed.d_9 <- results(dds.shed.d_9, cooksCutoff = FALSE)
summary(res.dds.shed.d_9)

# Day 0
ds.shed.d0 <- phyloseq_to_deseq2(ps1.d0, ~randomization_arm + Shedding)
geoMeans.ds.shed.d0 <- apply(counts(ds.shed.d0), 1, gm_mean)
ds.shed.d0 <- estimateSizeFactors(ds.shed.d0, geoMeans = geoMeans.ds.shed.d0)
dds.shed.d0 <- DESeq(ds.shed.d0, test="Wald")
res.dds.shed.d0 = results(dds.shed.d0, cooksCutoff = FALSE)
summary(res.dds.shed.d0)

# Day 7
ds.shed.d7 <- phyloseq_to_deseq2(ps1.d7, ~randomization_arm + Shedding)
geoMeans.ds.shed.d7 <- apply(counts(ds.shed.d7), 1, gm_mean)
ds.shed.d7 <- estimateSizeFactors(ds.shed.d7, geoMeans = geoMeans.ds.shed.d7)
dds.shed.d7 <- DESeq(ds.shed.d7, test="Wald")
res.dds.shed.d7 = results(dds.shed.d7, cooksCutoff = FALSE)
summary(res.dds.shed.d7)

# Tabulate results
# Day -9
nrow(res.dds.shed.d_9)
df1.shed <- as.data.frame(res.dds.shed.d_9[ which(res.dds.shed.d_9$padj < 0.05), ])
nrow(df1.shed)
df1.shed <- as.data.frame(df1.shed[ which(df1.shed$baseMean > 100), ])
nrow(df1.shed)
df1.shed <- rownames_to_column(df1.shed, var = "ASV")
df1.shed$Comparison <- "Shedding: Day -9"
df1.shed$Variable <- "Shedding"

# Day 0
nrow(res.dds.shed.d0)
df2.shed <- as.data.frame(res.dds.shed.d0[ which(res.dds.shed.d0$padj < 0.05), ])
nrow(df2.shed)
df2.shed <- as.data.frame(df2.shed[ which(df2.shed$baseMean > 100), ])
nrow(df2.shed)
df2.shed <- rownames_to_column(df2.shed, var = "ASV")
df2.shed$Comparison <- "Shedding: Day 0"
df2.shed$Variable <- "Shedding"

# Day 7
nrow(res.dds.shed.d7)
df3.shed <- as.data.frame(res.dds.shed.d7[ which(res.dds.shed.d7$padj < 0.05), ])
nrow(df3.shed)
df3.shed <- as.data.frame(df3.shed[ which(df3.shed$baseMean > 100), ])
nrow(df3.shed)
df3.shed <- rownames_to_column(df3.shed, var = "ASV")
df3.shed$Comparison <- "Shedding: Day 7"
df3.shed$Variable <- "Shedding"

# Combine all differential abundance tables
df.all.shed <- rbind(df1.shed, df2.shed, df3.shed)
nrow(df.all.shed)

dfs.shed <- list(df1.shed, df2.shed, df3.shed)
df.unique.shed <- join_all(dfs.shed, type = "full", by = "ASV")
nrow(df.unique.shed)

# Bind taxonomy
df.all.shed <- left_join(df.all.shed, ps1.rdp.tax, by = "ASV")
df.unique.shed <- left_join(df.unique.shed, ps1.rdp.tax, by = "ASV")
write.table(df.unique.shed, file = "./results/df_unique_shed.txt", sep = "\t")
write.table(df.unique.shed, file = "./results/df_all_shed.txt", sep = "\t")

p.shed.point <- ggplot(subset(df.unique.shed, Comparison != "Shedding: Day -9"), aes(x = log2FoldChange, y = Family, color = Phylum)) +
  geom_point(aes(size = log10(baseMean)), alpha = 0.7) +
  facet_grid(~Comparison) +
  geom_vline(xintercept = 0, lty = 2) +
  xlim(-40, 40) +
  theme(legend.position = "NULL") +
  scale_color_brewer(palette = "Dark2") +
  geom_errorbarh(aes(xmax = log2FoldChange + lfcSE, xmin = log2FoldChange - lfcSE), size=0.5, height = 0.3) +
  labs(x=expression(log[2]*" fold-change")) +
  labs(size=expression(log[10]*" base mean"))
p.shed.point

```

```{r diffab-trajectories}
nrow(df.all.boost)
nrow(df.all.shed)
df.all.boost_shed <- rbind(df.all.boost, df.all.shed)
df.all.boost_shed.distinct <- (distinct(df.all.boost_shed, ASV, .keep_all = TRUE))
nrow(df.all.boost_shed.distinct)

df.all.boost.counts <- inner_join(df.all.boost, rlog.all, by = "ASV")
nrow(df.all.boost.counts)
df.all.shed.counts <- inner_join(df.all.shed, rlog.all, by = "ASV")
nrow(df.all.shed.counts)

# Smoother plots
p.diff.ab.boost.smooth <- ggplot(df.all.boost.counts, aes(x = day, y = Abundance, group = d7_rota_boost_updated, color = d7_rota_boost_updated)) +
  geom_jitter(size = 1, alpha = 0.6, width = 0.5) +
  scale_y_log10() +
  labs(color = "Boost") +
  theme(legend.position = "NULL") +
  facet_wrap(Phylum.x~Family.x) +
  stat_smooth(data = subset(df.all.boost.counts, d7_rota_boost_updated == "No", method = "loess")) +
  stat_smooth(data = subset(df.all.boost.counts, d7_rota_boost_updated == "Yes", method = "loess")) +
  labs(y = "Abundance (rlog)", x = "Day")

p.diff.ab.shed.smooth <- ggplot(df.all.shed.counts, aes(x = day, y = Abundance, group = Shedding, color = Shedding)) +
  geom_jitter(size = 1, alpha = 0.6, width = 0.5) +
  scale_y_log10() +
  theme(legend.position = "NULL") +
  facet_wrap(Phylum.x~Family.x) +
  stat_smooth(data = subset(df.all.shed.counts, Shedding == "No shedding", method = "loess")) +
  stat_smooth(data = subset(df.all.shed.counts, Shedding == "Shedding", method = "loess")) +
  labs(y = "Abundance (rlog)", x = "Day")

```

```{r diffab-figure-for-manuscript}
ggarrange(ggarrange(p.boost.point, p.shed.point, nrow = 2, labels = c("A)", "C)")),
          ggarrange(p.diff.ab.boost.smooth, p.diff.ab.shed.smooth, nrow = 2, labels = c("B)", "D)")), widths = c(1,1.25), ncol = 2)

```

```{r session-info}
# Display current R session information
sessionInfo()

```