-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of peak integration #46
Comments
Hi @plyush1993, good question! I've been meaning to document my integration process with RaMS in a vignette for a while now and just haven't gotten around to it yet. Here's a quick summary of my process at the moment for reference: First, load libraries and the data (here I'm using the minified demo files included in RaMS) library(RaMS)
library(tidyverse)
library(plotly)
file_dir <- system.file("extdata", package = "RaMS")
ms_files <- list.files(file_dir, pattern = "LB12.*mzML", full.names = TRUE)
msdata <- grabMSdata(ms_files) If it's just a single compound that we're interested in, we can just extract the chromatogram, pick out the starting and ending retention times, and use the bet_chrom <- msdata$MS1[mz%between%pmppm(118.0865, 10)]
qplotMS1data(bet_chrom, color_col="filename")
ggplotly() # Activate an interactive interface and then mouse over it to find the peak bounds
rtstart <- 7.55 #Manually entered
rtend <- 8.15 #Manually entered
bet_chrom %>%
filter(rt%between%c(rtstart, rtend)) %>%
arrange(rt) %>%
summarise(area=trapz(rt, int, baseline="square"), .by=filename) which returns the peak areas for each file.
With multiple compounds, it gets a bit fancier but still just consists of looping between plotting the chromatogram to find the retention time bounds on the peak, entering them manually (when I'm integrating lots of compounds I'll do this in Excel instead of in R as a compound_df <- tribble(
~compound_name, ~mz,
"Alanine", 90.0555,
"Glycine betaine", 118.0865,
"Creatine", 132.0773,
"Adenine", 136.0623
)
compound_df %>%
pmap(function(...){
row_data <- list(...)
cmpd_eic <- msdata$MS1[mz%between%pmppm(row_data$mz, 10)]
cmpd_chrom <- qplotMS1data(cmpd_eic, color_col="filename") +
scale_x_continuous(breaks=1:20) +
ggtitle(row_data$Compound_Name)
print(ggplotly(cmpd_chrom)) # Force printing during loop
readline() # Pause for interactive browsing
}) We use compound_rtbounds <- tribble(
~compound_name, ~rtstart, ~rtend,
"Alanine", 10.8, 11.3,
"Glycine betaine", 7.5, 8.2,
"Creatine", 11.1, 11.4,
"Adenine", 5.2, 5.9
) Once we have m/z and RT bounds for each compound, we can either loop over the dataframe again or we can do something a little more clever with a compound_df %>%
left_join(compound_rtbounds) %>%
mutate(mzbounds=map(mz, pmppm, 10)) %>%
unnest_wider(mzbounds, names_sep = "_") %>%
left_join(msdata$MS1, join_by(between(y$mz, x$mzbounds_1, x$mzbounds_2),
between(y$rt, x$rtstart, x$rtend))) %>%
summarise(area=trapz(rt, int), .by=c(compound_name, filename)) which returns the peak areas we're interested in by file and compound:
Of course, this only works if you don't have huge amounts of retention time drift over the course of the run. If your peaks are moving around a lot between samples, you'll have to either specify the RT bounds specific to each file or accept that there's going to be some noise introduced by integrating outside of the peak. If you're considering the first, I'd strongly recommend software like Skyline instead. If you somehow have a way of mapping corrected RTs to the original ones for each scan in your files, you can also use the RaMS isn't meant to perform automated peak detection and integration like XCMS, MS-DIAL, or MzMine. They've got dedicated teams working on problems like that and they do a better job as dedicated platforms for automated analysis. |
Amazing! Thanks a lot, @wkumler! |
Hi all! Thanks for such a brilliant R package!
Is it possible to implement a peak integration process within the RaMS workflow? Could you please provide an example?
The text was updated successfully, but these errors were encountered: