Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of peak integration #46

Closed
plyush1993 opened this issue Nov 20, 2024 · 2 comments
Closed

Implementation of peak integration #46

plyush1993 opened this issue Nov 20, 2024 · 2 comments

Comments

@plyush1993
Copy link

Hi all! Thanks for such a brilliant R package!
Is it possible to implement a peak integration process within the RaMS workflow? Could you please provide an example?

@wkumler
Copy link
Owner

wkumler commented Nov 20, 2024

Hi @plyush1993, good question! I've been meaning to document my integration process with RaMS in a vignette for a while now and just haven't gotten around to it yet. Here's a quick summary of my process at the moment for reference:

First, load libraries and the data (here I'm using the minified demo files included in RaMS)

library(RaMS)
library(tidyverse)
library(plotly)

file_dir <- system.file("extdata", package = "RaMS")
ms_files <- list.files(file_dir, pattern = "LB12.*mzML", full.names = TRUE)
msdata <- grabMSdata(ms_files)

If it's just a single compound that we're interested in, we can just extract the chromatogram, pick out the starting and ending retention times, and use the trapz function to perform integration in about 10 lines of code.

bet_chrom <- msdata$MS1[mz%between%pmppm(118.0865, 10)]
qplotMS1data(bet_chrom, color_col="filename")
ggplotly() # Activate an interactive interface and then mouse over it to find the peak bounds

rtstart <- 7.55 #Manually entered
rtend <- 8.15   #Manually entered

bet_chrom %>%
  filter(rt%between%c(rtstart, rtend)) %>%
  arrange(rt) %>%
  summarise(area=trapz(rt, int, baseline="square"), .by=filename)

which returns the peak areas for each file. trapz is a newer function in RaMS that performs integration via trapezoidal Riemann sum and here I'm using it with the tidyverse functions but data.table has equivalent syntax if that's preferred.

> peak_areas
           filename     area
1 LB12HL_AB.mzML.gz 53288432
2 LB12HL_EF.mzML.gz 31605536
3 LB12HL_CD.mzML.gz 94247562

With multiple compounds, it gets a bit fancier but still just consists of looping between plotting the chromatogram to find the retention time bounds on the peak, entering them manually (when I'm integrating lots of compounds I'll do this in Excel instead of in R as a tribble, and then performing the integration.

compound_df <- tribble(
  ~compound_name, ~mz,
  "Alanine", 90.0555,
  "Glycine betaine", 118.0865,
  "Creatine", 132.0773,
  "Adenine", 136.0623
)
compound_df %>%
  pmap(function(...){
    row_data <- list(...)
    cmpd_eic <- msdata$MS1[mz%between%pmppm(row_data$mz, 10)]
    cmpd_chrom <- qplotMS1data(cmpd_eic, color_col="filename") +
      scale_x_continuous(breaks=1:20) +
      ggtitle(row_data$Compound_Name)
    print(ggplotly(cmpd_chrom)) # Force printing during loop
    readline() # Pause for interactive browsing
  })

We use pmap to loop over the rows of the data frame, each one containing a compound, and plot the chromatogram. Here I've added some interactive steps (ggplotly to make it easier to get RT values and readline to make the loop pause after each compound until I hit Enter in the console), and with each loop I'm filling out another data frame like the one below. When I'm doing this for a lot of compounds I'll do it in an Excel document instead - write out the above compound_df and then use Excel for the manual data entry, then read it in again afterward.

compound_rtbounds <- tribble(
  ~compound_name, ~rtstart, ~rtend,
  "Alanine", 10.8, 11.3,
  "Glycine betaine", 7.5, 8.2,
  "Creatine", 11.1, 11.4,
  "Adenine", 5.2, 5.9
)

Once we have m/z and RT bounds for each compound, we can either loop over the dataframe again or we can do something a little more clever with a left_join using some of the tidyverse's new non-equi join_by syntax:

compound_df %>%
  left_join(compound_rtbounds) %>%
  mutate(mzbounds=map(mz, pmppm, 10)) %>%
  unnest_wider(mzbounds, names_sep = "_") %>%
  left_join(msdata$MS1, join_by(between(y$mz, x$mzbounds_1, x$mzbounds_2), 
                                between(y$rt, x$rtstart, x$rtend))) %>%
  summarise(area=trapz(rt, int), .by=c(compound_name, filename))

which returns the peak areas we're interested in by file and compound:

# A tibble: 12 × 3
   compound_name   filename                   area
   <chr>           <chr>                     <dbl>
 1 Alanine         LB12HL_AB.mzML.gz 1.628335092e5
 2 Alanine         LB12HL_CD.mzML.gz 2.267466924e5
 3 Alanine         LB12HL_EF.mzML.gz 2.341240033e5
 4 Glycine betaine LB12HL_AB.mzML.gz 6.496509653e7
 5 Glycine betaine LB12HL_CD.mzML.gz 1.060818334e8
 6 Glycine betaine LB12HL_EF.mzML.gz 4.369075640e7
 7 Creatine        LB12HL_AB.mzML.gz 2.887160182e4
 8 Creatine        LB12HL_CD.mzML.gz 2.003315878e4
 9 Creatine        LB12HL_EF.mzML.gz 2.294787089e4
10 Adenine         LB12HL_AB.mzML.gz 1.939750206e6
11 Adenine         LB12HL_CD.mzML.gz 1.647856023e6
12 Adenine         LB12HL_EF.mzML.gz 1.900026187e6

Of course, this only works if you don't have huge amounts of retention time drift over the course of the run. If your peaks are moving around a lot between samples, you'll have to either specify the RT bounds specific to each file or accept that there's going to be some noise introduced by integrating outside of the peak. If you're considering the first, I'd strongly recommend software like Skyline instead. If you somehow have a way of mapping corrected RTs to the original ones for each scan in your files, you can also use the editMSfileRTs() function in RaMS to apply an RT correction to the mzMLs themselves before integration.

RaMS isn't meant to perform automated peak detection and integration like XCMS, MS-DIAL, or MzMine. They've got dedicated teams working on problems like that and they do a better job as dedicated platforms for automated analysis.

@wkumler wkumler pinned this issue Nov 20, 2024
@plyush1993
Copy link
Author

Amazing! Thanks a lot, @wkumler!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants