Implementation of peak integration #46

plyush1993 · 2024-11-20T12:40:44Z

Hi all! Thanks for such a brilliant R package!
Is it possible to implement a peak integration process within the RaMS workflow? Could you please provide an example?

wkumler · 2024-11-20T18:29:50Z

Hi @plyush1993, good question! I've been meaning to document my integration process with RaMS in a vignette for a while now and just haven't gotten around to it yet. Here's a quick summary of my process at the moment for reference:

First, load libraries and the data (here I'm using the minified demo files included in RaMS)

library(RaMS)
library(tidyverse)
library(plotly)

file_dir <- system.file("extdata", package = "RaMS")
ms_files <- list.files(file_dir, pattern = "LB12.*mzML", full.names = TRUE)
msdata <- grabMSdata(ms_files)

If it's just a single compound that we're interested in, we can just extract the chromatogram, pick out the starting and ending retention times, and use the trapz function to perform integration in about 10 lines of code.

bet_chrom <- msdata$MS1[mz%between%pmppm(118.0865, 10)]
qplotMS1data(bet_chrom, color_col="filename")
ggplotly() # Activate an interactive interface and then mouse over it to find the peak bounds

rtstart <- 7.55 #Manually entered
rtend <- 8.15   #Manually entered

bet_chrom %>%
  filter(rt%between%c(rtstart, rtend)) %>%
  arrange(rt) %>%
  summarise(area=trapz(rt, int, baseline="square"), .by=filename)

which returns the peak areas for each file. trapz is a newer function in RaMS that performs integration via trapezoidal Riemann sum and here I'm using it with the tidyverse functions but data.table has equivalent syntax if that's preferred.

> peak_areas
           filename     area
1 LB12HL_AB.mzML.gz 53288432
2 LB12HL_EF.mzML.gz 31605536
3 LB12HL_CD.mzML.gz 94247562

With multiple compounds, it gets a bit fancier but still just consists of looping between plotting the chromatogram to find the retention time bounds on the peak, entering them manually (when I'm integrating lots of compounds I'll do this in Excel instead of in R as a tribble, and then performing the integration.

compound_df <- tribble(
  ~compound_name, ~mz,
  "Alanine", 90.0555,
  "Glycine betaine", 118.0865,
  "Creatine", 132.0773,
  "Adenine", 136.0623
)
compound_df %>%
  pmap(function(...){
    row_data <- list(...)
    cmpd_eic <- msdata$MS1[mz%between%pmppm(row_data$mz, 10)]
    cmpd_chrom <- qplotMS1data(cmpd_eic, color_col="filename") +
      scale_x_continuous(breaks=1:20) +
      ggtitle(row_data$Compound_Name)
    print(ggplotly(cmpd_chrom)) # Force printing during loop
    readline() # Pause for interactive browsing
  })

We use pmap to loop over the rows of the data frame, each one containing a compound, and plot the chromatogram. Here I've added some interactive steps (ggplotly to make it easier to get RT values and readline to make the loop pause after each compound until I hit Enter in the console), and with each loop I'm filling out another data frame like the one below. When I'm doing this for a lot of compounds I'll do it in an Excel document instead - write out the above compound_df and then use Excel for the manual data entry, then read it in again afterward.

compound_rtbounds <- tribble(
  ~compound_name, ~rtstart, ~rtend,
  "Alanine", 10.8, 11.3,
  "Glycine betaine", 7.5, 8.2,
  "Creatine", 11.1, 11.4,
  "Adenine", 5.2, 5.9
)

Once we have m/z and RT bounds for each compound, we can either loop over the dataframe again or we can do something a little more clever with a left_join using some of the tidyverse's new non-equi join_by syntax:

compound_df %>%
  left_join(compound_rtbounds) %>%
  mutate(mzbounds=map(mz, pmppm, 10)) %>%
  unnest_wider(mzbounds, names_sep = "_") %>%
  left_join(msdata$MS1, join_by(between(y$mz, x$mzbounds_1, x$mzbounds_2), 
                                between(y$rt, x$rtstart, x$rtend))) %>%
  summarise(area=trapz(rt, int), .by=c(compound_name, filename))

which returns the peak areas we're interested in by file and compound:

# A tibble: 12 × 3
   compound_name   filename                   area
   <chr>           <chr>                     <dbl>
 1 Alanine         LB12HL_AB.mzML.gz 1.628335092e5
 2 Alanine         LB12HL_CD.mzML.gz 2.267466924e5
 3 Alanine         LB12HL_EF.mzML.gz 2.341240033e5
 4 Glycine betaine LB12HL_AB.mzML.gz 6.496509653e7
 5 Glycine betaine LB12HL_CD.mzML.gz 1.060818334e8
 6 Glycine betaine LB12HL_EF.mzML.gz 4.369075640e7
 7 Creatine        LB12HL_AB.mzML.gz 2.887160182e4
 8 Creatine        LB12HL_CD.mzML.gz 2.003315878e4
 9 Creatine        LB12HL_EF.mzML.gz 2.294787089e4
10 Adenine         LB12HL_AB.mzML.gz 1.939750206e6
11 Adenine         LB12HL_CD.mzML.gz 1.647856023e6
12 Adenine         LB12HL_EF.mzML.gz 1.900026187e6

Of course, this only works if you don't have huge amounts of retention time drift over the course of the run. If your peaks are moving around a lot between samples, you'll have to either specify the RT bounds specific to each file or accept that there's going to be some noise introduced by integrating outside of the peak. If you're considering the first, I'd strongly recommend software like Skyline instead. If you somehow have a way of mapping corrected RTs to the original ones for each scan in your files, you can also use the editMSfileRTs() function in RaMS to apply an RT correction to the mzMLs themselves before integration.

RaMS isn't meant to perform automated peak detection and integration like XCMS, MS-DIAL, or MzMine. They've got dedicated teams working on problems like that and they do a better job as dedicated platforms for automated analysis.

plyush1993 · 2024-11-20T18:54:20Z

Amazing! Thanks a lot, @wkumler!

wkumler pinned this issue Nov 20, 2024

plyush1993 closed this as completed Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of peak integration #46

Implementation of peak integration #46

plyush1993 commented Nov 20, 2024

wkumler commented Nov 20, 2024

plyush1993 commented Nov 20, 2024

Implementation of peak integration #46

Implementation of peak integration #46

Comments

plyush1993 commented Nov 20, 2024

wkumler commented Nov 20, 2024

plyush1993 commented Nov 20, 2024