Processing methods for signal suppression of FTMS data
© Ma et al; licensee BioMed Central Ltd. 2011
Published: 14 October 2011
Skip to main content
Volume 9 Supplement 1
© Ma et al; licensee BioMed Central Ltd. 2011
Published: 14 October 2011
Fourier Transform Mass Spectrometry coupled with Liquid Chromatography(LC-FTMS) has been widely used in proteomics. Past investigation has revealed that there exists an intensity dependent random suppression in peptide elution profiles in LC-FTMS data. The suppression is homogenous for the same peptide but non-homogenous for different peptides. The correction of suppressed profiles and an estimation on the range of suppression are necessary for accurate and reliable quantification using FTMS data.
A software package, Gcorr, is presented. The software corrects peptide profiles that satisfy correction conditions, and it can predict fold change null distributions at different intensity levels. Subsequently, the significance P-values of measured fold changes can be estimated based on the predicted null distributions. We have used an 1:1 LC-FTMS label-free dataset pair collected based on the same sample to verify that our predicted null distributions conforms to that of the observed null distribution.
This software is able to provide suppression correction for peptide profiles, suppression distribution analysis and peptide differential expression analysis in terms of its fold change significance. The software is freely available at http://compgenomics.utsa.edu/Suppression_Study.html.
Due to its capability in achieving high resolution and mass accuracy simultaneously, Fourier Transform Mass Spectrometry(FTMS) has gained popularity in quantitative analysis of biomolecules and biomaker discovery . However, many researchers have found ion abundance accuracy of FTMS instrument problematic. Padley et al make note of several sources of non-linearity in measurement. Schrader et al also mention signal loss in large compound library experiment. Sterner et al find that the signal of small proteins is suppressed by larger ones. Gordon et al find that due to ion interactions, the spectral signal intensities do not necessarily reflect true trapped-ion abundances . Additional signal suppression phenomena due to the effects of measuring several peptides are brought up in .
In our previous work , we found that: 1)there exists signal suppression in Liquid Chromatrograhpy-FTMS (LC-FTMS) by investigating the isotope ratios between 13 C and 12 C; 2) the suppression is intensity dependent, the lower the intensity level, the severer the suppression; and 3)the suppression is non-homogenous for different peptide but homogenous for the same peptide. We developed a correction algorithm to correct peptide profiles with relative high intensity. For peptide profiles that cannot be corrected, we analyzed the range of suppression and it showed that 10% to 300% of measurement error could be resulted due to suppression.
Given such severe random suppression that affects a significant portion of peptides, the use of FTMS for biomarker discovery is questionable unless we can estimate the impact of random suppression. In this paper, we consider how the random suppression would affect fold change measurements of peptides between two label-free LC-FTMS samples or labeled LC-FTMS samples which is critical in differential analysis aiming for biomarkers discovery. Since many current biomarker discovery projects  employs LC-FTMS, our considered problem is critical.
Given a 1:1 label-free LC-FTMS dataset pair containing the same sample, because of experimental variation and random suppression, the measured peptide fold change is actually randomly distributed around 1:1, which is defined as the null distribution. We will need such null distributions to estimate the significance of measured fold changes in LC-FTMS experiments that compare two different samples. Note that since suppression characteristics change with intensity levels, null distributions also changes . At a lower intensity level, due to significant random suppression, the null distribution generally has a large variance. At higher intensity levels, the random suppression effect is considerably less, and the null distribution is mainly caused by experimental variations. Generally, the null distribution at a given intensity level is not directly available in a regular differential LC-FTMS experiments, and we have to estimate them in order to provide significance P-values for all fold changes. Without estimating the appropriate null distributions, it would be hard to detect differentially expressed proteins reliably especially in the low intensity region. Currently no software provides such significance estimation or suppression correction. We develop a software, Gcorr, that performs correction/suppression characteristics analysis, and fold change significance estimation at different intensity levels.
We use two replicate QC datasets and a UPS1 dataset for development and demonstration of Gcorr. The QC dataset is a quality control dataset generated on the organism LTQ_Orb_2. We downloaded the QC datasets DatasetQC_Shew_07_011pt0_b_04Feb07_Falcon_070202 and DatasetQC_Shew_07_011pt0_b_04Feb07_Falcon_070203 from http://omics.pnl.gov/browse/. For details of the data please see . In the QC datasets, tandem peptide identification is performed at the same time of the LC/MS experiment and a peptide list annotated with sequence, charge state and elution time information is provided for each dataset. UPS1 is a Proteomics Standard Set (from SIGMA – ALDRICH TM ), consisting of a mixture of 48 individual human source or human sequence recombinant proteins, each of which has been selected to limit heterogeneous post-translational modifications (PTMs). The total protein content in each vial is 10.6 mg. Each protein has been quantitated by amino acid analysis (AAA) prior to formulation. All these datasets are collected on LC-FTMS.
Both LC-MS(MS1) data and LC-MS/MS (MS2) data are required to run the software. Currently, the software support MS1 data in mzxml format and MS2 data in excel format. Many instruments support the mzxml format, and many tools are available to convert .raw to .mzxml files, . We require that MS2 files in excel format must be the output of Trans-Proteomic Pipeline(TPP) , the MS2 data file after TPP contains a list of peptides that have been identified, and need to be quantified. Most importantly, the MS2 peptide list is annotated with PeptideProphet  scores which allows us to pick existing peptides for suppression characteristic analysis. More detail of the data format is provided in the supplemental material. After the MS1 and MS2 information are loaded, the mass, XIC and peak interval of identified peptide are calculated. As the preprocessing could be very time consuming(depend on the size of the input data and the number of peptide of interest), the output of mass, XIC and peak interval values can be exported as MATLAB .mat files and stored for future uses. The output of the software is a peptide list in text format. The result file include these information: the peptide sequence, the abundance before correction, whether the peptide is correctable, abundance after correction and the p-value of fold change. The Gcorr software takes these input files: 1) Two label free LC-FTMS files that contains samples of two classes to be compared; 2) A list of peptides to be differentially analyzed; and 3) a null distribution file that describes experimental variations. Given these inputs, Gcorr outputs: 1) Corrected peptide profiles and their corrected fold changes if they can be corrected. 2) Fold change significance P-values for peptides that can not be corrected. The Gcorr software can be easily extended to process labeled LC-FTMS datasets.
In this section, we briefly describe the correction function in Gcorr. In , we developed an peptide profile correction algorithm based on iterative conditional mode (ICM) algorithm. Given a peptide, we denote its observed elution profiles as y 1 = [y 1(t 1), y 1(t 2), ⋯] for the higher peptide profile, and y 2 = [y 2(t 1), y 2(t 2), ⋯] for the lower profile of 12 C and 13 C respectively, where t 1, t 2, ⋯ are sampling time of the elution profiles. Define x 1 and x 2 as the true profiles of the peptide. x 1 and y 1 are related as x 1 = f(y 1), where f(·) is the correction function. We define the inverse function of f(·) as the distortion function g(·). We verified in our previous work, that the distortion of different isotopes is the same for the same peptide, and we have x 1 = f(y 1) and x 2 = f(y 2). Let T = x 1 + x 2 represents the total ion count of the peptide at 12 C and 13 C. The basic idea is: 1.Set an initial value for the total ion count T = y 1 + y 2. 2. Based on the total ion count, the isotope ratio r and y 2, calculate the most probable correction function. 3. Correct the elution profiles y 1 and y 2 using currently estimated correction functions, and the values of x 1 and x 2 are updated. 4. Estimate a new correction function with the updated x 2, the isotope ratio and the total ion count T. 5. Repeat step 4, and 5 until the convergence condition is met. We can obtain the correction function and the corrected peptide profiles after the application of the algorithm. For detail of the algorithm please refer to . However, the algorithm can only be used to correct peptide profiles that satisfy these correction conditions: 1. The isotope ratios are not close to zero or one (x 2 / x 1 = 0.2 to 0.8 for example), since in these cases, the suppression is not detectable by comparing y 1 and y 2; 2.The maximum intensity of y 2 need to be greater than the distortion free range lower limit (106) , as the function can only be estimated on the range that y 2 spans, and if max(y 2 ) does not reach the lower limit, then the correction function cannot be found for the full range that needs correction.
The correction algorithm corrects profiles that meet the correction condition almost perfectly. However, only part of peptide profiles with y 2 reaching certain threshold can be corrected. In general, such threshold is very high, and the intensity of most peptides (70% - 90%)are lower than the threshold . We want to understand the characteristics of those uncorrectable peptides. We consider the correction functions that has been found as random samples of all possible correction functions, based on which, we can estimate the overall suppression characteristics. We want to know how the peptide profiles are suppressed at different intensity levels. To estimate suppression characteristics, we apply random correction functions to typical peptide profiles at lower intensities to get an estimation of the range of suppressions at different intensity levels.
Once we obtain samples of correction functions for each LC-MS dataset, we can further examine the fold change variation of peptides from different samples in differential analysis using LC-FTMS. The variations in measured fold change are caused by experimental and instrumental variations. These variations need to be considered carefully in biomarker discovery, as the variation are intensity dependant. Gcorr provides a tool for evaluating whether fold changes of peptides are significant.
There are two kind of variations in the FTMS data. One is experimental variation which is caused by sample preparations and other experimental steps. The other is instrumental variation which is the result of random suppression.
Experimental variations can be obtained by inspecting the measured ratios between identical peptides in two datasets that are supposed to contain the same amount of such peptides. The most convenient way to get such a null distribution is to inspect a 1:1 label-free LC-FTMS dataset pair collected based on the same sample. We have found that the suppression is intensity dependent in our previous work, and when the intensity of a peptide profile is greater than a certain threshold, the observed profile is suppression free. The variation in measured fold changes of suppression free peptide profiles is only determined by the experimental process, and we can consider the fold change distribution of suppression free peptide profiles as the experimental null distribution. For the profiles that is lower than the threshold, they suffers from both experimental and instrumental variation. As the suppression is intensity dependent, the resulted null distributions are different at different intensity levels and they can not be considered as experimental distributions. Several runs of 1:1 data can be collected to further confirm the experimental variation in a particular set of experimental conditions. If 1:1 dataset are not available, identical amount of peptide standard can be spiked in two samples for experimental null distribution estimation.
From these observations we hypothesize that the overall variation if measured fold changes is caused by experimental variations and instrumental random suppression. While the experimental variations are not intensity dependant, the instrument variation is. Under this hypothesis, if given experimental variations of an typical differential LC-FTMS experiments, we can get a null distribution of fold changes at any given intensity level with the collection of estimated correction/distortion functions. Then base on null distributions at different intensity levels, we can provide a significance p-value for any measured fold change between two samples. Note that generally the null distribution is not directly available in a typical differential LC-FTMS experiment due to different sample contents.
To test the hypothesis, here we show that for a 1:1 label-free dataset pair, we can predict a null distributions that conforms to the observed null distribution at different intensity levels after we transform an experimental null distribution by estimated correction/distortion functions. Note that with a 1:1 dataset pair collected based on the same sample, it is possible to get the observed null distribution on fold changes at different intensity levels. However in most cases, such distributions are not available.
As the experimental null distribution and the overall null distribution are base on 1:1 label free samples, it is expected that a lab will first conduct a 1:1 sample run to get an estimation of experimental variation using the same set of equipments.
Subsequently at a regular differential experiment, the Gcorr software would be applied to find correction/distortion functions. Subsequently, given a fold change measurement between two peptides, we will first determine its intensity range and estimate its overall null distribution based on the set of estimated correction/distortion functions. Once we have obtain the null distribution estimation, then the fold change’s significance score will be calculated based on the null distributions.
We develop a software for correcting the signal suppression in FTMS data based on the interactive correction mode algorithm. The software have been tested with replicate QC data, part of peptides satisfying the correction condition can be corrected perfectly. With the corrected peptide profiles, the overall null distribution is estimated and compared to the theoretical prediction at a lower intensity level. Based on such null distributions, the significance P-value of fold changes in a typical LC-FTMS differential analysis experiment can be calculated.
XM implemented the software, performed simulations and wrote the first draft of the paper. JC assisted the development of the software. JZ conceive the concept of the software, advised the development of the software and revised the manuscript.
This work is supported by a San Antonio Life Sciences Institute Research Enhancement grant, and a grant from National Institute of Health (NIH 2G12RR013646-11).
This article has been published as part of Proteome Science Volume 9 Supplement 1, 2011: Proceedings of the International Workshop on Computational Proteomics. The full contents of the supplement are available online at http://www.proteomesci.com/supplements/9/S1.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.