Increasing peptide identifications and decreasing search times for ETD spectra by pre-processing and calculation of parent precursor charge
© The article is a work of the United States Government; licensee BioMed Central Ltd. 2012
Received: 20 September 2011
Accepted: 9 February 2012
Published: 9 February 2012
Electron Transfer Dissociation [ETD] can dissociate multiply charged precursor polypeptides, providing extensive peptide backbone cleavage. ETD spectra contain charge reduced precursor peaks, usually of high intensity, and whose pattern is dependent on its parent precursor charge. These charge reduced precursor peaks and associated neutral loss peaks should be removed before these spectra are searched for peptide identifications. ETD spectra can also contain ion-types other than c and z˙. Modifying search strategies to accommodate these ion-types may aid in increased peptide identifications. Additionally, if the precursor mass is measured using a lower resolution instrument such as a linear ion trap, the charge of the precursor is often not known, reducing sensitivity and increasing search times. We implemented algorithms to remove these precursor peaks, accommodate new ion-types in noise filtering routine in OMSSA and to estimate any unknown precursor charge, using Linear Discriminant Analysis [LDA].
Spectral pre-processing to remove precursor peaks and their associated neutral losses prior to protein sequence library searches resulted in a 9.8% increase in peptide identifications at a 1% False Discovery Rate [FDR] compared to previous OMSSA filter. Modifications to the OMSSA noise filter to accommodate various ion-types resulted in a further 4.2% increase in peptide identifications at 1% FDR. Moreover, ETD spectra when searched with charge states obtained from the precursor charge determination algorithm is shown to be up to 3.5 times faster than the general range search method, with a minor 3.8% increase in sensitivity.
Overall, there is an 18.8% increase in peptide identifications at 1% FDR by incorporating the new precursor filter, noise filter and by using the charge determination algorithm, when compared to previous versions of OMSSA.
Mass-spectrometry based proteomics is a major technique for the identification of the constituents of complex protein mixtures . Analysis of peptide and protein sequences using gas-phase ion chemistry and tandem mass spectrometry has been described by various groups [2–5], where common methods of peptide identification involve enzymatic digestion of proteins isolated from protein mixture, fractionation and fragmentation of the resultant peptides, followed by MS/MS sequence search algorithms to match the peptide sequence to the tandem mass spectrometry data. Some of the widely used search algorithms are OMSSA , X!Tandem , Sequest , MyriMatch , SpectrumMill (Agilent) and Mascot . A key step is the fragmentation method used to dissociate the peptides obtained after enzyme cleavage. Techniques currently used include Collision Activated Dissociation [CAD], Electron Capture Dissociation [ECD] and ETD . The use of ETD is becoming increasingly prevalent as it can be used on more common instruments such as quadrupole ion trap.
ETD/ECD spectra can contain precursor peaks in various charge states, called charge reduced precursors. Neutral losses from these precursors are shown to be prevalent among both ECD  and ETD spectra . Some of the widely observed neutral losses in ETD are ammonia (17 Da), water (18 Da), and carbon monoxide (28 Da). Also, the presence of arginine in the peptide sometimes leads to loss of a guanidino group (43 Da). In an effort to reduce false positives and improve sensitivity (true positives/total hits) in OMSSA, the intense precursor and neutral loss peaks that can be present in ETD MS/MS spectra should be removed before these spectra are searched against the protein sequence library. OMSSA currently has a precursor filter  which removes these precursor peaks and neutral losses. We employed modifications to this present routine to accommodate higher charged precursor peaks and their associated neutral losses.
ETD works exceptionally well on large multiply charged peptides . In the MS/MS dataset used in this paper the precursor charge ranges from 3+ to 7+. Unlike high resolution instruments, lower resolution instruments, such as the linear ion trap used in this experiment, are typically not used to determine a precursor charge. Peptide identification is generally done by MS/MS database search algorithms, which generally require either the precursor charge or a possible range as input along with the precursor mass and the peak list file. If the range option is used, the algorithm looks for peptides with different possible precursor charge range and molecular weights, which can be computationally expensive and result in false positives. Determining the precursor charge state accurately may improve sensitivity and specificity along with the sequence library search times. Several algorithms have been developed [18–20] to determine the parent precursor charge from the tandem mass spectrometry data. Charger  uses 2 methods to infer precursor charge state. The first method employs self-correlation analysis of product ions to infer precursor charge from the peptide mass, obtained from the complementary ions. When the 1st method fails, Charger uses linear discriminant analysis [LDA] to predict charge states using different features of the ETD spectra. The Charge Prediction Machine  employs Bayesian decision theory to classify charge states using the features found in the ETD spectra. Recently, another algorithm to determine the precursor charge using support vector machine [SVM] has been developed . The algorithm described in this paper uses LDA with a unique set of features to estimate if a charge state can be assigned to a spectrum, and if so, what charge states are possible. The algorithm is trained not to assign a charge state if the features do not warrant determination of the precursor charge.
Results and discussion
In this study, we examined algorithms for pre-processing and charge determination of ETD spectra. To validate these algorithms, we used a dataset consisting of ETD MS/MS spectra of yeast phosphopeptides . The dataset has a total of 16901 spectra, of which 10000 spectra were used as training set, while the rest were used as test set. We searched the spectra against a target-decoy library  using target sequences from the NCBI 6298 yeast protein sequence library.
Spectral pre-processing prior to protein sequence library search is an important step to identify high-confidence peptide-spectrum matches. Generally, this pre-processing step involves removing possible noise peaks and other non-product ion peaks, for example, precursor peaks and their associated neutral losses in ETD spectra. These pre-processing steps, which are already present in OMSSA , were revised to accommodate more ion-types and higher precursor charge states. For our analysis here, we divided this spectral pre-processing step into 2 stages of filtering -- precursor filtering and noise filtering.
Other than the product ion fragmentation, MS/MS spectra obtained from ETD can contain charge reduced precursor peaks, usually of high intensity. This series of charge reduced precursor peaks is distributed in n non-overlapping bins of the MS/MS spectra where n is the parent precursor charge. These bins are mass windows around <MH(z+)>, where z ranges from 1 to n, M is neutral mass of the peptide, H is mass of proton and MH(z+)=(M+zH)/z. Figure 1b illustrates the precursor peaks and their associated neutral losses observed in an ETD spectrum. The width of the isotopic distribution of these ions generally depends on their mass.
removal of a fixed window width around these precursor peaks,
removal of a variable window width around these precursor peaks, and
removal of the neutral loss region (-60 Da or -18 Da, scaled to z) for the precursor and its reduced series.
The motivation behind using a variable window is to make the algorithm applicable for both smaller peptides (3+ parent charge) and larger peptides (7+ or higher charged precursor peptides). If we use a "fixed" window around the precursor, we may remove "product ion signal" regions for lower charged precursors (3+) or in case of higher charged precursors (7+), we may retain these precursor peaks, which could affect the scoring adversely. We compared our results with a recently developed spectral processing algorithm by Good et al  and the previous OMSSA filter. Below is some terminology used in this comparision:
W→ Window upstream of monoisotopic precursor peaks,
M→ Neutral mass of peptide,
n→ Parent precursor charge,
N1, N2→ Width of neutral losses which are downstream to precursor peaks,
H→ Mass of proton and
mz→ Corresponds to the region where all the peaks are removed.
Removal of precursor peaks
Removal of neutral loss peaks
Apart from the precursor filtering routine, OMSSA employs a noise filtering routine to remove noise peaks found in the mass spectra prior to submitting the spectra to sequence library search. In this noise filtering algorithm, first step involves removing the isotope peaks, other than the monoisotopic product ion-peak. This is done by removing peaks which are 1-2 Da upstream of the most intense peak. We did not make any changes to this routine. The second step in this noise filtering involves removing peaks that are too close together. This is explained in detail in the original OMSSA paper . This filter retains the top 2 most intense peaks in a sliding window of +/- 27 Da (or +/- 14 Da) when looking for 1+ (or 2+) product ion peaks. The reason to pick 2 peaks is that the filter assumes there is one forward ion series (c ions) and one reverse ion series (z˙) in each region. Since it has been shown that ETD spectra can also contain other ion types [13, 15, 16], we modified this routine to accommodate for extra ion-types present in the spectra. For example, if we are looking for c, z˙ and y ions, then the probability of any product ion being found in a +/- 27 Da window would be more than if we are looking for only c and z˙ ions. In OMSSA, the number of 1+ product ion peaks allowed in a window of +/- 27 Da is given by "h1", while the number of 2+ product ion peaks allowed in a +/- 14 Da window is given by "h2". By default, these values are set to 2. Since we are using c, z˙ and y ions in our peptide sequence library search, we ran OMSSA searches on the training set to find that the optimum value for "h1" and "h2" is 3. Hence, we made "h1" and "h2" equal to the number of possible ions found in the spectra. OMSSA users can change these values, if needed.
Precursor charge determination
Charge reduced precursor series
where X(M, z) = M/(W*z) and W = 500. Parameter W is taken from the precursor filter analysis. If M is 2000 Da, then X for MH(2+) will be 2000/(500*2) = 2 Da. Similarly for MH(1+), X will be 4 Da. Parameter tolp is set to 2 Da.
Similarly, neutral losses associated with these precursor peaks can be calculated, depending on the types of neutral losses, which is explained below in detail.
where tol is the tolerance (bin width) around the neutral losses window and N2 is the neutral loss considered in this analysis. The values considered for N2 and tol are 18 Da and 4 Da respectively. One of the differences between this equation compared to (4) and (5) is that here we removed only neutral loss peaks of water and ammonia, instead of removing the entire region from the precursor to these peaks, the reason being the identification of the precursor charge exactly rather than spectral cleaning. All the features used in this study are normalized with the total ion current present in the spectra. OMSSA searches with assigned precursor charges from the algorithm and by general range search method are compared using ROC curves and the results are described below.
For the charge determination algorithm, we used the same training and test sets that we used for the spectral pre-processing. There were some differences on how we used the training set to build a LDA classifier. Our first step in this analysis is to find a good set of peptide-spectrum matches to classify the spectra into different charge states based on the information obtained from the MS/MS scan. We used OMSSA on the training set to get peptide-spectrum matches. To have a reliable set to input LDA, we used an OMSSA e-value cut-off of 1e-6 on the training set results to pick high-confidence peptide-spectrum matches. We chose this e-value to avoid any decoy assignments. In cases where OMSSA results in identification of 2 peptides (for a ETD MS/MS spectrum) with different precursor charges, we used the top-most hit. We found 428 unique peptide-spectrum matches that satisfy these criteria and are used as input to LDA classifier.
Top 1: Only the best predicted charge state is used to search for peptides using OMSSA.
Top 1/Top 2: Consider a threshold t1 for the posterior probabilities obtained from LDA classifier and then assign the top 2 predicted charges to the spectra whose posterior probability of the best predicted charge is below t1, while considering only the best predicted charge for spectra above this threshold t1. This can be considered similar to Charger , as it can assign 2 best predicted charge states if it cannot assign a single best possible precursor charge.
1/2/All: A third scenario is considered, where the 2 thresholds t1 and t2 are considered. If the posterior probability of the best predicted charge is greater than t1, only the best charge is considered. While if the probability falls between t1 and t2 (t1>t2), then top 2 charges are searched. If the probability is less than t2, the spectra is assigned the entire range to search for the probable precursor charge states. This is similar to changing the relaxation parameter in the Charge Prediction Machine . We determined the thresholds using the training set. For the present analysis, we found that optimum values for t1 and t2 are 0.99 and 0.9 respectively. We varied the settings for t1 and t2 and selected the values that worked best. Introducing few false positives, rather than losing many true positives, is one of the criteria in determining the above threshold values.
Table showing database search times for OMSSA using different variants of precursor charge determination algorithm.
Computational Time (minutes)
CP+NL (Top 1)
CP+NL (Top 1/Top 2)
As we mentioned earlier, one of the reasons to determine precursor charge is to reduce the database search time and to decrease the number of false identifications. The computational times for the OMSSA range search and the OMSSA search after precursor charge assignments is shown in Table 1. It can be seen that if we assign only the best charge to all of the spectra in the test set, the search is almost 10 times (91/9) faster than the range search method. However, this results in loss of peptide identifications. Similarly, we lose some true positives with top 1/top 2 option. Since it is important not to lose any identifications, the 1/2/All variant looks optimal and is 3.5 times faster than the range search method.
Apart from the precursor peaks and neutral losses, we also considered the density and distribution of product ions as a feature. The density and distribution of the product ion peaks in ETD spectra depends on the precursor charge i.e., a 3+ precursor charge peptide ion can produce product fragment ions up to 2+ charge and a 4+ precursor charge ion can produce up to 3+ charge. It can be inferred that higher the charge of the precursor, the denser the product ion peaks in the MS/MS scan, although to a small degree this is counteracted by a reduction in the intensity of the charge reduced precursors. This kind of approach was used previously to differentiate between 2+ and 3+ precursor charge states in CAD data . We also used a similar approach to see if there is further increase in sensitivity using product ion distribution as a feature. From our analysis, we could not see any improvement using product ion distribution as another input feature to LDA.
Table showing sensitivity increase of peptide identifications with updated filters and precursor charge estimation.
True positives (1% FDR)
New Precursor Filter
New Precursor and Noise Filters
Precursor Charge Determination Algorithm
(3.5 times faster)
ETD can dissociate precursor ions over a wide charge range. MS/MS spectra of these peptides can have charge reduced precursor peaks with corresponding neutral losses, all of which are generally intense. To reduce false positives and false negatives, these peaks should be removed before submitting the spectra to a protein database search algorithm for peptide identification. We developed an algorithm to remove these precursor peaks and neutral losses more effectively. We removed bins upstream of precursor peaks of width proportional to the molecular weight of the precursor. Similarly we removed the possible neutral losses associated with these precursor peaks in the ETD spectra. ROC plots (see Figure 2) show better performance compared to the previous OMSSA filter and the spectral pre-processing algorithm developed by Good et. al. . There was an increase of at least 9.8% identifications at 1% FDR when OMSSA's precursor filter is compared to the original OMSSA filter (see Figure 6).
An additional improvement to the spectral pre-processing was based on the observation that ETD spectra can contain different ion-series such as y ions, depending on the precursor charge of the peptide. We incorporated this information into OMSSA's noise filtering. This led to a further 4.2% increase in peptide identifications at 1% FDR (see Figure 6). Charge reduced precursor peak filtering along with the noise filtering should result in increase in peptide identifications for MS/MS data obtained from both the lower resolution and higher resolution instruments. We did not test the filters on data from a high resolution instrument.
Precursor charge is often not measured on lower resolution instruments, although the distribution of charge reduced precursor peaks in the MS/MS spectra has a pattern that determines the precursor charge. In this study, we used this pattern of charge reduced precursor peaks and their neutral losses to determine the precursor charge. Neutral loss peaks can aid in classifying the ambiguous charge states, i.e., multiples such as 3+/6+ etc. We developed an algorithm to predict parent precursor charge state using statistical methods. Using LDA, we determined that the intensity and pattern of charge reduced precursor peaks and neutral losses were found to be a good predictor of the precursor charge. Using this precursor charge determination algorithm, OMSSA's run times were 3.5 times faster compared to range search method, with a minor 3.8% increase in peptide identifications at 1% FDR (see Figure 6). Previous charge determination algorithms did not report any increase in sensitivity of peptide identifications, while our algorithm clearly showed small increase in sensitivity. MS/MS database search algorithms could incorporate charge state determination algorithms as an important tool in significantly reducing database search times. Overall, using the new versions of the precursor and noise filters in OMSSA and incorporating charge determination algorithm, there was an increase of at least 18.8% in peptide identifications and almost 3.5 times faster than the previous version of OMSSA. Such improvement in sensitivity and the database search times with the updated filters and precursor charge determination could be useful for mass spectrometry labs with lower resolution instruments.
ETD MS/MS spectra of yeast phosphopeptides is used for this study . These spectra were acquired using the Finnigan LTQ mass spectrometer (Thermo Electron, San Jose, CA). This spectrometer was equipped with a nano-flow HPLC microelectrospray ionization source and was modified to facilitate ETD. The dataset used for this study has a total of 16901 spectra, of which 10000 spectra were used as training set, while the rest were used as test set. We compared the charge state breakdown of the training set and test sets for hits with an e-value better than 1e-6. There were 20.1%, 51.4%, 19.4%, 7.7% and 1.4% peptide hits of +3, +4, +5, +6 and +7 charge states respectively in the training set. In the test set, there were 22.8%, 44.4%, 20.1%, 10.7% and 2.0% peptide hits of +3, +4, +5, +6 and +7 charge states respectively. The charge state distributions are approximately same for both training and test sets. Of the peptide identifications, there were only 18 unique peptide hits that were common to both training and test sets.
The OMSSA precursor and noise filtering algorithm was prepared using the NCBI C++ toolkit. For precursor charge determination, we used LDA, where precursor charge states are the predefined classes. LDA is done using MATLAB 7.8.0 [R2009a]. We wrote scripts in MATLAB to extract features from the spectra to input into LDA.
After the spectral processing is done and precursor charge states assigned for the spectra, we used OMSSA 2.1.7 for peptide identification. Here is a brief outline of the parameters and the sequence library used for the OMSSA search. A static modification of alkylation with iodoacetamide on cysteine, static modifications of methyl ester formation on aspartic acid, glutamic acid and the peptide C terminus, a variable modification of oxygen on methionine and phosphorylation of serine, threonine and tyrosine are considered. A precursor mass tolerance of 3.0 Da, and a fragment mass tolerance of 0.4 Da is used and c, z˙ and y ions are searched in these ETD MS/MS spectra. For all our analyses, we searched the MS/MS spectra against a target-decoy library  using target sequences from the NCBI 6298 yeast protein sequence library. Using the target-decoy database strategy, we get decoy and the forward database assignments. The number of false positives is generally considered equal to the decoy assignments, while the number of true positives is the forward database assignments minus the decoy database assignments at the e-value considered. OMSSA search results are then analyzed using the receiver operating characteristic [ROC] curves. ROC curve is a plot of sensitivity (true positives) plotted against 1-specificity (false positives). All the OMSSA searches were run on a cluster of SuSe linux machines.
This research was supported in part by the Intramural Research Program of the NIH, National Library of Medicine and by NIH GM 037537 to DFH.
- Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature 2003, 422: 198–207.View Article
- Kelleher NL: Top-down proteomics. Anal Chem 2004, 76: 197A-203A.View Article
- Schroeder MJ, Shabanowitz J, Schwartz JC, Hunt DF, Coon JJ: A neutral loss activation method for improved phosphopeptide sequence analysis by quadrupole ion trap mass spectrometry. Anal Chem 2004, 76: 3590–3598.View Article
- Swaney DL, McAlister GC, Wirtala M, Schwartz JC, Syka JE, Coon JJ: Supplemental activation method for high-efficiency electron-transfer dissociation of doubly protonated peptide precursors. Anal Chem 2007, 79: 477–485.PubMed CentralView Article
- Syka JE, Coon JJ, Schroeder MJ, Shabanowitz J, Hunt DF: Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proc Natl Acad Sci USA 2004, 101: 9528–9533.PubMed CentralView Article
- Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH: Open mass spectrometry search algorithm. J Proteome Res 2004, 3: 958–964.View Article
- Craig R, Beavis RC: TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20: 1466–1467.View Article
- Yates JR III, Eng JK, McCormack AL, Schieltz D: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal Chem 1995, 67: 1426–1436.View Article
- Tabb DL, Fernando CG, Chambers MC: MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 2007, 6: 654–661.PubMed CentralView Article
- Perkins DN, Pappin DJ, Creasy DM, Cottrell JS: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20: 3551–3567.View Article
- Zubarev RA, Zubarev AR, Savitski MM: Electron capture/transfer versus collisionally activated/induced dissociations: solo or duet? J Am Soc Mass Spectrom 2008, 19: 753–761.View Article
- Falth M, Savitski MM, Nielsen ML, Kjeldsen F, Andren PE, Zubarev RA: Analytical utility of small neutral losses from reduced species in electron capture dissociation studied using SwedECD database. Anal Chem 2008, 80: 8089–8094.View Article
- Sun RX, Dong MQ, Song CQ, Chi H, Yang B, Xiu LY, Tao L, Jing ZY, Liu C, Wang LH, et al.: Improved peptide identification for proteomic analysis based on comprehensive characterization of electron transfer dissociation spectra. J Proteome Res 2010, 9: 6354–6367.View Article
- Geer LY, Bai DL, Shabanowitz J, Kowalak JA, Markey SP, Bryant SH, Hunt DF: An algorithm for sequence searching of peptide spectra generated via electron transfer dissociation. J Am Soc Mass Spec 2005, 16: S2-S169.View Article
- Chalkley RJ, Medzihradszky KF, Lynn AJ, Baker PR, Burlingame AL: Statistical analysis of Peptide electron transfer dissociation fragmentation mass spectrometry. Anal Chem 2010, 82: 579–584.PubMed CentralView Article
- Liu X, Shan B, Xin L, Ma B: Better score function for peptide identification with ETD MS/MS spectra. BMC Bioinformatics 2010,11(Suppl 1):S4.PubMed CentralView Article
- Coon JJ, Ueberheide B, Syka JE, Dryhurst DD, Ausio J, Shabanowitz J, Hunt DF: Protein identification using sequential ion/ion reactions and tandem mass spectrometry. Proc Natl Acad Sci USA 2005, 102: 9463–9468.PubMed CentralView Article
- Sadygov RG, Hao Z, Huhmer AF: Charger: combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra. Anal Chem 2008, 80: 376–386.View Article
- Carvalho PC, Cociorva D, Wong CC, Carvalho MD, Barbosa VC, Yates JR: Charge prediction machine: tool for inferring precursor charge states of electron transfer dissociation tandem mass spectra. Anal Chem 2009.
- Sharma V, Eng JK, Feldman S, von Haller PD, MacCoss MJ, Noble WS: Precursor charge state prediction for electron transfer dissociation tandem mass spectra. J Proteome Res 2010, 9: 5438–5444.PubMed CentralView Article
- Chi A, Huttenhower C, Geer LY, Coon JJ, Syka JE, Bai DL, Shabanowitz J, Burke DJ, Troyanskaya OG, Hunt DF: Analysis of phosphorylation sites on proteins from Saccharomyces cerevisiae by electron transfer dissociation (ETD) mass spectrometry. Proc Natl Acad Sci USA 2007, 104: 2193–2198.PubMed CentralView Article
- Elias JE, Gygi SP: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 2007, 4: 207–214.View Article
- Good DM, Wenger CD, McAlister GC, Bai DL, Hunt DF, Coon JJ: Post-acquisition ETD spectral processing for increased peptide identifications. J Am Soc Mass Spectrom 2009, 20: 1435–1440.PubMed CentralView Article
- Sweet SM, Jones AW, Cunningham DL, Heath JK, Creese AJ, Cooper HJ: Database search strategies for proteomic data sets generated by electron capture dissociation mass spectrometry. J Proteome Res 2009, 8: 5475–5484.PubMed CentralView Article
- Na S, Paek E, Lee C: CIFTER: automated charge-state determination for peptide tandem mass spectra. Anal Chem 2008, 80: 1520–1528.View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.