Skip to main content

A method for improving SELDI-TOF mass spectrometry data quality



Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a powerful tool for rapidly generating high-throughput protein profiles from a large number of samples. However, the events that occur between the first and last sample run are likely to introduce technical variation in the results.


We fractionated and analyzed quality control and investigational serum samples on 3 Protein Chips and used statistical methods to identify poor-quality spectra and to identify and reduce technical variation.


Using diagnostic plots, we were able to visually depict all spectra and to identify and remove those that were of poor quality. We detected a technical variation associated with when the samples were run (referred to as batch effect) and corrected for this variation using analysis of variance. These corrections increased the number of peaks that were reproducibly detected.


By removing poor-quality, outlier spectra, we were able to increase peak detection, and by reducing the variance introduced when samples are processed and analyzed in batches, we were able to increase the reproducibility of peak detection.


Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) allows users to generate protein expression data rapidly from a large number of samples and has been used increasingly to identify diagnostic biomarkers of cancer [13], mental illness [4, 5], and neurological disorders [6, 7]. However, as with any analytic technique, its results must be reproducible if one is to have confidence in them.

Several challenges to implementing SELDI-TOF MS in routine clinical diagnostics have already been overcome [810]. These include challenges pertaining to biologic samples such as the characterization of sample donors (e.g., by age, sex, fasting status, diurnal rhythm) [11]; sample collection and handling [12, 13]; and the effects of freezing, thawing, and storage on specimen stability [14]. Parameters of the SELDI-TOF MS technique that have been assessed range from its sample-processing and robotic-handling systems to its application of the energy-absorbing matrix [1517]. Finally, many aspects of the technique designed to improve the calibration and quality of the spectra [10, 1821] and of peak detection and quantification [2224] have made SELDI-TOF MS one of the most promising protein biomarker discovery methods.

Even though a variety of software packages can be used to analyze SELDI-TOF MS data, few are effective in averaging replicate spectra or identifying poor-quality spectra [25, 26], and none are capable of analyzing and adjusting for the variation introduced when samples are processed and analyzed in batches. We demonstrate that conventional statistical approaches can be used to identify outlying spectra and correct for batch variation, as well as to increase the number of peaks detected by SELDI-TOF MS and improve the reproducibility of peak detection.


To identify and remove poor-quality spectra, we assessed the degree of linear relationship among all spectra in each data set (a ProteinChip-fraction combination). We then generated a pair-wise similarity matrix using the Pearson correlation coefficient on normalized intensity values of each spectrum. To visually depict the data, we drew a diagnostic plot of 1 minus the mean (1-mean) of Pearson correlation coefficients (x-axis) against the range of correlation coefficients (y-axis) (Figure 1). By comparing the results depicted in these diagnostic plots to other evaluation methodologies, such as principal component analysis of the processed spectra or signal to noise (SN) ratios, and by comparing the number of peaks in each spectrum with the average number of peaks for all spectra in the data set, we established cut-off values of 1-mean > 0.2 for QC spectra and > 0.4 for specimen spectra.

Figure 1
figure 1

Diagnostic plot generated from a Pearson correlation matrix of 66 QC spectra from the CMLS-F4 data set. A cut-off value of 1-Mean of Pearson correlation coefficient > 0.2 was used to exclude spectra from the QC analysis (blue line). In this data set, 2 QC sample spectra were above this cut-off (colored blue) and therefore were removed prior to further processing. For spectra from the investigational data set, a 1-Mean > 0.4 was used for all ProteinChip-Fraction combinations.

Variation in analytic results is introduced when samples are processed and analyzed in different batches. To examine the extent of this batch effect, we used the nonparametric Kruskal-Wallis test to compare the normalized peak intensities in the spectra within a batch to the same peak (by mass-to-charge (m/z) value) in the spectra from all other batches. Our null hypothesis was that intensity means would be identical for each peak across the different batches. Using a corrected p-value of < 0.005 to calculate the number of peaks that were different in at least one batch, we found a statistically significant batch effect in at least 50% of peaks for each ProteinChip-fraction combination (Figure 2).

Figure 2
figure 2

Sources of technical variation for the QC (left) and Investigational (right) data sets prior to the Partek Batch Removal process. A plot of the average F ratio (the signal-to-noise ratio) produced when we applied the ANOVA model (Batch and Spot) to the QC data sets (left), and the Investigational data set (right) which also included ProteinChip lot number (Array). Analysis was performed on spectra after removal of outlier spectra, prior to technical effect batch removal. Batch effect is the largest contributor to variance. The numbers above each group indicate the percentage of peaks that were different in at least one batch (p > 0.005) as determined by the Kruskal-Wallis test. The QC data set is derived from a pooled serum sample so no. After batch removal, none of the peaks were significantly different.

We used a 2-way analysis of variation (ANOVA) model to explore batch effect variation in the QC sample and a 3-way ANOVA model to explore batch effect variation in the investigational samples. The batch from which spectra were processed was the largest source of variation in both the QC and the investigational samples (Figure 2). The range of the F ratio (or signal-to- noise ratio) was 4 to 14 for the QC sample, much lower than the 2 to 34 F ratio range for the investigational samples. The CM10 low-stringency (CMLS) fractions 1 to 4 (F1-F4) and the IMAC F1 and F3 ProteinChips showed the lowest batch variance, with QC and investigational samples having similar F ratios (Figure 2).

As described in the Methods section, we used the Batch Remover tool (Partek Genomics Suite) to reduce the effects of batch variation. Hierarchical clustering of spectra in each data set showed that before we used the Batch Removal tool, each batch clustered as a distinct node (Figure 3 for CMLS-F4-QC). In fact, 2 nodes were apparent in all data sets, one for batches run prior to unexpected instrument maintenance being done (batches A through F) and one for batches run after such maintenance was done (batches G through K). After we applied the Batch Removal tool, however, we observed no clustering by batch (Figure 3).

Figure 3
figure 3

Dendrograms from hierarchical cluster analysis of spectra from CMLS-F4-QC data set labelled by batch, processed before (left) and after (right) Partek Batch Removal. QC spectra peak intensities from different points in the analytical process were used to generate dendrograms from hierarchical cluster analysis (Spearman rank dissimilarity metric with average linkage). Each spectrum was labeled for the batch in which it was processed (A through K). The dendrogram on the left is from analysis of spectra before Partek Batch Remover was applied to the data set. Spectra cluster in nodes according to the batch in which they were processed. Two large clusters are evident. One with spectra from batches A through F. The second covering batches G through K. An unanticipated instrument adjustment had to be made between the sixth and seventh batch, which is noticed in the analysis. The dendrogram on the right shows the hierarchical cluster analysis of the same data after Partek Batch Remover was used to reduce the contribution of batch effect technical variance. Spectra no longer cluster by batch in which they were processed, and spectra from before and after the instrument maintenance are inter-mingled across the 2 major nodes.

We assessed the quality of each spectrum by using the Pearson correlation coefficient to compare the 11,876 intensity measures of each spectrum. Using the cut-off criteria we established of 1-mean > 0.2 for QC spectra and > 0.4 for specimen spectra, we obtained very similar results if we used peak intensities (less than 100 values per spectrum) to generate the correlation matrix. Before the outlier spectra and batch effect variance were removed, the correlation coefficients ranged from 0.75 to 0.95 in each full data set (Table 1A). Removing poor-quality spectra improved the correlations, 0.88 to 0.96 (Table 1B) as did removing the batch effect, 0.95 to 0.99 (Table 1C). Duplicate spectra from individual samples showed a high degree of reproducibility as demonstrated by a median Pearson correlation coefficient of 0.98 for the 207 pairs of spectra in the CMLS-F4 data set. Results for the other data sets were similar (results not shown).

Table 1 Summary data showing stages in the quality assessment of QC sample spectra.

To measure the reproducibility, we calculated the coefficient of variation for the peak intensities of all spectra in each QC sample data set (Table 1). Similar data is available for the investigational samples [see Additional file 1]. The removal of low-quality spectra generally improved the number of peaks common to all spectra in that data set and reduced the average CV for the full spectrum (Table 1B). Batch removal produced a more dramatic effect (Table 1C): the number of peaks remained the same, but the average CV improved as did the number of peaks in each data set with a CV < 30% (Table 1). For example, the CMLS_F5 QC serum data set started with 66 spectra with 54 peaks present; the average CV for specimens in this set was 70% (range: 11–274%, Table 1A). Using the diagnostic plot criteria, we removed two spectra, thereby reducing the CV range to 10–48% and the average CV to 24% (Table 1B). Removing the batch effect technical variance further reduced the CV range to 6–31% and the average CV to 13% (Table 1C). We obtained similar results with the specimen data sets [see Additional file 1]. For all data sets, the CV for m/z values were within the 0.3% reported in the literature [19].


Even though SELDI-TOF MS is designed as a high-throughput automated assay, large studies involving many biological samples are often divided into batches that are analyzed over several days to weeks. To detect any variability that may occur, analysts process pooled human serum (QC samples) with the study samples. In this study, we used an ANOVA model to assess technical variance in peak intensities that could be introduced by differences among sample batches, variations in the spot position of each sample on the ProteinChip, and variations in the ProteinChip array. We found that batch differences accounted for the largest source of technical variability in each data set, with variations in spot position and ProteinChip array contributing little. Therefore, any analysis that ignores the variation associated with processing samples in different batches leaves a considerable amount of noise in the data. The balanced design of the experiments we conducted allowed us to reliably estimate the batch effect and then to remove that effect using the Partek Batch Remover (based on a mixed-model ANOVA). As only technical factors were included in the ANOVA model, the peak intensity data can then be used in further statistical analyses.

Hong et al. [27] identified the correlation matrix as an effective metric for identifying lower quality spectra. However, we found that this approach was less effective when used to establish one cut-off value for several large data sets. In an attempt to automate our decisions on which spectra should be included in our analysis, we drew on our knowledge of microarrays and presented our data in diagnostic plots [28] (Figure 1). The results we obtained using diagnostic plots to assess poor spectrum quality compared favourably with assessments based on visual inspection and normalization factors >2 standard deviations from the mean (as recommended by Ciphergen Biosystems, Freemont, USA). Our use of statistical measures to assess spectrum quality allowed us to automatically remove even more poor-quality spectra. There are other measures that could be considered in determining data quality, for example peak resolution [18, 19], however the software packages used in this analysis did not determine this parameter.

QC data sets represent the same pooled serum sample run with each batch of investigational sera. This directly evaluates the repeatability of measurements and a more stringent cut off value (1-mean > 0.2) is used with the QC data sets compared to the investigational data sets. The repeatability is expected to be much higher in the QC data sets. Good performance should be associated with low coefficients of variation for the peak intensities, as the data are all derived from the same pooled reference serum. Table 1 illustrates the improvements in data quality and reproducibility resulting from the removal of outlying, poor-quality spectra and the removal of the technical batch effect. The average CVs for all data sets (except H50-F1) were ≤ 20% when all peaks were considered rather than just 3 to 7 major peaks as reported in some studies [29, 30]; furthermore, more than 90% of all peaks in each data set, other than H50-F1, had peak intensity CVs < 30%.


In this study, we used a diagnostic plot to detect and discard low-quality spectra. This method was easy to implement and effective in detecting outlier spectra. Our use of the model-based ANOVA to account for the technical variance introduced by batch processing of spectra further improved the data quality.



A reference or QC sample was prepared by pooling serum collected in Vacutainer tubes with no additives from 10 donors. This was processed, aliquoted and frozen in the same manner as study subject samples. Serum samples from 207 subjects (referred to as investigational sera) were collected during a clinical study of Chronic Fatigue Syndrome in Wichita Kansas [31].

Serum fractionation

All of the experimental protocols were performed by a single laboratorian.

To reduce sample complexity and increase the number of protein peaks detected, we performed anion exchange fractionation using the Expression Difference Mapping™ Kit – Serum Fractionation (Ciphergen Biosystems Inc., Fremont, CA, USA) the robotic Biomek 2000 liquid handling system (Beckman Coulter, Fullerton, CA, USA). We collected six different fractions – pH 9 (F1), pH 7 (F2), pH 5 (F3), pH 4 (F4), pH 3 (F5) and organic (F6) – from investigational serum samples that were fractionated in 11 batches over a period of 7 months. Twenty investigational samples and 3 QC samples were processed in each batch and then frozen at -80°C. For each batch, we analyzed fractions in the same order and kept freezing times (2 to 11 days) and processing conditions constant.

Protein expression profiling

Aliquots of each fraction were bound in duplicate with a randomized ProteinChip/spot position allocation scheme to 3 different types of ProteinChip arrays: IMAC-Cu (metal binding), H50 (hydrophobic chemistry) and two CM10 (anionic chemistry) ProteinChip arrays. One for a high stringency (HS) wash using 50 mM HEPES, pH7 performed before sample application to allow selective binding of proteins, and one for low stringency (LS) wash, 0.1 M sodium acetate pH4, performed before sample application. From previous studies [32], we know that F2 is not particularly informative, and F5 has many overlapping peaks present in F4 and/or F6. Therefore, we did not run these fractions in this study.

For each ProteinChip array, the relevant QC fraction was present on one spot position. The details of ProteinChip processing have been described previously [32]. We used saturated sinapinic acid in 50% acetonitrile/0.5% trifluoroacetic acid as matrix and applied it using the robot. We read the ProteinChips in a PBSIIc mass spectrometer (Ciphergen Biosystems) using automated data collection protocols with previously optimized conditions [32]. We used data from the low mass range protocols (3000 to 30,000 Daltons) in our analysis and calibrated for mass accuracy using the "all-in-one" protein standard II on NP20 ProteinChips (Ciphergen Biosystems). The "all-in-one" peptide standard should be used if a greater accuracy is required at m/z < 8,000 applied with the sinapinic acid matrix to keep data comparable.

Instrument performance and evaluation are critical to spectrometer function and complete details of calibration, alignment and accuracy assessments performed routinely are fully outlined in a previous publication [32].

Using data from 15 fractions, we generated 414 investigational spectra and 66 QC spectra per ProteinChip-fraction, each of which we considered a data set. We had to make an unanticipated instrument adjustment, which involved a preventative maintenance service, between the sixth and seventh batches because of laboratory relocation.

Processing of spectral data sets

We used the QC serum sample to develop and evaluate data processing procedures, which we then used in processing data for the 207 investigational samples.

We exported raw spectrum data files for each ProteinChip-fraction and processed them using the following calibration equation: mz = U(a(t - t 0)2 + b)

Where m/z is the mass-to-charge ratio, U is the voltage (20,000 for this data set), and t is the time-of-flight. For our mass calibration, we used the values, a = 0.336302, b = 0, and t0 = 0.09, which we obtained from the calibration equation generated from the protein standard. The final spectrum, from m/z 3,000 to 30,000, generated 11,876 data points. We saved the m/z and intensity values as comma-separated values files. We used SpecAlign [33] to pre-process each spectral data set of QC spectra (66 per ProteinChip-fraction) and specimen spectra (414 spectra per ProteinChip-fraction). We then followed the steps below to process the data:

  1. 1.

    Smooth the data using the Savitzky-Golay filter with a setting of 8.

  2. 2.

    Denoise the spectra using a wavelet transform with a threshold setting of 0.5.

  3. 3.

    View baseline subtraction using a window setting of 5.

  4. 4.

    Subtract baseline.

  5. 5.

    Rescale intensity values to positive.

  6. 6.

    Normalize intensity values using Total Ion Current.

  7. 7.

    Generate an average spectrum.

  8. 8.

    Align spectra using the combined Fast Fourier Transform (FFT)/Peak matching method on the full m/z range, with a scale of 1, a maximum shift of 20, looking ahead by 1, and using the average as a reference.

  9. 9.

    Export the processed data as a single file (to be used for correlation analysis).

  10. 10.

    Pick peaks with a baseline cut-off of 0.5, a window of 10, and a height ratio of 1.5

  11. 11.

    Export peak intensity values for all spectra in a single file.

Statistical Analysis

We performed all statistical analysis using Partek Genomics Suite software, version 6.2 (Partek Inc., St. Charles, Missouri).

To detect outlier spectra, we used full spectrum processed data consisting of 11,876 intensity values covering the m/z range from 3,000 to 30,000 (the data file exported in step 9 above). We also generated a similarity matrix using the Pearson correlation coefficient on all combinations of spectra within the data set. We then calculated a mean correlation coefficient for each spectrum and visually depicted the coefficients on diagnostic plots [28]. Our cut-off criteria were 1-mean > 0.2 for the removal of QC spectra and > 0.4 for the removal of spectra from investigational samples.

After the quality assessment of the spectra and prior to the batch removal process, we used a 2-way ANOVA model to determine the variation in the data sets. Variation in the QC data sets attributable to the batch process (Batch, Figure 2) and to the position on the ProteinChips (Spot, Figure 2) was evaluated. For the larger investigational data sets, we used a 3-way model that also incorporated ProteinChip array (Array, determined by Lot Number) as a factor (nested in Batch). The Partek Batch Remover that we used employs a mixed-model ANOVA of the technical factors to identify and remove these sources of variation. The variation is reported as the average F ratio, a measure of the average signal-to-noise ratio of all the computed variables for each factor. Each component (Batch, Spot or Array) are compared to the error measurement, normalized to unity for reference.

We averaged all spectra with replicates and performed all statistical analyses using nonparametric tests: the Mann-Whitney test to compare 2 groups and the Kruskal-Wallis test to compare more than 2 groups. A bootstrap method was used to perform multiple test correction in the statistical tests. The bootstrap is used to determine the probability of obtaining a particular p-value by chance. Group labels are randomly re-assigned (with replacement) for a total of 2,000 iterations of the bootstrap. The bootstrap method does not assume that tests are independent.

Hierarchical clustering was performed on the peak intensities of the spectra using a Spearman rank dissimilarity metric with average linkage.


  1. Liu XP, Shen J, Li ZF, Yan L, Gu J: A serum proteomic pattern for the detection of colorectal adenocarcinoma using surface enhanced laser desorption and ionization mass spectrometry. Cancer Invest 2006, 24: 747–753. 10.1080/07357900601063873

    Article  CAS  PubMed  Google Scholar 

  2. Munro NP, Cairns DA, Clarke P, Rogers M, Stanley AJ, Barrett JH, Harnden P, Thompson D, Eardley I, Banks RE, Knowles MA: Urinary biomarker profiling in transitional cell carcinoma. Int J Cancer 2006, 119: 2642–2650. 10.1002/ijc.22238

    Article  CAS  PubMed  Google Scholar 

  3. Oh JH, Gao J, Nandi A, Gurnani P, Knowles L, Schorge J: Diagnosis of early relapse in ovarian cancer using serum proteomic profiling. Genome Inform 2005, 16: 195–204.

    CAS  PubMed  Google Scholar 

  4. Lakhan SE: Schizophrenia proteomics: biomarkers on the path to laboratory medicine? Diagn Pathol 2006, 1: 11. 10.1186/1746-1596-1-11

    Article  PubMed Central  PubMed  Google Scholar 

  5. Novikova SI, He F, Cutrufello NJ, Lidow MS: Identification of protein biomarkers for schizophrenia and bipolar disorder in the postmortem prefrontal cortex using SELDI-TOF-MS ProteinChip profiling combined with MALDI-TOF-PSD-MS analysis. Neurobiol Dis 2006, 23: 61–76. 10.1016/j.nbd.2006.02.002

    Article  CAS  PubMed  Google Scholar 

  6. Lewczuk P, Esselmann H, Groemer TW, Bibl M, Maler JM, Steinacker P, Otto M, Kornhuber J, Wiltfang J: Amyloid beta peptides in cerebrospinal fluid as profiled with surface enhanced laser desorption/ionization time-of-flight mass spectrometry: evidence of novel biomarkers in Alzheimer's disease. Biol Psychiatry 2004, 55: 524–530. 10.1016/j.biopsych.2003.10.014

    Article  CAS  PubMed  Google Scholar 

  7. Sanchez JC, Guillaume E, Lescuyer P, Allard L, Carrette O, Scherl A, Burgess J, Corthals GL, Burkhard PR, Hochstrasser DF: Cystatin C as a potential cerebrospinal fluid marker for the diagnosis of Creutzfeldt-Jakob disease. Proteomics 2004, 4: 2229–2233. 10.1002/pmic.200300799

    Article  CAS  PubMed  Google Scholar 

  8. Clarke CH, Buckley JA, Fung ET: SELDI-TOF-MS proteomics of breast cancer. Clin Chem Lab Med 2005, 43: 1314–1320. 10.1515/CCLM.2005.225

    CAS  PubMed  Google Scholar 

  9. Bons JA, Wodzig WK, van Dieijen-Visser MP: Protein profiling as a diagnostic tool in clinical chemistry: a review. Clin Chem Lab Med 2005, 43: 1281–1290. 10.1515/CCLM.2005.222

    Article  CAS  PubMed  Google Scholar 

  10. White CN, Chan DW, Zhang Z: Bioinformatics strategies for proteomic profiling. Clin Biochem 2004, 37: 636–641. 10.1016/j.clinbiochem.2004.05.004

    Article  CAS  PubMed  Google Scholar 

  11. Albrethsen J, Bogebo R, Olsen J, Raskov H, Gammeltoft S: Preanalytical and analytical variation of surface-enhanced laser desorption-ionization time-of-flight mass spectrometry of human serum. Clin Chem Lab Med 2006, 44: 1243–1252. 10.1515/CCLM.2006.228

    Article  CAS  PubMed  Google Scholar 

  12. Banks RE, Stanley AJ, Cairns DA, Barrett JH, Clarke P, Thompson D, Selby PJ: Influences of Blood Sample Processing on Low-Molecular-Weight Proteome Identified by Surface-Enhanced Laser Desorption/Ionization Mass Spectrometry. Clin Chem 2005, 51: 1637–1649. 10.1373/clinchem.2005.051417

    Article  CAS  PubMed  Google Scholar 

  13. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, Mehigh RJ, Cockrill SL, Scott GB, Tammen H, Schulz-Knappe P, Speicher DW, Vitzthum F, Haab BB, Siest G, Chan DW: HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 2005, 5: 3262–3277. 10.1002/pmic.200401245

    Article  CAS  PubMed  Google Scholar 

  14. Traum AZ, Wells MP, Aivado M, Libermann TA, Ramoni MF, Schachter AD: SELDI-TOF MS of quadruplicate urine and serum samples to evaluate changes related to storage conditions. Proteomics 2006, 6: 1676–1680. 10.1002/pmic.200500174

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  15. Cordingley HC, Roberts SL, Tooke P, Armitage JR, Lane PW, Wu W, Wildsmith SE: Multifactorial screening design and analysis of SELDI-TOF ProteinChip array optimization experiments. Biotechniques 2003, 34: 364–373.

    CAS  PubMed  Google Scholar 

  16. Guerreiro N, Gomez-Mancilla B, Charmont S: Optimization and evaluation of surface-enhanced laser-desorption/ionization time-of-flight mass spectrometry for protein profiling of cerebrospinal fluid. Proteome Sci 2006, 4: 7. 10.1186/1477-5956-4-7

    Article  PubMed Central  PubMed  Google Scholar 

  17. Jock CA, Paulauskis JD, Baker D, Olle E, Bleavins MR, Johnson KJ, Heard PL: Influence of matrix application timing on spectral reproducibility and quality in SELDI-TOF-MS. Biotechniques 2004, 37: 30–34.

    CAS  PubMed  Google Scholar 

  18. Semmes OJ, Feng Z, Adam BL, Banez LL, Bigbee WL, Campos D, Cazares LH, Chan DW, Grizzle WE, Izbicka E, Kagan J, Malik G, McLerran D, Moul JW, Partin A, Prasanna P, Rosenzweig J, Sokoll LJ, Srivastava S, Srivastava S, Thompson I, Welsh MJ, White N, Winget M, Yasui Y, Zhang Z, Zhu L: Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 2005, 51: 102–112. 10.1373/clinchem.2004.038950

    Article  CAS  PubMed  Google Scholar 

  19. Bons JA, de BD, van Dieijen-Visser MP, Wodzig WK: Standardization of calibration and quality control using surface enhanced laser desorption ionization-time of flight-mass spectrometry. Clin Chim Acta 2006, 366: 249–256. 10.1016/j.cca.2005.10.019

    Article  CAS  PubMed  Google Scholar 

  20. Aivado M, Spentzos D, Alterovitz G, Otu HH, Grall F, Giagounidis AA, Wells M, Cho JY, Germing U, Czibere A, Prall WC, Porter C, Ramoni MF, Libermann TA: Optimization and evaluation of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) with reversed-phase protein arrays for protein profiling. Clin Chem Lab Med 2005, 43: 133–140. 10.1515/CCLM.2005.022

    Article  CAS  PubMed  Google Scholar 

  21. Wolski WE, Lalowski M, Jungblut P, Reinert K: Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants. BMC Bioinformatics 2005, 6: 203. 10.1186/1471-2105-6-203

    Article  PubMed Central  PubMed  Google Scholar 

  22. Morris JS, Coombes KR, Koomen J, Baggerly KA, Kobayashi R: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 2005, 21: 1764–1775. 10.1093/bioinformatics/bti254

    Article  CAS  PubMed  Google Scholar 

  23. Meleth S, Eltoum IE, Zhu L, Oelschager D, Piyathilake C, Chhieng D, Grizzle WE: Novel approaches to smoothing and comparing SELDI TOF spectra. Cancer Informatics 2005, 1: 78–85.

    CAS  PubMed Central  PubMed  Google Scholar 

  24. Yasui Y, Pepe M, Thompson ML, Adam BL, Wright GL Jr., Qu Y, Potter JD, Winget M, Thornquist M, Feng Z: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 2003, 4: 449–463. 10.1093/biostatistics/4.3.449

    Article  PubMed  Google Scholar 

  25. Coombes KR, Tsavachidis S, Morris JS, Baggerly KA, Hung MC, Kuerer HM: Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 2005, 5: 4107–4117. 10.1002/pmic.200401261

    Article  CAS  PubMed  Google Scholar 

  26. Prados J, Kalousis A, Sanchez JC, Allard L, Carrette O, Hilario M: Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics 2004, 4: 2320–2332. 10.1002/pmic.200400857

    Article  CAS  PubMed  Google Scholar 

  27. Hong H, Dragan Y, Epstein J, Teitel C, Chen B, Xie Q, Fang H, Shi L, Perkins R, Tong W: Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS). BMC Bioinformatics 2005, 6 Suppl 2: S5. 10.1186/1471-2105-6-S2-S5

    Article  PubMed  Google Scholar 

  28. Park T, Yi SG, Lee S, Lee JK: Diagnostic plots for detecting outlying slides in a cDNA microarray experiment. Biotechniques 2005, 38: 463–471.

    Article  CAS  PubMed  Google Scholar 

  29. Su Y, Shen J, Qian H, Ma H, Ji J, Ma H, Ma L, Zhang W, Meng L, Li Z, Wu J, Jin G, Zhang J, Shou C: Diagnosis of gastric cancer using decision tree classification of mass spectral data. Cancer Sci 2007, 98: 37–43. 10.1111/j.1349-7006.2006.00339.x

    Article  CAS  PubMed  Google Scholar 

  30. Rossi L, Martin BM, Hortin GL, White RL, Foster M, Moharram R, Stroncek D, Wang E, Marincola FM, Panelli MC: Inflammatory protein profile during systemic high dose interleukin-2 administration. Proteomics 2006, 6: 709–720. 10.1002/pmic.200500004

    Article  CAS  PubMed  Google Scholar 

  31. Vernon SD, Whistler T, Aslakson E, Rajeevan M, Reeves WC: Challenges for molecular profiling of chronic fatigue syndrome. Pharmacogenomics 2006, 7: 211–218. 10.2217/14622416.7.2.211

    Article  CAS  PubMed  Google Scholar 

  32. Rollin D, Whistler T, Vernon SD: Laboratory methods to improve SELDI peak detection and quantitation. Proteome Sci 2007., 5:

    Google Scholar 

  33. Wong JWH, Cagney G, Cartwright HM: SpecAlign--processing and alignment of mass spectra datasets. Bioinformatics 2005, 21: 2088–2090. 10.1093/bioinformatics/bti300

    Article  CAS  PubMed  Google Scholar 

Download references


We thank Dr Elizabeth Unger for her critical reading of the manuscript and her invaluable comments.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Toni Whistler.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

TW designed the experiments, developed the analytical approach, implemented the analysis and wrote the manuscript. DDR performed the laboratory experiments. SDV had the original idea for the study and assisted in the writing of the manuscript. All authors read and approved the manuscript.

Electronic supplementary material


Additional file 1: Summary data showing stages in the quality assessment of specimen spectra. Pearson correlation coefficients were calculated for the entire spectrum prior to peak detection, the coefficient for the entire dataset is reported (Grand statistic). The coefficient of variation (CV) was calculated for peak intensities present in the entire spectrum. (PDF 10 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Whistler, T., Rollin, D. & Vernon, S.D. A method for improving SELDI-TOF mass spectrometry data quality. Proteome Sci 5, 14 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: