A method for improving SELDI-TOF mass spectrometry data quality
© Whistler et al; licensee BioMed Central Ltd. 2007
Received: 25 June 2007
Accepted: 05 September 2007
Published: 05 September 2007
Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a powerful tool for rapidly generating high-throughput protein profiles from a large number of samples. However, the events that occur between the first and last sample run are likely to introduce technical variation in the results.
We fractionated and analyzed quality control and investigational serum samples on 3 Protein Chips and used statistical methods to identify poor-quality spectra and to identify and reduce technical variation.
Using diagnostic plots, we were able to visually depict all spectra and to identify and remove those that were of poor quality. We detected a technical variation associated with when the samples were run (referred to as batch effect) and corrected for this variation using analysis of variance. These corrections increased the number of peaks that were reproducibly detected.
By removing poor-quality, outlier spectra, we were able to increase peak detection, and by reducing the variance introduced when samples are processed and analyzed in batches, we were able to increase the reproducibility of peak detection.
Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) allows users to generate protein expression data rapidly from a large number of samples and has been used increasingly to identify diagnostic biomarkers of cancer [1–3], mental illness [4, 5], and neurological disorders [6, 7]. However, as with any analytic technique, its results must be reproducible if one is to have confidence in them.
Several challenges to implementing SELDI-TOF MS in routine clinical diagnostics have already been overcome [8–10]. These include challenges pertaining to biologic samples such as the characterization of sample donors (e.g., by age, sex, fasting status, diurnal rhythm) ; sample collection and handling [12, 13]; and the effects of freezing, thawing, and storage on specimen stability . Parameters of the SELDI-TOF MS technique that have been assessed range from its sample-processing and robotic-handling systems to its application of the energy-absorbing matrix [15–17]. Finally, many aspects of the technique designed to improve the calibration and quality of the spectra [10, 18–21] and of peak detection and quantification [22–24] have made SELDI-TOF MS one of the most promising protein biomarker discovery methods.
Even though a variety of software packages can be used to analyze SELDI-TOF MS data, few are effective in averaging replicate spectra or identifying poor-quality spectra [25, 26], and none are capable of analyzing and adjusting for the variation introduced when samples are processed and analyzed in batches. We demonstrate that conventional statistical approaches can be used to identify outlying spectra and correct for batch variation, as well as to increase the number of peaks detected by SELDI-TOF MS and improve the reproducibility of peak detection.
We used a 2-way analysis of variation (ANOVA) model to explore batch effect variation in the QC sample and a 3-way ANOVA model to explore batch effect variation in the investigational samples. The batch from which spectra were processed was the largest source of variation in both the QC and the investigational samples (Figure 2). The range of the F ratio (or signal-to- noise ratio) was 4 to 14 for the QC sample, much lower than the 2 to 34 F ratio range for the investigational samples. The CM10 low-stringency (CMLS) fractions 1 to 4 (F1-F4) and the IMAC F1 and F3 ProteinChips showed the lowest batch variance, with QC and investigational samples having similar F ratios (Figure 2).
Summary data showing stages in the quality assessment of QC sample spectra.
QC Data Set
Pearson Correlation Coefficient (Grand Mean ± Std Dev)
Coefficient of Variation (%)
% Peaks CV < 30%
0.834 ± 0.225
0.829 ± 0.135
0.946 ± 0.038
0.851 ± 0.142
0.918 ± 0.110
0.879 ± 0.146
0.904 ± 0.173
0.948 ± 0.065
0.844 ± 0.189
0.891 ± 0.084
0.790 ± 0.256
0.939 ± 0.038
0.884 ± 0.112
0.911 ± 0.080
0.728 ± 0.265
0.940 ± 0.037
0.878 ± 0.085
0.946 ± 0.038
0.911 ± 0.059
0.941 ± 0.049
0.910 ± 0.060
0.935 ± 0.055
0.961 ± 0.037
0.922 ± 0.055
0.905 ± 0.068
0.922 ± 0.057
0.939 ± 0.038
0.925 ± 0.045
0.921 ± 0.061
0.929 ± 0.041
0.980 ± 0.018
0.971 ± 0.027
0.975 ± 0.026
0.961 ± 0.052
0.972 ± 0.039
0.979 ± 0.018
0.975 ± 0.024
0.990 ± 0.012
0.982 ± 0.012
0.977 ± 0.019
0.971 ± 0.028
0.976 ± 0.197
0.968 ± 0.045
0.945 ± 0.055
0.989 ± 0.009
To measure the reproducibility, we calculated the coefficient of variation for the peak intensities of all spectra in each QC sample data set (Table 1). Similar data is available for the investigational samples [see Additional file 1]. The removal of low-quality spectra generally improved the number of peaks common to all spectra in that data set and reduced the average CV for the full spectrum (Table 1B). Batch removal produced a more dramatic effect (Table 1C): the number of peaks remained the same, but the average CV improved as did the number of peaks in each data set with a CV < 30% (Table 1). For example, the CMLS_F5 QC serum data set started with 66 spectra with 54 peaks present; the average CV for specimens in this set was 70% (range: 11–274%, Table 1A). Using the diagnostic plot criteria, we removed two spectra, thereby reducing the CV range to 10–48% and the average CV to 24% (Table 1B). Removing the batch effect technical variance further reduced the CV range to 6–31% and the average CV to 13% (Table 1C). We obtained similar results with the specimen data sets [see Additional file 1]. For all data sets, the CV for m/z values were within the 0.3% reported in the literature .
Even though SELDI-TOF MS is designed as a high-throughput automated assay, large studies involving many biological samples are often divided into batches that are analyzed over several days to weeks. To detect any variability that may occur, analysts process pooled human serum (QC samples) with the study samples. In this study, we used an ANOVA model to assess technical variance in peak intensities that could be introduced by differences among sample batches, variations in the spot position of each sample on the ProteinChip, and variations in the ProteinChip array. We found that batch differences accounted for the largest source of technical variability in each data set, with variations in spot position and ProteinChip array contributing little. Therefore, any analysis that ignores the variation associated with processing samples in different batches leaves a considerable amount of noise in the data. The balanced design of the experiments we conducted allowed us to reliably estimate the batch effect and then to remove that effect using the Partek Batch Remover (based on a mixed-model ANOVA). As only technical factors were included in the ANOVA model, the peak intensity data can then be used in further statistical analyses.
Hong et al.  identified the correlation matrix as an effective metric for identifying lower quality spectra. However, we found that this approach was less effective when used to establish one cut-off value for several large data sets. In an attempt to automate our decisions on which spectra should be included in our analysis, we drew on our knowledge of microarrays and presented our data in diagnostic plots  (Figure 1). The results we obtained using diagnostic plots to assess poor spectrum quality compared favourably with assessments based on visual inspection and normalization factors >2 standard deviations from the mean (as recommended by Ciphergen Biosystems, Freemont, USA). Our use of statistical measures to assess spectrum quality allowed us to automatically remove even more poor-quality spectra. There are other measures that could be considered in determining data quality, for example peak resolution [18, 19], however the software packages used in this analysis did not determine this parameter.
QC data sets represent the same pooled serum sample run with each batch of investigational sera. This directly evaluates the repeatability of measurements and a more stringent cut off value (1-mean > 0.2) is used with the QC data sets compared to the investigational data sets. The repeatability is expected to be much higher in the QC data sets. Good performance should be associated with low coefficients of variation for the peak intensities, as the data are all derived from the same pooled reference serum. Table 1 illustrates the improvements in data quality and reproducibility resulting from the removal of outlying, poor-quality spectra and the removal of the technical batch effect. The average CVs for all data sets (except H50-F1) were ≤ 20% when all peaks were considered rather than just 3 to 7 major peaks as reported in some studies [29, 30]; furthermore, more than 90% of all peaks in each data set, other than H50-F1, had peak intensity CVs < 30%.
In this study, we used a diagnostic plot to detect and discard low-quality spectra. This method was easy to implement and effective in detecting outlier spectra. Our use of the model-based ANOVA to account for the technical variance introduced by batch processing of spectra further improved the data quality.
A reference or QC sample was prepared by pooling serum collected in Vacutainer tubes with no additives from 10 donors. This was processed, aliquoted and frozen in the same manner as study subject samples. Serum samples from 207 subjects (referred to as investigational sera) were collected during a clinical study of Chronic Fatigue Syndrome in Wichita Kansas .
All of the experimental protocols were performed by a single laboratorian.
To reduce sample complexity and increase the number of protein peaks detected, we performed anion exchange fractionation using the Expression Difference Mapping™ Kit – Serum Fractionation (Ciphergen Biosystems Inc., Fremont, CA, USA) the robotic Biomek 2000 liquid handling system (Beckman Coulter, Fullerton, CA, USA). We collected six different fractions – pH 9 (F1), pH 7 (F2), pH 5 (F3), pH 4 (F4), pH 3 (F5) and organic (F6) – from investigational serum samples that were fractionated in 11 batches over a period of 7 months. Twenty investigational samples and 3 QC samples were processed in each batch and then frozen at -80°C. For each batch, we analyzed fractions in the same order and kept freezing times (2 to 11 days) and processing conditions constant.
Protein expression profiling
Aliquots of each fraction were bound in duplicate with a randomized ProteinChip/spot position allocation scheme to 3 different types of ProteinChip arrays: IMAC-Cu (metal binding), H50 (hydrophobic chemistry) and two CM10 (anionic chemistry) ProteinChip arrays. One for a high stringency (HS) wash using 50 mM HEPES, pH7 performed before sample application to allow selective binding of proteins, and one for low stringency (LS) wash, 0.1 M sodium acetate pH4, performed before sample application. From previous studies , we know that F2 is not particularly informative, and F5 has many overlapping peaks present in F4 and/or F6. Therefore, we did not run these fractions in this study.
For each ProteinChip array, the relevant QC fraction was present on one spot position. The details of ProteinChip processing have been described previously . We used saturated sinapinic acid in 50% acetonitrile/0.5% trifluoroacetic acid as matrix and applied it using the robot. We read the ProteinChips in a PBSIIc mass spectrometer (Ciphergen Biosystems) using automated data collection protocols with previously optimized conditions . We used data from the low mass range protocols (3000 to 30,000 Daltons) in our analysis and calibrated for mass accuracy using the "all-in-one" protein standard II on NP20 ProteinChips (Ciphergen Biosystems). The "all-in-one" peptide standard should be used if a greater accuracy is required at m/z < 8,000 applied with the sinapinic acid matrix to keep data comparable.
Instrument performance and evaluation are critical to spectrometer function and complete details of calibration, alignment and accuracy assessments performed routinely are fully outlined in a previous publication .
Using data from 15 fractions, we generated 414 investigational spectra and 66 QC spectra per ProteinChip-fraction, each of which we considered a data set. We had to make an unanticipated instrument adjustment, which involved a preventative maintenance service, between the sixth and seventh batches because of laboratory relocation.
Processing of spectral data sets
We used the QC serum sample to develop and evaluate data processing procedures, which we then used in processing data for the 207 investigational samples.
We exported raw spectrum data files for each ProteinChip-fraction and processed them using the following calibration equation: mz = U(a(t - t 0)2 + b)
Smooth the data using the Savitzky-Golay filter with a setting of 8.
Denoise the spectra using a wavelet transform with a threshold setting of 0.5.
View baseline subtraction using a window setting of 5.
Rescale intensity values to positive.
Normalize intensity values using Total Ion Current.
Generate an average spectrum.
Align spectra using the combined Fast Fourier Transform (FFT)/Peak matching method on the full m/z range, with a scale of 1, a maximum shift of 20, looking ahead by 1, and using the average as a reference.
Export the processed data as a single file (to be used for correlation analysis).
Pick peaks with a baseline cut-off of 0.5, a window of 10, and a height ratio of 1.5
Export peak intensity values for all spectra in a single file.
We performed all statistical analysis using Partek Genomics Suite software, version 6.2 (Partek Inc., St. Charles, Missouri).
To detect outlier spectra, we used full spectrum processed data consisting of 11,876 intensity values covering the m/z range from 3,000 to 30,000 (the data file exported in step 9 above). We also generated a similarity matrix using the Pearson correlation coefficient on all combinations of spectra within the data set. We then calculated a mean correlation coefficient for each spectrum and visually depicted the coefficients on diagnostic plots . Our cut-off criteria were 1-mean > 0.2 for the removal of QC spectra and > 0.4 for the removal of spectra from investigational samples.
After the quality assessment of the spectra and prior to the batch removal process, we used a 2-way ANOVA model to determine the variation in the data sets. Variation in the QC data sets attributable to the batch process (Batch, Figure 2) and to the position on the ProteinChips (Spot, Figure 2) was evaluated. For the larger investigational data sets, we used a 3-way model that also incorporated ProteinChip array (Array, determined by Lot Number) as a factor (nested in Batch). The Partek Batch Remover that we used employs a mixed-model ANOVA of the technical factors to identify and remove these sources of variation. The variation is reported as the average F ratio, a measure of the average signal-to-noise ratio of all the computed variables for each factor. Each component (Batch, Spot or Array) are compared to the error measurement, normalized to unity for reference.
We averaged all spectra with replicates and performed all statistical analyses using nonparametric tests: the Mann-Whitney test to compare 2 groups and the Kruskal-Wallis test to compare more than 2 groups. A bootstrap method was used to perform multiple test correction in the statistical tests. The bootstrap is used to determine the probability of obtaining a particular p-value by chance. Group labels are randomly re-assigned (with replacement) for a total of 2,000 iterations of the bootstrap. The bootstrap method does not assume that tests are independent.
Hierarchical clustering was performed on the peak intensities of the spectra using a Spearman rank dissimilarity metric with average linkage.
We thank Dr Elizabeth Unger for her critical reading of the manuscript and her invaluable comments.
- Liu XP, Shen J, Li ZF, Yan L, Gu J: A serum proteomic pattern for the detection of colorectal adenocarcinoma using surface enhanced laser desorption and ionization mass spectrometry. Cancer Invest 2006, 24: 747–753. 10.1080/07357900601063873PubMedView ArticleGoogle Scholar
- Munro NP, Cairns DA, Clarke P, Rogers M, Stanley AJ, Barrett JH, Harnden P, Thompson D, Eardley I, Banks RE, Knowles MA: Urinary biomarker profiling in transitional cell carcinoma. Int J Cancer 2006, 119: 2642–2650. 10.1002/ijc.22238PubMedView ArticleGoogle Scholar
- Oh JH, Gao J, Nandi A, Gurnani P, Knowles L, Schorge J: Diagnosis of early relapse in ovarian cancer using serum proteomic profiling. Genome Inform 2005, 16: 195–204.PubMedGoogle Scholar
- Lakhan SE: Schizophrenia proteomics: biomarkers on the path to laboratory medicine? Diagn Pathol 2006, 1: 11. 10.1186/1746-1596-1-11PubMed CentralPubMedView ArticleGoogle Scholar
- Novikova SI, He F, Cutrufello NJ, Lidow MS: Identification of protein biomarkers for schizophrenia and bipolar disorder in the postmortem prefrontal cortex using SELDI-TOF-MS ProteinChip profiling combined with MALDI-TOF-PSD-MS analysis. Neurobiol Dis 2006, 23: 61–76. 10.1016/j.nbd.2006.02.002PubMedView ArticleGoogle Scholar
- Lewczuk P, Esselmann H, Groemer TW, Bibl M, Maler JM, Steinacker P, Otto M, Kornhuber J, Wiltfang J: Amyloid beta peptides in cerebrospinal fluid as profiled with surface enhanced laser desorption/ionization time-of-flight mass spectrometry: evidence of novel biomarkers in Alzheimer's disease. Biol Psychiatry 2004, 55: 524–530. 10.1016/j.biopsych.2003.10.014PubMedView ArticleGoogle Scholar
- Sanchez JC, Guillaume E, Lescuyer P, Allard L, Carrette O, Scherl A, Burgess J, Corthals GL, Burkhard PR, Hochstrasser DF: Cystatin C as a potential cerebrospinal fluid marker for the diagnosis of Creutzfeldt-Jakob disease. Proteomics 2004, 4: 2229–2233. 10.1002/pmic.200300799PubMedView ArticleGoogle Scholar
- Clarke CH, Buckley JA, Fung ET: SELDI-TOF-MS proteomics of breast cancer. Clin Chem Lab Med 2005, 43: 1314–1320. 10.1515/CCLM.2005.225PubMedGoogle Scholar
- Bons JA, Wodzig WK, van Dieijen-Visser MP: Protein profiling as a diagnostic tool in clinical chemistry: a review. Clin Chem Lab Med 2005, 43: 1281–1290. 10.1515/CCLM.2005.222PubMedView ArticleGoogle Scholar
- White CN, Chan DW, Zhang Z: Bioinformatics strategies for proteomic profiling. Clin Biochem 2004, 37: 636–641. 10.1016/j.clinbiochem.2004.05.004PubMedView ArticleGoogle Scholar
- Albrethsen J, Bogebo R, Olsen J, Raskov H, Gammeltoft S: Preanalytical and analytical variation of surface-enhanced laser desorption-ionization time-of-flight mass spectrometry of human serum. Clin Chem Lab Med 2006, 44: 1243–1252. 10.1515/CCLM.2006.228PubMedView ArticleGoogle Scholar
- Banks RE, Stanley AJ, Cairns DA, Barrett JH, Clarke P, Thompson D, Selby PJ: Influences of Blood Sample Processing on Low-Molecular-Weight Proteome Identified by Surface-Enhanced Laser Desorption/Ionization Mass Spectrometry. Clin Chem 2005, 51: 1637–1649. 10.1373/clinchem.2005.051417PubMedView ArticleGoogle Scholar
- Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, Mehigh RJ, Cockrill SL, Scott GB, Tammen H, Schulz-Knappe P, Speicher DW, Vitzthum F, Haab BB, Siest G, Chan DW: HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 2005, 5: 3262–3277. 10.1002/pmic.200401245PubMedView ArticleGoogle Scholar
- Traum AZ, Wells MP, Aivado M, Libermann TA, Ramoni MF, Schachter AD: SELDI-TOF MS of quadruplicate urine and serum samples to evaluate changes related to storage conditions. Proteomics 2006, 6: 1676–1680. 10.1002/pmic.200500174PubMed CentralPubMedView ArticleGoogle Scholar
- Cordingley HC, Roberts SL, Tooke P, Armitage JR, Lane PW, Wu W, Wildsmith SE: Multifactorial screening design and analysis of SELDI-TOF ProteinChip array optimization experiments. Biotechniques 2003, 34: 364–373.PubMedGoogle Scholar
- Guerreiro N, Gomez-Mancilla B, Charmont S: Optimization and evaluation of surface-enhanced laser-desorption/ionization time-of-flight mass spectrometry for protein profiling of cerebrospinal fluid. Proteome Sci 2006, 4: 7. 10.1186/1477-5956-4-7PubMed CentralPubMedView ArticleGoogle Scholar
- Jock CA, Paulauskis JD, Baker D, Olle E, Bleavins MR, Johnson KJ, Heard PL: Influence of matrix application timing on spectral reproducibility and quality in SELDI-TOF-MS. Biotechniques 2004, 37: 30–34.PubMedGoogle Scholar
- Semmes OJ, Feng Z, Adam BL, Banez LL, Bigbee WL, Campos D, Cazares LH, Chan DW, Grizzle WE, Izbicka E, Kagan J, Malik G, McLerran D, Moul JW, Partin A, Prasanna P, Rosenzweig J, Sokoll LJ, Srivastava S, Srivastava S, Thompson I, Welsh MJ, White N, Winget M, Yasui Y, Zhang Z, Zhu L: Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 2005, 51: 102–112. 10.1373/clinchem.2004.038950PubMedView ArticleGoogle Scholar
- Bons JA, de BD, van Dieijen-Visser MP, Wodzig WK: Standardization of calibration and quality control using surface enhanced laser desorption ionization-time of flight-mass spectrometry. Clin Chim Acta 2006, 366: 249–256. 10.1016/j.cca.2005.10.019PubMedView ArticleGoogle Scholar
- Aivado M, Spentzos D, Alterovitz G, Otu HH, Grall F, Giagounidis AA, Wells M, Cho JY, Germing U, Czibere A, Prall WC, Porter C, Ramoni MF, Libermann TA: Optimization and evaluation of surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) with reversed-phase protein arrays for protein profiling. Clin Chem Lab Med 2005, 43: 133–140. 10.1515/CCLM.2005.022PubMedView ArticleGoogle Scholar
- Wolski WE, Lalowski M, Jungblut P, Reinert K: Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants. BMC Bioinformatics 2005, 6: 203. 10.1186/1471-2105-6-203PubMed CentralPubMedView ArticleGoogle Scholar
- Morris JS, Coombes KR, Koomen J, Baggerly KA, Kobayashi R: Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinformatics 2005, 21: 1764–1775. 10.1093/bioinformatics/bti254PubMedView ArticleGoogle Scholar
- Meleth S, Eltoum IE, Zhu L, Oelschager D, Piyathilake C, Chhieng D, Grizzle WE: Novel approaches to smoothing and comparing SELDI TOF spectra. Cancer Informatics 2005, 1: 78–85.PubMed CentralPubMedGoogle Scholar
- Yasui Y, Pepe M, Thompson ML, Adam BL, Wright GL Jr., Qu Y, Potter JD, Winget M, Thornquist M, Feng Z: A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 2003, 4: 449–463. 10.1093/biostatistics/4.3.449PubMedView ArticleGoogle Scholar
- Coombes KR, Tsavachidis S, Morris JS, Baggerly KA, Hung MC, Kuerer HM: Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 2005, 5: 4107–4117. 10.1002/pmic.200401261PubMedView ArticleGoogle Scholar
- Prados J, Kalousis A, Sanchez JC, Allard L, Carrette O, Hilario M: Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics 2004, 4: 2320–2332. 10.1002/pmic.200400857PubMedView ArticleGoogle Scholar
- Hong H, Dragan Y, Epstein J, Teitel C, Chen B, Xie Q, Fang H, Shi L, Perkins R, Tong W: Quality control and quality assessment of data from surface-enhanced laser desorption/ionization (SELDI) time-of flight (TOF) mass spectrometry (MS). BMC Bioinformatics 2005, 6 Suppl 2: S5. 10.1186/1471-2105-6-S2-S5PubMedView ArticleGoogle Scholar
- Park T, Yi SG, Lee S, Lee JK: Diagnostic plots for detecting outlying slides in a cDNA microarray experiment. Biotechniques 2005, 38: 463–471.PubMedView ArticleGoogle Scholar
- Su Y, Shen J, Qian H, Ma H, Ji J, Ma H, Ma L, Zhang W, Meng L, Li Z, Wu J, Jin G, Zhang J, Shou C: Diagnosis of gastric cancer using decision tree classification of mass spectral data. Cancer Sci 2007, 98: 37–43. 10.1111/j.1349-7006.2006.00339.xPubMedView ArticleGoogle Scholar
- Rossi L, Martin BM, Hortin GL, White RL, Foster M, Moharram R, Stroncek D, Wang E, Marincola FM, Panelli MC: Inflammatory protein profile during systemic high dose interleukin-2 administration. Proteomics 2006, 6: 709–720. 10.1002/pmic.200500004PubMedView ArticleGoogle Scholar
- Vernon SD, Whistler T, Aslakson E, Rajeevan M, Reeves WC: Challenges for molecular profiling of chronic fatigue syndrome. Pharmacogenomics 2006, 7: 211–218. 10.2217/146224184.108.40.206PubMedView ArticleGoogle Scholar
- Rollin D, Whistler T, Vernon SD: Laboratory methods to improve SELDI peak detection and quantitation. Proteome Sci 2007., 5: Google Scholar
- Wong JWH, Cagney G, Cartwright HM: SpecAlign--processing and alignment of mass spectra datasets. Bioinformatics 2005, 21: 2088–2090. 10.1093/bioinformatics/bti300PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.