Comparison of two label-free global quantitation methods, APEX and 2D gel electrophoresis, applied to the Shigella dysenteriae proteome
© Kuntumalla et al; licensee BioMed Central Ltd. 2009
Received: 10 April 2009
Accepted: 29 June 2009
Published: 29 June 2009
The in vitro stationary phase proteome of the human pathogen Shigella dysenteriae serotype 1 (SD1) was quantitatively analyzed in Coomassie Blue G250 (CBB)-stained 2D gels. More than four hundred and fifty proteins, of which 271 were associated with distinct gel spots, were identified. In parallel, we employed 2D-LC-MS/MS followed by the label-free computationally modified spectral counting method APEX for absolute protein expression measurements. Of the 4502 genome-predicted SD1 proteins, 1148 proteins were identified with a false positive discovery rate of 5% and quantitated using 2D-LC-MS/MS and APEX. The dynamic range of the APEX method was approximately one order of magnitude higher than that of CBB-stained spot intensity quantitation. A squared Pearson correlation analysis revealed a reasonably good correlation (R 2 = 0.67) for protein quantities surveyed by both methods. The correlation was decreased for protein subsets with specific physicochemical properties, such as low Mr values and high hydropathy scores. Stoichiometric ratios of subunits of protein complexes characterized in E. coli were compared with APEX quantitative ratios of orthologous SD1 protein complexes. A high correlation was observed for subunits of soluble cellular protein complexes in several cases, demonstrating versatile applications of the APEX method in quantitative proteomics.
Until recently, quantitative proteomics studies have mainly relied on two-dimensional (2D) gel electrophoresis combined with protein identification by mass spectrometry (MS) to analyze large datasets of proteins from complex protein mixtures [1, 2]. Quantitation of relative protein abundances from 2D gels has involved the comparison of protein spot intensities across two or more sample groups . Limited dynamic range caused by low detection sensitivity, the saturation of protein staining, and insufficient spot resolution from overlapping and co-migrating protein spots have confounded the accuracy and depth of protein quantitation in 2D gels [4, 5]. In addition, proteins with certain physicochemical traits are difficult to analyze in 2D gels, including those with a basic pI value, a high or low Mr value, and transmembrane domains. Alternative protein quantitation strategies based on shotgun proteomics have evolved to address some of these limitations [6, 7], including peptide or protein labeling [8, 9], and label-free strategies .
Label-free approaches have included measurements of mass spectral peak intensities  and spectral counting . While peak intensities of peptide ions can be correlated with protein abundances, spectral counting methods estimate protein abundances by comparing the number of MS/MS spectra assigned to each protein, based on the assumption that the number of peptides observed from a protein correlates with its abundance . Spectral counting provides the advantage of measuring both relative  and absolute abundances of different proteins in complex samples . To account for the fact that larger proteins contribute more peptides compared to smaller proteins, spectral counting data is normalized to avoid abundance over-estimation of high Mr proteins [13, 15]. However, since the ionization efficiency of peptides and their subsequent observation in the mass spectrometer depend on a variety of factors including their physicochemical properties, peptide composition and local chemical environment , spectral counting based solely on the number of experimentally observed, proteotypic peptides is often not an accurate measure of protein abundance [16, 17].
To address this, the APEX methodology, a label-free quantitation method for absolute protein expression measurements was developed by the Marcotte group [14, 18]. The APEX quantitation method correlates spectral counts obtained from mass spectrometric data with computational predictions of proteotypic peptides for each protein to estimate protein abundance from the fraction of observed peptide mass spectra. For proteotypic peptide prediction, machine learning classification algorithms are applied to a training dataset comprised of peptides from a limited set of abundant proteins to build a classification model for the prediction of proteotypic peptides generated in silico from the entire proteome. Prior expectation of observing these peptides and the confidence in protein identification serve as correction factors in APEX quantitation. APEX thereby estimates absolute protein concentration as the proportionality between the abundance of a protein and the number of its proteotypic peptides versus that of the total protein concentration and all proteotyic peptides .
In this study, we quantitatively analyzed the proteome of the Gram-negative bacterium Shigella dysenteriae serotype 1 (SD1) using two different approaches: (1) 2D gel display and quantitation of proteins via spot intensities; (2) tryptic digestion of the proteome, and LC-MS/MS in conjunction with APEX to estimate protein abundances from quantitation of peptides. The human pathogen SD1 is the most virulent of the four Shigella species and a causative agent of shigellosis [19, 20]. The predicted number of proteotypic peptides for each SD1 protein was derived from a species-specific SD1 training dataset generated from 100 abundant SD1 proteins, employing a recently developed software application based on the APEX methodology termed the APEX Quantitative Proteomics Tool . The APEX tool is freely available, user-friendly and easily downloadable for quantitation of proteins using LC-MS/MS datasets. We also describe a method to estimate protein abundances derived from CBB-stained 2D spot intensity values as molecules per cell. These experiments enabled us to generate a comparative proteomic dataset from two label-free global quantitation methods. Furthermore, we observed a high correlation of known stoichiometric ratios of subunits for several characterized E. coli protein complexes and the APEX ratios of equivalent SD1 proteins. These findings are significant as they demonstrate that computationally modified spectral counting methods, such as APEX, are among the most promising developments in quantitative proteomics.
Materials and methods
Bacterial strains and culture conditions
The strain Sd1617 of Shigella dysenteriae serotype 1 (SD1) was grown to stationary phase in Luria-Bertani (LB) medium at 37°C and pelleted by centrifugation at 7,000 × g for 10 min at 4°C. The SD1 cell pellet was washed with PBS by centrifuging at 6,000 × g for 15 min at 4°C and resuspended in a hypotonic lysis buffer composed of 25 mM Tris-HCl, pH 7.8 with 150 μg/mL lysozyme, 0.05% Triton X-100, 5 mM EDTA and protease inhibitors benzamidine (1 mM) and AEBSF (1 mM). After incubation in the lysis buffer for 30 min at room temperature (RT), the samples were immediately stored at -80°C until further processing. For nucleic acid digestion, bacterial samples suspended in the lysis buffer were thawed and gently agitated for 1 h at RT after the addition of leupeptin, DNAse and RNAse (10 μg/mL each) and 20 mM MgCl2. Cell lysates were centrifuged at 16,000 × g for 30 min at 4°C, and the supernatant containing bacterial cell lysate proteins was recovered.
2D-LC-MS/MS analysis of SD1 cell lysate
Following cell lysis, the extracted bacterial proteins were precipitated in six volumes of ice-cold acetone at -20°C for at least 1 h. Acetone-precipitated proteins were recovered as a pellet after centrifugation at 5,000 × g for 10 min. The protein pellet was resuspended in 0.1 M TAB (triethyl ammonium bicarbonate, Sigma Chemicals, St. Louis, MI) buffer, pH 8.5, and the protein concentration determined using the BCA assay (Sigma Chemicals). Proteins were denatured in 0.1% SDS and reduced using 5 mM TCEP (Tris(2-carboxyethyl)phosphine) for 1 h at 37°C, followed by alkylation using 10 mM MMTS (methyl methanethiosulfonate) for 1 h at RT . In-solution trypsin digestion of the complex protein mixture was performed by the addition of trypsin at 1:25 for 5 h at 37°C followed by 1:50 digestion overnight. Peptide digests (ca. 100 μg) were fractionated by 2D-LC-MS/MS, first on an offline Polysulfoethyl-A SCX column (4.6 × 50 mm, Nest Group, USA). Fractions collected from the SCX separation were then delivered from 96-well plates to a RP-C18 column (BioBasic C18, 75 μm × 10 cm, New Objective, USA), online with an ion trap mass spectrometer (LTQ, ThermoElectron). Spectra were acquired in automated MS/MS mode with the top five parent ions selected for fragmentation. LC-MS/MS was performed in three sequential m/z subscans (300–650, 650–900, 900–1500 m/z) to increase the sampling depth . MS/MS data from sequential runs were combined for analysis and searched by the Mascot search engine (Matrix Science) against a S. dysenteriae Sd197 database, a subset created from a non-redundant NCBI protein database. Mascot search parameters allowed for tryptic specificity of up to one missed cleavage, with methylthio-modifications of cysteine as a fixed modification and oxidation of methionine as a variable modification. Mascot search results of three replicate 2D-LC-MS/MS experiments were validated by PeptideProphet™ and ProteinProphet™  which are part of the Trans-Proteomic Pipeline (TPP) accessed at http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP.
Quantitation of a ten protein mixture using the APEX method
A ten protein standard mixture was initially used to assess the accuracy of the computational quantitation performed with the APEX Quantitative Proteomics Tool . Proteins were mixed in known concentrations ranging from 1 to 500 pmol in 0.1 M TAB, pH 8.5, denatured in 0.1% SDS, reduced with 5 mM TCEP for 1 h at 37°C, alkylated with 10 mM MMTS for 1 h at RT, and digested with trypsin (1:50) at 37°C overnight. The resulting peptides were analyzed by LC-MS/MS (LTQ) in three sequential m/z subscans (300–650, 650–900, 900–1500 m/z). LC-MS/MS data from three replicate runs were searched by Mascot against a NCBInr database, and the Mascot results validated by PeptideProphet™ and ProteinProphet™ analyses . Employing the APEX tool , a training dataset was generated, O i values calculated, and APEX abundances estimated by normalizing for the measured total protein concentration, as described in more detail for the APEX quantitation of SD1 proteins.
APEX quantitation from LC-MS/MS data of SD1 cell lysates
The APEX quantitation of SD1 proteins using the APEX Quantitative Proteomics Tool consisted of three steps: building a SD1 training dataset, computing SD1 protein O i (expected number of unique proteotypic peptides for protein i) values, and calculating SD1 protein APEX abundances. Proteins in the training dataset were chosen based on the 100 most abundant SD1 proteins in order to generate a species-specific training dataset. A list of the top 100 SD1 proteins was generated based on high spectral counts per protein and high protein and peptide identification probabilities . The training dataset .ARFF file was constructed based on 35 peptide sequence attributes including mass, length, pI, charge, hydrophobicity measures, amino acid composition, amino acid frequencies within secondary peptide structures and other peptide physicochemical properties deemed significant for the computational prediction of proteotypic peptides [14, 17]. The list of all 35 peptide physicochemical attributes is provided to users of the APEX tool at http://pfgrc.jcvi.org/index.php/bioinformatics/apex.html.
To compute SD1 protein O i values, the Random Forest classifier algorithm available from the Weka data mining software package at http://www.cs.waikato.ac.nz/ml/weka was employed. Random Forest is the default classifier algorithm of the APEX tool due to its high performance . The classifier algorithm was applied to the SD1 training dataset constructed in the previous step, and then to all tryptic peptides generated in silico from the SD1 proteome to enable computation of SD1 protein O i values. APEX abundances of the SD1 proteins observed by 2D-LC-MS/MSwere calculated using the protXML file generated from the PeptideProphet™ and ProteinProphet™ validation of the Mascot search results and the SD1 protein O i values. A <5% false positive rate (FPR) was chosen, along with a normalization factor of 2.5 × 106. The normalization factor in the APEX tool is equivalent to the term C in the APEX equation , which represents the total concentration of protein molecules per cell. Since S. dysenteriae is very closely related to E. coli, the total number of protein molecules/cell estimated at 2–3 × 106 for E. coli  was used as a normalization factor in the APEX abundance measurements of S. dysenteriae proteins.
2D gel analysis of SD1 cell lysate
Following cell lysis, the extracted SD1 proteins were analyzed in 2D gels and by MS as described previously [24, 25]. Briefly, ca. 110 μg of protein was loaded onto 24 cm IPG strips (GE Healthcare) with pI range 4–7. The first-dimension protein separation in IPG strips and the second-dimension (SDS-PAGE) polyacrylamide slab gel separation (25 × 19 × 0.15 cm), as well as the Coomassie Brilliant Blue G-250 (CBB) gel staining and scanning procedures, were performed as described previously [24, 25]. For protein spot detection, scanned 2D gel images were analyzed by the gel image analysis software Proteomweaver v.4.0 (Bio-Rad). Tryptic peptides extracted from protein gel plugs of interest were analyzed by MALDI-TOF/TOF (4700 Proteomics Analyzer, Applied Biosystems), as well as LC-MS/MS (LTQ, ThermoElectron) interfaced with a nano-LC system (Agilent). The Mascot search engine was employed to search data against the S. dysenteriae Sd197 database, and the results viewed in an in-house LIMS system. MS protein identifications were matched to the excised protein spots. The 2D spots that matched to a single protein with high confidence were considered for quantitative comparison with APEX estimations of protein abundances.
Estimation of protein abundances from 2D gel spot intensities
where the numerator I i is the (average) spot intensity of any protein i, while the denominator represents the total spot intensity of all spots detected. As in the APEX calculations, the term C represents the total number of protein molecules per cell (estimated to be 2.5 × 106) or the measured total protein concentration in the sample . This approach allowed us to convert relative spot intensity volumes into protein abundances (molecules/cell) that were used for the comparative quantitative analysis with the APEX method.
Comparison of APEX-computed protein quantities with known quantities of a ten protein standard mixture
A ten protein mixture consisting of bovine α-casein (10 pmol), bovine cytochrome c (20 pmol), bovine serum albumin (40 pmol), bovine deoxyribonuclease (500 pmol), chicken lysozyme (5 pmol), chicken ovalbumin (100 pmol), equine myoglobin (60 pmol), rabbit glycogen phosphorylase (2 pmol), human transferrin (1 pmol) and human carbonic anhydrase I (200 pmol) was digested and analyzed by LC-MS/MS. The average number of MS/MS spectra was 10218 from three replicate analyses. APEX-calculated protein abundance estimates correlated well with the injected protein concentrations, with Spearman rank correlation coefficient R s = 0.98 and squared Pearson correlation coefficient R 2 = 0.92 (Additional File 1). Interestingly, the APEX values for proteins in the low molarity range (1–20 pmol) were more precise than those for proteins with high molarities (500 pmol), possibly attributable to the saturation of MS/MS spectral sampling at very high protein concentrations. The correlations dropped significantly (R s = 0.79, R 2 = 0.68) when APEX abundances were estimated without the calculation of O i values (O i = 1), emphasizing the importance of accurate O i (expected number of unique proteotypic peptides for protein i) values for reliable protein abundance measurements.
Profile of SD1 proteins in Coomassie-Blue-stained 2D gels and quantitative analysis
For a correlation analysis with the APEX method, relative abundances of CBB-stained 2D gel spots and spot trains were converted to molecule/cell estimates. An equation described in the Materials and Methods section was used for this conversion, based on an estimate of 2.5 × 106 total protein molecules per cell and on the simplifying assumption that individual proteins were stained with CBB with roughly equal efficiency. From these calculations, the most abundant proteins in 2D gels were GroEL, GadB and TufA, each with >35,000 molecules/cell. These proteins are indeed known to be highly abundant in stationary phase cells of γ-proteobacteria [27, 28]. Surveyed as the least abundant proteins were the putative sugar-dephosphorylating enzyme YidA (gene locus SDY_4179) and the galactose-binding transport protein MglB, with <550 molecules/cell (Additional File 2).
Profile of SD1 proteins using 2D-LC-MS/MS and APEX for quantitative analysis
333,374 MS/MS spectra (average of three datasets) were generated by the 2D-LC-MS/MS analysis of SD1 proteins. Among the 1214 proteins identified from Mascot searches of LC-MS/MS runs, 1148 proteins were validated by the algorithms PeptideProphet™ and ProteinProphet™, assuming a FPR of <5%. Thirty-five of these proteins were derived from the virulence-associated pSD1 plasmid, including invasion plasmid antigens and type III secretion system components. More than 250 hypothetical proteins were identified demonstrating that the corresponding genes were indeed expressed. The coverage of the genome-predicted SD1 proteome was ca. 26%. This dataset was subjected to protein quantitation using the APEX Quantitative Proteomics Tool (Fig. 1). The Random Forest classifier algorithm was trained on a high quality training dataset of 100 abundant proteins to predict protein O i values. The algorithm classified ca. 23% of the peptides in the training dataset as 'observed', compared to ca. 9% reported previously . In addition, the 'observed' peptides were predicted with a F-measure of 0.75 (0.72 precision and 0.8 recall), while 'non-observed' peptides were predicted with a much higher F-measure (0.94 precision and 0.91 recall). This increased the overall accuracy of correct classifications on the training dataset by the classifier to ca. 88%. These results supported the notion that the proteins chosen for the training dataset resulted in the identification of a large number of proteotypic peptides, which in turn permitted better estimation of protein abundances.
APEX abundance values were calculated using SD1 protein-specific O i values normalized by an estimated total number of 2.5 × 106 protein molecules/cell . The proteins are listed in the APEX protein quantitation table (Additional File 3). The most abundant proteins were the DNA-binding protein HU-alpha (HupA), the global regulator Dps and the PTS system protein PtsH, each estimated at >30,000 molecules (ca. 1.2% of total protein/cell). GroEL, GadB and TufA, the most abundant proteins from 2D gel measurements, also yielded high copy numbers (ca. 25,000 molecules, 1% of total protein/cell) using the APEX method. Estimates for the 100 least abundant proteins were in the range of 20 to 250 molecules per cell (ca. 0.001% to 0.01% of total protein/cell). For example, formate acetyltransferase 3 (TdcE), the Fe-S subunit of a putative oxidoreductase (YffG), and the large subunit of glutamate synthase (GltB) were calculated to be present at less than 30 molecules/cell. The dynamic range of APEX-based protein abundance measurements was 103, about one order of magnitude higher than that of CBB-stained spot intensity quantitation from 2D gels. Correlation of SD1 protein APEX estimates with protein properties such as isoelectric point (pI) and net charge followed previously reported trends, with no significant correlation observed for these protein properties . Apparently, the combination of LC-MS/MS and APEX introduces little bias in abundance measurements based on protein characteristics such as protein pI or net charge. Of note, the APEX vs. 2D gel comparison of proteins with pI values >7 is of limited value, because most proteins are not displayed in the pH range of gels examined here (4 to 7).
Biological and biochemical implications of APEX protein abundance data
Stoichiometric ratios of protein complexes as quantitated by APEX
(b) E. coli stoichiometric ratio
SD1 APEX ratio
(c)SD1 APEX abundances (± sd)
6889(± 827):7004(± 651)
2012(± 302):2086(± 121)
1534(± 549):3258(± 460)
6716(± 739):7029(± 707)
4869(± 740):5917(± 44)
1486(± 125):2953(± 955)
2095(± 301):4312(± 1016)
8713(± 216):8848(± 673)
1052(± 295):1491(± 595)
477(± 25):699(± 76)
1:1 or 1:5
2272(± 216):13673(± 303)
For a few protein complexes, the observed APEX stoichiometry was different from the reported ratio. The thioredoxin peroxidase AhpC/AhpF is composed of an equimolar dimer-dimer assembly according to the EcoCyc database, but the observed APEX ratio was 6:1. Interestingly, further review of the literature suggested decamer formation of AhpC in a reduced state, whereas the dimer is formed in an oxidized state . Thus, the examined stationary phase growth state of SD1 cells appeared to favor the reduced, active AhpC state, which is linked to reduction of hydroperoxide substrates. Correlation decreased for ratios of subunits that formed part of membrane-associated protein complexes. The integral outer membrane protein YaeT and four lipoproteins (NlpB, SmpA, YfiO and YfgL) each supposedly contribute a monomer to a five-protein outer membrane complex. The APEX quantitated stoichiometry of proteins in this complex was 2.8:1.9:4.6:1:10.6, respectively. A similar case was seen for subunits of the F1-ATP synthase complex. In comparison to AtpA and AtpD, the subunits AtpG and AtpH revealed lower APEX-calculated quantities than those expected from the reported stoichiometry of 3:3:1:1 (AtpA:AtpD:AtpG:AtpH) , with the observed stoichiometry being 8.4:8.3:1.4:1. Presumably, the causes were differences in the efficiency of extracting individual subunits from membranes during cell lysate preparation, with AtpA and AtpD being more soluble peripheral membrane proteins . Stoichiometric ratios of subunits of four protein complexes were also determined from 2D gel data. They deviated more from the expected ratios than those determined by the APEX method. For example, the stoichiometric ratios for SucC/SucD and SdhB/SdhA were 1:1.42 and 1:1.53 (2D gel), and 1:1.01 and 1:1.21 (APEX), respectively, whereas the expected ratios are 1:1 for both protein complexes.
Comparison of protein profiles and quantitative data derived from APEX and 2D gel analyses
Comparison of proteins quantitated by 2D-LC-MS/MS-APEX vs. 2-DE
Abundance range (molecules/cell)
~20 to ~45000
~500 to ~52000
Mr range (kDa)
6.4 – 163.3
8.3 – 99.7
3.59 – 11.81
4.52 – 8.48
Net charge range
33.74 to -50
3.65 to -40
1.36 to -1.53
0.31 to -1.53
0.01 to 0.18
0.01 to 0.14
Protein physicochemical properties affect APEX vs. 2D gel abundance correlations
In contrast to 2D gels, proteins identified by 2D-LC-MS/MS included the alkaline pI range (Table 1). Our ability to compare proteins quantitated in 2D gels vs. APEX was compromised by the fact that proteins in 2D gels were only focused in the pI range of 4–7, thus excluding basic proteins from a meaningful quantitative analysis. The distribution of proteins detected by the APEX method followed a bimodal pattern with two distinct clusters for acidic proteins vs. basic proteins. Proteins with pI values in the pH range 7–8 are relatively rare due to their lower solubility at a near-neutral net charge under physiological growth conditions. Most of the predicted proteins for the Sd197 genome were observed in the 5–6 pI range, as predicted for other organisms , and reflected in the relative distribution of proteins quantitated by the APEX method and in 2D gels. The R s and R 2 values for distinct pI ranges of proteins with pI values <7 did not deviate from the correlation for all proteins. Net charge of a protein at pH 7 was then calculated to determine the correlation of protein abundances based on charge. About 96% of the Sd197 proteins were predicted in the net charge range of -20 to 20 units, with ca. 94% of the proteins quantitated by APEX and in 2D gels within that range. Proteins with a net positive charge >20 and <-40 (at pH 7) were particularly rare in the 2D gel dataset. The R s and R 2 values for net charge ranges were in good agreement with those observed for distinct pI ranges. The correlation of APEX vs. 2D gel abundance measurements for moderately acidic proteins in the net charge range from 0 to -10 (R s = 0.82, R 2 = 0.66, n = 164), and for strongly acidic proteins (R s = 0.78, R 2 = 0.67, n = 84) was close to that of the overall correlation, indicating no quantitative bias based on protein pI or net charge.
In summary, the evaluation of qualitative and quantitative data comparing APEX and 2D gels revealed several advantages of the APEX method: (1) higher detection sensitivity of the digested peptides via LC-MS/MS compared to proteins in CBB-stained 2D gels; (2) fewer constraints in the detection of peptides featuring a variety of physicochemical characteristics per protein (APEX) compared to that of proteins via 2D gel spots; (3) higher dynamic range of peptide spectral counts (LC-MS/MS) than that of proteins detected in CBB-stained 2D gel spots. Other computationally adjusted LC-MS/MS spectral counting methods [51, 52] may perform as well as APEX for global protein quantitation. Although these methods also employ peptide detectability, they were explored only in the context of relative quantitation, rather than absolute quantitation as performed by the APEX method. With appropriate adjustments to sample preparation procedures, shortcomings of the APEX method regarding quantitation of hydrophobic and membrane-bound proteins can likely be addressed. On the other hand, unlike LC-MS/MS-based methods, 2D gels retain the advantage that post-translational modification processes and functional characteristics of proteins are often measurable qualities and useful in interpreting biological processes.
While 2D gels have been used for more than 40 years for highly parallel protein quantitation, the APEX method was developed very recently by integrating spectral counting with computational predictions of proteotypic peptides from LC-MS/MS datasets to estimate protein abundances . In this report, proteomic datasets derived from cell lysates of S. dysenteriae serotype 1 were subjected to a direct comparison of these label-free quantitation methods. Applying the APEX Quantitative Proteomics Tool  to a high quality training dataset of 100 high abundance SD1 proteins ensured that optimal parameters and Oi values were established for the SD1 APEX quantitation. In-depth analysis of MS data obtained from replicate 2D gels also served as a quality control step. Proteins whose spot assignments were not reproducible or revealed evidence for extensive spot overlaps were not included in the APEX vs. 2D gel correlation analysis.
Strategies to enable absolute quantitation of proteins from 2D gels have involved radioactive labeling of proteins and scintillation counting of protein spots , while fluorescent dyes have been generally employed for relative protein quantitation (differential display), e.g. 2D-DIGE . Previous studies comparing APEX with 2D gel abundance measurements from 2D-DIGE and radioactive labeling resulted in lower correlations of R 2 = 0.21 for 210 E. coli proteins and R 2 = 0.52 for 48 yeast proteins . The usual quantitative analysis mode of CBB-stained 2D gels is also differential display which results in spot quantitation relative to another dataset. In this study, a direct label-free comparison of abundance measurements (APEX vs. CBB-stained 2D gels) was performed, which required the estimation of absolute protein abundances derived from relative spot quantities in 2D gels. This was achieved via an equation incorporating a factor estimating total protein molecules/cell corrected by the estimated ratio of gel-visualized vs. total protein per sample. A nonlinear relationship between spot intensity volumes and actual protein amounts has been mentioned as a caveat for measurements of accurate protein abundance in 2D gels [3, 43]. This pertains to the fact that spot staining saturation occurs for highly abundant proteins and to the notion that individual proteins differ in their affinity to the staining dye used. The dataset on highly abundant SD1 proteins resulted in a decreased correlation with APEX values, compared to the correlation for the entire SD1 dataset, suggesting that saturation effects may have compromised the accuracy of CBB-stained 2D spot intensity measurements. More sensitive fluorescent dyes such as SYPRO Ruby increase the dynamic range of protein abundance measurements in 2D gels and reduce the problem of spot saturation. In theory, this could result in improved protein abundance correlations with the APEX method. Technical problems, however, often limit the value of using a more sensitive 2D gel dye. Such problems include insufficient spot resolution, which is detrimental to the quantitation of low abundance proteins, and the requirement of high resolution imaging systems to detect the increased dynamic range of fluoresecent dye-stained 2D spots. CBB is still a widely used dye for 2D gel-based proteomic studies [54, 55] and, therefore, a good first choice for the APEX vs. 2D gel-based comparative analysis.
The overall correlation between APEX- and 2D gel-based protein abundances yielded a R s value of 0.81 and a R 2 value of 0.67. In comparison to the correlation for all 255 proteins, abundance correlations increased for subsets of proteins with distinct physicochemical properties. Based on protein Mr values, correlation of abundance estimates improved for 182 proteins in the Mr range 20 – 70 kDa (R 2 = 0.73), while the correlation decreased considerably for low Mr proteins (R 2 = 0.51). Very low Mr (<15 kDa) and very high Mr (>100 kDa) proteins are more challenging to quantitate, for reasons better known in the context of 2D gels , such as inefficient fixing and staining of low Mr proteins, and modifications of amino acid residues giving rise to multiple variants of high Mr proteins. During sample preparation for 2D-LC-MS/MS, protein loss due to ineffective acetone precipitation of low Mr proteins may result in the underestimation of protein quantities. Of note, protein abundances estimated by APEX correlated inversely with protein Mr . The underlying reasons appear to be biological rather than technical . Schmidt et al.  reported that 2D gel analysis and ICAT-LC/MS, a peptide-based quantitation relying on isotope-labeled cysteine residues in proteins, each resulted in underestimation of proteins with Mr values <10 kDa. Our data support the notion that, if a low Mr protein has several unique proteotypic peptides with high identification probabilities by LC-MS/MS, the APEX method is well suited for quantitation (e.g. YjbJ with a Mr = 8.3 kDa in this dataset). In contrast, a low Mr protein with a small number of proteotypic peptides (e.g. EmrR with a Mr = 20.5 kDa in this dataset) may be less accurately measured by the APEX method.
Limitations in the quantitation of alkaline and hydrophobic proteins in 2D gels have been described previously . Due to the fact that the examined pI range of 2D gels was 4 – 7 in this study, the correlation analysis was more applicable to hydrophobic proteins than to basic proteins. The correlation between APEX and 2D gel datasets decreased with high protein hydrophobicity. There is considerable evidence for wide-spread quantitative underestimation of hydrophobic proteins in 2D gels . Such proteins are usually membrane-integrated or membrane-anchored, characteristics that lower protein solubilization and resolution in 2D gels. In the 2D gel dataset, 7.3% of the identified proteins were predicted to be membrane-associated, while the membrane-associated proteins formed 18.5% of the APEX dataset. Also, for very hydrophobic proteins such as Pfs and YhlB (hydropathy score >0.3) quantitated in the common protein dataset, abundance estimates in 2D gels were ca. two- to threefold lower than the equivalent APEX abundance measurements. This is in contrast to a report by Schmidt et al.  where 2D gels overestimated proteins in the hydrophobic range compared to ICAT-LC/MS. Inadvertent mislabeling of hydrophobic and hydrophilic score ranges in a figure pertaining to this experiment, however, may be the explanation (Jungblut, personal communication). Interestingly, the comparison of stoichiometric ratios of protein subunits that were part of soluble and membrane protein complexes allowed us to assess 2D-LC-MS/MS-APEX measurement accuracies. The stoichiometric ratios for the examined membrane protein complexes deviated more from the expected values than the ratios for soluble protein complexes. Likely causes of the differences in ratios comparing APEX values vs. known stoichiometric ratios of E. coli membrane protein complexes were ineffective protein solubilization and/or tryptic digestion. We cannot exclude the possibility that hydrophobic peptide analysis by LC-MS/MS followed by APEX computational adjustments also influenced the measurement accuracy of membrane protein complexes. Quantitative subunit ratios were unavailable for all but four protein complexes in the 2D gel dataset and deviated more from the expected ratios compared to the corresponding APEX dataset.
We are not aware of other reports comparing LC-MS/MS-based, computationally modified protein quantitation data with quantitation from CBB-stained 2D gel spot intensity data. Our study demonstrates a generally good correlation between 2D gel and APEX quantitative measurements. The combination of APEX and 2D gels in proteomic analyses is of interest because these methodologies are inexpensive, versatile and bypass chemical or isotope-labeling steps that can introduce more experimental variability in quantitative analysis experiments. The combination of quantatitive 2D gel and APEX analyses is a powerful tool in proteomics research. 2D gels provide the advantages of visual proteome representation and easy detection of protein isoforms with modifications resulting from Mr and pI changes, which are often biologically significant [1, 56]. Examples observed here are: (1) the periplasmic protein Agp (spot # 35, Fig. 2) whose spot pI precisely matches that of a protein N-terminally truncated by 22 amino acids, indicative of signal peptide cleavage; (2) the chaperone/protease ClpB (spot # 30, Fig. 2), which was displayed in isoforms, one with an N-terminal truncation of ca. 160 amino acids; this N-terminal region has been linked to a binding site critical for activation of ClpB . The APEX method, which is more sensitive and has a higher dynamic range of quantitation, yields comprehensive protein abundance data. APEX also shows promise for determination of stoichiometric ratios of subunits part of protein complexes. We demonstrated that the ratios of subunits of a variety of soluble protein complexes derived from APEX measurements were close to the experimentally reported stoichiometries. We also discussed an example where the stoichiometric ratio of a protein complex, the peroxidase AhpC/AhpF, implied a specific structure-function relationship. The observed 6:1 APEX ratio (AhpC:AhpF) suggested a reduced, active state of AhpC associated with substrate reduction . In proteomics, such quantitative data is ideally combined with parallel analysis of native protein complexes, e.g. BN-PAGE , a tool that directly reveals participation of proteins in a specific complex. However, BN-PAGE is not as sensitive and quantitatively accurate as the APEX method. In conclusion, we identified an additional area in protein research where APEX will be a useful discovery tool.
Absolute protein expression
Attribute relation file format
Coomassie Brilliant Blue
False positive rate
Laboratory Information Management System
Liquid chromatography with tandem mass spectrometry
Matrix-assisted laser desorption ionization with tandem time of flight
- Mr :
protein molecular weight
- O i :
estimation of expected number of unique proteotypic peptides for a given protein i
protein isoelectric point
- R 2 :
squared Pearson correlation coefficient
- R s :
Spearman rank correlation coefficient
Shigella dysenteriae type 1
Triethyl ammonium bicarbonate
This work was funded by the Pathogen Functional Genomics Resource Center (PFGRC), through a contract awarded by the National Institutes of Allergy and Infectious Diseases (NIAID), contract No. N01-AI-15447, awarded to the J. Craig Venter Institute (JCVI), Rockville, Maryland, USA. We wish to thank Christine Vogel and Edward M. Marcotte from the University of Texas at Austin for extremely helpful and extensive discussions regarding the APEX methodology. At Tufts University, this project was funded in whole or in part with Federal funds from the NIAID, NIH, DHHS, under contract number N01-AI-30050.
- Gorg A, Weiss W, Dunn MJ: Current two-dimensional electrophoresis technology for proteomics. Proteomics 2004, 4: 3665–3685. 10.1002/pmic.200401031PubMedView Article
- Gygi SP, Corthals GL, Zhang Y, Rochon Y, Aebersold R: Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. Proc Natl Acad Sci USA 2000, 97: 9390–9395. 10.1073/pnas.160270797PubMed CentralPubMedView Article
- Moritz B, Meyer HE: Approaches for the quantification of protein concentration ratios. Proteomics 2003, 3: 2208–2220. 10.1002/pmic.200300581PubMedView Article
- Smolka M, Zhou H, Aebersold R: Quantitative protein profiling using two-dimensional gel electrophoresis, isotope-coded affinity tag labeling, and mass spectrometry. Mol Cell Proteomics 2002, 1: 19–29. 10.1074/mcp.M100013-MCP200PubMedView Article
- Pietrogrande MC, Marchetti N, Dondi F, Righetti PG: Spot overlapping in two-dimensional polyacrylamide gel electrophoresis maps: relevance to proteomics. Electrophoresis 2003, 24: 217–224. 10.1002/elps.200390018PubMedView Article
- Link AJ, Eng J, Schieltz DM, Carmack E, Mize GJ, Morris DR, Garvik BM, Yates JR 3rd: Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol 1999, 17: 676–682. 10.1038/10890PubMedView Article
- Roe MR, Griffin TJ: Gel-free mass spectrometry-based high throughput proteomics: tools for studying biological response of proteins and proteomes. Proteomics 2006, 6: 4678–4687. 10.1002/pmic.200500876PubMedView Article
- Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M: Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 2002, 1: 376–386. 10.1074/mcp.M200025-MCP200PubMedView Article
- Steen H, Pandey A: Proteomics goes quantitative: measuring protein abundance. Trends Biotechnol 2002, 20: 361–364. 10.1016/S0167-7799(02)02009-7PubMedView Article
- Old WM, Meyer-Arendt K, Aveline-Wolf L, Pierce KG, Mendoza A, Sevinsky JR, Resing KA, Ahn NG: Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol Cell Proteomics 2005, 4: 1487–1502. 10.1074/mcp.M500084-MCP200PubMedView Article
- Chelius D, Bondarenko PV: Quantitative profiling of proteins in complex mixtures using liquid chromatography and mass spectrometry. J Proteome Res 2002, 1: 317–323. 10.1021/pr025517jPubMedView Article
- Zhang B, VerBerkmoes NC, Langston MA, Uberbacher E, Hettich RL, Samatova NF: Detecting differential and correlated protein expression in label-free shotgun proteomics. J Proteome Res 2006, 5: 2909–2918. 10.1021/pr0600273PubMedView Article
- Rappsilber J, Ryder U, Lamond AI, Mann M: Large-scale proteomic analysis of the human spliceosome. Genome Res 2002, 12: 1231–1245. 10.1101/gr.473902PubMed CentralPubMedView Article
- Lu P, Vogel C, Wang R, Yao X, Marcotte EM: Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 2007, 25: 117–124. 10.1038/nbt1270PubMedView Article
- Ishihama Y, Oda Y, Tabata T, Sato T, Nagasu T, Rappsilber J, Mann M: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol Cell Proteomics. 2005,4(9):1265–1272. 10.1074/mcp.M500061-MCP200PubMedView Article
- Kuster B, Schirle M, Mallick P, Aebersold R: Scoring proteomes with proteotypic peptide probes. Nat Rev Mol Cell Biol 2005, 6: 577–583. 10.1038/nrm1683PubMedView Article
- Mallick P, Schirle M, Chen SS, Flory MR, Lee H, Martin D, Ranish J, Raught B, Schmitt R, Werner T, et al.: Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol 2007, 25: 125–131. 10.1038/nbt1275PubMedView Article
- Vogel C, Marcotte EM: Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data. Nat Protoc 2008, 3: 1444–1451. 10.1038/nport.2008.132PubMedView Article
- Levine MM, Kotloff KL, Barry EM, Pasetti MF, Sztein MB: Clinical trials of Shigella vaccines: two steps forward and one step back on a long, hard road. Nat Rev Microbiol 2007, 5: 540–553. 10.1038/nrmicro1662PubMed CentralPubMedView Article
- Donohue-Rolfe A, Keusch GT, Edson C, Thorley-Lawson D, Jacewicz M: Pathogenesis of Shigella diarrhea. IX. Simplified high yield purification of Shigella toxin and characterization of subunit composition and function by the use of subunit-specific monoclonal and polyclonal antibodies. J Exp Med 1984, 160: 1767–1781. 10.1084/jem.160.6.1767PubMedView Article
- Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, et al.: The APEX Quantitative Proteomics Tool: Generating protein quantitation estimates from LC-MS/MS proteomics results. BMC Bioinformatics 2008, 9: 529. 10.1186/1471-2105-9-529PubMed CentralPubMedView Article
- Ross PL, Huang YN, Marchese JN, Williamson B, Parker K, Hattan S, Khainovski N, Pillai S, Dey S, Daniels S, et al.: Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 2004, 3: 1154–1169. 10.1074/mcp.M400129-MCP200PubMedView Article
- Keller A, Eng J, Zhang N, Li XJ, Aebersold R: A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol Syst Biol 2005, 1: 2005.0017. 10.1038/msb4100024PubMed CentralPubMedView Article
- Gatlin CL, Pieper R, Huang ST, Mongodin E, Gebregeorgis E, Parmar PP, Clark DJ, Alami H, Papazisi L, Fleischmann RD, et al.: Proteomic profiling of cell envelope-associated proteins from Staphylococcus aureus. Proteomics 2006, 6: 1530–1549. 10.1002/pmic.200500253PubMedView Article
- Pieper R, Gatlin-Bunai CL, Mongodin EF, Parmar PP, Huang ST, Clark Fleischmann RD, Gill SR, Peterson SN: Comparative proteomic analysis Staphylococcus aureus strains with differences in resistance to the cell wall-targeting antibiotic vancomycin. Proteomics 2006, 6: 4246–4258. 10.1002/pmic.200500764PubMedView Article
- Meibom KL, Dubail I, Dupuis M, Barel M, Lenco J, Stulik J, Golovliov Sjostedt A, Charbit A: The heat-shock protein ClpB of Francisella tularensis is involved in stress tolerance and is required for multiplication in target organs of infected mice. Mol Microbiol 2008, 67: 1384–1401. 10.1111/j.1365-2958.2008.06139.xPubMedView Article
- Liao X, Ying T, Wang H, Wang J, Shi Z, Feng E, Wei K, Wang Y, Zhang X, Huang L, et al.: A two-dimensional proteome map of Shigella flexneri. Electrophoresis 2003, 24: 2864–2882. 10.1002/elps.200305519PubMedView Article
- Foster JW: Escherichia coli acid resistance: tales of an amateur acidophile. Nat Rev Microbiol 2004, 2: 898–907. 10.1038/nrmicro1021PubMedView Article
- Coghlan A, Wolfe KH: Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast 2000, 16: 1131–1145. 10.1002/1097-0061(20000915)16:12<1131::AID-YEA609>3.0.CO;2-FPubMedView Article
- Keseler IM, Bonavides-Martinez C, Collado-Vides J, Gama-Castro S, Gunsalus RP, Johnson DA, Krummenacker M, Nolan LM, Paley S, Paulsen IT, et al.: EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res 2009, 37: D464–470. 10.1093/nar/gkn751PubMed CentralPubMedView Article
- Kwon AR, Kessler BM, Overkleeft HS, McKay DB: Structure and reactivity of an asymmetric complex between HslV and I-domain deleted HslU, a prokaryotic homolog of the eukaryotic proteasome. J Mol Biol 2003, 330: 185–195. 10.1016/S0022-2836(03)00580-1PubMedView Article
- Bordier C, Rossetti GP: Subunit composition of Escherichia coli RNA polymerase during transcription in vitro. Eur J Biochem 1976, 65: 147–153. 10.1111/j.1432-1033.1976.tb10399.xPubMedView Article
- Miwa K, Yoshida M: The alpha 3 beta 3 complex, the catalytic core of F1-ATPase. Proc Natl Acad Sci USA 1989, 86: 6484–6487. 10.1073/pnas.86.17.6484PubMed CentralPubMedView Article
- Wood ZA, Poole LB, Hantgan RR, Karplus PA: Dimers to doughnuts: redox-sensitive oligomerization of 2-cysteine peroxiredoxins. Biochemistry 2002, 41: 5493–5504. 10.1021/bi012173mPubMedView Article
- Boyer PD: A research journey with ATP synthase. J Biol Chem 2002, 277: 39045–39061. 10.1074/jbc.X200001200PubMedView Article
- Zhou Y, Duncan TM, Cross RL: Subunit rotation in Escherichia coli FoF1-ATP synthase during oxidative phosphorylation. Proc Natl Acad Sci USA 1997, 94: 10583–10587. 10.1073/pnas.94.20.10583PubMed CentralPubMedView Article
- Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 2005, 21: 617–623. 10.1093/bioinformatics/bti057PubMedView Article
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001, 305: 567–580. 10.1006/jmbi.2000.4315PubMedView Article
- Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004, 340: 783–795. 10.1016/j.jmb.2004.05.028PubMedView Article
- Juncker AS, Willenbrock H, von Heijne G, Brunak S, Nielsen H, Krogh A: Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci 2003, 12: 1652–1662. 10.1110/ps.0303703PubMed CentralPubMedView Article
- Berven FS, Flikka K, Jensen HB, Eidhammer I: BOMP: a program to predict integral beta-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria. Nucleic Acids Res 2004, 32: W394–399. 10.1093/nar/gkh351PubMed CentralPubMedView Article
- Rabilloud T, Vaezzadeh AR, Potier N, Lelong C, Leize-Wagner E, Chevallet M: Power and limitations of electrophoretic separations in proteomics strategies. Mass Spectrom Rev 2008, in press.
- Schmidt F, Donahoe S, Hagens K, Mattow J, Schaible UE, Kaufmann SH, Aebersold R, Jungblut PR: Complementary analysis of the Mycobacterium tuberculosis proteome by two-dimensional electrophoresis and isotope-coded affinity tag technology. Mol Cell Proteomics 2004, 3: 24–42.PubMedView Article
- Blankenhorn D, Phillips J, Slonczewski JL: Acid- and base-induced proteins during aerobic and anaerobic growth of Escherichia coli revealed by two-dimensional gel electrophoresis. J Bacteriol 1999, 181: 2209–2216.PubMed CentralPubMed
- Weichart D, Querfurth N, Dreger M, Hengge-Aronis R: Global role for ClpP-containing proteases in stationary-phase adaptation of Escherichia coli. J Bacteriol 2003, 185: 115–125. 10.1128/JB.185.1.115-125.2003PubMed CentralPubMedView Article
- Lopez-Campistrous A, Semchuk P, Burke L, Palmer-Stone T, Brokx SJ, Broderick G, Bottorff D, Bolch S, Weiner JH, Ellison MJ: Localization, annotation, and comparison of the Escherichia coli K-12 proteome under two states of growth. Mol Cell Proteomics 2005, 4: 1205–1209. 10.1074/mcp.D500006-MCP200PubMedView Article
- Rosen R, Ron EZ: Proteome analysis in the study of the bacterial heat-shock response. Mass Spectrom Rev 2002, 21: 244–265. 10.1002/mas.10031PubMedView Article
- Wei C, Yang J, Zhu J, Zhang X, Leng W, Wang J, Xue Y, Sun L, Li W, Jin Q: Comprehensive proteomic analysis of Shigella flexneri 2a membrane proteins. J Proteome Res 2006, 5: 1860–1865. 10.1021/pr0601741PubMedView Article
- Kyte J, Doolittle RF: A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982, 157: 105–132. 10.1016/0022-2836(82)90515-0PubMedView Article
- Lobry JR, Gautier C: Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. Nucleic Acids Res 1994, 22: 3174–3180. 10.1093/nar/22.15.3174PubMed CentralPubMedView Article
- Tang H, Arnold RJ, Alves P, Xun Z, Clemmer DE, Novotny MV, Reilly JP, Radivojac P: A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 2006, 22: e481–488. 10.1093/bioinformatics/btl237PubMedView Article
- Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmstrom J, Brunner E, Mohanty S, Lercher MJ, Hunziker PE, et al.: Comparative Functional Analysis of the Caenorhabditis elegans and Drosophila melanogaster Proteomes. PLoS Biol 2009, 7: e48. 10.1371/journal.pbio.1000048PubMedView Article
- Futcher B, Latter GI, Monardo P, McLaughlin CS, Garrels JI: A sampling of the yeast proteome. Mol Cell Biol 1999, 19: 7357–7368.PubMed CentralPubMed
- Westermeier R: Sensitive, quantitative, and fast modifications for Coomassie Blue staining of polyacrylamide gels. Proteomics 2006,6(Suppl 2):61–64. 10.1002/pmic.200690121PubMedView Article
- Sasse J, Gallagher SR: Staining proteins in gels. Curr Protoc Mol Biol 2009,Chapter 10(Unit 10):16.
- Jungblut PR, Holzhutter HG, Apweiler R, Schluter H: The speciation of the proteome. Chem Cent J 2008, 2: 16. 10.1186/1752-153X-2-16PubMed CentralPubMedView Article
- Barnett ME, Zolkiewska A, Zolkiewski M: Structure and activity of ClpB from Escherichia coli. Role of the amino-and -carboxyl-terminal domains. J Biol Chem 2000, 275: 37565–37571. 10.1074/jbc.M005211200PubMed CentralPubMedView Article
- Stenberg F, Chovanec P, Maslen SL, Robinson CV, Ilag LL, von Heijne G, Daley DO: Protein complexes of the Escherichia coli cell envelope. J Biol Chem 2005, 280: 34409–34419. 10.1074/jbc.M506479200PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.