Selection of beech trees and collection of bark samples
Ten healthy trees were identified in seven stands in Fredericton, New Brunswick, Canada (Table
1). Healthy trees comprised only 5% of the beech trees in this area and were included in this study only if they were greater than 15 cm DBH
. All of these stands have been under attack by both Cryptotoccus fagisuga and Neonectria spp. since the early 1930’s
 and would be considered an aftermath forest. After initial mortality waves, the remaining trees in an aftermath forest are primarily heavily cankered and a lower density persistent scale infestation is present in the stand. Diseased trees were selected along with healthy trees in five of the stands. A healthy tree (greater than 10 cm DBH) and a susceptible tree from Ludington, MI., USA were also included in this study. Beech scale is estimated to have been established in Ludington as early as 1990 and the presence of Neonectria was confirmed in 2001
. At the time of tissue collection, 2004, this was considered a killing front. All diseased trees selected for this study showed visible signs of Neonectria infection such as cankers or the presence of perithecia and scale infestation.
The experimental sampling is unbalanced with respect to disease resistance because the primary interest is in resistant genotypes for breeding (more resistant trees are selected for grafting, testing, and planting). Modern statistical algorithms and computer power are sufficient to allow significant imbalance in experiments to be modelled, and we take advantage of this in our experimental design. All trees selected were tested for resistance to the beech scale insect in studies reported previously
[16, 65] and summarized in Table
1. These tests demonstrated that all the healthy trees were resistant to the beech scale insect and all of the diseased trees were susceptible to the beech scale insect.
Branches, 1–2 cm in diameter, were removed from the crown of each of the selected trees in October, 2006 in the New Brunswick stands and in September 2004 in Ludington, MI. Bark was peeled from the branches with a potato peeler and bark strips were placed in labelled 50mL falcon tubes, flash frozen and stored in liquid Nitrogen or a dry ice ethanol bath on site. Peeled bark collected from each tree was divided among three tubes and transferred to a −80°C freezer for storage either at the Natural Resources Canada Lab in Fredericton, New Brunswick, Canada or the US Forest Service (USFS) Lab at Delaware, Ohio, USA. In February of 2007, samples from New Brunswick, Canada were shipped overnight on dry ice to Delaware, OH, USA.
Protein was extracted according to Bona et al.
 with minor modifications to account for the high soluble phenolic content of tree bark and phloem tissues. Bark tissue from each tree was combined with dry ice and ground to a course powder in a standard household coffee grinder and then transferred to a −80°C freezer. Three technical replicates were produced from the tissue from each tree. For each replicate, 2g of powdered tissue (after dry ice sublimated off) were combined with 2g of frozen polyvinyl-polypyrrolidone and 20mL of lysis buffer (as per Bona et al.
 except that 1% Sigma Plant Proteinase inhibitor cocktail, P-9599, was used in place of 2% phenylmethylsulfonyl floride/dimethyl sulphoxide) and homogenized using a tissue homogenizer (Janke & Kunkel, IKA Labortecknik, Ultra-Turrax T25 with 18N tip). The resulting homogenate was centrifuged at 26,000gn for 10 minutes at 4°C to pellet solids. The supernatant (generally 10 mL) was combined with 10 mL of tris(hydroxymethyl)aminomethane (Tris–HCl, pH 8.8) saturated phenol and mixed for one hour at room temperature. The phenolic phase was separated by centrifugation and rinsed with another 10 mL of lysis buffer, followed by further centrifugation to separate the phenolic phase. The final phenolic phase was recovered and proteins were precipitated by adding five volumes of methanol/0.1M ammonium acetate and incubating overnight at −20°C. Proteins were pelleted by centrifuging at 26,000gn for 20 minutes and the resulting pellet rinsed three times with cold methanol, once with cold acetone, and dried under vacuum. The pellet was resolubilized in 450uL of resolubilization buffer (Biorad ReadyPrep sequential extraction reagent II (8M Urea, 4% 3-[(3-cholamidopropyl)dimethylamonio]-1-propanesulphonate (CHAPS), 40mM Tris, 0.2% Bio-Lyte 3/10 ampholytes) plus 1% tris-butyl phosphate (TBP, Sigma T-7567) and 1% plant proteinase inhibitor cocktail). Proteins were quantified (in triplicate) using the Biorad RC DC protein assay kit (Biorad 500–0118) microfuge tube assay protocol with the optional second wash. Protein quality was checked by running 40μg of protein on a denaturing polyacrylamide gel and staining with coomassie stain as per standard protocols (Biorad mini-protean-3 cell Instruction Manual).
Two-dimensional electrophoresis (2-DE)
2-DE was conducted at the Plant Microbe Genomics Facility at The Ohio State University (OSU). Isoelectric focusing (IEF) was performed using 11cm pH 3–10 immobilized pH gradient strips (Biorad, 163–2014) in the Protean IEF Cell (Biorad). For quantitative gels, 100 μg of protein was mixed with rehydration buffer (7.0M Urea, 2.0M Thiourea, 2.0% w/v CHAPS, 2.0% w/v sulfo-betain 10 and focused for 55 kVH at 25°C. After IEF, the strips were treated according to the ReadyPrep Reduction-Alkylation kit (Biorad, 163–2090) which uses TBP for reduction, and iodoacetamide for alkylation. Second dimension separation was carried out on Criterion 8-16% Tris–HCl gels (Biorad 161–1394) using a Criterion Dodeca cell so that all eight gels in the replicate could be run in parallel. Gels were run at 200V for 60 minutes and then fixed for 30 minutes in a solution of 10% methanol and 6% acetic acid. Gels were then stained with 1x SYPRO-Ruby (Biorad, 170–3138) following manufacturer's instructions. Post staining, gels were de-stained for 1 hour in identical solution as that used for fixation. Preparative gels for spot cutting to recover proteins were prepared in the same way, except that 450 μg of protein was used per sample and gels were stained with Coomassie stain (Biorad, 161–0787) following manufacturer's instructions.
Image analysis and quantification
Gels were imaged on a VersaDoc imager (Biorad), and the software program PDQuest (version 8.0, Biorad) was used to conduct the image analysis, spot identification and quantification. Gel images were digitally cropped along the outer edge to remove the molecular size marker and gel edges, and to standardize image size, but both pI fronts and the full size resolving area were retained.
The spot selection and gel matching were conducted in two stages, first a separate master gel was created for each tree by auto-matching the three replicate gels using the 'create experiment' dialog boxes of PDQuest. For these tree master gels, the spot detection and automated spot matching are conducted as part of the same procedure. For spot detection we used the spot detection wizard with vertical streak reduction on, and selecting the user chosen reference spot for small spot, faint spot, and large spot cluster from the same region of the gel for all gels. In addition we selected the local area regression method of normalization, which is proprietary but appears to be based on similar microarray normalization methods (see Quackenbush
). For spot matching, we defined no groups and spots were added to the master image only if present in two of three gels. Auto-matched spots were manually checked and corrected by dividing the gel area into 81 quadrants and hand marking landmark spots in each quadrant present in all three gels. All of the matches were hand checked based upon these landmark spots, and manual corrections to the spot detection and auto-matching were made, including removal of spots detected on the unresolved pI fronts and gel edges.
The second phase of image analysis was to create a 'compare experiments' analysis including all sixteen individual tree 'master gels' (CM02d was software selected as the base master gel). Automated matching was used to create the initial master file, then all matches were manually checked. Additional spots were added to the master manually if they were present in two or more tree masters. We applied the same hand check quality control as for individual tree masters (manually checking and correcting each of 81 quadrants using landmark spots) and applied the same normalization method (local area regression model option). Of note, we did not incorporate an additional scaling factor and the normalization method doesn’t scale the data, so the final spot quantities still have the original unit of counts.
Once the compare experiment master gel was fully checked, a quantitative dataset was created. The quantitative dataset was output from PDQuest using the function Report| Quantity Table Report, with the settings: all matched spots checked, configuration set to 'individual gels', missing spots set to 'estimate', and saturated spots set to 'estimate'. Spot quantities were estimated so that analysis options that require balanced and nonzero datasets could be used. PDQuest estimates saturated spots by fitting a Gaussian spot to the edges only and extrapolating the peak, then calculating the estimated volume from the extrapolated value. PDQuest estimates missing spots as the value of a minimum detectable spot. The resulting report contained spot quantities for all spots in the master gel across all 48 experimental gels. Graphical analysis of the spot quantities by spot were deemed sufficiently normally distributed to proceed with modelling.
To be sure the unmatched spots that are unique to one tree were not artifacts related to low spot intensity or variance in protein quantification making it difficult to match them, a random check of the intensity distribution of unmatched spots was conducted. Four trees were selected at random, then the spot intensity output for all spots on the tree master for these four trees was examined both visually by inspecting spot intensities in the gels and by comparing the distribution of intensities of the unique spots to the matched spots. Comparisons were made both graphically and by using summary statistics (mean, minimum, Q1, median, Q3, and maximum values).
Experimental design and statistical analysis
The experiment included 16 trees, each of which had a location (stand) code and a disease condition code. Three replicate extractions (for three separate gels) were run per tree. For each replicate there were a total of 16 extractions, one per tree. Each protein extraction was assigned first to one of four extraction sets (three healthy and one diseased tree per set), then extraction sets paired to form gel sets (two gel sets per replicate). Samples in an extraction set were extracted in parallel, and gel sets were run in parallel for both IEF and polyacrylamide gel separation. Hence, each extraction had a full list of factors assigned: tree, stand, disease state, replicate, extraction set, gel set. The full dataset included these factors and spot quantities for each spot on the master gel for each of the 48 gels (see additional files: Additional file
2, full study design; Additional file
3, full set of spot quantities). Calculated spot quantities were from the normalized gel images, and were evaluated and determined to need no additional transformation. An ANOVA approach to statistical analysis was used to so that multiple effects and interactions could be included in the same model to better control error variance, and because the other biological effects (i.e. stand and its interaction with BBD) will be informative in selecting proteins for future study.
Statistical analysis was generated using SAS® software version 9.2 of the SAS system for Windows, copyright 2002–2008 SAS Institute Inc. SAS and all other SAS Institute Inc. products or service names are registered trademarks or trademarks of SAS Institute, Cary, NC, USA.
A series of tests were used to categorize each protein spot, to arrive at a list of candidate spots for further analysis (summarized in Figure
). The first phase of the analysis sought to exclude constitutive proteins that did not differ between any trees, and assess the significance of the technical factors. The following model was fit using the General Linear Model procedure of SAS:
where μ is an overall average, tj is the effect of the jth tree (tree effect), gk(l) is the effect of the lth extraction set nested within the kth gel set (technical effect), and εijk(l) is a random error term. The model was fit for each spot, and the test of significant effects computed using the type III sums of squares. Model fit was evaluated by verifying that the overall model fit had a significant F value (p-value < 0.05) and by examination of standardized residuals (especially by plotting against the levels of effects). For each model, careful assessment of residual plots confirmed model assumptions about error distribution and equal variances were sufficiently met (i.e. the spot quantities did not deviate from the assumed normal distribution enough to warrant additional transformation). Degrees of freedom (df) were the same for each spot model: tree effect has 15 df, technical effect had 11 df, and error df = 21 (total df=47). For the technical effect, a Bonferroni adjustment was used to determine significance level, but for the tree effect a p<= 0.05 was considered significant. This permissive cut off is appropriate since the goal of the test was to eliminate constitutive proteins (and reduce the physical size of the dataset to ease computations) and because a false acceptance of the null is more problematic than a false rejection at this point in the analysis. Any spots that are not significantly different in at least one tree (tree effect p<0.05) were dropped from the dataset. Technical effects were found to be not significant and were dropped from further analysis for all but six spots that were dropped from the dataset.
The second phase of the analysis was designed to determine how spots differed between trees. Technical factors were dropped and stand and disease state factors were added (there were insufficient degrees of freedom to analyse technical and biological factors in the same model, partly due to imbalance in the number of trees selected per stand). The following model was fit using the General Linear Model procedure of SAS:
where μ is an overall average, sj is the effect of the jth stand (stand effect), dk is the effect of the kth disease state (BBD effect), sdjk is the interaction effect of stand and disease state, and εijk is a random error term. Model fit was evaluated by verifying that the overall model fit had a significant F value (p-value < 0.05) and by examination of standardized residuals (especially by plotting against the levels of effects). For each model, careful assessment of residual plots confirmed model assumptions about error distribution and equal variances were sufficiently met (i.e. the spot quantities did not deviate from the assumed normal distribution enough to warrant additional transformation). Degrees of freedom were the same for each spot model: BBD effect has 1df, STAND effect has 7 df, BBDxSTAND interaction effect has 5 df, and error has 34 df (total df=47). The model was fit for each spot, and the test of significant effects computed using the type III sums of squares. Interaction effects and stand effect were tested using the conservative Bonferroni correction. For the BBD effect, p-values from the tests were output to a new dataset and the package 'qvalue' (q-value 1.1) for the statistical program R (R-2.4.0-win32,
) was used to compute the associated q-value for each test. Significance was determined using q-values while controlling the false discovery rate at 5%
. False discovery rate controls the percentage of null hypothesis rejected in error (false positives) rather than the overall error rate, and is an accepted and typical statistical analysis for large genomic and proteomic datasets
Spot selection and cutting
All spots with a significant effect for the disease state factor were considered for spot cutting and sequencing. Spot quantities were evaluated in all of the trees and trees ranked as the best trees were those having the most BBD significant spots at the highest spot densities. The two top trees were used for preparative gels and spot cutting. All BBD significant spots in the two selected trees were evaluated on the gel images to determine if the spot could be excised cleanly and was sufficiently intense to support sequencing. Spots were excised from the preparative gels at the PMGF using the Protean 2-D spot cutter (Biorad Laboratories). Several constitutive spots were also selected as sequencing reference spots. High resolution pre-cut and post-cut images of preparative gels were captured on the VersaDoc imager and evaluated for quality control. Only protein spots that were cleanly excised and had no evidence of contamination from adjacent spots were sent for MS/MS analysis.
Mass spectrometry was performed at the OSU Campus Chemical Instrumentation Center. Gel pieces were washed twice in 50% methanol/5% acetic acid for one hour each, followed by dehydration in acetonitrile. Cysteines were reduced by rehydrating and incubating in dithiothreitol (DTT) solution (5mg/mL in 100 mM ammonium bicarbonate) for 30 minutes. Cysteins were alkylated by the addition of 15mg/mL iodoacetamide in 100 mM ammonium bicarbonate solution, and incubation in the dark for 30 min. The gel cores were washed again with cycles of acetonitrile and ammonium bicarbonate (100mM) in 5 min increments, then dried under vacuum. Protein was digested in Multiscreen Solvinert Filter Plates from Millipore (Bedford, MA) with sequencing grade modified trypsin (Promega, Madison WI) overnight. The peptides were extracted from the polyacrylamide by washing several times with 50% acetonitrile and 5% formic acid, pooled, and concentrated under vacuum to ~30 uL.
Capillary-liquid chromatography-nanospray tandem mass spectrometry (Nano-LC/MS/MS) was performed on a Thermo Finnigan LTQ mass spectrometer equipped with a nanospray source operated in positive ion mode. The LC system was an UltiMate™ 3000 system from Dionex (Sunnyvale, CA). Five microliters of each sample were first injected on to the micro-Precolumn Cartridge (Dionex, Sunnyvale, CA), and washed with 50 mM acetic acid. The injector port was switched to inject and the peptides were eluted off of the trap onto the column. A 5 cm 75 μm ID ProteoPep II C18 column (New Objective, Inc. Woburn, MA) packed directly in the nanospray tip was used for chromatographic separations. Peptides were eluted directly off the column into the LTQ system using a gradient of 2-80% acetonitrile over 45 minutes, with a flow rate of 300 nl/min and total run time was 65 minutes. The MS/MS was acquired using a nanospray source operated with a spray voltage of 3 KV and a capillary temperature of 200°C. The analysis was programmed for a full scan recorded between 350 – 2000 Da, and a MS/MS scan to generate product ion spectra to determine amino acid sequence in consecutive instrument scans of the ten most abundant peaks in the spectrum. The CID fragmentation energy was set to 35%. Dynamic exclusion is enabled with a repeat count of 30s, exclusion duration of 350s and a low mass width of 0.5 Da and high mass width of 1.50 Da.
Sequence data processing and matching
Sequence information from the MS/MS data were searched using Mascot Daemon (version 2.2.1 Matrix Scientific, Boston, MA)
 against several databases (detailed below). The search parameters were: mass accuracy of the precursor ions = 2.0, fragment mass accuracy = 0.5 Da, considered (variable) modifications = methionine oxidation and carbamidomethyl cysteine, missed cleavages = 2–4. Due to the low representation of woody plant and bark tissue sequences in the databases, the search was conducted against several databases. Searching against the full SwissProt database version 54.1 (283454 sequences; 104030551 residues) was unproductive (only procedural peptides identified, data not shown). A second search was conducted restricting the search set to taxon Viridiplantae (version, sequences, residues). The Fagaceae genomics project
 has constructed EST libraries from American Beech, Red Oak (Quercus rubra L.), White Oak (Quercus alba L.), American chestnut (Castanea dentata (Marsh.) Borkh) and Chinese chestnut (Castanea mollissima Blume) including libraries constructed from both healthy and diseased stem tissues. Twenty-four individual EST libraries (#13696, 8-21-2009, 10691208 sequences; 751178460 residues) were compiled into a custom database and searched. Peptide matches were checked manually and only those identifications with a Mascot score of 50 or higher and having two or more unique peptides of five or more residues were accepted. For EST matches, peptides were matched to EST's (by the same criteria), then EST's searched against GenBank (BLASTP, default settings
 to obtain protein identifications. Analysis data is available in the PRIDE database
[74, 75] under the accession numbers 17706. The data was converted using the PRIDE Converter