A label-free differential quantitative mass spectrometry method for the characterization and identification of protein changes during citrus fruit development

Background Citrus is one of the most important and widely grown commodity fruit crops. In this study a label-free LC-MS/MS based shot-gun proteomics approach was taken to explore three main stages of citrus fruit development. These approaches were used to identify and evaluate changes occurring in juice sac cells in various metabolic pathways affecting citrus fruit development and quality. Results Protein changes in citrus juice sac cells were identified and quantified using label-free shotgun methodologies. Two alternative methods, differential mass-spectrometry (dMS) and spectral counting (SC) were used to analyze protein changes occurring during earlier and late stages of fruit development. Both methods were compared in order to develop a proteomics workflow that could be used in a non-model plant lacking a sequenced genome. In order to resolve the bioinformatics limitations of EST databases from species that lack a full sequenced genome, we established iCitrus. iCitrus is a comprehensive sequence database created by merging three major sources of sequences (HarvEST:citrus, NCBI/citrus/unigenes, NCBI/citrus/proteins) and improving the annotation of existing unigenes. iCitrus provided a useful bioinformatics tool for the high-throughput identification of citrus proteins. We have identified approximately 1500 citrus proteins expressed in fruit juice sac cells and quantified the changes of their expression during fruit development. Our results showed that both dMS and SC provided significant information on protein changes, with dMS providing a higher accuracy. Conclusion Our data supports the notion of the complementary use of dMS and SC for label-free comparative proteomics, broadening the identification spectrum and strengthening the identification of trends in protein expression changes during the particular processes being compared.


Background
Fruit ripening and development has being studied using transcriptomic, proteomics, and metabolomics approaches [1][2][3][4][5][6][7][8]. Quantitative proteomics provides an alternative approach for studies of fruit development. In the last few years, quantitative proteomics has been widely applied for the quantification of complex biological samples [9][10][11]. The most commonly used approach for comparative proteomic analysis of plant tissues is the application of 2DE-gels. This method is limited in sensitivity, has a low dynamic range, it is inefficient when analyzing insoluble proteins or proteins with very high or low molecular mass and are limited in their reproducibility [12], although reproducibility has been improved with the use of differential imaging gel electrophoresis (DIGE) [13,14]. Alternative techniques to 2DE-gels are non-gel LC-MS/MS-based shotgun proteomics [15][16][17][18], where quantification is performed using the mass-spectrometer data. Some success for the quantification of proteins has been achieved by using stable isotope labeling, 15 N, 13 C, 2 H and SILAC [19], ICAT [20,21], iTRAQ [22] and 18 O stable isotope incorporation [23]. One of the main limitations of these methods is that full labeling of the proteins is rarely achieved and that different peptides incorporate the label at different rates which complicates data analysis. Recently, a label-free method for comparative proteomic analysis has emerged [9][10][11]24].
Label-free proteomics allows for the quantification of peptides using spectral characteristics such as retention time, m/z ratio and peak intensity by comparing the direct mass spectrometric signal intensity for any given peptide (differential Mass Spectrometry, dMS) or by counting the number of acquired tandem mass spectra matching to a specific peptide as an indicator for their abundance in a given sample (spectral counting, SC) [25,26]. dMS is based on comparisons of chromatographic peaks of peptide precursor ion measurements belonging to a specific protein extracted from an LC-MS/MS run [27][28][29][30][31][32]. This approach is based on the observation that dMS in most cases is proportional to the concentration of the peptide in the sample investigated [10,[27][28][29]. Peak intensity for every individual spectrum is determined and the comparison of spectra between multiple LC-MS runs provides quantitative measurement of thousands of peptides. From this massive data a selected list of differential peptides can be produced for subsequent fragmentation by LC-MS/MS for sequence determination and protein identification. In order to match the massive spectra data according to retention time and precursor m/z characteristics various software have been developed. Once matched, expression ratio in peak intensity is calculated according to peak areas corresponding to the matched peptides. SC counting is based on counting and comparing the number of spectra identifying specific peptides of a given protein to assess relative protein abundance, also found to be in good correlation with protein abundance [15,30].
Label-free comparative proteomics is a relatively new approach that has been used successfully in different systems (humans, yeast, fly, etc.) [39][40][41][42], but its application in plants is scanty [26,43]. Using LC-MS/MS we recently analyzed soluble and enriced membrane fractions of mature citrus fruit to identity the proteome of fruit juice cells and classified these proteins according to their putative function according to known biosynthetic pathways [18]. Here, we describe a method for the use of label-free LC-MS/MS-based shotgun differential proteomics for the study of fruit development in Citrus, a non-model plant lacking a fully sequenced genome. The method combines the use of dMS and SC and the creation of iCitrus, a citrus fruit-specific database and interface, for the identification of the protein changes occurring during the development of citrus fruits.

Citrus proteins annotations using iCitrus
Although the citrus genome has not been fully sequenced yet, a comprehensive citrus EST database has been developed in the past few years [44]. Several groups have contributed to EST sequencing efforts using different species, including C. sinensis (sweet orange), C. clementina (Clementine mandarin), C. paradisi (grapefruit), Poncirus trifoliata, and other hybrids (C. sinensis × Poncirus trifoliata, Carrizo citrange). A wide range of libraries derived from multiple reproductive and vegetative tissues at different developmental stages were used in addition to different treatments or stresses to create a relatively large database. To date, there are 582,334 citrus sequences in the National Center for Biotechnology Information (NCBI) EST database. With the advantage of comprehensive sequence dataset in hand, there were many challenges to be addressed before using the databases for proteomic research. Some of these challenges arose from the nature of EST databases, over-representation of highly-expressed genes (and the underrepresentation of weakly-expressed genes), redundancy, incomplete sequences, poor annotation etc. The challenge of using the EST database for proteomics came from the fact that a highly redundant database with many similar sequences would artificially decrease the significance of potential "hits". On the other hand, a strong reduction in sequence-based redundancy, relying on sequence similarity rather than identity, would significantly reduce the number of possible hits. To solve some of these problems, iCitrus http://citrus.bioinformatics.ucdavis.edu/ was created ( Figure 1). The iCitrus collected dataset was produced by excluding sequences shorter than 50 amino acids between stop codons and removing redundant sequences with 100% identity to another longer sequence in the dataset. Similar sequences, sharing less than 100% similarity were kept for spectra search. Keeping sequences sharing high similarity (97-99% identify) was a necessity because the citrus ESTs database consists of sequences originated from a wide range of citrus cultivars and species. Minor differences in nucleotide sequences between similar ESTs could lead to differences in amino acid sequences and therefore to differences in virtual spectra derived from the database during mass-spectra search. Keeping these sequences served to broaden our chances of identifying proteins in the databases while discarding them could lead to miss-identification or no identification of proteins. A disadvantage of this approach was the redundancy of accessions that were dealt with by manually aligning the sequences of the proteins of interest. In a few cases where the accessions shared a high similarity, the redundancy resulted in the identification of two or more ESTs with only one peptide. If these ESTs belong to the same unigene, then two or more peptides could identify the same specific protein.
To date, there are 62,415 sequences in the iCitrus collected database; 41,018 from the HarvEST:Citrus assembly http://harvest.ucr.edu/, 20,949 from NCBI's unigenes (C. sinensis and C. clementina), and 448 from NCBI's proteins (C. sinensis and C. clementina) ( Figure   1). iCitrus dataset in a FASTA file format and a description of the iCitrus interface structure can be found as Additional File 1 and a conversion table of HarvEST:Citrus, NCBI/Citrus/ESTs and NCBI/Citrus/ Proteins accessions into iCitrus accessions can be found in Additional File 2: Table S1.
Label-free LC-MS/MS based shotgun proteomics, differential Mass-Spec and Spectral Counting To achieve a better identification of differentially expressed proteins during fruit development and to decrease sample complexity, the juice sac cells were fractionated into soluble and membrane-bound proteins ( Figure 2). Two alternative strategies for label-free mass spectrometric analysis; peptide ion intensities measurements and spectral counting were used. The peptide ion intensities measurements, also referred as differential Mass Spec (dMS), integrate the peak area which is proportional to the concentration of the peptide in the sample (Additional File 3: Figure S1). Determining the area for each mass extracted peptide ion chromatogram Figure 1 iCitrus database. Three major sources were used in creating iCitrus dataset: UC Riverside HarvEST:citrus (C46 assembly), NCBI/citrus/unigenes and NCBI/citrus/proteins (see text). The first two datasets were translated into all 6 reading frames, split at stop codons, and sequences shorter than 50 amino acids were removed. These were combined with the NCBI protein sequences, and all three protein sequence sets were then clustered at 100% identity using CD-HIT http://bioinformatics.ljcrf.edu/cd-hi/, meaning that sequences that aligned with 100% identity to a longer sequence in the combined set were removed. All remaining sequences were then blasted to TAIR proteins, and separately to the subset of NCBI's nr database belonging to taxa within Viridiplantae, to collect GOterm and descriptive annotation for the clustered sequences.  retention time pair and comparing the areas between multiple LC-MS runs of different samples can provide a comprehensive quantification of thousands of peptides within samples. The alternative strategy, Spectral Counting (SC), calculates the number of MS/MS scans that are attributed to the same peptide ion. The frequency of these MS/MS scans correlates with the abundance of a given peptide in the sample. In this study we have used dMS strategy to analyze and identify differential proteins changes during fruit development in citrus juice sac cells ( Figure 2) and SC as an alternative strategy to validate our finding. Identification of proteins was done by MS spectra search against the iCitrus database and annotations by using the iCitrus interface.
Label-free relative quantitative analysis detects, selects and compares spectra that are significantly different between samples (either by dMS or SC). However, many of the spectra that were selected as being different in their intensity or abundance were found to be not statistically different between the developmental stages compared and will be discussed later.
Using dMS, 1494 and 1364 proteins were identified by at least two peptides in the comparisons between Stage II (55 mm fruit diameter) versus early Stage II (35 mm fruit diameter) and Stage III (80 mm fruit diameter) versus Stage II, respectively ( Figure 3). A high number of identified proteins were down-and up-regulated during the earlier and later stages of development, respectively ( Figure 3a).
Accessions identified by SC and dMS were compared using both iCitrus and Arabidopsis homologs ( Figure 4).
These comparisons were made to minimize possible redundancies of identified citrus ESTs and to conserved citrus protein accessions that might originate from different unigenes but belonging to the same gene family. Once again, aconitase can provide a good example for database redundancy as the accessions 45840 and 47264, sharing 99% amino acid similarity, are essentially the same unigene originating from two different citrus species (Table 1). These accessions shared little similarity with 39802 and sequence alignment showed that their sequences did not overlap but shared high homology with the other members, i.e. 55395 and 43680. Notably, some proteins did not share homology to any Arabidopsis proteins, providing support to the use of citrus accessions for comparisons. In some cases, these accessions could be assembled to one contig while in other cases these ESTs could not be assembled. Two possibilities arose, either these EST sequences originated from the same gene but did not overlap, therefore could not be assembled, or these ESTs were originated from different genes belonging to the same family.
Most of the proteins identified by both dMS and SC also showed similar expression patterns ( Figure 5). Out of 452 proteins identified by both methods in the comparison between fruits at Stage II versus fruits at early Stage II, 308 proteins (69%) had the same expression pattern therefore referred as "matching" (Figure 5a). In the comparison between fruits at Stage III versus Stage II 51% of the shared proteins displayed similar expression pattern and the rest fell under the "weak matching" category ( Figure 5a). "Weak matching" refers to proteins showing significant expression changes with one method   All iCitrus accessions for aconitase that were identified by both methods were homolog to the Arabidopsis gene At2g05710. Identification of aconitase by dMS and SC. The column "direction" under SC represents up-regulated = 1, no change = 0, down-regulated = -1. Aconitase iCitrus accessions amino acids sequences similarities. while showed no significant expression differences when analyzed with the other method (Figure 5b-d). Only few proteins, 1 and 16, showed contradicting expression patterns in the comparisons between Stage II versus early Stage II and between Stage III and Stage II, respectively. The high percentage of proteins shared by dMS and SC that show the same expression pattern serves also as a strong validation for protein expression.

Changes in protein expression during fruit development
Label-free LC-MS/MS analysis of juice sac cells indicated significant changes in protein synthesis during fruit development ( Table 2). Changes in the expression of 1834 and 1004 iCitrus accessions during fruit development were identified by dMS and SC, respectively. These numbers consisted of accessions identified by the four types of comparisons conducted (Stage II vs. early Stage II, Stage III vs. Stage II, membrane-bound proteins and soluble), and proteins appearing at more than one stage of development were only counted once. In most cases, the discrepancies between the two methods were due to differences on the bioinformatics associated with dMS and SC workflows (see Discussion). A significant number of proteins (772 and 560) were identified and classified as "not changed" by dMS and SC, respectively ( Table 2). Although these proteins were found to match differentially expressed peptides, did not pass the statistical threshold. Although not differentially expressed, the identification of these proteins provides valuable information because: (i) they are proteins that are active during fruit development; (ii) they strengthen the confidence in the identification of the same peptides in other comparisons [39]. Here, we have classified the fruit proteins into 14 major functional groups (Table 2). In general, the expression of a large number of proteins identified decreased during the transition from early Stage II to Stage II of development (617 were downregulated and 451 were up-regulated). This trend reversed during the transition from Stage II to Stage III where 850 proteins were up-regulated and 86 were down-regulated ( Table 2). Most of the up-regulated proteins belonged to Metabolism, Processing, Oxidative processes, Trafficking, Transcription and Transport.

Changes in protein associated with vesicular trafficking during fruit development
In order to illustrate similarities and disparities between dMS and SC for the quantitation of protein changes during fruit development, we analyzed changes in proteins associated with vesicular trafficking and protein movements. The global changes in protein profiles and the metabolic processes associated with the quantitative protein changes during fruit development will be presented and discussed elsewhere (Katz et al., in preparation).
Among the members of the RHO family, ROP4 was down-regulated at earlier stages of development (Table 4). Interestingly, dMS showed that two RAB GDI (GDP-RAB dissociation inhibitors), GDI1 and GDI2-like were upregulated during the later stages of fruit development while only GDI1was identified by SC. Three R-SNAREs were identified; SEC22 that was down-regulated during the transition from early Stage II to Stage II, VAMP27-1 and VAMP713 were identified but were not found to be differentially expressed. Five Q-SNAREs were identified but only VTI11 (Qb-SNARE) was found to be down-regulated during early stages of development while SYP132 (Qa-SNARE, syntaxin) was found to be up-regulated during late stages of development. SNF7, a component of the endosomal ESCRT III complex that functions in cargo recognition and sorting [49], was up-regulated during the late stages of development. Additional proteins related to vesicular trafficking such as dynamin, COP-I coatomer, reticulon 3 and 6, and proteins related to secretory membrane carriers such as SEC14, PATL2 and SYT1 were upregulated during the late stages of fruit development, while SEC 14, SYT1, and light chain of clathrin were upregulated during the transition from early Stage II to Stage II. Heavy chain of clathrin was down-regulated throughout development (Table 3). Differential protein expression was also found in other important groups of proteins, actins and tubulins, key factors in trafficking, cell division and enlargement [50]. TUB1, TUA3, TUA4, TUB5, TUA6, TUB6 and TUB8 were down-regulated in the transition from early Stage II to Stage II (Table 4). TUB1, TUA4, TUB5, TUA6 and TUB6 were down-regulated further during the transition from Stage II to Stage III while TUB7 and TUB8 were up-regulated during this transition. Actins, driving vesicular movement towards their destination, showed significant changes during fruit development (Table 4). ACT1, ACT7, ACT8 and ACT11 were down-regulated during the transition from early stage II to stage II and were up regulated during the transition from stage II to stage III (Table 4).
Down-regulation of other proteins related to the vesicle movements such as CaM5 (which binds to the motor protein kinesin [51,52] and myosin were detected (Table 4). Profilins, PFN1, PFN3 and PFN5, involved in actin polymerization and cytoskeleton organization did not change during the transition from early Stage II to Stage II, but PFN1 and PFN3 were up-regulated during the transition from Stage II to Stage III. Another protein, ADF4, involved in actin de-polymerization was down regulated during the transition to Stage III. Microtubule Associated Protein 65 (MAP65) and KIS (Tubulin cofactor A) involved in tubulin complex assembly and cell division [53,54], were down-regulated throughout fruit development (Table 4).
Transporters play a crucial role in cell growth and homeostasis, especially in specialized solute accumulating cells such as citrus juice cells. As expected, many changes in transporters protein expression were noted during fruit development (Table 5). During the transition from early Stage II to Stage III, there was a significant downregulation of subunits of lysosomal ATPases and cation transporters associated with K + -and Na + -coupled transport. On the other hand, only one plasma membranebound ATPase displayed down-regulation (similar to AHA8), while those similar to AHA2, AHA4 and AHA10 were not significantly changed. In general, these changes were noted using both dMS and SC. Most of the proteins that were down-regulated during the transition from early to Stage II, were up-regulated during the transition from Stage II to Stage III (Table 5), suggesting their role during fruit expansion. Similar results were seen with mitochondrial-bound proteins such as ACP4, ADP/ATP carriers and others. Two tonoplast monosaccharide transporters, TMT1and TMT2 were up-regulated during the transition from early to stage II and TMT2 was further up-regulated during the later stages of fruit development. A dicarboxylate/tricarboxylate carrier was upregulated throughout development. Plasma membrane water channels PIP1B/PIP1;2, TMP-C/PIP1;4, PIP2;8/ PIP3B and PIP2;5/PIP2 D were down-regulated during the transition from early to Stage II according to SC (Table 5).

Discussion
In this study we describe a label-free shotgun approach to establish a proteomics workflow for the identification of the protein changes occurring during citrus fruit development. We analyzed and compared juice sac cells extracted from fruits at three stages of development. The end of Stage I (early Stage II), characterized by extensive cell division; Stage II, where cell division ceases and the juice cell sacs expand with the accumulation of large amounts of solutes and water; and Stage III, where the fruit matures and ripens [55,56]. It should be noted that it was practically impossible to extract juice sac cell proteins at Stage I (fruit diameter ≈10-15 mm) because at this stage the juice sac cells are not well developed.
Comparative proteomics studies in plants are still lagging behind studies done in mammalian cells and are predominantly performed by employing 2DE-gels [57]. Although differential proteomics studies employing label-free quantification have been published during the last few years [9,10,24], in plants these studies are scarce [26,43].
In order to employ an efficient proteomics study in citrus, a plant species lacking a full sequenced genome, we established a workflow that dealt with few of the problems arising from using a ESTs database. We created iCitrus, a database and interface that collected sequences from three different sources, HarvEST:Citrus http://harvest.ucr.edu/, NCBI's Citrus unigenes and NCBI's Citrus proteins http://www.ncbi.nlm.nih.gov/ Taxonomy/Browser/wwwtax.cgi?mode=In-fo&id=2711&lvl=3&lin=f&keep=1&srchmode=1&unlock to create one unified database with reduced redundancy for mass spectra search. iCitrus was created to provide a compact database for the identification of citrus proteins and a more accurate quantitative expression measurements. The iCitrus interface enabled a fast identification of lists of accessions including Arabidopsis homologs, and the use of bioinformatics tools such as MapMan, AraCyc and Cytoscape (Katz et al. in preparation).
The iCitrus resource is essentially an interface that can be used to access pre-calculated Blast results. iCitrus itself does not make or summarize GO assignments based on rules that weight GO terms from various hits; this is the (perfectly reasonable) philosophy behind Blas-t2GO and related tools. We chose to allow users, instead of iCitrus, to determine if they trust and adopt particular annotations or not. We took this approach to allow individual users to use specific knowledge of protein families or taxonomical differences (i.e. Citrus versus Arabidopsis) to influence their interpretation of the BLAST results. In addition, there may be cases in which GO annotation is absent in the BLAST results against Arabidopsis or Viridiplantae, but a consensus could * The column "direction" under spectral counting measurement represent expression direction, 1 = up-regulated, 0 = no change, -1 = down-regulated. Proteins identified by dMS were considered to be up-regulated when expression fold > 2, not changed when 0.5 < fold < 2 and down-regulated when fold change was < 2. For spectral counting Bayes factor of > 10 was used for significance difference. emerge from the descriptive text accompanying a hit. We think this combined approach of manual annotation with the assistance of pre-computed BLAST results is more effective when predicting functional information for a not well-annotated organism like Citrus. Two widely used, but fundamentally different, labelfree methods for quantification were used in this study; peak integration (dMS) and spectral counting (SC). For dMS, we used a two-fold change as a threshold for differential expression of the identified proteins [25] and a Bayes factor of 10 for spectral counting [58]. Such a stringent threshold is needed because the protein ratios are calculated by averaging the intensity weight of peptide ratios, and because the number of peptides identifying each protein is highly variable. In most cases, both methods identified similar proteins with some discrepancies (Figure 4a). These discrepancies derived from the way SIEVE (for dMS) and Scaffold (for SC) handled the peptides information. Scaffold is able to identify peptides in similar proteins and group them together, thus identifying database redundancy, on the other hand, SIEVE does not group similar proteins. When we compare the number of identified proteins by the two methods using the corresponding Arabidopsis homologs of each iCitrus accession identified ( Figure  4b) the differences decreased significantly, particularly for dMS ( Figure 4). Yet, additional redundancy could arise from possible gene families in Citrus. The wide range of Citrus species used to create HarvEST:Citrus database including Citrus sinensis, Citrus paradise, Citrus unshiu, C. reticulata, C. jambhiri, C. aurantium, C. clementina, C. macrophylla and Poncirus trifoliate, consists of sequences that are similar but not identical therefore were not screened out from the iCitrus dataset. In addition, some of the sequences in the database that might originate from the same unigene did not overlap therefore could not be assembled, contributing to the difference in number of proteins identified (Table  1). Currently, non-overlapping sequences cannot be assembled until more ESTs can be produced to cover the missing gaps or until the Citrus genome is fully sequenced [59]. A significant number of proteins (144 in dMS and 118 in SC in the Stage II vs. early Stage II comparison, and 119 in dMS and 255 proteins in SC, in the Stage III vs. Stage II comparison) were identified by only one of the methods due to the inherent differences of dMS and SC workflows. SEQUEST and SIEVE (dMS workflow) use protein probability cut-off based on false discovery rate (FDR) according to the Decoy method [60]. X!Tandem, Scaffold and Qspec (SC workflow) use peptide identification probability criteria as specified by the Peptide Prophet algorithm [61]. The different workflows affect some of the proteins identification. The performance of the SC method depends strongly on the depth of the MS/MS sampling because ratios by SC are most significant for proteins with large numbers of product ion spectra, while ratios by dMS are most significant for proteins with large numbers of overlapping peptide ions [25]. This also explains the higher percentage of proteins that were found to be significantly different by dMS and not significant by SC (Figures 3, 5a). Therefore, dMS provides more accurate measurements of compared samples while SC is faster and easier to use. Our data show that dMS is more accurate in measuring differences in protein expression [25]. dMS provide rich information of the LC-MS data but requires a massive computational effort to be spent on processing the data including background filtering, peak frame detection and alignment [62,63]. Spectral counting is conceptually simpler and can be as sensitive as dMS in terms of detection range while retaining linearity [25,30,64]. Nevertheless, SC is less accurate in detecting differences in protein expression, in particular for less abundant proteins. Our results clearly show that the integrated use of both methods for quantification increases the power for detecting changes in shotgun proteomics experiments, and that both methods should be use in combination to gain insight of the complex protein network and a complete identification of its components.
Changes in a large number of small GTPases were identified during citrus fruit development. The expression of a relatively large number of members of the RAB, ARF, RHO and RAN families of small GTPases changed during the different stages. Although we cannot assign specific roles to all of these proteins, they clearly indicate a different role(s) of these members during the stages of citrus juice sac cell development. Vesicular trafficking is essential for fruit development [65][66][67]. During the Stage I there is intensive cell division [56]. Cytoskeleton elements (actins, tubulins, etc.) together with small G-proteins and coatomer complexes are vital to cell division, cell plate formation, cell polarity, etc. [68]. The expression of many of these proteins decreased during the transition from early Stage II to Stage II. This correlated well with the attenuation of cell division in the growing fruit and the prevalence of cell expansion. This notion was reinforced by the notable increase in expression of other small GTPases, auxiliary proteins and cytoskeletal components. Similar to the small G-proteins, changes in the expression of proteins associated with vesicular movements, docking and fusion were seen. In addition to different SNAREs (Qa, Qb, Qc, syntaxins, etc.), there was changes in COPI coatomers, clathrin, dynamin, and others suggesting the occurrence of endocytosis, exocytosis and vesicular trafficking during fruit development. Notably, while the expression of plasma membrane-associated H + -ATPases did not change during the early stages of development, changes in endosomal-associated H + -ATPases (V-type) paralleled the changes seen in the secretory and vesicular trafficking machinery. V-type ATPases and organellar acidification is essential for vesicular trafficking along exocytotic and endocytotic pathways [69,70].
Although significant changes in sugar contents and sugar homeostasis are expected during fruit development [71,72], changes in expression of only two putative vacuolar monosaccharide transporters (TMT1 and TMT2) were noted. A plausible explanation is that the expression of other sugar transporters did not change (although they could have been modified by post-translational mechanisms). In support of this notion, Etxeberria et al. [73,74] demonstrated a mechanism of sugar transport into the juice sac cells and sucrose into the vacuoles that is mediated by endocytosis and intracellular vesicular trafficking. The protein inventory developed in this work, provides a preliminary glance at the function(s) of these proteins during the different stages of fruit development and in particular during cell division (Stage I, early Stage II) and cell expansion (Stage II) and assimilate mobilization, sugar accumulation and processes regulating fruit maturation and ripening.
In conclusion, we developed a workflow for the analysis and identification of proteins during fruit development in citrus, a non-model plant, using comparative label-free shotgun proteomics. We established iCitrus, a comprehensive sequence database by merging three major sources of sequences and improving the annotation of existing unigenes. iCitrus provided a useful bioinformatics tool for the high throughput identification of citrus proteins. Two methods for label-free based shotgun proteomics were used and compared; peak integration (or differential mass-spec) and spectral counting. We have identified approximately 1500 citrus protein accessions expressed in fruits and quantified their expression changes during fruit development. Our results showed that both methods can provide significant information on protein changes, with dMS providing higher accuracy. Our results clearly suggest that dMS and SC are matching, broadening the identification spectrum and providing complementary data on the change trends during the particular processes being compared.

Plant material, protein extraction and precipitation
Orange Navel fruits at three different developmental stages, early stage II (35 mm in fruit diameter), stage II (55 mm) and stage III (80 mm) [55] were obtained from the Lindcove Research Center, University of California, Exeter, CA. Juice sacs were collected from at least 20 fruits and pooled at each stage. Two independent biological repetitions from two consecutive years were used and proteins were isolated as described before [18]. Soluble proteins were precipitated using a chloroform/methanol extraction method as described by Wessel and Flugge [75]. The samples were resuspended with 100 μl of 1% Acetonitrile and sonicated for 10 min and centrifuged at 10,000 g for 3 min. The supernatant was spin-dialyzed into 50 mM ammonium-bicarbonate (AMBIC), then prepared for MS analysis using standard reduction, alkylation, and tryptic digest procedures [76]. Dichloromethane was added (50/50 v/v with aqueous digest) before vortexing for 1 min. Samples were centrifuged for 5 min at 10,000 g in a microcentrifuge and the upper layer-containing peptides dried down and the peptides resolubilized in 2% acetonitrile/0.1% trifluoroacetic acid for LC-MS/MS analysis.
Membrane-bound proteins were spin-dialyzed into 50 mM AMBIC. An endo-polygalacturonanase (Megazyme) was employed to degrade pectins overnight at room temperature and the suspensions centrifuged and the pellets retained. Membranes were resolubilized in 50 mM AMBIC and digested with trypsin. The suspension was centrifuged 10 min at 10,000 g and the supernatant containing tryptic peptides retained. Delipidation was performed with dichloromethane and the peptides resolubilized in 2% acetonitrile/0.1% trifluoroacetic acid for LC-MS/MS analysis.

Mass Spectrometry and Data Analysis
Digested peptides were separated by reversed-phase chromatography using a Waters nanoACQUITY-UPLC system (Milford, MA), with a Waters BEH C 18 1.7 μm, 100 μm × 10 cm column. A binary solvent gradient was employed; buffer A was composed of 0.1% formic acid and buffer B composed of 100% acetonitrile (CAN). The 120 min gradient consisted of the steps 2-45% buffer B in 40 min, 45-80% buffer B in 65 min, hold for 1 min, 80-2% buffer B in 4 min, then hold for 10 min. Separated peptides were analyzed in a Thermo-Scientific LTQ-FT Ultram mass-spectrometer (San Jose, CA) with a Michrom captive spray nano-electrospray ionization source at a flow rate of 2 μl/min. MS and MS/MS spectra were acquired using a top 4 method, where the 4 most abundant ions in the MS scan were selected for automated low energy Collision-induced Dissociation (CID) with a 30 s exclusion time and repeat count of 2. The FTMS scan was obtained for the m/z range 300-1400 Da at 50,000 resolution. An isolation width of 2.5 Da was used for ITMS, and a normalized collision energy of 35% was used for the fragmentation. Five technical repeats of each pooled sample (older vs younger fruit) were each analyzed by SIEVE using blanks (washes) between each sample run.

Protein Identification and Validation, dMS workflow
Tandem mass spectra were extracted with Xcalibur version 2.0.7. All MS/MS samples were analyzed using SEQUEST (Protein Discoverer 1.1; Thermo-Scientific, San Jose, CA). SEQUEST was set up to search a FASTA file of the iCitrus Protein Database (see below), assuming the digestion enzyme trypsin. SEQUEST was searched with a peptide ion mass tolerance of 25 ppm and a fragment ion mass tolerance of 1.0 Da. Oxidation of methionine and iodoacetamide derivative of cysteine was specified in SEQUEST as possible modifications. DTASelect software was used to filter out low score matching. The filtering criteria consisted of Crosscorrelation (xcorr) values larger than 1.5 for singlecharged ions, 2.2 for double-charged ions, and 3.3 for triple-charged ions, for both half or fully tryptic peptides. This resulted in a false discovery rate of less than 5% using a decoy search strategy.

Differential Expression mass spectrometry, dMS workflow
Samples were analyzed using a Thermo Scientific LTQ-FT mass-spectrometer and a Michrom-Paradigm HPLC. Peptides were separated using a 200 μm × 15 cm Michrom Magic C18 reverse-phase column over 45 min using an acetonitrile gradient of 2%-60%. The massspectrometer was set to acquire spectra in standard top 3 method where 1 high resolution scan (100 K resolution) was acquired every sec with subsequent MS/MS spectra acquired in the LTQ simultaneously.
Samples were analyzed using SIEVE (Thermo Scientific, San Jose Ca). SIEVE is a label-free-differential expression package that aligns the MS spectra over time from different experimental conditions and then determines features in the data (m/z and retention time pairs) that differ across the different conditions. These differences were assigned using various statistics methods such as a P-Value and standard deviation and then sorted based on significance [10], based on the values obtained from the data of each biological replicate. Label free proteomic profiling was accomplished using SIEVE 1.3 (Thermo Scientific, San Jose, CA). The following parameters were set to align the retion time and generate the frames needed for abundance calculations. Significance was calculated within SIEVE using a standard T-test and results were filtered for a minimum of two peptides identified per protein (using the identification criteria stated in this method section) with frames having a p value of less than 0.05.
Tandem mass-spectra from peptide features that are considered differentially expressed across conditions are then searched using SEQUEST against iCitrus (see below). Search results were filtered for a False Discovery rate of 5% employing a decoy search strategy utilizing a reverse database [60].

Protein Identification and Validation for Spectral counting
Tandem mass-spectra were extracted by Bioworks-3.3. Charge state de-convolution and de-isotoping were not performed. All MS/MS samples were analyzed using X! Tandem http://www.thegpm.org; version TORNADO (2008.02.01.2)). X! Tandem was set up to search the 62,415 entries of iCitrus (see below) assuming the digestion enzyme trypsin. X! Tandem was searched with a fragment ion mass tolerance of 0.40 Da and a parent ion tolerance of 25 ppm. Iodoacetamide derivative of cysteine was specified in X! Tandem as a fixed modification. Deamidation of asparagine, oxidation of methionine, sulphone of methionine, tryptophan oxidation to formylkynurenin of tryptophan and acetylation of the N-terminus were specified in X! Tandem as variable modifications. Different tandem MS programs were used (SEQUEST for dMS and X!Tandem for Spectral Counting) because of licensing restriction and limited access to SEQUEST that would have generated significant time delays in the data analysis. Nonetheless, the use of SEQUEST or X!Tandem would have make little or no difference. In addition, in this report we aim at comparing overall methodology (i.e. dMS versus SC) and not their individual components.

Criteria for protein identification for Spectral Counting
Scaffold 2.06.00 (Proteome Software Inc., Portland, OR) was used to validate MS/MS-based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 80.0% probability as specified by the Peptide Prophet algorithm [61]. Protein identifications were accepted if they could be established at greater than 95.0% probability and contained at least 2 identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm [77]. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony.

Statistical Analysis for Spectral Counting
Unweighted Spectral counts for the identified proteins obtained from the samples corresponding to two consecutive growth seasons were exported from Scaffold and analyzed using QSpec [58] for significance analysis. Proteins were considered significantly different across sample conditions if QSpec reported a Bayes factor of > 10. This corresponds to a false discovery rate (FDR) of approximately 5%.

Proteomics Data Set
The data associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: Cf3G8KatEeCbDv2kV1Gnw4njaSYARJgmtyzYl +5764Gsbb/M3LX+/oo1zcHnHK1Gs0ukuBM5Rk +Q1t5hpia109pVPXkAAAAAAAAoLg==The hash may be used to prove exactly what files were published as part of this manuscript's data set, and the hash may also be used to check that the data has not changed since publication.

Additional material
Additional file 1: iCitrus database file in FASTA format. Citrus sequences from UC Riverside HarvEST:citrus (C46 assembly), NCBI/citrus/ unigenes and NCBI/citrus/proteins were used to creat iCitrus. Thedatasets were merged and identical sequences were filtered for redundancy (the longest sequences were kept). All sequences were blasted to TAIR, and separately to nr sequences belonging to taxa within Viridiplantae, in order to collect GO-term and descriptive annotations. The sequences are listed according to iCitrus accessions numbers followed with their HarvEST accession, NCBI\citrus\unigene accession or NCBI\Citrus\protein accession, Arabidopsis best homolog, annotation and amino acids sequence. Users can enter lists of citrus sequence ID's, which results in a table of ranked hits from a blast search of the citrus sequences against Arabidopsis or Viridiplantae sequences. ID's from 1 to 62415, representing the collected accessions, can be entered in the iCitrus interface ( Figure  1). Each citrus ID received its own section of the result table and each ID (hits) to TAIR proteins is separated into two blocks, defined by the high scoring pair wise (HSP)-to-query coverage cutoff that can be set on the front page. All BLAST hits with e-values than 1E-4 are reported, and no hits below that cutoff occurred for a particular sequence, an empty list is returned. The TAIR ID (AGI number) and NCBI gi number of the Arabidopsis or Viridiplantae protein similar to the citrus sequence are shown next, including links to TAIR and NCBI. Finally, GO annotations are listed when available. The final column "Annotation" contains TAIRspecific annotations that do not use the same terms as the Gene Ontology, but are available for the TAIR proteins. The data can be downloaded to any spreadsheet.
Additional file 2:  A (62,415 accessions). Column A consists of accessions from three databases: (1) NCBI/Citrus/Unigenes (accessions are numbered S#####) (2) HarvEST:Citrus ESTs (UC46_#####) (3) NCBI/Citrus/ Proteins (#####) and column B consist of the corresponding iCitrus ID's. iCitrus ID's organized in ascending order. A list of accessions, originated from the three databases that were found to be clustered together is shown in columns D-F. Column D consists of accessions that were found to be clustered with other accessions and column E consists of accessions that clustered with accessions in column D. Column F consists of the corresponding iCitrus accessions of the clustered accessions appeared in columns D and E. A list of accessions that are found in the databases (NCBI and HarvEST:Citrus) but are shorter than 50 AA between stop codons, are shown in column H. These sequences were taken out of iCitrus database and cannot be found in the FASTA file (Additional File 1). Fast conversion table between the different sources of sequences can be found in columns L-O.