Proteomic analysis of small acid soluble proteins in the spore core of Bacillus subtilis ΔprpE and 168 strains with predictions of peptides liquid chromatography retention times as an additional tool in protein identification

Background Sporulation, characteristic for some bacteria such as Bacillus subtilis, has not been entirely defined yet. Protein phosphatase E (PrpE) and small, acid soluble spore proteins (SASPs) influence this process. Nevertheless, direct result of PrpE interaction on SASPs content in spore coat of B. subtilis has not been evidenced so far. As proteomic approach enables global analysis of occurring proteins, therefore it was chosen in this experiment to compare SASPs occurrence in two strains of B. subtilis, standard 168 and ΔprpE, lacking PrpE phosphatase. Proteomic analysis is still a challenge, and despite of big approach in mass spectrometry (MS) field, the identification reliability remains unsatisfactory. Therefore there is a rising interest in new methods, particularly bioinformatic tools that would harden protein identification. Most of currently applied algorithms are based on MS-data. Information from separation steps is not still in routine usage, even though they also provide valuable facts about analyzed structures. The aim of this research was to apply a model for peptides retention times prediction, based on quantitative structure-retention relationships (QSRR) in SASPs analysis, obtained from two strains of B. subtilis proteome digests after separation and identification of the peptides by LC-ESI-MS/MS. The QSRR approach was applied as the additional constraint in proteomic research verifying results of MS/MS ion search and confirming the correctness of the peptides identifications along with the indication of the potential false positives and false negatives. Results In both strains of B. subtilis, peptides characteristic for SASPs were found, however their identification confidence varied. According to the MS identity parameter Xcorr and difference between predicted and experimental retention times (ΔtR) four groups could be distinguished: correctly and incorrectly identified, potential false positives and false negatives. The ΔprpE strain was characterized by much higher amount of SASPs peptides than standard 168 and their identification confidence was, mostly for alpha- and beta-type SASP, satisfactory. Conclusions The QSRR-based model for predicting retention times of the peptides, was a useful additional to MS tool, enhancing protein identification. Higher content of SASPs in strain lacking PrpE phosphatase suggests that this enzyme may influence their occurrence in the spores, lowering levels of these proteins.


Background
Although bacterial endospores have been studied for over 130 years, there are still questions about basic mechanisms of their unique features, i.e. high resistance to environmental stress, such as high temperatures, chemicals or radiation, which enable them long term survival in unfavorable conditions [1]. Bacillus subtilis (B. subtilis) is a sporulating, model organism often used in the biochemical, genetic and molecular research, concerning Gram positive bacteria. Its sporulation process is very complicated and requires space and time gene expression regulation. About 25% of genes in chromosome of this bacterium are involved in spore formation. What is more, there are already 154 proteins identified, characteristic exclusively for spores, which do not occur in vegetative cells [2]. Small acid soluble spore proteins (SASPs) have been evidenced to be one of specific spore components, which may have an influence on their resistance to unfavorable conditions [3]. The SASPs belong to a group of at least sixteen proteins found in the core of spores produced by B. subtilis [4][5][6]. Genes coding for those proteins are expressed only during late steps of sporulation, mainly in the forespore compartment under the control of sigma G RNA polymerase subunit [6]. Its major role is to bind to chromosomal DNA and convert into A form. This is unique property of SspA (alpha-type) and SspB (beta-type) to promote conformational change in DNA in aqueous solution [7]. The result of such conversion is the increase of UVresistance of spores. The major SASP are SspA and SspB (known also as alpha/beta-type SASPs) which may constitute 80% of all. The third major SASP is SspE protein (gamma-type). In contrast to alpha-and beta-type SASP, this protein exhibits only little homology among bacteria [8]. Moreover, it was previously shown that SspE protein does not bind to the chromosomal DNA and it was postulated that the protein has different physiological role, not identified yet [9]. It was also noticed that deletion of gene coding for one of α/β SASP leads to severe decrease of UV resistance of spores [10].
Protein phosphatase E (PrpE) is an enzyme of 27 kDa size, having in its structure motives characteristic for PPP protein phosphatases and diadenosinepolyphosphate hydrolases. PrpE phosphatase is a cytoplasmatic protein, present in a vegetative cell at a very small level as well as inside spores, in soluble and insoluble fractions. Some changes in spore coat of strain, which does not produce the functional PrpE protein, have been observed too. The research on the PrpE revealed that this enzyme is somehow implicated in expression of GerA germination receptors during sporulation, also directed by RNA polymerase with sigma G subunit [11].
This led to assume that lack of PrpE may also influence the expression of SASP.
To globally compare the occurrence of SASPs in standard 168 strain of B. subtilis and the one, lacking PrpE phosphatase, proteomic approach was chosen. However, despite of fast development of analytical tools in proteomics, identification and hence quantification of proteins and peptides, present in complex proteomic samples, still remains a challenge. Due to the great diversification and dynamic concentrations ranges of occurring peptides, one of the necessary steps in proteomic analysis is protein and peptides separation. Among many techniques, two-dimensional gel electrophoresis, liquid chromatography and capillary electrophoresis are used most frequently. The protein identification is then performed using mass spectrometry [12,13]. Huge amount of data coming from mass spectrometry analysis require bioinformatic tools to draw out right conclusions about the proteins presence and their concentrations in an analyzed sample. In general, many available database search algorithms, such as Mascot [14] or Sequest [15,16], identify analyzed protein samples by finding the best match between experimental spectra and theoretical ones, obtained for a set of possibly occurring peptides. True and false identifications are then distinguished by applying certain level of scoring threshold. However, in many cases, the confidence of identification is still unsatisfactory. It raises question if, on one hand, using high scoring criteria in filtering MS/MS spectra to lower the false discovery rate, the proteins that are really present in the sample, are not misidentified, or on the other hand, if lower them too much will not give untrue results. Therefore, there has been a raising interest observed in finding additional solutions lately, which may increase the identification reliability in proteomics. It is especially important in case of proteins that occur in low concentrations, and so are difficult to detect. Nevertheless they may remarkably influence the cell metabolism and for example may be used as biomarkers of certain diseases or help understanding biological processes. There are various approaches, aiming to raise protein identity, which use the mass spectrometry data [17,18]. One of most often applied strategies is Target-decoy approach [19]. There are also strategies, which additionally use information from separation step, for example from liquid chromatography, commonly combined on-line with mass spectrometer. A raising interest in application of peptides retention times prediction to protein identification in proteomics is recently observed [20][21][22][23][24]. Several ideas for predicting peptides retention times such as artificial neural networks [25,26] or regression models [27,28] are used. In case of multivariate modeling, the quantitative structure-retention relationships (QSRR) are often applied to predict the retention times of the test set of analyzed samples basing on the data from the model set. They are derived using statistic methods, relationships between chromatographic parameters and descriptors characterizing molecular structure of the analytes [29][30][31]. In this experiment a quantitative structure-retention relationships (QSRR) approach in multiple linear regression model (MLR) was used to build a model for predicting peptides retention times. The predicted (t R pred) and experimental (t R exp) retention times were then compared, and depending on the difference between them, the properly or improperly identified peptides were determined.
The aim of this project was to perform proteomic analysis of changes in small acid soluble proteins composition of spore coat produced by B. subtilis strain, lacking PrpE phosphatase, in comparison to standard 168, with the use of QSRR-based approach to predict peptides retention times and apply them as additional to MS/MS ion search parameter enhancing confidence of protein identification.

Standard amino acids and model proteins solutions preparation
The standard amino acids solutions were prepared by dissolving in 0.1% aqueous solution of trifluoroacetic acid (TFA) (about 0.6 mg/mL). The solutions of standard proteins were obtained by dissolving the lyophilized standard proteins in deionized water (about 3 mg/mL). Then samples were treated as it is shown below in digestion protocol.

Bacillus subtilis sample preparation
The growth conditions of Bacillus subtilis strains 168 and ΔprpE, spore purification and protein extraction procedures were as previously described [33]. After that, the samples were treated according to the below presented digestion protocol.

Digestion protocol
To 1 mL of each model protein sample (~3 mg/mL), 300 μL of DTT (100 mM, freshly prepared in 100 mM ammonium bicarbonate buffer, pH 8.5) was added. The samples were kept in 60°C for 30 min, to enable the disulfide bridges reduction. After that, to each sample, 50 μg of trypsin was added (ratio 1:50 E/S). They were digested for 12 hours (overnight digestion) at 37°C. Then 0.1 mL of TFA was added to each sample to stop the digestion. The standard solutions concentrations were about 50 pmol/μL. 150 μL of DTT (100 mM, freshly prepared in 100 mM ammonium bicarbonate buffer, pH 8.5) was added to 1 mL of Bacillus subtilis spore cells lysates (1.2-1.5 mg/ mL). The samples were stored in 60°C for 30 min, to allow reduction of the disulfide bridges. Next, 25 μg of trypsin was added (ratio 1:50 E/S) to each sample, which were digested for 12 hours (overnight digestion) at 37°C. Then 0.05 mL of TFA was added to each sample to stop the digestion. Received standard solutions concentrations were about 50 pmol/μL. Tryptic digests were stored at -20°C (in this reaction mixture the disulfide bonds would not reoxidase if frozen). The LC-ESI-MS/MS analyses were carried out in three weeks at the latest (the shelf life of such frozen solution is couple of months) [34].

LC conditions
The LC-MS apparatus was equipped with surveyor autosampler controlled at 20°C and thermostated column oven (Thermo Finnigan, San Jose, CA, USA), a quaternary gradient Surveyor MS pump (Thermo Finnigan, San Jose, CA, USA) with a diode array detection (DAD) system, and LTQ linear ion trap MS system with ESI ion source controlled by Xcalibur software 1.4 (Thermo Finnigan, San Jose, CA, USA).
The mobile phase consisted of two solvents (A and B) mixed on-line. Solvent A was 0.1% aqueous solution of TFA and solvent B was ACN containing 0.1% TFA. The linear 90 min gradient time, from 0% B to 60% B, was applied. The flow rate was 200 μL/min. The injection volume was 10 μL.

MS conditions
The MS/MS analysis was performed on Finnigan LTQ instrument (Thermo Finnigan, San Jose, CA, USA). The constant instrumental conditions, applied to generate mass spectra in positive ion mode, were as following: source voltage 4.62 kV, capillary voltage 40.97 V, sheath gas flow rate 39.99 (arbitrary units), auxiliary gas flow 10 (arbitrary units), sweep gas flow 0.95 (arbitrary units), capillary temperature 219.96°C, tube lens voltage 250.43 V. The collision-induced dissociation in the linear ion trap was used to generate MS/MS spectra. They were performed with an isolation width 3 Da (m/z), the activation amplitude was 35% of ejection RF amplitude, which corresponds to 1.58 V.

Protein identification
The peptides m/z values were measured manually for the most intense peaks in acquired MS/MS spectra and automatically searched against the protein database (*fasta, downloaded from Expasy [35]) with the use of the Sequest Algorithm, included in Bioworks 3.0 (Thermo Finningan, San Jose, CA, USA). Experimental retention times (t R exp ) of the analyzed peptides were defined at peak intensity maximum. Washburn et al. [36] filtering criteria were employed in the interpretation of the results obtained after the correlation analysis done on the peptides' experimental and the predicted retention times. The spectra for singly charged peptides with a cross-correlation score to a tryptic peptide (X corr ) higher than 1.9, for doubly charged tryptic peptides with X corr over 2.2 and the spectra for triply charged tryptic peptides with X corr of at least 3.75 were accepted as correctly identified using Sequest software. All the analyzed spectra were characterized by ΔC n values above 0.08.

QSRR analysis
The structural descriptors: logarithm of sum of retention factors of amino acids building certain peptide increased with one (log Sum (k+1) AA ) and a calculated logarithm of n-octanol-water partition coefficient (c log P) of the analyzed peptides from investigated, standard proteins and B. subtilis cell lizates were calculated. The log Sum (k+1) AA descriptor was calculated using retention data for 7 the most retained amino acids (isoleucine, leucine, methionine, phenylalanine, tryptophan, tyrosine and valine). The k values for other, hardly retained amino acids, were ascribed (k = 0) and one was added to avoid zero in the calculation of the logarithm [37]. The c log P values were calculated applying average log P module in ALOGPS 2.1 software http://www. vcclab.org.
Then, multiple linear regression equation for model set of peptides based on the experimental retention times was derived using Microsoft Excel software (Microsoft Co., Redmond, WA, USA) and Statistica (StatSoft, Tulsa, OK, USA) run on a personal computer. Regression coefficients (± standard deviations), multiple correlation coefficients, R, standard errors of estimate, s, significance levels of each term and of the whole equations, p, and values of the F-test of significance, (F) were calculated. The general form of QSRR equation to predict peptides retention times is: where t R is the gradient HPLC retention time and k 1 -k 3 are regression coefficients.
Finally, the following general equation, with a satisfactory statistical quality, was derived:  (Table 1) and verified with a test set of 21 peptides. They were identified by LC-ESI-MS/MS analysis of 8 model proteins [36]. Due to the fact that X corr values are used as indication of proper or improper match between theoretical and experimental spectra, the model set included peptides with the best identification reliability, i.e. with the highest X corr score, whereas for test peptides these values were lower, but still qualifying them to correctly identified ones.

Results and discussion
A statistically reliable QSRR equation (Equation 2), derived using a model set of peptides from 8 model proteins, was applied to predict retention times of peptides from group of small acid soluble proteins from proteomes of Bacillus subtilis strains ΔprpE and 168. The data were then analyzed together with usage of Sequest software with above mentioned threshold level.
The gradient retention time prediction of analyzed peptides, originated from several types of small, acid soluble spore proteins (SASPs) from proteomic samples of both analyzed strains of Bacillus subtilis, enabled distinguishing them into four groups, depending on their identification confidence ( Table 2 and Table 3). In the first group there were peptides identified with high X corr values and differences between their experimental (t R exp) and predicted (t R pred) retention times (Δt R ) lower than 5 minutes (from 0.01 to 3.90 min). It can be noticed that small differences between predicted and experimental retention times correspond with proper determination level of peptide presence in analyzed sample. The second group consisted of peptides, which identification reliability was poor according to their X corr values, and their differences between predicted and experimental retention times were characterized by high values as well (from 5.66 up to even 83.74 min in 90 min run). In this case, detailed comparison between MS and MS/MS spectra proved that the matches between theoretical and experimental ones were not good. However in different parts of chromatogram the parent ions of certain m/z values could be found, what indicated that they possibly originated from peptides of another sequences. Hence, it may be observed that low X corr scores correlate with big differences between predicted and experimental retention time of certain peptide, what additionally proofs their improper identification. Peptides from the third group were described by X corr values classifying them to correctly identified ones, but their Δt R were between 6.41 to 10.05 min. It may suggest that some of them could be potential false positives. Therefore further examination, whether they are present or not in analyzed samples, would be useful. The identification confidence of peptides from fourth group was insufficient, considering X corr scores, however their Δt R values were low (from 1.90 to 4.57 min). This may indicate that some of them could be potential false negatives. In this case additional experiment, proving their real occurrence in any of the analyzed strains of B. subtilis, could help to distinguish right and wrong identifications as well.
The ΔprpE strain was characterized by big amount of various peptides coming from SASPs of alpha-, betaand gamma-type. There were also detected peptides, characteristic for SASPs, however it was not possible to distinguish definitely from which of them. Only alphaand beta-type SASPs were identified with a satisfactory level of confidence according to X corr scores and their Δt R . In the incorrectly identified group were peptides typical for alpha/beta-type, gamma-type SASP, SspI and Ssp Tlp. In the group of potential false positives there were peptides characteristic for gamma-type SASP.
Although the MS confirmation of their presence in the sample was good, the Δt R values were high, what may suggest, that they did not occur. One peptide of betatype SASP and two from SASPs in general characterized with a low Δt R values, but their MS identification was insufficient for the applied threshold level. It may mean that these peptides were falsely classified to the improperly identified according to X corr score, and the proteins, for which they are typical, really were present in the analyzed sample (Table 1). In contrast, in standard 168 strain, lower amount of peptides from SASPs was detected. Moreover, only two peptides, from alpha-and beta-type SASPs, were identified correctly according to X corr scores and Δt R values as well. Most of the peptides, characteristic for alpha/beta-, beta-and gammatype SASP were incorrectly identified considering their X corr and Δt R values, hence they were not present in the analyzed sample. There was only potential false negative peptide of a sequence typical for SASP, because, in spite of the fact that its X corr was low, Δt R value indicated that it was correctly identified. In this strain no potential false positives could be found (Table 2).

Conclusions
Thanks to proteomic approach applied in this experiment, it was possible to analyze the whole protein content at once, what enabled easier distinction between both strains of B. subtilis: the standard 168 and the one lacking PrpE phosphatase (ΔprpE), in view of small, acid soluble spore proteins (SASPs) occurrence.
A QSRR-based retention time prediction model revealed to be a useful tool, supporting MS/MS ion search, in analysis of small acid soluble proteins (SASPs) from two Bacillus subtilis strains. The PrpE phosphatase lacking strain was characterized by the occurrence of alpha-and beta-type SASPs, which identification confidence was proved both with Sequest X corr values and small Δt R . The gamma-type SASP proved to occur in the ΔprpE strain, however the Δt R values suggest, that it might be potentially false positive identified protein. The wild 168 strain was characterized by poor content of SASPs and, moreover, they were identified basing only on one peptide occurrence, hence, according to proteomic standards, may not be really present. This suggests that the absence of PrpE phosphatase results in higher amount of SASPs, especially alpha-and beta-type, in the spores.