MALDI/MS peptide mass fingerprinting for proteome analysis: identification of hydrophobic proteins attached to eucaryote keratinocyte cytoplasmic membrane using different matrices in concert

Background MALDI-TOF-MS has become an important analytical tool in the identification of proteins and evaluation of their role in biological processes. A typical protocol consists of sample purification, separation of proteins by 2D-PAGE, enzymatic digestion and identification of proteins by peptide mass fingerprint. Unfortunately, this approach is not appropriate for the identification of membrane or low or high pI proteins. An alternative technique uses 1D-PAGE, which results in a mixture of proteins in each gel band. The direct analysis of the proteolytic digestion of this mixture is often problematic because of poor peptide detection and consequent poor sequence coverage in databases. Sequence coverage can be improved through the combination of several matrices. Results The aim of this study was to trust the MALDI analysis of complex biological samples, in order to identify proteins that interact with the membrane network of keratinocytes. Peptides obtained from protein trypsin digestions may have either hydrophobic or hydrophilic sections, in which case, the direct analysis of such a mixture by MALDI does not allow desorbing of all peptides. In this work, MALDI/MS experiments were thus performed using four different matrices in concert. The data were analysed with three algorithms in order to test each of them. We observed that the use of at least two matrices in concert leads to a twofold increase of the coverage of each protein. Considering data obtained in this study, we recommend the use of HCCA in concert with the SA matrix in order to obtain a good coverage of hydrophilic proteins, and DHB in concert with the SA matrix to obtain a good coverage of hydrophobic proteins. Conclusion In this work, experiments were performed directly on complex biological samples, in order to see systematic comparison between different matrices for real-life samples and to show a correlation that will be applicable to similar studies. When 1D gel is needed, each band may contain a great number of proteins, each present in small amounts. To improve the proteins coverage, we have performed experiments with some matrices in concert. These experiments enabled reliable identification of proteins, without the use of Nanospray MS/MS experiments.

coverage, we have performed experiments with some matrices in concert. These experiments enabled reliable identification of proteins, without the use of Nanospray MS/MS experiments.

Background
One of mass spectrometry's major concerns the acquisition of complete sequence information from biopolymers such as proteins. The development of both matrix-assisted laser desorption-ionisation (MALDI) [1] and electrospray ionisation (ESI) [2] has significantly contributed to reach this goal. Proteomics focuses on the identification of a large number of proteins from cellular extracts. Biological samples normally contain numerous proteins, and the complexity of these materials requires separation steps. The most common method involves the following steps: separation of proteins from a given biological sample by gel electrophoresis [3], excision of spots from the gel and digestion by trypsin [4], and extraction from the gel and analysis by mass spectrometry [5]. Protein identification is achieved by mass fingerprint of peptides derived from proteins in databases. Membrane or low or high pI proteins cannot be separated efficiently by two-dimensional (2D) gel electrophoresis [6]. In such case, separation can only be achieved by one-dimensional (1D) gel electrophoresis can achieve separation, leading to more than one protein per gel band. In a direct analysis of the mixture of peptides generated from digestion of such proteins, suppression effects can occur. This phenomenon is attributed to matrix effects in MALDI [7] or solvent effects in ESI. However, in a mixture of a high number of peptides, peaks for all components are usually not observed, which leads thus to a poor coverage of protein sequence. One explanation could be that the presence of one given peptide prevents the response of another. Therefore, the choice of an adequate matrix plays an important role in peptides desorption. Some matrices can be complementary in the obtention of high protein sequence coverage. Such is the case for low-mass peptide ions compared to higher mass peptides, or for hydrophobic compared to hydrophilic peptides. Several methods have been employed to enhance the quality of the mass spectra and the number of desorbed peptides in a single sample. Various sample preparation methods have been tested [8], and some studies have reported the use of new acidic matrices [9] (with a pyruvic acid function instead of benzoic acid) or basic matrices [10]. Furthermore, compounds have been added to MALDI samples, as ammonium salts [11] or acids, or by using co-matrices, which may provide the most general and simplest means of improving the current matrix systems. These findings serve different purposes: (1) increase the homogeneity of the matrix/analyte deposit, (2) decrease the levels of cationisation, (3) increase ion yields, and (4) increase sample-to-sample reproducibility. Different studies concerning the comparison of different matrix have been published, and some recommendations for peptide mapping were established: HCCA is a good matrix for peptides with mass ions below 2500 Da [7,12], SA is recommended for higher masses (> 2500 Da) [13]. DHB [8,14,15] or 3-HPA [16] is recommended for hydrophobic peptides or peptides difficult to be ionised such as glyco-or phospho-peptides. In this work, we present new experimental evidences that some peptides, which are not desorbed from one matrix, are desorbed from another. To our knowledge, this is the first study where the combination of different matrices in the analysis of a complex biological sample (e.g. protein extract from a keratinocyte cell line separated by 1D SDS-PAGE), allowed to increase the coverage for protein identification. The results lead to recommend, when performing MALDI experiments, the use of at least two different matrices, in the case of weak ESI response, due to low level of ionisation efficiency.

Matrix and preparation selections
There is no universal sample preparation or matrix yielding to good results for a broad variety of peptides. Some reports [7,8,13,16] described criteria for sample preparations and matrix selection for peptide analysis. They also emphasise the fact that experimental conditions must be optimised. When sample to be analysed is very small, the problem becomes crucial and no optimisation is possible. After extraction from gel and desalting of peptides, the sample is usually diluted in very small volumes (3 µl) and all experiments must be performed on it. Here we propose that MALDI/MS experiments be performed with at least two different matrices with the dried droplet sample preparation. Indeed, three major techniques of deposit exist [8]: the dried droplet method [17], the thin-layer method, and the sandwich method. We chose to use the dried droplet method, which is the simplest one, and can be used with almost all matrices. The following conventional matrices were used: α-cyano-4-hydroxycinnamic acid (HCCA) [18], 2,5-dihydroxybenzoïc acid (DHB) [19] and sinapinic acid (SA) [20]. In order to detect the most hydrophobic peptides or proteins, a mixture of three matrices (2,5-dihydroxybenzoïc acid (DHB), 2-hydroxy-5methoxy benzoic acid (DHBs) and succinic acid [21]) was also tested. Each experiment was performed on three samples extracted from 1D gel, corresponding approximately to molecular weights ranging from 51 to 61 kDa.

Choice of parameters to identify proteins by peptide mass fingerprint using three-database softwares
The peptide mass fingerprint of a protein is believed to be specific enough to identify the protein solely by the comparison of the peptide mass values measured with those calculated by applying the corresponding enzyme cleavage rules, using an appropriate scoring algorithm. Three major computer algorithms are free on the WEB: Protein Prospector, Mascot and Profound. We decided to use all these algorithms to ensure the identification of proteins in each 1D gel band. Indeed, as several proteins are assumed to be present within each band, the use of an algorithm able to identify mixture of protein is recommended (Mascot or Profound), even if Protein Prospector is capable of generating an exhaustive list of susceptible proteins. Monoisotopic peptide masses were searched using SWISS-PROT database with a mass tolerance of 20 ppm for bands at 53 and 51 kDa and 50 ppm for the band at 61 kDa. Where possible for instance, after an internal calibration with trypsic auto digestion peptides, a mass tolerance of 20 ppm is recommended in order to identify proteins in a reliable way. After this step, it is possible to broaden the search to 50 ppm in order to increase the coverage of identified proteins. Table 1 (see Additional file 1) displays the list of proteins identified in the three bands by these software's from ions displayed on spectra generated for each matrix ( Figure 1). The non-redundant ions of the four lists were joined in order to create a mixed list. The major problem was the large number of peptide ion masses able to match with non-present proteins. Cautions must be taken to avoid false identification. After a search in databases at 50 ppm, the matched peptide ions must be superior to 50% of the entire list of peptide ions. In our case, this requirement leads to the selection of peptide ions with a relative intensity higher than approximately 10% above the baseline (11% for instance for peptides arising from the digest of the 53 kDa band). For instance, the number of matched peptide ions generated from digest of the 53 kDa band were 27 to 37 (73%) with DHB ( Figure 1a Table 1 (see Additional file 1) shows that the identification of proteins present in each band is coherent between the three software's. High scores are obtained with values greater than 60 with Mascot (minimum limit for identity), probability of 1 or 0.99 and "Est'dZ" values superior to 1.65 with Profound (minimum limits for identity), and mowse scores superior to 2000 with Protein Prospector (no minimum limit given, and no possibility to identify a mixture of proteins); the comparison of the results arising from the three algorithms seems to show that a higher confidence is obtained with the Profound algorithm. Indeed, two parameters are evaluated: probability and score (Est'dZ). Furthermore, this algorithm seems to be the most appropriate to detect proteins in a mixture. So, we recommend to start the analysis with the Protein Prospector algorithm to generate an exhaustive list of susceptible proteins followed by an analysis with the Profound program.

Evidence of Matrix effects on protein identification by peptide mass fingerprint
Striking effects on the MALDI spectra of peptide solutions have been observed according to the matrix solution used. An example of these effects is illustrated in Figure 1, which compares mass spectra of the sample at 53 kDa prepared in four different matrices, using the dried-droplet method [17]. Figure 1a shows the spectrum of the digest obtained by mixing the tryptic digest with a matrix solution of 60 µg/µl DHB in methanol. This unconventional matrix concentration has been previously optimised for a digest of bovine serum albumin. Figure 1b represents the spectrum of the digest obtained from a mixture of three matrices (DHBsT) [21] consisting of a 50/45 formic acid]. This spectrum was noisier than the one obtained with DHB and almost peptide ions were common to both experiments. Figure 1c shows the spectrum of the digest obtained from a matrix solution of 10 µg/µl HCCA in methanol. In this case, less low mass peptide ions were desorbed when compared to the spectrum obtained with DHB [7,12]. We can note that the intensity of some ions was enhanced such as, for example, that of the ion at m/z 1104.57 (SAYGGPVGAGIR of cytokeratin 7). When a matrix solution, obtained by mixing 10 µg/µl SA solution in 30/50 (v/v) CH 3 CN/1% formic acid was used (Figure 1d), the resulting spectrum shows a lower number of peptide ions, but with higher masses [13]. Figure 1 clearly demonstrates that peptide ions were desorbed differently according the matrix used.
The analysis of Table 1 (see Additional file 1) leads to the following remarks: (1) identification of proteins present in each band is coherent with the four matrices; (2) experiments with DHB and HCCA always lead to interpretable spectra, but not with DHBsT (51 kDa band) or SA (61 kDa band); (3) the mixed list of ions generated from the four experiments always lead to a greater number of matched peptides, higher scores and better coverage of proteins. For instance, for the band at 53 kDa, the coverage of cytokeratin 8 (obtained with the Profound algorithm) is of 14 % with the list of ions obtained from the experiment in the SA matrix and 28 % with the mixed list; (4) the mixed list of ions lead to a greater number of fitting protein. Thus, for the band at 53 kDa, only the mixed list allows the identification of tubulin beta 5 or beta 2 in the sample.

d) SA
The analysis of the 1D gel bands allowed to us identify a mixture of three cytokeratins 5, 1 and 9 in the band at 61 kDa, with a respective coverage of 15, 18 and 20 % with HCCA sample preparation, and 23, 20 and 22 % from the mixed list generated with the four matrices (50 ppm). A mixture of three proteins was identified in the band at 53 kDa: cytokeratins 8 and 7, and tubulin beta 5 or beta 2, with respective coverages of 12, 0 and 0 % with HCCA sample preparation, and 28, 26 and 13 % from the mixed list generated with the four matrices (20 ppm). The matched peptide ions did not allow us to discriminate between tubulin beta 5 and tubulin beta 2. In the band at 51 kDa, a mixture of at least two proteins was identified containing tubulin beta 2 and/or beta 1 and a tubulin alpha 1 and/or alpha 6, with respective coverages of 11 and 17 % with SA sample preparation, and 23 and 20 % from the mixed list generated with the four matrices (20 ppm).

Analysis of matched peptides based on hydrophobicity
In the literature, the use of HCCA matrix for of tryptic digest analysis [7,12] and DHB matrix when the tryptic digest is generated from hydrophobic proteins [8,14,15] is usually recommended while SA has been more efficiently used to detect high mass peptide ions [13]. In order to validate these established results on a complex biological sample, based on hydrophobicity, we have calculated two hydrophobic parameters namely "LogP" and "Gravy" for each matched peptide of our experiments. The "LogP" value [22,23] is the logarithm of the partition coefficient between n-octanol and water phases. The "Gravy" parameter [24] corresponds to the Grand average of hydropathicity of peptides. The common hydropathy index, defined at one specific position in a sequence, is the mean value of the hydrophobicity (tendency to avoid water) of the amino acids within a window, usually 19 residues long, around each position. In transmembrane helices, the hydropathy index is high for a number of consecutive positions in the sequence. The "GRAVY" is the average value of the hydropathy index at each position. Table 2 (see Additional file 2) shows the matched peptides in the four matrix preparations for the three bands and the calculated values of LogP and Gravy. For both parameters, a negative value means a hydrophilic peptide and a positive value corresponds to hydrophobic peptide. We can note from Table 2 (see Additional file 2) that DHBsT is the "matrix", which desorbs the higher number of peptide ions. A homogeny spot was difficult to obtain with this "matrix", which is a mixture of three matrices (DHB, DHBs and succinic acid). This can explain why no results were obtained from sample the 51 kDa. HCCA and DHB allow the detection of a relatively high number of peptide ions too. Concerning the hydrophobicity parameters, Table 2 (see Additional file 2) shows that tubulins peptides are more hydrophobic than cytokeratins peptides. Indeed, a positive LogP value (hydrophobic peptide) corresponds to a negative Gravy value (hydrophilic peptide). One example is the peptide NSSYFVEWIPNNVK (M = 1695.815) (band at 51 kDa, tubulin beta 2), which has a LogP value of +6.18, and a Gravy value of -0.54. So, we will discuss about hydrophobicity and mass range, only from peptides having both positive and negative LogP and Gravy values (bold and underline in Table 2 (see Additional file 2) respectively). We considered too, peptides with very different values of LogP and Gravy, as for instance for the peptide VGINYQPPTVVPGGDLAK (M = 1823.978), with a LogP value of +11.07 and a Gravy value of +0.02 (band at 51 kDa, tubulin alpha 1 or alpha 6). The LogP values of matrices are -0.85 for succinic acid, +0.41 for HCCA, +0.65 for DHB, +1.26 for DHBs, and 2.34 for SA (calculated online with the web site http://www.unibas.ch/mdpi/ ecsoc/e0002/logpcalc.htm). It is known that highly hydrophobic matrices are more efficient to detect highly hydrophobic peptides. Concerning cytokeratins 1, 9 and 8, which present only hydrophilic matched peptides generated by tryptic digest, Table 2 (see Additional file 2) shows that the use of HCCA in concert with SA was needed in order to obtain a good coverage of proteins. For instance, we observed suppression effects for ions generated from tryptic digest of cytokeratin 9 (band at 61 kDa) in DHB matrix when compared to HCCA matrix. HCCA is used to detect low mass peptide ions, as opposed to SA which desorbs high mass peptide ions in spite of its relatively high hydrophobicity. However, some particular samples may not give results with SA (sample at 61 kDa for instance). Concerning the other identified proteins, with hydrophilic and hydrophobic matched peptides; the use of DHB in concert with SA showed the higher coverage. This

Conclusions
The direct analysis of a mixture of peptides, resulting from proteolytic digestion is often a problem because of of peptide detection and consequent weakness of sequence coverage in databases. Furthermore, peptide identification with MALDI is also submitted to signal suppression effects that occur in complex mixtures of peptides. This behaviour, which is enhanced when membrane or low or high pI peptides are concerned, is related with hydrophobic / hydrophilic matrix properties. A solution to enhance sequence coverage lies in the combination of several matrices. Our goal was to make our MALDI analysis performed on complex biological samples more reliable by increasing the number of matched peptides and then the cover-age and thus increasing the confidence in the identification of proteins.
In this study, we used four different matrices in concert to improve the identification of proteins. Our aim was to enhance the sequence coverage of proteins obtained from the characterisation of overlapping peptides. Our results show that the use of at least two matrices in concert allows a significant increase in the number of peptides identified per protein. The twofold increase in sequence coverage improves the confidence of the assignment. From the analysis of desorbed peptide based on mass range and hydrophobicity (LogP and Gravy values), we recommend the use of HCCA in concert with SA matrices in order to obtain a good coverage of hydrophilic proteins, while hydrophobic proteins can have their coverage increased by the use of DHB in concert with SA matrices. The risk to obtain no results from SA preparation can be avoid by using HCCA in concert with DHB matrices, knowing that high mass peptide ions will be not efficiently desorbed. The comparison of the results generated from the three algorithms seems to show that a higher confidence is obtained from the Profound algorithm. Indeed, two parameters are evaluated: probability and score (Est'dZ). Furthermore, this is the most appropriate algorithm to detect proteins in a mixture. We thus recommend the use of the Protein Prospector algorithm in order to generate an exhaustive list of susceptible proteins followed by analysis with the Profound software. The main motivation for this study was to determine a reliable protocol to identify complex biological samples. Our results show that the identification of hydrophobic proteins can be improved with this protocol which is not sample consuming (1.5 µl of trypsic digest) and can be used even if th use of nanospray/MS/ MS remains possible for some peptides. Indeed, after the desalting procedure, samples were generally diluted in 3 µl of solution and only 1.5 µl was enough to perform MS/ MS. Work is currently in progress to analyse the other bands of the gel, using our protocol for hydrophobic proteins identification.

Chemicals
2,5-dihydroxybenzoic acid (DHB), α-cyano-4-hydroxycinnamic acid (HCCA), sinapinic acid (SA), succinic acid and 2-hydroxy-5-methoxy benzoic acid (DHBs) were purchased from Sigma-Aldrich. Methyl cyanide (ACN) and formic acid were purchased from Prolabo without further purification. Water was of Milli-Q grade. (1) with the ion list from each experiment and, (2) with a mixed list containing non redundant ions. Monoisotopic peptide masses were matched against the SWISSPROT non-redundant database using 20 or 50 ppm mass tolerance, limited to the Homo Sapiens proteins and with a minimum of matched peptides of 5 (with Protein Prospector). The protein molecular mass was adjusted to ± 50% of the 1D SDS-PAGE gel determined molecular mass. Alkylation of cysteines by acrylamide was considered but no missed cleavage was allowed.