Optimization and evaluation of surface-enhanced laser-desorption/ionization time-of-flight mass spectrometry for protein profiling of cerebrospinal fluid

Cerebrospinal fluid (CSF) potentially carries an archive of peptides and small proteins relevant to pathological processes in the central nervous system (CNS) and surrounding brain tissue. Proteomics is especially well suited for the discovery of biomarkers of diagnostic potential in CSF for early diagnosis and discrimination of several neurodegenerative diseases. ProteinChip surface-enhanced laser-desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) is one such approach which offers a unique platform for high throughput profiling of peptides and small proteins in CSF. In this study, we evaluated methodologies for the retention of CSF proteins < 20 kDa in size, and identify a strategy for screening small proteins and peptides in CSF. ProteinChip array types, along with sample and binding buffer conditions, and matrices were investigated. By coupling the processing of arrays to a liquid handler reproducible and reliable profiles, with mean peak coefficients of variation < 20%, were achieved for intra- and inter-assays under selected conditions. Based on peak m/z we found a high degree of overlap between the tested array surfaces. The combination of CM10 and IMAC30 arrays was sufficient to represent between 80–90% of all assigned peaks when using either sinapinic acid or α-Cyano-4-hydroxycinnamic acid as the energy absorbing matrices. Moreover, arrays processed with SPA consistently showed better peak resolution and higher peak number across all surfaces within the measured mass range. We intend to use CM10 and IMAC30 arrays prepared in sinapinic acid as a fast and cost-effective approach to drive decisions on sample selection prior to more in-depth discovery of diagnostic biomarkers in CSF using alternative but complementary proteomic strategies.


Background
Human cerebrospinal fluid (CSF) is largely produced by the highly vascular choroid plexus [1]. CSF continuously circulates through cavities in the brain and spinal chord and in the subarachnoid space, and contains peptides and proteins that play critical roles in many physiological processes [2]. Its proximity to the brain and the little risk involved with procuring CSF samples from individuals makes it an appropriate source of protein biomarkers for neurodegenerative disease. CSF is in direct contact with the extracellular space of the brain, and so contains some proteins and other products of neural cell origin. As such, any variation in protein composition or abundance relative to normal CSF may potentially reflect pathological processes in the surrounding brain tissue and other parts of the central nervous system (CNS) [1,[3][4][5][6]. Monitoring a combination of biomarkers in CSF exhibiting high sensitivity and specificity offers the potential for aiding early stage diagnosis, when correct diagnosis is often difficult, and when therapeutic compounds have the greatest potential for being effective. Biomarkers could also be used as quantitative indices of disease progression and response to therapeutics, and for discriminating early or incipient Alzheimer's disease (AD) from age-associated memory loss impairment, depression, and some secondary dementias.
Similar to plasma, the predominant proteins in CSF are isoforms of serum albumin, transferin and immunoglobulins, which represent more than 70% of the total protein amount. Furthermore, an unwanted high dynamic range of protein abundance is found in CSF, making the detection of lower abundance proteins extremely challenging with the current analytical methods. An additional challenge with analyzing CSF is protein concentration. On average CSF contains 100 fold less protein than plasma, therefore, necessitating the need for larger sample amounts relative to plasma. A variety of proteomic approaches have recently been used to characterize the peptide and protein composition in CSF. Most of these proteomic technology platforms are centered around the implementation of mass spectrometric techniques in conjunction with several other analytical techniques such as gel electrophoresis, isoelectric focusing, and liquid chromatography (LC) [3,[7][8][9][10][11]. While these approaches provide a large amount of data and can identify hundreds of proteins, they are generally very time consuming and hence restrictive in the number of comparative samples that can be analyzed.
Surface-based enrichment approaches in combination with MS is one such approach which offers a unique platform for high throughput CSF protein profiling. Protein-Chip surface enhanced laser desorption/ionization timeof-flight mass spectrometry (SELDI-TOF-MS) technology (Ciphergen Biosystems Inc., Fremont, CA) was developed to facilitate the high-throughput analysis of proteins in complex biological samples such as body fluids [12][13][14][15][16][17]. This technology uses chip-based protein sample arrays with different chromatographic surfaces designed to capture and retain subsets of proteins based on specific protein characteristics such as affinity, charge, hydrophobicity, and metal-binding capabilities. After a series of binding and washing steps of the chromatographic surfaces, matrix is added to the spots and the samples are analyzed by laser desorption/ionization-TOF-MS generating mass/charge profiles of the applied sample [18]. Proteomic expression patterns derived from mass spectrometry have been put forward as potential biomar-kers of clinical relevance [19][20][21]. Such spectral profiles can be compared to uncover patterns of differential abundance and aid in the identification of diagnostic patterns of disease and toxicity [13,15,[22][23][24][25]. Moreover, surfacebased enrichment approaches have the potential to capture and enrich for low abundant, low molecular weight species [12,17,22,26,27]. The low molecular weight region of CSF comprising of peptides and fragments of proteins remains relatively unexplored and represents a potential treasure trove of histopathological information.
Although the ProteinChip SELDI-TOF approach is a straightforward, robust platform for high throughput protein profiling, much has been discussed concerning its poor representation of the proteome, particularly for proteins above 20 kDa in mass. In this paper, however, we take advantage of the technologies potential for screening peptides and small proteins between the 2-20 kDa mass window range, and its requirement of only small sample volumes for analysis. As with any technology, experimental procedures must be optimized and reproducible to ensure consistent data output. Therefore, the aim of this paper was to improve on existing methodologies to identify effective conditions of retention for profiling proteins in the low molecular weight region of the CSF proteome.

Assessment of ProteinChip array types, buffer conditions and matrix for profiling CSF
Four ProteinChips (CM10, Q10, H50 and IMAC30) were used for the detection of proteins present in CSF between the m/z range of 2.5-20 kDa. In order to identify effective conditions of protein retention, pooled human CSF samples were prepared under native, denatured, and denatured/reduced conditions, and analyzed on ProteinChip arrays processed using different buffer conditions and matrices. For each condition of retention the samples were processed in triplicate. The selection criteria used to determine the choice of conditions for profiling CSF was dependent on both the number and quality of resolved peaks within the mass spectra. Tables 1 and 2 summarize the number of peaks automatically detected between the m/z range of 2.5-20 kDa across all tested conditions using either SPA or CHCA as the energy absorbing matrices. Peaks were required to have a signal to noise ratio of 3 or greater in order to be considered. The representative profiles obtained from the tested conditions on arrays processed with SPA and CHCA are shown in figures 1 and 2, for the respective matrices. Based on total peak count, denatured samples consistently demonstrated a higher number of resolved peaks when compared with samples prepared under native or denatured/reduced conditions. This was the case across all four ProteinChip arrays processed with either SPA or CHCA. When comparing peak count between CM10 and Q10 arrays, slightly better pro-Representative SELDI-TOF MS spectra of pooled CSF sample obtained from all tested conditions using SPA as matrix Figure 1 Representative SELDI-TOF MS spectra of pooled CSF sample obtained from all tested conditions using SPA as matrix. CSF was processed on CM10, Q10, H50 and IMAC30 ProteinChip arrays prepared under the following conditions: Representative SELDI-TOF MS spectra of pooled CSF sample obtained from all tested conditions using CHCA as matrix Figure 2 Representative SELDI-TOF MS spectra of pooled CSF sample obtained from all tested conditions using CHCA as matrix. CSF was processed on CM10, Q10, H50 and IMAC30 ProteinChip arrays prepared under the same conditions as described in figure 1, but with CHCA as the energy absorbing matrix. files were obtained with both array types processed in the absence of Triton X-100. In the case of IMAC30, surface activation with copper provided the higher peak number when compared to activation with nickel, whereas with H50, higher peak numbers were obtained with 10% AcN/ 0.1% TFA as binding buffer compared to PBS. Typically for the analysis of proteins and peptides by MS, SPA is the matrix of choice for large proteins, whereas CHCA is the preferred matrix for peptides (< 4 kDa). Not surprising, arrays processed with SPA consistently showed better peak resolution and higher peak number across all surfaces within the measured mass range of 2.5 to 20 kDa (tables 1 and 2).

Assessment of peak overlaps between ProteinChip array types
Peak profiles for CSF samples prepared in denaturing buffer were compared across all four ProteinChip array types to asses the extent of peak overlap between the different array types. The assessment of peak overlaps would be used to determine the optimal combination of array types, in terms of the number and resolution of peaks, to be adopted for a CSF profiling strategy. Figure 3 shows representative spectra of proteins retained on the four array types prepared with SPA. When assigning peak clusters across the different array types, peaks on different surfaces were assumed to be the same protein if their respective m/z were within 0.3%. Nevertheless, it must be mentioned that without assigned peak identities one can never be confident that a peak of similar mass, observed between the different array types, represents the same protein. A Venn diagram representing peak counts, determined as unique or common across the array types, is shown in figure 4. A significant proportion of all detected peaks were found to be common to two or more array types. Approximately, 8 and 12 peaks were common to all array types processed with SPA and CHCA, respectively, out of which 6 peaks were common to all array types processed with both matrices (m/z 8740, 11956, 12055, 13996, 14122 and 14164). Overall, approximately 75 and 56 unique peaks (defined as only present on one surface) were detected across all four arrays types prepared with SPA and CHCA, respectively. The highest number of unique peaks was observed on CM10 and IMAC30 arrays. Peaks with a signal-to-noise of 3 or greater were assigned between the m/z range of 2.5-20 kDa. 5 µL of pooled CSF prepared under native, denaturing, and denaturing/reducing buffer was applied in triplicate to each condition tested on CM10, Q10, H50, and IMAC30 arrays.
Moreover, the combination of CM10 and IMAC30 covered 89% and 79% of all detected peaks on SPA and CHCA, respectively.

Assessment of spectral reproducibility
For comparative studies, reliable and reproducible protein profiles must be obtained to ensure that the variation in spectra reflects biological differences in protein concentration rather than systematic variability. As such, accurate mass peak heights are necessary, and the technical variation of the profiles must be known. In order to increase the reliability of the approach we adapted the entire processing of arrays to a robotics system for consistency. Intra-chip reproducibility was assessed using 6 technical replicates of a pooled CSF sample spotted in equal volumes (5µL) on four individual CM10 chips. A total of 30 randomly selected peaks with a signal-to-noise ratio >3, and common to all spectra, were randomly selected and compared with regards to their normalized peak intensities by calculating the CV within each chip. The mean CV for intra-chip variability was 17%, ranging from 7-34% across individual peaks. To evaluate the inter-chip variability, pooled CSF sample was randomly placed on a single spot across each of twelve different CM10 chips on one bioprocessor plate. This was repeated using a second bioprocessor plate processed the following day. A total of 37 peaks with a signal-to-noise ratio >3, and common to all spectra, were randomly selected and compared with regards to their normalized peak intensities by calculating the CV within each bioprocessor. Mean peak CVs of 19% (ranging from 6-29%) and 23% (ranging from 7-47%) for normalized intensity were calculated for each bioprocessor. The spectra profiles for the inter-chip assay are shown in figure 5. Overall, the CVs obtained for both inter-and intra-chip indicate that the processing of protein chips across a bioprocessor using a robotic system is reliable and reproducible.
Representative SELDI-TOF MS spectra of CSF in denaturing buffer on CM10, Q10, H50 and IMAC30 arrays  CSF sample was randomly placed on a single spot across each of twelve different CM10 chips on one bioprocessor plate. The same was repeated for the other plates 4 and 24 h later. Figure 6 shows the principle component analysis results for data points representing spectra of each sample with color denoting the bioprocessor plate. The clustering of points to their respective plates indicates the presence of a discernable systematic variability across plates, with the largest separation between clusters seen for the plate processed at 24 h. This variability is very important in the context of large studies in which a large number of bioprocessor plates would be needed. Intelligent randomization procedures using technical replicates would need to be adopted in order to minimize any systematic bias introduced by plates and time of process.

Discussion
Surface-based enrichment approaches in combination with MS, such as ProteinChip SELDI-TOF approach, have been developed to facilitate the high-throughput analysis of peptides and proteins in complex biological samples such as body fluids. ProteinChip SELDI-TOF technology allows for facile sample analysis since very small sample volumes can be directly applied to the ProteinChip array surfaces, and the process can be easily automated for highthroughput analysis. In the present study, we applied the ProteinChip SELDI-TOF approach coupled with an auto-mated robotic sample preparation workstation as a strategy for potentially screening large numbers of CSF samples from clinical studies. In particular, we take advantage of the technologies high-throughput potential for screening proteins between the 2.5-20 kDa mass window range.
To date, few examples exist in the literature describing the application of ProteinChip SELDI-TOF approach for analyzing CSF [28][29][30][31][32][33]. and none describe a comparative evaluation of CSF profiles across different conditions and arrays. To our knowledge this paper is the first to describe a comparative investigation of experimental procedures for identifying effective and consistent conditions for retention of CSF proteins on different ProteinChip array types between the 2.5-20 kDa mass range. ProteinChip array types, along with sample and binding buffer conditions, and matrices were evaluated based on the number of resolved peaks exhibiting a signal-to-noise ratio of 3 or greater. We found that CSF prepared under denaturing conditions without reduction performed best on all Pro-teinChip arrays processed with either SPA or CHCA as the matrix. With respect to the selection of binding buffer for protein retention, buffer in the absence of Triton X-100 performed better on CM10 and Q10 arrays. For IMAC30 and H50 slightly higher peak numbers were obtained by surface activation with copper, and using 10% AcN/0.1% Figure 4 Venn diagram representing the overlap of peaks between ProteinChip array types. CSF prepared in denaturing buffer was processed on CM10, Q10, H50 and IMAC30 ProteinChip arrays using (a) SPA, and (b) CHCA. Peaks with a signalto-noise ratio of 3 or greater, between the m/z range of 25-20 kD, were considered. When assigning peak clusters across spectra, two peaks on different surfaces were assumed to be the same protein if both their respective m/z were within 0.3%. TFA as binding buffer, respectively. However, the type of matrix rather than the binding condition used on each chip surface appears to dictate the overall protein profile from each chip type. ProteinChip arrays prepared with SPA consistently showed better peak sharpness and higher peak number across all surfaces. The reliability of the approach was emphasized with the low CVs which compare favorably to the CVs reported for other protein profiling approaches [34]. It's very likely that a major contributor to this was the adaptation of the entire process of ProteinChip preparation to a robotics system. Nevertheless, we observed systematic variability across plates processed at different times. Therefore, to ensure that systematic bias is minimized, sample randomization procedures using technical replicates must be properly addressed when large studies are conducted

Relative intensity
The surface-based enrichment approach using Protein-Chips is a rapid and straightforward tool for screening peptides and small proteins below 20 kDa from small sample volumes. However, the major shortcomings of this approach are peak identities and limited proteome coverage. In this study, only a small subset of species, out of potentially 100s' if not 1000s' of circulating molecules in CSF, is actually monitored as peaks when the starting material is unfractionated. Moreover, due to the high level of overlap in profiles between the tested array types, we observed that the combination of CM10 and IMAC30 was sufficient enough to represent between 80-90% of all assigned peaks on the tested arrays. Preferably, we would have wanted a much lower overlap in profiles between the array types so as to increase proteome coverage.
CSF contains a tremendous array of molecules, spanning a concentration range of 10 orders of magnitude between Assessment of systematic variability across bioprocessor plates using principle component analysis Figure 6 Assessment of systematic variability across bioprocessor plates using principle component analysis. Variability was evaluated across three bioprocessor plates processed 4 and 24 hours apart. Spectra were obtained from pooled CSF sample randomly placed on a single spot across each of twelve different CM10 chips on one bioprocessor plate. The same was repeated for the other plates 4 and 24 hr later. Following baseline subtraction, normalization and spectra alignment, 45 peaks which appeared in all spectra were used for PCA analysis. The PCA results were color coded for the three bioprocessor plates: blue, 0 hr; red, 4 hr; black, 24 h. t [1] the highest and lowest abundance proteins, of which only a handful (e.g. albumin) constitute up to 90% of the total protein concentration. Consequently, it is likely that the low abundance molecules of diagnostic potential will be competed out by high abundance non-informative molecules for binding on the solid-phase. Indeed, most differences identified by this approach in body fluids have shown bias towards the high abundant molecules (present in the µg/mL to mg/mL concentration range) implying that the ProteinChip SELDI-TOF technology is probably not adequate for 'deep' proteome analysis [19]. Initial prefractionation of CSF by LC based methods, in combination with immunodepletion of abundant proteins, are thus likely obligatory steps for exploiting lowabundant molecules of diagnostic potential [35,36]. Concurrently, a balance must be found between analysis depth, speed, throughput, and sample requirements.
Further developments in analytical strategies for selective protein absorption on solid support coupled to high mass accuracy and high resolution MS technology is necessary before this approach can be used as a more comprehensive proteomic profiling tool in an automated and high throughput fashion. A promising area of development is in the utilization of combinatorial ligands for mining the proteome [38,39]. Libraries of potential millions of discrete amino acid ligands synthesized on solid-phase beads have been created, in which theoretically, there is a ligand for every protein, antibody, and peptide present in the starting material. It is envisaged that these beads impregnated with complex proteomes could capture equal quantities of each and all the peptides and proteins present in CSF, thus reducing the concentration difference. This could make proteomic approaches using ProteinChips more adapted to 'deep' proteome analysis and biomarker discovery. Another interesting approach was adopted by both Mehta [39] and Zhou [40] to profile the proportion of low molecular weight species bound to specific circulating carrier proteins. It was found that by selectively targeting high abundant proteins in serum for depletion, many other peptides and small proteins associated with these abundant proteins are concomitantly removed. By this targeted selection, the concentrations of associated peptides and small proteins are enriched to levels of detection. Indeed, some of the species identified represented clinically relevant biomarkers, including prostate-specific antigen which in healthy males is present at a concentration of 1 ng/mL. Examination of the low molecular weight species bound to specific carrier proteins may, therefore, allow for the detection and mining of diagnostic information.
In spite of the current technical shortcomings of the Pro-teinChip SELDI-TOF technology, one could envisage the potential utilization of surface-based enrichment approaches in combination with MS as a strictly high throughput screening tool to drive decisions on sample selection prior to more in-depth discovery of diagnostic markers. For instance, the ProteinChip SELDI-TOF technology could be used as an upfront quality control step for screening large sample numbers obtained from multiple clinical centres. Multivariate analysis of the data sets would help reveal potential sample outliers as a result of either sample handling or intrinsic patient variability. This would aid in the selection of a smaller sample subset for more in-depth comparative analysis using alternative proteomic platforms such as multidimensional LC-MS based strategies.

Conclusion
In conclusion, we have shown that the ProteinChip SELDI-TOF technology can provide a fast, robust, straightforward and reproducible profiling platform for measuring peaks in the low molecular mass range of the CSF proteome. We are currently examining the robustness of the profile across patient sample sets from healthy, mild cognitive impaired, AD, and other dementias in order to address the feasibility of the current platform in combination with decision algorithms to detect biomarker panels associated with the different pathological conditions, and as a screening tool for sample selection prior to more indepth analysis.

Cerebrospinal fluid samples
Normal CSF samples obtained from consenting patients were provided by PrecisionMed Inc. (San Diego, CA). CSF samples were obtained by lumbar puncture as part of a routine clinical procedure. The samples were collected in polypropylene tubes and gently mixed to avoid gradient effects. The samples were centrifuged at 2000 × g for 10 min to remove cells and other insoluble material. Supernatants were frozen in aliquots and stored at -80°C until analysis.

Sample preparation
A standard pooled Human CSF sample was used to evaluate the different conditions of retention on the all Protein-Chip arrays. CSF samples were prepared on four different ProteinChip array surfaces: cation-exchange (CM10), strong anion-exchange (Q10); metal-binding (IMAC30) and reverse phase (H50). All ProteinChip Arrays were processed on the same day following the procedures recommended by the manufacturer. Binding buffers used for the different arrays were 100 mM ammonium acetate pH 4.0 (with or without 0.1% Triton X-100) for CM10; 100 mM Tris-HCl pH 9.0 (with or without 0.1% Triton X-100) for Q10; 100 mM Na phosphate, 500 mM NaCl pH 7.0 (activated with either 100 mM copper sulphate or 100 mM nickel sulphate hexahydrate) for IMAC30., 10% ace-tonitrile (AcN), 0.1% trifluoroacetic acid (TFA) or phosphate buffered saline (PBS) for H50.
Preliminary experiments were first performed on Protein-Chip arrays spotted with different CSF sample volumes prepared in different ratios of sample to binding buffer. Based on the overall number of detected peaks and their associated intensities we determined 5 µL of CSF diluted 1:4 in the appropriate binding buffer to be the optimal amount for loading onto the ProteinChip arrays (data not shown). Subsequent experiments reported in this paper were performed using this CSF volume. In brief, 5 µL of CSF was diluted 1:1 in sample buffer preparations representing the following conditions: native (dH 2 O), denatured (9.5 M urea, 2% CHAPS, 50 mM Tris-HCl, pH 9.0) and denatured/reduced (9.5 M urea, 2% CHAPS, 50 mM Tris-HCl, 10 mM dithiothreitol, pH 9.0). Following 20 min incubation at 4°C, the CSF samples were then added to separate spots on the array surface using a Biomek laboratory station (Beckman-Coulter, CA) modified to make use of a ProteinChip array bioprocessor (Ciphergen Biosystems Inc.). The samples on the arrays were diluted 1:4 in the appropriate ProteinChip array binding buffer. The bioprocessor was then centrifuged for 10 s at 1000 rpm, using an Eppendorf Centrifuge 5804 system, to remove any air bubbles. The arrays were incubated for 1 h at room temperature with gentle shaking. The ProteinChip arrays were washed twice with 50 µL binding buffer for 5 minutes with gentle shaking, followed by two washes with 150 µL distilled water for 1 minute to remove buffer salts. The bioprocessor was subsequently removed and the Pro-teinChip arrays air-dried at 23°C for 15 minutes. Once dry, two 1 µL aliquots of a 50% saturated sinapinic acid (SPA; Ciphergen Biosystems Inc.) solution prepared in 50% acetonitrile and 0.5% TFA was added to each spot of the ProteinChip array. The arrays were allowed to air-dry before SELDI analysis. The same was repeated for Protein-Chip arrays prepared with 50% saturated α-Cyano-4hydroxycinnamic acid (CHCA; Ciphergen Biosystems Inc.) solution prepared in 50% acetonitrile and 0.5% TFA. Each condition was analysed in triplicate. For storage, the spotted arrays were kept in the dark at room temperature.

Data acquisition and spectral processing
ProteinChip arrays were placed in the ProteinChip reader Series 4000 mass spectrometer (Ciphergen Biosystems Inc.) and mass spectra was acquired using settings optimized for the m/z range of 2.5-20 kDa. For each spot around 175 shots were collected in positive ionization mode using a laser intensity set at 1,500 nJ. For SPA preparations a deflector setting of 1000 Da, and an ion focus mass of 9000 Da was used, whereas for CHCA preparations a deflector setting of 500 Da, and an ion focus mass of 3500 Da was used. The spectra were externally calibrated using the "All-In-One" peptide mass standard (Ciphergen Biosystems Inc.). The standards, ranging from 1-7 kDa, were prepared on NP20 ProteinChip arrays according to the manufactures recommendation. The Pro-teinChip reader was calibrated daily and we typically achieved mass accuracies within 150 ppm.
Spectra were analysed using Ciphergen Express software Version 3.0.5 (Ciphergen Biosystems Inc.). The baseline was subtracted (baseline smooth width of 25) and the spectral intensities were normalized by total ion current (TIC) to an external normalization coefficient of 0.2 between the mass range of 2.5 to 20 kDa. Automatic peak detection was performed using the following settings: noise calculation between the mass range of 2.5 to 20 kDa, 3 times the signal-to-noise ratio and 2 times the valley depth for the first pass, and 2 times the signal-to-noise ratio and 2 times the valley depth for the second pass. In addition to the automatic assignment of peaks, manual inspection of the spectra was conducted as a quality control step to ensure that all peaks were correctly labelled. When assigning peak clusters across spectra, two peaks on different surfaces were assumed to be the same protein if both their respective m/z were within 0.3%. Principle component analysis was performed using the SIMCA-P statistical package (Umetrics AB, Sweden), and was used to reveal major variance structure and clustering.

Authors' contributions
NG participated in the design of the study and drafted the manuscript. SC carried out all the SELDI-TOF experiments. BGM participated in the design of the study and helped to draft the manuscript.