Identification of serum proteomic biomarkers for early porcine reproductive and respiratory syndrome (PRRS) infection

Background Porcine reproductive and respiratory syndrome (PRRS) is one of the most significant swine diseases worldwide. Despite its relevance, serum biomarkers associated with early-onset viral infection, when clinical signs are not detectable and the disease is characterized by a weak anti-viral response and persistent infection, have not yet been identified. Surface-enhanced laser desorption ionization time of flight mass spectrometry (SELDI-TOF MS) is a reproducible, accurate, and simple method for the identification of biomarker proteins related to disease in serum. This work describes the SELDI-TOF MS analyses of sera of 60 PRRSV-positive and 60 PRRSV-negative, as measured by PCR, asymptomatic Large White piglets at weaning. Sera with comparable and low content of hemoglobin (< 4.52 μg/mL) were fractionated in 6 different fractions by anion-exchange chromatography and protein profiles in the mass range 1–200 kDa were obtained with the CM10, IMAC30, and H50 surfaces. Results A total of 200 significant peaks (p < 0.05) were identified in the initial discovery phase of the study and 47 of them were confirmed in the validation phase. The majority of peaks (42) were up-regulated in PRRSV-positive piglets, while 5 were down-regulated. A panel of 14 discriminatory peaks identified in fraction 1 (pH = 9), on the surface CM10, and acquired at low focus mass provided a serum protein profile diagnostic pattern that enabled to discriminate between PRRSV-positive and -negative piglets with a sensitivity and specificity of 77% and 73%, respectively. Conclusions SELDI-TOF MS profiling of sera from PRRSV-positive and PRRSV-negative asymptomatic piglets provided a proteomic signature with large scale diagnostic potential for early identification of PRRSV infection in weaning piglets. Furthermore, SELDI-TOF protein markers represent a refined phenotype of PRRSV infection that might be useful for whole genome association studies.


Background
Porcine reproductive and respiratory syndrome (PRRS) is one of the most important infectious swine diseases throughout the world [1][2][3] and is still having, more than two decades after its emergence, major impacts on pig health and welfare (reviewed by [4]). The responsible agent is an enveloped, ca. 15 kb long positive-stranded RNA virus (PRRSV) that belongs to the Arteriviridae family [5] and that can cause late-term abortions in sows and respiratory symptoms and mortality in young or growing pigs. Once this virus has entered a herd it tends to remain present and active indefinitely causing severe economic losses and marketing problems due to high direct medication costs and considerable animal health costs needed to control secondary pathogens [6,7].
Pigs of all ages are susceptible to this highly infectious virus, which has been shown to be present in most pigs for the first 105 days post infection [8]. However clinical manifestations vary with physiological status and age [9], as the virus uses several immune evasion ways to complicate the ability of the host to respond to the infection process [4,10,11]. Weaning piglets, in particular, are likely to be exposed to the infection. Although PRRSV viraemia is often asymptomatic in these piglets, their productive performance is significantly decreased. Indeed, despite being sero-negative, persistently infected piglets still harbor PRRSV and have been shown to be a source of virus for susceptible animals [12].
SELDI-TOF MS analysis allows the comparison of protein profiles obtained from a large number of diverse biological samples by combining two principles, chromatography by retention on chip surface on the basis of defined properties (e.g. charge, surface hydrophobicity, or biospecific interaction with ligands) and mass spectrometry. It is thus distinct from common non-selective techniques, such as two-dimensional polyacrilamide gel electrophoresis (2D-PAGE) and matrix-assisted laser desorption ionisation (MALDI) MS. SELDI-TOF MS has been widely used for diagnostic biomarker discovery and validation across studies in blood serum/plasma, particularly in cancer research (reviewed by [13]), but also to characterize and identify biomarkers associated with viral and other infectious diseases [14][15][16][17][18][19]. The protein signatures identified by SELDI-TOF MS analysis have thus many potential applications in animal health, including early diagnosis of diseases, prediction of disease states, as well as monitoring of disease progression, recovery, and response to vaccination. Few reports have been published for livestock applications [19][20][21][22].
Current needs in veterinary medicine and animal husbandry include the identification of tools that allow the early warning of diseases, especially during the incubation periods and before the onset of clinical signs. Therefore, the objective of this study was to identify by SELDI-TOF MS a proteomic profile able to differentiate PPRSV-positive from -negative weaning piglets raised in commercial farms and without clinical symptoms of the disease. We optimized the experimental conditions previously described [20] and validated 47 statistically significant discriminatory biomarkers. Among these, a combination of 14 biomarkers identified in F1 on CM10 at low focus mass permitted to correctly assign the piglets to the PPRSV-positive or PRRSV-negative groups with sensitivity and specificity of 77% and 73%, respectively.

Results
To enable identification of medium-low abundant proteins, only samples with a total content of hemoglobin lower than 4.52 μg/mL were included in the study. Total hemoglobin absorbance and the resulting hemoglobin content were calculated for all the piglet sera in both discovery (n = 50) and validation (n = 70) phases of the study [Additional file 1: Table S1 and Additional file 2:  Table S2, respectively].
Fractioning of the sera resulted in six different pH fractions; F1 = pH9, F2 = pH7, F3 = pH5, F4 = pH4, F5 = pH3, and F6 = organic solvent. The fractions F1, F4, and F6 were analyzed on the three surfaces CM10, IMAC30, and H50 at both low and high focus masses. Fractions F2 and F3 were excluded from further analyses because preliminary data with 3 serum samples showed that they still contained elevated quantities of abundant proteins (such as albumin), as well as the quality of the spectra and the number of signals detected were very low. Fraction F5 was excluded because no signals were detected.
The fractions F1, F4, and F6 on the surfaces CM10, IMAC30, and H50 showed generally good signal intensities and low coefficient of variation (CV) values (< 30%) in both the discovery and validation phases. Exceptions were fraction F1 on IMAC30 (analyzed at high focus mass) and H50 (both low and high focus masses), as well as fraction F4 on H50 (low focus mass), which were therefore excluded from further analyses.

Discovery phase
A total of 50 pig sera, 25 from PRRSV-positive and 25 from PRRSV-negative piglets were analyzed during the discovery phase of the study [Additional file 1: Table S1].
We found a total of 785 protein peaks in the sera of all samples (Table 1). The most represented pH fraction was F6 (n = 381), followed by F4 (n = 223), and F1 (n = 181). On surface CM10 we identified 317 peaks, on IMAC30 302 peaks, and on H50 166 peaks. Furthermore, a much higher number of peaks (n = 512) was found on low mass range (1-20 kDa) compared to the high (n = 273; 20-200 kDa).
The highest sensitivity (80%) and specificity (76%) were obtained with the 22 discriminatory peaks of F1 on CM10 at low focus mass. Higher sensitivities were found with the 18 peaks of F4 on CM10 at low focus mass (87%), the 7 peaks of F6 on CM10 at low focus mass (85%), and the 12 peaks of F6 on CM10 at high focus mass (87%), however the specificities of these peaks were lower (64%, 66%, and 66%, respectively).

Validation phase
The validation phase was performed on 35 new PRRSVpositive and 35 new PRRSV-negative piglets using the same experimental conditions applied in the discovery phase [Additional file 2: Table S2]. Of the total 200 peaks that were significant in the discovery phase, 47 were confirmed in the validation phase (Table 2).
In particular, 28 peaks were confirmed on CM10, 19 on IMAC30, whereas none of the peaks could be validated on the surface H50. In the 3 fractions with different pH tested, F1 contained 28 peaks, F4 3 peaks, and F6 16 peaks. A higher number of peaks (n = 36) corresponded to small peptides (acquired at low focus mass 1-20 kDa), compared to big peptides (n = 11) that were acquired at high focus mass   (Table 2). In line with the results of the discovery phase, the combination of peaks with the highest sensitivities (77% and 64.5%) and specificities (73% and 69.7%) were found on CM10 at low focus mass with the 14 discriminatory peaks of F1 and the 6 discriminatory peaks of F6, respectively ( Table 2). The correctly and incorrectly assigned piglets using these peaks are graphically illustrated in the heat map of Figure 1; part 1A shows the 14 peaks of F1 and part 1B the 6 peaks identified in F6.
Principal component analysis (PCA) was performed on the profiles of the 47 discriminatory peaks identified during the discovery and confirmed during the validation phase to identify and quantify independent sources of variation observed in the data. PCA analysis showed that 58.2% (PCA1), 17.9% (PCA2), and 12.9% (PCA3) of the total variability within the data was accounted for the X, Y, and Z axes, respectively. These axes were used to plot the data ( Figure 2) and they provide an overview of the variation between the individual samples and show how samples grouped. Figure 2A showed three-dimensionally that the PCA peak profiles of piglets positive to PRRSV differed from piglets negative to PRRSV and revealed a good separation among the profiles of the two different groups, especially considering the high heterogeneity of the samples included in the study, as reported in the MM section and in [Additional file 1: Table S1 and Additional file 2: Table S2]. Furthermore, with the exception of few The 785 total number of peaks detected and the 200 statistically significant (p < 0.05) discriminatory peaks associated with PRRS infection that were identified by the Ciphergen Express software are reported with the fraction, the array surface, and the acquisition focus mass (low: 1-20 kDa; high: 20-200 kDa). outliers, PCA1 combined with PCA2 also separated well the two piglet populations ( Figure 2B).

Comparison with relevant protein peaks and immunity genes related to PRRSV infection in other studies
To provide an overview of the current literature and to try to correlate the discriminatory peaks identified in this study with relevant proteins, we summarized in Table 3 the molecular weights of several peaks that have been shown to be related to PRRSV infection. First of all, we summarized the available information on the PRRS viral proteins. The PRRSV genome is ca. 15 kb in size and consists of the 5' untranslated region (UTR), at least nine open reading frames (ORFs), and     Table 3, along with the MW of the closest discriminatory peak identified in the current study. Interestingly, the MW of the viral proteins ORF2b, ORF4, and ORF7 were very similar (difference of MW ≤0.3 kDa) to up-regulated discriminatory peaks identified here ( Table 3).
As next, we compared proteins related to PRRSV infection that were identified in additional studies (Table 3); interestingly, all the 9 peaks found by [28], and in particular the only up-regulated in PRRSV infected (corresponding to the Alpha 1 S (a1S)-subunit of porcine Haptoglobin), showed minimal MW differences (≤0.3 kDa) with up-regulated peaks identified in this study (Table 3).

Discussion
In the present work, we show that proteomic fingerprint profiling is useful in researches on PRRS immuno-pathogenesis and might also be a robust, large scale diagnostic tool for the assessment of the proportion of PRRSV-positive weaning piglets without clinical symptoms in a herd. Indeed, we confirmed that the high-throughput capacity of the SELDI-TOF MS technology allows the screening for disease biomarkers of hundred of samples in a relative short-time period and with minimal sample preparation (as previously also reported by [32]).
Our results indicate that from the 200 significant peaks found in the discovery phase, a total of 47 could be confirmed in the validation phase. These values are comparable with another study where similar experimental conditions were applied to ovine sera [19].
Our findings also show that the combination of 14 discriminatory peaks in F1 on CM10 at low focus mass provided the highest sensitivity of 77% and specificity of 73% to correctly assign the piglets to the PPRSVpositive or PRRSV-negative groups. These percentages are in line with recent studies in humans using the      [33,34]. Also the PCA results showed a good separation of the piglets in the two groups under examination. This was reached even though the tested piglets had large variability and heterogeneity, as they were collected from several farms located in different regions, and underwent high environmental pressures, typical of the field conditions. This is mainly due to the careful choice of the serum samples, where we tried to minimize the environmental differences by using same experimental parameters (e.g. sample collection procedures, storage, handling) and by including a similar number of pigs from the same breed (Large White) and with very similar sex ratios and ages (at weaning).
In a preliminary work [20] we had successfully transferred the experimental conditions used in profiling experiments of human sera to pig sera. However, in that work, none of the potential biomarkers identified in the discovery phase could be validated in the subsequent validation phase, because of high samples heterogeneity and high content of serum (e.g. albumin) and contaminant proteins (e.g. hemoglobin), having a negative effects on the detection of significant biomarkers, particularly those corresponding to the medium-low abundant proteins. It has been reported that low abundant proteins constitute about 1% of the entire human serum proteome, with the remaining 99% being comprised of only 22 proteins [35].
As it was therefore necessary to reduce the level of abundant proteins, in this follow up study, particular relevance was given to the content of the contaminant protein hemoglobin. Only non-hemolytic samples with similar, low contents of hemoglobin were included in the study. Additionally, to further increase the likelihood to identify statistically significant discriminatory biomarkers, we introduced a fractioning step based on anion-exchange chromatography. In a similar study performed with MALDI-TOF [28], where serum samples were analyzed in the first weeks (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16) Table 3). Furthermore, two peaks identified in this study (23.162 and 14.843 kDa) were similar to peaks identified elsewhere (corresponding to Heat shock 27 kDa protein 1 [29][30][31] and Galectin 1 [26,31], respectively). In accordance with [31], the identified peak corresponding to Heat shock 27 kDa protein 1 was upregulated, while the peak corresponding to Galectin 1 was down-regulated. Thus, these proteins seem to be very interesting and suitable candidates for future investigations. The preponderance of the significant biomarkers had a molecular mass lower than 20 kDa, confirming that small peptides are a rich source of relevant biomarkers in SELDI-TOF MS analyses as previously reported in human [36] and ovine [19] sera. This may also partly be caused by the fact that the low molecular weight region (LMW) of the serum proteome, called peptidome, is an assortment of small intact proteins and proteolytic fragments of larger proteins, including several classes of physiologically important proteins like peptide hormones and components of both the innate and adaptive immune systems (i.e. cytokines and chemokines) [35,37]. This is particularly interesting as the patho-physiological state of the body's tissue is predominantly reflected in the LMW and low abundance region of the serum proteome, and specific protein fragments of the serum peptidome have been shown to contain a rich source of disease-specific diagnostic information and they have been correlated with disease stages in several studies (reviewed by [37]).
In agreement with other studies [29,31], we found that the majority of the discriminatory biomarkers were up-regulated in PRRSV-positive piglets. This seems to suggest that the corresponding proteins might be of viral origin or related to the innate or adaptive immune responses (e.g. cytokines, chemokines, acute phase proteins, toll like receptors). In fact, several peaks showed high similarities (MW differences ≤0.3 kDa) with previous works, in particular regarding viral proteins ( Table 3). The assignment of the discriminatory peak to a specific protein will require additional work, because the SELDI-TOF technology can only detect masses/peaks of proteins that are differentially expressed between samples but can not directly identify the proteins. This represents one of the major drawbacks of this technology compared to other methods. However, an advantage of the SELDI-TOF MS in this regard is that the results of this technique might lead to the identification of new proteins that were previously not correlated to the disease, and this might hopefully lead to the identification of new biomarkers representing the field situation. The interpretation of these results and the continuation of this project will benefit from the very imminent termination and publication of the sequence of the swine genome [38], which will definitely contribute to a more precise annotation and a better identification of genes and proteins and thus will greatly facilitate genome wide mapping association studies.

Conclusions
Although a combination of peaks identified with different experimental conditions (e.g. using different fractions and different surfaces) might have provided higher discriminatory power, here we developed a PRRSV diagnostic test based on peaks identified with the same experimental conditions (e.g. fraction, surface, and focus mass), which can be reproduced at high-throughput at reasonable costs. These results provide a set of proteomic biomarkers and related, optimized experimental conditions for high-throughput profiling of pig populations by SELDI-TOF MS for whole genome association studies, where identification of proteins underlying the phenotype can be made a posteriori. SELDI-TOF MS might therefore represent a complementary test or a possible alternative to classical (PCR) and more recent diagnostic methods (e.g. antibody detection in saliva) for profiling large flocks of pigs at reasonable costs, using blood samples that are routinely collected for general veterinary inspections. As well, these SELDI-TOF MS based tests could complement and provide a broader reference for emerging diagnostic methods and have potential applications for the detection of relevant proteins having highly heritable traits (e.g. acute phase proteins).

Piglets
A total of 120 serum samples of Large White piglets were selected from a well defined and characterized repository database, presently containing more than 20,000 swine samples from 18 different farms of the Lombardy region, Italy. Selection of the piglets aimed to minimize environmental factors and experimental conditions that might influence the results [39]. Hence, all piglets were from the same breed (Large White), had similar ages (weaning: 45-50 days), and their sera showed a low and comparable amount of hemoglobin (calculated as shown below).
In the discovery phase of the study, a total of 50 pig sera, 25 from PRRSV-positive and 25 from PRRSVnegative piglets, as determined by PCR (see below), were analyzed [Additional file 1: Table S1]. The validation phase was performed with the same experimental conditions as the discovery phase. A total of 35 new PRRSVpositive and 35 new PRRSV-negative piglets were examined [Additional file 2: Table S2]. The actual duration of infection for each individual PRRSV-positive piglet was unknown, as sera were collected and analyzed once for each piglet (at weaning: 45-50 days of age). None of the piglets was treated, as they did not show any symptom of the disease.
To ensure large variability and heterogeneity of the samples and minimize environmental differences, we included in the PRRSV-positive and -negative groups similar numbers of piglets with the same sex that originated from several farms located in different regions. In fact, PRRSV-positive piglets originated from 6 farms of the Lodi region (n = 8) and 7 farms of the Mantua region (n = 52), while PRRSV-negative piglets were collected in 5 farms around Lodi (n = 19) and 9 farms around Mantua (n = 41). Sex ratios males/females (44/76) were very similar in PRRSV-positive (21 vs. 39) and -negative (23 vs. 37) piglets, respectively.
Veterinary inspections of the overall clinical status of the piglets at the day of serum collection did not evidence any clinical symptoms of PRRS, including respiratory distress or sneezing.

Serum samples
All the serum samples were collected, stored, and handled in the same way. They were obtained for each piglet by storing two mL of whole blood without anticoagulants at room temperature (RT) for 4 h followed by centrifugation at 3,500 rpm for 4 min. As suggested in a previous work [20], an abundant quantity of hemoglobin in the serum can hide early diagnostic biomarkers of PRRSV by competing with the other serum components for the binding site of the chromatographic surfaces. To avoid the consequent signal suppression of the medium-low abundant proteins, only non-hemolytic samples were included in the present study.
A total of 200 clear, transparent sera without red pigmentation (low hemoglobin content) were first selected by visual screening from the total sera available in the database. Hemoglobin content of each serum sample was then determined according to [40] with minor modifications. A calibration curve was generated using five standard solutions (concentrations: 1.8, 3.6, 5.4, 7.2, and 9 μg/ml) of porcine hemoglobin diluted in 400 μL commercially available porcine serum (Sigma Aldrich, St Louis, MO, USA). Triplicate samples were incubated for 5Ámin at RT, then absorbance (E) was measured at 380, 415, and 440 nm. Absorbance at 380 and 440 nm was used to discern background absorbance flanking the absorbance peak (415Ánm) of oxygenated hemoglobin. Absorbance due to hemoglobin was calculated as: E415-[(E380 + E440)/2]. Hemoglobin absorbance values of the samples were converted to μg/mL of hemoglobin by means of the calibration curve. Of the 200 initial samples, a total of 120 samples having an absorbance ≤ 0.085 (corresponding to a hemoglobin content below 4.52 μg/mL) were included in the study; 50 in the discovery and 70 in the validation phases, respectively. Viral RNA extraction from the sera was performed following standard Roche procedures (High Pure Viral RNA Kit, Roche Diagnostics GmbH, Germany). Presence or absence of PRRSV was determined by multiplex PCR of conserved regions of viral ORF7 using primers and conditions previously described [41,42]. The test also enabled to discriminate European and American genotypes and could detect all the different viral strains present in the Lombardy region at the time of sample collection.

Serum fractionation
All the detailed steps of the SELDI-TOF MS process performed here are schematically represented [see Additional file 3: Figure S1]. The protocol follows the manufacturer's instructions with minor modifications (Bio-Rad Laboratories, ProteinChip W Serum Fractionation Kit manual).

SELDI-TOF MS analysis
ProteinChip arrays were read using a Ciphergen Protein-Chip Reader PCS4000 model and data were analyzed with Ciphergen Express Software (Ciphergen Biosystems).
Profiles were collected in the range 1-200 kDa at the two different ion focus mass 10 kDa ("low focus mass") and 50 kDa ("high focus mass"). The instrument was calibrated for dataset collection using all-in-one peptide standard (Bio-Rad Laboratories) in the 1-20 kDa range for 10 kDa low ion focus mass and all-in-one protein standard in the 20-200 kDa range for 50 kDa high ion focus mass [Additional file 3: Figure S1G].

Ciphergen Express software analysis
Spectra were normalized by total ion current, starting and ending at the M/Z of the collection ranges (1-20 or 20-200 kDa) after baseline subtraction and noise calculation. Outlier spectra were removed. The spectra were aligned to a reference spectrum with the normalization factor nearest 1.0. The spectra were aligned only if the percentage coefficient of variation was reduced after the alignment. Peaks from the different spectra were aligned using the cluster wizard function of the Ciphergen Express 3.0.6 software. The peak detection was automated within the M/Z range of analysis. Peaks were detected on the first pass when the signal-to-noise (S/N) ratio was 7 and the peak was 5 times the valley depth. Peaks below threshold were deleted and all first-pass peaks were preserved. Clusters were created within 0.15% of M/Z for each peak detected in the first pass. The clusters were completed by adding peaks with S/N ratio of 2 and two times the valley depth. P-values and ROC/AUC (Receiver Operating Characteristic/ Area Under Curve) values were calculated by using the P-value wizard.
A 2-tailed t-test was used for statistical analysis of differences in peak intensity between groups. P-values below 0.05 were considered statistically significant. Principal component analysis (PCA) and agglomerative hierarchical clustering algorithm were applied to investigate the pattern among the different statistically significant peaks.
PCA is a multivariate data analysis that transforms without a loss of essential information a number of correlated variables into a smaller number of uncorrelated variables called principal components (PCs), which can explain sufficiently the data structure. PCA transformation allows studying many variables simultaneously, showing how similar samples are correlated and grouped together. The data structure is visualized directly in a graphical way by projection of objects onto the space defined by the selected PCAs (for details see [43]).
Finally, to evaluate the influence of external variables (e.g. sample processing and acquisition) on the system under study and to calculate the dispersion of the acquired data, the coefficient of variation (CV), which is the normalized measure of dispersion of a probability distribution and shows the% dispersion of the data in rapport to the media (intensity variation), was also calculated. Six serum samples commercially available were prepared and analyzed in parallel with the pig samples of both, discovery and validation phases. The CV was calculated for all fractions and surfaces by choosing 6 peaks evenly distributed along the entire range.

Additional files
Additional file 1: Table S1. Pigs tested with SELDI-TOF MS during the discovery phase of the study. List of the 25 positive and 25 negative pigs to PRRS (PCR-tested) analyzed with SELDI-TOF MS during the discovery phase of the study. The pig ID is reported with the total absorbance and the total amount of hemoglobin present in the sample, the status regarding the PRRS virus, as well as the sex and the number and location of the farm (MA = Mantua region, LO = Lodi region).
Additional file 2: Table S2. Pigs tested with SELDI-TOF MS during the validation phase of the study. List of the 35 positive and 35 negative pigs to PRRS (PCR-tested) analyzed with SELDI-TOF MS during the validation phase of the study. The pig ID is reported with the total absorbance and the total amount of hemoglobin present in the sample, the status regarding the PRRS virus, as well as the sex and the number and location of the farm (MA = Mantua region, LO = Lodi region).