Discovery and identification of potential biomarkers of pediatric Acute Lymphoblastic Leukemia

Background Acute lymphoblastic leukemia (ALL) is a common form of cancer in children. Currently, bone marrow biopsy is used for diagnosis. Noninvasive biomarkers for the early diagnosis of pediatric ALL are urgently needed. The aim of this study was to discover potential protein biomarkers for pediatric ALL. Methods Ninety-four pediatric ALL patients and 84 controls were randomly divided into a "training" set (45 ALL patients, 34 healthy controls) and a test set (49 ALL patients, 30 healthy controls and 30 pediatric acute myeloid leukemia (AML) patients). Serum proteomic profiles were measured using surface-enhanced laser desorption/ionization-time-of-flight mass spectroscopy (SELDI-TOF-MS). A classification model was established by Biomarker Pattern Software (BPS). Candidate protein biomarkers were purified by HPLC, identified by LC-MS/MS and validated using ProteinChip immunoassays. Results A total of 7 protein peaks (9290 m/z, 7769 m/z, 15110 m/z, 7564 m/z, 4469 m/z, 8937 m/z, 8137 m/z) were found with differential expression levels in the sera of pediatric ALL patients and controls using SELDI-TOF-MS and then analyzed by BPS to construct a classification model in the "training" set. The sensitivity and specificity of the model were found to be 91.8%, and 90.0%, respectively, in the test set. Two candidate protein peaks (7769 and 9290 m/z) were found to be down-regulated in ALL patients, where these were identified as platelet factor 4 (PF4) and pro-platelet basic protein precursor (PBP). Two other candidate protein peaks (8137 and 8937 m/z) were found up-regulated in the sera of ALL patients, and these were identified as fragments of the complement component 3a (C3a). Conclusion Platelet factor (PF4), connective tissue activating peptide III (CTAP-III) and two fragments of C3a may be potential protein biomarkers of pediatric ALL and used to distinguish pediatric ALL patients from healthy controls and pediatric AML patients. Further studies with additional populations or using pre-diagnostic sera are needed to confirm the importance of these findings as diagnostic markers of pediatric ALL.


Introduction
Acute lymphoblastic leukemia (ALL) is the most common pediatric cancer, accounting for 30% of all pediatric malignancies [1]. ALL is diagnosed in 3000 to 4000 persons in the United States each year, where two-thirds of these are children [2,3]. ALL can develop from any lymphoid cell blocked at a particular stage of development, including primitive cells with multilineage potential [4,5]. The diagnosis of ALL presently depends on immunophenotyping. Although more than 339 different cluster-of-differentiation (CD) molecules expressed on human leukocytes are defined [6], only a few of these molecules are truly lineage-specific. Melhem reported that oncoprotein 18 (Op18) phosphorylation was significantly correlated with the white blood cell count and the percentage of cells in S phase [7]. However, Brattsand and Chen reported that over-phosphorylation of Op18 was also found in breast cancer and lung adenocarcinomas [8,9]. Noninvasive and specific biomarkers for early diagnoses of pediatric ALL remain an urgent need.
Proteome analysis provides valuable information about the total proteome's dynamic and rapid changes occurring during illness. Recent advances in proteomics have offered opportunities for finding biomarkers in biological fluids, especially in sera. Surface-assisted laser desorption/ ionization time-of-flight mass spectrometry (SELDI-TOF-MS), which generates protein fingerprints, has been proven as a powerful tool for potential biomarker discovery [10,11]. Recently, SELDI-TOF-MS analysis has been successfully used to identify specific biomarkers for various cancers, such as prostate cancer, bladder cancer, ovarian cancer, lung cancer, colon cancer, breast cancer and pancreatic cancer [12][13][14][15][16][17][18][19][20][21]. Albitar explored the potential of proteomic analysis of peripheral blood plasma, using SELDI-TOF-MS, to predict the recurrence of ALL in adult patients and build a proteomic-based model predicting clinical behaviors in adult ALL [22]. Hegedus used SELDI-TOF-MS to analyze proteomes of pediatric leukemia cell lines, including ALL, mixed lineage leukemia (MLL) and acute myeloid leukemia (AML) cell lines, and pediatric leukemia bone marrow samples of different subtypes. Differences in protein expression were reported among ALL, MLL and AML cell lines and bone marrow. A protein of 8.3 kDa was found to be expressed at high levels in the ALL cell line as well as in bone marrow and was identified as a C-terminal truncated form of ubiquitin after purification and trypsin digestion. These results may guide further research toward understanding the development of leukemias [23].
In this study, we first used SELDI-TOF-MS technology to screen potential protein patterns specific to ALL, then purified the candidate protein biomarker peaks by HPLC and identified them by LC-MS/MS.

Patients, controls and serum samples
The pediatric ALL patients were randomly selected from patients who had been confirmed by myelocytic cytological diagnosis in China-Japan Friendship Hospital during a given period of time (2004.7-2004.8). At the same time, the healthy controls were selected from children who took health examinations required for school or kindergarten entrance and had not been diagnosed with any other disease. Furthermore, the pediatric AML patients were also taken as controls from this hospital. Serum samples (n = 178) were collected from 94 ALL patients, ranging in ages from 2-14, 54 healthy children, and 30 pediatric acute myeloid leukemia (AML) patients. The three groups were similar in age and gender distribution ( Table 1). All the serum samples were collected from the selected patients and controls in 2004. Informed consent was obtained from each participant. The collected sera were immediately centrifuged at 1500 × g, 4°C for 10 min after acquisition, then distributed into 20 μL/tube, and stored at -80°C for further analysis. All the SELDI-TOF-MS experiments were finished within three months.

SELDI-TOF-MS analysis of serum protein profiles
Protein profiling of serum samples was determined by SELDI-TOF-MS using the eight-spot format WCX2 (weak cation exchange) Proteinchip arrays (Ciphergen Biosystems, Fremont, CA, USA). Frozen serum samples were thawed on ice and spun at 10,000 rpm for 5 min at 4°C. Each serum sample (10 μL) was denatured by the addition with WCX2 Proteinchip arrays. The samples were held by a bioprocessor (Ciphergen Biosystems) and preactivated twice with 150 μL low stringency buffer at room temperature for 5 min. The diluted serum sample was allowed to react with the surface of the WCX2 chip for 60 min at room temperature. Each spot was then washed three times with appropriate buffers of various pHs and ionic strengths to eliminate non-adsorbed proteins. After drying the array surface in the air, 1 μL of saturated sinapinic acid (SA) matrix in 50% ACN and 0.5% TFA was applied and allowed to dry. MS analysis was performed on a PBS-II ProteinChip reader (Ciphergen Biosystems). Mass peak detection was analyzed using ProteinChip Biomarker Software version 3.1 (Ciphergen Biosystems). The mass spectra of the proteins were generated using an average of 90 laser shots at a laser intensity of 150-160 arbitrary units, and detector sensitivity was set at 8. For data acquisition of low-molecular weight proteins, the optimized detection mass range was set from 5 to 20 kDa for all study sample profiles. The m/z of each peak to be quantified (S/N ratio>5) was determined according to externally calibrated standards (Ciphergen Biosystems). The m/z sample peaks with more than 2000 m/z were normalized with Biomarker Wizard to compile all spectra and automatically detect quantified mass peaks.

Bioinformatics and biostatistics
Serum samples were split into two groups, the training set or test set. Forty-five samples from ALL patients and 34 healthy controls were randomly selected for training sample set. To evaluate the accuracy and validity of the classification tree, 49 samples of ALL patients and 50 controls (20 healthy children and 30 patients with pediatric AML) were selected for the test set ( Table 1).
The profiling spectra of serum samples from the training set were normalized using total ion current normalization by Ciphergen's ProteinChip Software (version 3.1). Peak labeling was performed by the Biomarker Wizard feature of the software. A two-sample t-test was used to compare mean normalized intensities between the case and control groups. The p value was set at 0.05 to be statistically significant. The intensities of selected peaks were then transferred to Biomarker Pattern Software (BPS) to construct the classification tree of ALL. Briefly, the intensities of the selected peaks were submitted to BPS as a 'Root note'. Based on peak intensity, a threshold was determined by BPS to classify the root node into two child nodes. If the peak intensity of a blind sample was lower than or equal to the threshold, this peak would be labeled as "left-side child node." Peak intensities higher than the threshold would be marked as "right-side child node." After rounds of decision making, the training set was found to be discriminatory with the least error.
All of the protein peak intensities of samples in the test set were evaluated by BPS using the classification model. The ALL and control samples were then discriminated based on their proteomic profile characteristics. The sensitivity was defined as the probability of predicting ALL cases, and the specificity was defined as the probability of predicting control samples. A positive predictive value reflected the probability of ALL if a test result was positive.

Serum fractionation
Serum samples from both healthy controls and ALL patients were selected for the purification of the four candidate protein biomarkers. The serum sample was mixed with U9 buffer (1:2, v/v) and incubated for 30 min at room temperature. The sample was then diluted in 5 mL of WCX binding buffer (50 mM NaAc, pH 4.0) and loaded to the CM Ceramic Hyper D WCX SPE column (6 × 10 mm, Pall Life science, USA). After washing with 2 mL of WCX binding buffer, the column was eluted with 5 mL of eluting buffer (2 M NaCl, 50 mM NaAc, pH 4.0) at a flow rate of 0.5 mL/min. The eluted fraction was further purified using HPLC.

Identification of candidate protein biomarkers by LC-MS/ MS
In-solution digestion of each concentrated fraction, which contains one candidate protein biomarker, was performed with a standard protocol. Briefly, each fraction was dissolved in 25 mM NH4HCO3, reduced with 10 mM DTT for 1 hour, and alkylated by 40 mM iodacetamide in the dark for 45 minutes at room temperature. Then, 40 mM DTT was added to quench the iodacetamide for 30 min at room temperature. Protease K (0.1 μg, Promega Corporation, USA) was then added into the sample solution and incubated for 45 min at 37°C. The digestion was stopped by adding formic acid to a final concentration of 0.1%. The digested sample was loaded onto a homemade C18 column (100 mm ×100 μm) packed with Sunchrom packing material (SP-120-3-ODS-A, 3 μm) and followed by nano-LC-ESI-MS/MS analysis. The LTQ mass spectrometer was operated in the data-dependent mode in which first the initial MS scan recorded the mass to charge (m/z) ratios of ions over the mass range from 400-2000 Da. The five most abundant ions were automatically selected for subsequent collision-activated dissociation. All MS/MS data were searched against a human protein database downloaded from the NCBI database using the SEQUEST program (Thermo, USA).

Confirmation of candidate protein biomarkers using ProteinChip immunoassays
To confirm the identity of the candidate protein biomarkers, specific antibodies (anti-PF4 rabbit antibody, ab49735; anti-NAP-2 mouse antibody, ab58142, anti-C3a mouse antibody, ab11872, all from ABcam) were applied to each spot of pre-activated PS20 ProteinChip arrays (Ciphergen Biosystems) and incubated overnight at 4°C in a humidity chamber. After blocking with BSA and rinsing, antibody-coated spots were incubated with 1.5 μL of serum samples and 3 μL of binding buffer (0.1 M Na 3 PO 4 , 0.5 M urea, 0.5% CHAPS, pH 7.2) for 90 min [24]. Spots were then washed with PBST (0.5% Triton X-100), PBS and deionized water twice before drying. SELDI-TOF-MS analysis was performed on a PBS-II Pro-teinChip reader with CHCA as the matrix.

Serum protein profiles and data processing
All the serum samples were collected from the selected patients and controls in 2004, then distributed into 20 mL per tube, and stored at -80°C until use. The SELDI-TOF-MS experiments were finished within three months after the collection of all serum samples. No significant change of the protein markers was found during storage after SELDI-TOF-MS and MALDI-TOF-MS analysis.
Serum samples from the training set were evaluated by comparing the results obtained by SELDI-TOF-MS with those from the WCX2 chip. All MS data had baseline subtracted and were normalized using total ion current. Peak clusters were then generated by Biomarker Wizard software. Twenty-six peaks had statistically significant differences between pediatric patients and healthy children (p value < 0.05). In the ALL group, five protein peaks were found to be up-regulated and twenty-one peaks were found to be down-regulated (Table 2). Figure 1 shows the protein profiling patterns of sera from ALL patients and control samples. The results indicated that two protein peaks (8137 m/z, p value 6.07E-05 and 8937 m/z, p value 5.13E-05) were up-regulated in sera from ALL patients, while two other peaks (7769 m/z, p value 1.54E-07 and 9290 m/z, p value 7.59E-08) were down-regulated, compared with those from the healthy controls.

Protein peak detection and validation
To develop biomarker patterns for the diagnosis of ALL, the intensities of the protein peaks in the training set were submitted to BPS. A total of seven peaks (9290 m/z, 7769 m/z, 15110 m/z, 7564 m/z, 4469 m/z, 8937 m/z, 8137 m/z) with the highest discriminatory power were automatically selected to construct a classification tree. Figure  2 shows the tree structure and sample distribution. The classification tree using the combination of the five peaks identified 45 ALL and 34 healthy subjects with a calculated sensitivity of 96% and a specificity of 98%.
To validate the accuracy and validity of the classification model derived from the training set, we applied the derived classification tree to a test dataset consisting of 49 ALL and 50 control samples. The classification tree discriminated the ALL samples from the controls with a sensitivity of 91.8% and a specificity of 90%. The positive predictive value was found to be 90%.

Purification and identification of candidate protein biomarkers
Serum samples from ALL patients were used for the purification of the two up-regulated candidate protein biomarkers (8137 and 8937 m/z), and serum samples from healthy controls were used for the purification of the two down-regulated proteins (7769, 9290 m/z) in the sera from ALL patients using WCX SPE and C18 HPLC. Figure  3 shows the results of MALDI-TOF-MS analyses of the four purified candidate protein biomarkers.
After digestion with protease K, the peptide mixture was analyzed by nano-LC-MS/MS. Figure 4 shows the results of the LC-MS/MS chromatogram (A) and MS/MS spectra of two identified peptides (B, C) from the protein (9290 m/z). Table 3 shows the results of the identification of the four candidate protein biomarkers: CTAP-III (fragment of pro-platelet basic protein precursor, PBP, 9290 m/z), PF4 (platelet factor 4, 7769 m/z), and two fragments of C3a (one of human complements, 8137 and 8937 m/z). A combination of high sequence coverage and accurate MW measurement by MALDI-TOF-MS provided a complete sequence of the four candidate protein markers.
Representative mapping of SELDI-TOF-MS analysis of sera from healthy controls and pediatric ALL patients Figure 1 Representative mapping of SELDI-TOF-MS analysis of sera from healthy controls and pediatric ALL patients. Differentially expressed proteins with potential diagnostic significance are indicated by arrows. The top group denotes serum from a healthy volunteer, in which 7769 m/z and 9290 m/z were up-regulated. The bottom group denotes sera from patients with ALL, in which 8137 m/z and 8937 m/z were over-expressed.

Validation of four candidate protein biomarkers
To confirm the identity of the four proteins as PF4, CTAPIII, and two fragments of C3a, we performed immunoassays with specific antibodies directed against the four proteins immobilized on a ProteinChip (Ciphergen Biosystems). The results showed that the CTAP-III and PF4 were captured and detected in the serum of healthy, but not ALL patients, controls, and the fragments of C3a in the serum of ALL patients, but not healthy controls ( Figure 5).

Discussion
In this study, we obtained serum protein mass spectra from pediatric ALL patients and controls using SELDI-TOF-MS. During the pilot study, we tried SCX2, WCX, IMAC-Cu, IMAC-Ni and H4 protein chips for the sera samples. Eventually, we found that more protein peaks were detected using the WCX protein chip to assay serum samples (data not shown in this paper.) Based on the serum proteomic profiles, we constructed a classification model to discriminate the ALL patients from the healthy controls. We used a two-step approach for proteomic biomarker screening. First, we employed a "training set" to identify differentially expressed proteins and used the distinguishing proteomic peaks from BPS for the construction of a classification tree. The classification model discriminated patients with ALL from healthy controls and pediatric AML with a sensitivity of 91.8% and a specificity of 90.0%.
Two down-regulated candidate protein biomarkers were identified as PF4 (platelet factor 4, 7769 m/z) and CTAP-III (fragment of pro-platelet basic protein precursor, PBP, 9290 m/z). Additionally, two up-regulated candidate protein biomarkers (8137 and 8937 m/z) were identified as two fragments of C3a.
Among the proteins identified by LC-MS/MS, both PF4 and CTAP-III are platelet-derived chemokines. PF4, which is also known as CXCL4, is present in platelet α-granules and is released during platelet aggregation. PF4 is also released from activated T lymphocytes and mast cells [25]. PF4 has been reported to influence many biological processes, including endothelial cell proliferation, migration, and angiogenesis [26][27][28]. PF4 can down-regulate cell proliferation and cytokine release, thereby inhibiting T cell functions [29], and could support the survival of normal hematopoietic precursors, protecting them from the toxicity of chemotherapeutic agents [30]. CTAP-III is an N-terminal cleavage product of PBP, which is synthesized by megakaryocytes. CTAP-III, β-thromboglobulin and NAP-2, which are derived from PBP through proteolysis, belong to a group of homologous and immunologically cross-reactive proteins derived from platelet α-granules. It was reported that CTAP-III could support stem cell- derived hematopoiesis [30]. Both PF4 and CTAP-III could protect early cells from the toxic effects of various chemotherapeutic agents [30]. Vermeulen found that average levels of PF4 and CTAP-III were down-regulated in the serum of benzene-exposed workers in comparison with control subjects [24].
Complement components play important roles as mediators of inflammation and immune responses. Complement 3, which is composed of α and β chains, is the most abundant complement component in serum (1.2 mg/ mL). Complement 3 convertase could cleave C3 at the residues Arg726-Ser 727 giving rise to C3b (Mr 176000 Da) and C3a (Mr 9000 Da) [31]. C3a was reported to be an inflammatory mediator of innate immune responses [32], suggesting that C3a might represent an inflammation biomarker. C3a was found to be up-regulated in the ascitic fluids of ovarian cancer patients [33]. Lee found that C3a is elevated in patients with chronic hepatitis C and HCV-related hepatocellular carcinoma [34]. The 8.1 kDa C3a fragment was identified for the first time.
In summary, we identified a set of protein peaks that could discriminate pediatric ALL from healthy controls. From the protein peaks specific for pediatric ALL disease, we identified platelet factor (PF4), a fragment of proplatelet basic protein precursor (CTAP-III) and C3a as potential proteomic biomarkers of pediatric ALL. This panel of markers is likely to be limited to distinguishing pediatric ALL from healthy controls and pediatric AML patients. Further studies with additional populations or using pre-diagnostic sera are needed to confirm the importance of these findings as diagnostic markers of pediatric ALL.
MALDI-TOF-MS spectra of four purified potential protein markers