Volume 11 Supplement 1
Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Proteome Science
Gaussian process regression model for normalization of LC-MS data using scan-level information
- Mohammad R Nezami Ranjbar^{1, 2},
- Yi Zhao^{3},
- Mahlet G Tadesse^{4},
- Yue Wang^{1} and
- Habtom W Ressom^{2}Email author
https://doi.org/10.1186/1477-5956-11-S1-S13
© Nezami Ranjbar et al; licensee BioMed Central Ltd. 2013
Published: 7 November 2013
Abstract
Background
Differences in sample collection, biomolecule extraction, and instrument variability introduce bias to data generated by liquid chromatography coupled with mass spectrometry (LC-MS). Normalization is used to address these issues. In this paper, we introduce a new normalization method using the Gaussian process regression model (GPRM) that utilizes information from individual scans within an extracted ion chromatogram (EIC) of a peak. The proposed method is particularly applicable for normalization based on analysis order of LC-MS runs. Our method uses measurement variabilities estimated through LC-MS data acquired from quality control samples to correct for bias caused by instrument drift. Maximum likelihood approach is used to find the optimal parameters for the fitted GPRM. We review several normalization methods and compare their performance with GPRM.
Results
To evaluate the performance of different normalization methods, we consider LC-MS data from a study where metabolomic approach is utilized to discover biomarkers for liver cancer. The LC-MS data were acquired by analysis of sera from liver cancer patients and cirrhotic controls. In addition, LC-MS runs from a quality control (QC) sample are included to assess the run to run variability and to evaluate the ability of various normalization method in reducing this undesired variability. Also, ANOVA models are applied to the normalized LC-MS data to identify ions with intensity measurements that are significantly different between cases and controls.
Conclusions
One of the challenges in using label-free LC-MS for quantitation of biomolecules is systematic bias in measurements. Several normalization methods have been introduced to overcome this issue, but there is no universally applicable approach at the present time. Each data set should be carefully examined to determine the most appropriate normalization method. We review here several existing methods and introduce the GPRM for normalization of LC-MS data. Through our in-house data set, we show that the GPRM outperforms other normalization methods considered here, in terms of decreasing the variability of ion intensities among quality control runs.
Keywords
Background
Liquid chromatography coupled with mass spectrometry is one of the promising high through-put tools for identification and quantification of biomolecules extracted from serum, plasma, tissue, etc. Analysis of a sample by LC-MS typically generates three pieces of information: a pair of mass-to-charge ratio (m/z) and retention time (RT) along with a related ion intensity. Following preprocessing of data from a set LC-MS runs, a data matrix is created with each row and column representing a feature (RT, m/z) and a sample, respectively. Assuming, pf eaturesandn samples, we consider in this paper a p × n data matrix.
Normalization of the preprocessed LC-MS data is considered before statistical analysis to decrease undesired bias [1]. The bias can be from differences in sample collection, biomolecule extraction, or from column separation nonlinearity, ionization variability, etc [2]. The importance of the sample preparation step to achieve consistent results in different runs of the same experiment was emphasized in recent studies [3].
To the best of our knowledge, limited studies investigated the performance of existing normalization methods through real LC-MS data [2, 4]. In these studies, a pooled mixture of multiple samples is utilized to generate replicate QC runs. Then, the QC runs are utilized to estimate and correct the bias.
In [5] we reviewed most of the existing methods for normalization of LC-MS data. Some of the methods were modified and all methods were employed to conduct an evaluation of their performances on a real data set. In this study, we expand the aforementioned work by introducing a new normalization method using Gaussian process regression to capture the variation of ion intensities. We use maximum likelihood approach to find the parameters for the fitted stochastic process. Then the learned model is used to correct for the drift based on analysis order. This approach can be used with either preprocessed data [6] or the raw data (scan-level data). The latter allows us to capture information that may be lost during preprocessing, but also deals with more complex data.
A data set generated from both experimental and QC samples is used here to assess normalization methods. We use the number of ions with significant variation within the QC runs as a measure to evaluate the run to run variability for each ion. From this point of view a normalization method is assumed to have better performance if it can decrease this variation for more ions. In addition, cross-validation is used to evaluate the performance of each normalization method. The methods are further compared based on their ability to detect statistically significant ions between cases and controls. In other words we look into different batches of data in the same experiment with dependent and independent set of samples and compare the normalization methods based on their ability to increase the number of statistically significant ions overlapping among different batches. However, we do not use this criteria to rank the methods as the ground-truth is not available. The variability with-in the QC runs is utilized to compare the GPRM with other normalization methods reviewed in this paper and particularly with those that use analysis order for normalization.
Results and discussion
Performance comparison of analysis order-based normalization methods using QC runs and the number of statistically significant ions between cases and controls
Batch | Raw | LOESS | LOESS-CV | GPRM | GPRM-EIC | 2D-GPRM-EIC |
---|---|---|---|---|---|---|
1 Positive | 5.0 | 5.2 | 4.6 | 2.9 | 2.5 | 2.1 |
2 Positive | 7.0 | 4.3 | 3.7 | 2.5 | 2.2 | 1.8 |
1 Negative | 5.0 | 3.8 | 3.6 | 3.0 | 2.6 | 2.2 |
2 Negative | 20 | 12 | 10.9 | 6.8 | 5.7 | 4.4 |
(A) Percentage of the number of ions with significant QC variation (q _{ζ} < 0.1) | ||||||
Mode | Raw | LOESS | LOESS-CV | GPRM | GPRM-EIC | 2D-GPRM-EIC |
Positive | 23 | 27 | 30 | 37 | 39 | 42 |
Negative | 11 | 19 | 21 | 28 | 31 | 33 |
(B) Number of statistically significant ions between cases and controls, overlapping in batch B1 and B2 |
Performance comparison of TIC, MedScale, and Quantile normalization
Batch | Raw | TIC | MedScale | Quantile |
---|---|---|---|---|
1 Positive | 5.0 | 6.1 | 5.5 | 4.0 |
2 Positive | 7.0 | 4.1 | 3.5 | 3.0 |
1 Negative | 5.0 | 3.1 | 4.1 | 2.9 |
2 Negative | 20 | 11 | 9.9 | 7.3 |
(A) Percentage of the number of ions with significant QC variation (q _{ ζ } < 0.1) | ||||
Mode | Raw | TIC | MedScale | Quantile |
Positive | 23 | 40 | 32 | 62 |
Negative | 11 | 19 | 15 | 105 |
(B) Number of statistically significant ions between cases and controls, overlapping in batch B1 and B2 |
By comparing all the reviewed methods across different batches in the data set, we observed that three methods, TIC, MedScale, and Quantile normalization, showed better performance consistently [5]. As shown in Tables 1 and 2, both GPRM and GPRM-EIC reduce the percentage of ions with q _{ ζ } < 0.1 compared with other normalization method or unnormalized data (see Evaluation Method). This indicates that our proposed methods lead to a decrease in the number of features with significant variation across the QC runs.
From Table1t can be seen that among analysis order-based normalization methods, our proposed approach has the highest efficiency in decreasing the variability within the QC runs. Table 1 shows that these methods also outperformed other normalization methods by considering the same measure for estimated variability of QC runs. We think that GPRM can perform better as it handles the drift by using a stochastic model and optimization to find the parameters. In comparison, other analysis order-based normalization methods work with limited possible values for parameters and as a result they may not reach the highest possible performance. Also by taking advantage of the scan-level intensities from EICs, GPRM-EIC is able to achieve better performance than GPRM. However, GPRM-EIC requires appropriate alignment of the scan-level peaks. Finally, by merging information across different scans, 2D-GPRM-EIC showed the best performance.
Comparing Tables 1 and 2 reveals that although some methods show less decrease within QC variability, but they lead to more number of ions selected as statistically significant between cases and controls. For example Quantile method reduced the percentage of the ions with q _{ ζ } < 0.1 to 4.0% and 3.0% for B1 and B2 respectively in positive mode (Section: Evaluation Methods). In comparison GPRM achieved 2.5% and 2.2%. However the number of ions selected as statistically significant between cases and controls are 62 and 37 for Quantile and GPRM respectively in positive mode. As pointed out before, the ground truth is not available to evaluate the performance of the normalization methods on the basis of detected differences between cases and controls. Thus, all ions found statistically significant are regarded as potential candidates until subsequent verification is conducted to determine if the differences are true or biologically meaningful. However, the LC-MS runs are expected to yield better reproducibility following normalization. In particular, the QC runs in this study are anticipated to have the least variability, at least for a considerable subset of the analytes.
Conclusions
Systematic bias is one of the challenges in quantitative comparison of biomolecules by LC-MS. Various normalization methods have been proposed to address this issue. However, there no universally applicable solutions at the present time. Thus, each LC-MS data set should be carefully inspected to determine the most appropriate normalization procedure. Since most of the evaluation studies have been performed on data from relative a small sample size without adequate replicates and QC runs, additional investigations on large-scale LC-MS data are needed [5].
We reviewed several existing normalization methods in this paper. Also a new method for normalization of LC-MS data is introduced. The method uses the analysis order information in a Gaussian process model. Compared to other methods that also use analysis order information, our model has some advantages. It can model the bias from instrument drift more efficiently as a statistical approach is used which includes noise in the model to estimate the parameters through optimization. Therefore it is more precise in estimation of the scale parameter compared to some analysis order-based methods which search heuristically for the span parameter of the smoothing algorithm. In addition we extended this method to perform normalization on the basis of EICs obtained from raw LC-MS data instead of the preprocessed peak list.
We evaluated the performance of the GPRM and other existing normalization methods using our in-house LC-MS data generated from both experimental (cases and controls) and QC samples. The QC runs were used to estimate and correct the drift in the ion intensities. The normalization methods were assessed based on two criteria: (1) the decrease in the within-sample variability; (2) the number of extra ions selected as statistically significant compared to those obtained without normalization. The first criterion is used to rank the models based on their performance, while the second criterion used to investigate the effect of normalization in terms of the number of possible candidates with significant differences between cases and controls. Our method showed improvement over existing methods considering the first criteria. However some methods with a lower rank, e.g. Quantile, provide more number of candidates. Therefore it is required to conduct a verification experiment to confirm the true differences and discard false positives.
While the performance of the normalization method can be improved by using the scan-level LC-MS data following appropriate alignment, there are some issues. One of these issues is misalignment of the peaks across the scans. We used a simple approach to align the peaks, but more advanced techniques are available to further improve the alignment. Also, including prior distributions on the parameters of the GPRM through Bayesian analysis can potentially elevate the performance of our method. Future work will focus on addressing these issues.
Methods
Several normalization techniques have been proposed for LC-MS data. As normalization is a well-known concept in the area of genomics, most of the methods have been adapted from the techniques developed for gene expression microarray data [4, 7–10]. Usually the underlying assumption of these approaches is that the average biomolecule concentrations should be equal for all samples in the same experiment. To examine the performance of these methods, replicate LC-MS runs of a reference sample can be used [5].
In this paper, we introduce a Gaussian process regression model for normalization based on analysis order. Also we extend this method to estimate variability of scan-level ion intensities within an EIC of a peak. For comparison we investigated the following normalization methods: (i) normalization based on total ion count (TIC), (ii) median scale normalization, (iii) pretreatment methods such as scaling, centering and transformation, (iv) normalization based on internal standards, (v) quantile normalization, (vi) MA transform linear/local regression normalization, (vii) normalization based on QC consistency, (viii) normalization based on stable features, and (ix) normalization based on analysis order. We implemented these methods and in some cases we modified the algorithm [5].
Existing normalization methods
where I _{ j } is the intensity of the pair (rt, m/z) for the j th sample. Here both TIC of the sample and TIC of the selected ions can be used. We preferred the latter as the former includes all the noisy ions which have been already removed in the preprocessing step.
where x _{ i } ^{∗} is the i th ion of the reference sample x^{∗}. Similarly x _{ ij } is the i th ion of the j th sample x _{ j }. We modified this method by adopting some rules to select the reference. The first option is to select one of the QC runs. In the second scenario, any sample may be selected as the reference. In both cases it is convenient to choose randomly, but we decided to include the option to select the reference sample based on minimum number of missing values/outliers, where the outliers are detected based on projection statistics.
The drawback of this method is inflation of measurement errors. In class III, usually centered log magnitude or square root of intensities is used to reduce the effect of different data distributions and make skewed distributions more symmetric, but this class has difficulties with zero values and large variances.
Normalization based on internal standards is another popular approach for LC-MS data [12]. In this method by inserting one or more internal standards with controlled amounts of concentration, normalization is done based on the variation of these landmarks. If there is only one standard available, one sample is considered as the reference, then we scale all samples by the ratio of the standard's intensity of the reference to the standard's intensities of the samples. This approach can be modified by selecting a robust value for the reference to avoid accepting outliers as standard ions. If there are more than one internal standard, two approaches are feasible. First by using a distance measure we can find the closest standard to each ion and apply the previous method with one standard. Moreover it is possible to find a regression model for the variation of standards versus order of ions and apply the result to all intensities. The problem with this method is that it is expensive because it needs to add internal standards with precise concentrations in the sample preparation phase. This approach performs well when the correlation between an ion and the internal standard is not high, otherwise it does not meaningful to be used for normalization.
- (1)
Find the smallest values for each vector or array. Save the average or median of these values.
- (2)
Similarly, find the second smallest values, and up to the n smallest values for each vector or array. Save the averages or medians of these values.
- (3)
For each vector or array, replace the sorted actual values with these averages, and resort them again.
MA transform local regression normalization is MA transform linear regression, but instead of using a linear model, it applies piecewise linear or other nonlinear models such as higher order polynomials and splines to find the baseline curve of intensities. Also locally weighted polynomial regression (LOESS) technique [14] has been used to smooth the log magnitude of relative ion intensities versus the reference or in a pairwise manner (Figure 2).
Normalization based on analysis order is one of the most recent approaches [2]. The main idea of this method is to model the variation of intensities versus the sample's run (injection) order in the experiment, and to remove this variation by applying a smoothing regression technique. This approach needs a set of reference samples to model the variation versus analysis order based on their deviation from expected intensity values. However this method has not been examined with a large data set and only a set of technical replicates was used in the work reported in [2]. Also only animal samples with few numbers of known proteins were used in the study. The authors reported that their normalization method outperformed all other existing methods in their experiment.
Proposed normalization method
One way to address the issue of selecting the smoothing parameter of the LOESS algorithm, is to use a stochastic model and use optimization to learn the parameters. We propose a stochastic model to correct for drift using the analysis order information in which the LC-MS data were generated. This method uses preprocessed data to perform normalization [6]. Next, we extend the method to apply normalization using scan-level data.
Normalization based on Gaussian process (GPRM)
for t ∈ {t _{1} .. t _{ n }} which is the analysis order index. In addition index i points to the i th ion. To use the model in (11), all we need is to find the co-variance matrix of the process Σ^{(i)}.
where ${\sigma}_{\in}^{2}$ is the variance of the zero mean Gaussian noise.
Any optimization method can be used to find the maximum likelihood estimator, θ^{ ∗ }. For example, using gradient descent approach: $\theta \left(r+\text{1}\right)=\theta \left(r\right)-\lambda {\nabla}_{\theta}\mathcal{L}$, where r is the iteration index and 0 < λ < 1.
So far, we explained the algorithm for a given ion. As we have multiple ions, we repeat the pro-cedure for each ion separately to estimate ${\theta}^{\left(i\right)}={\left[{\ell}^{\left(i\right)}{\sigma}^{\left(i\right)}\phantom{\rule{0.3em}{0ex}}{\sigma}_{\in}^{\left(i\right)}\phantom{\rule{0.3em}{0ex}}{\mu}_{0}^{\left(i\right)}\phantom{\rule{0.3em}{0ex}}{\mu}_{1}^{\left(i\right)}\right]}^{T}\text{for}{x}^{\left(i\right)}~\mathcal{N}\left({\mu}^{\left(i\right)},{\text{\Sigma}}^{\left(i\right)}\right)$
Gaussian Process Regression Model - Extracted Ion Chromatogram(GPRM-EIC)
for t = t _{1} , .., t _{ n }, where index i points to the i th ion and s represents the scan number. To use the model in (18), similar to the previous model, all we need is to find the covariance matrix of the process Σ. Here $\ell $ and µ parameters are defined as scalars.
To summarize, first we find the base peaks for the corresponding mass of each ion to form the EIC. This can be done by using XCMS2, to find regions of interest [18] (ROI) for the ions selected in pre-processing. However we can use segmentation along mass axis as it is employed in original XCMS [19]. Thereafter, by looking into raw data, each individual scan is used to model the drift based on analysis order. The model is used to correct for the variation. Finally the normalized peak intensities are used to recalculate the area under the EIC curve and update the ion intensities. One issue here is the misalignment of the peaks. The drift in each scan may be partly due to retention time differences across samples. To correct for this we simply align the first peak in each spectrum to match the scans across different samples.
2-D Gaussian Process Regression Model - Extracted Ion Chromatogram (2D-GPRM-EIC)
where z = [t s]^{ T } for analysis order $t={t}_{1},..,{t}_{{n}_{QC}}$ and scan $s={s}_{1}^{\left(i\right)},..,{s}_{S\left(i\right)}^{\left(i\right)}$. Here S(i) is the number of scans for ion i. By using this model, we consider two different scales along analysis order and scan time axes and use a 2-D Gaussian process to model the variability.
LC-MS data
We used in-house LC-MS data set to evaluate the normalization methods described in the previous sections. The data set is derived from three types of samples, cases, controls, and QCs. The samples were collected from adult patients at Tanta University Hospital, Tanta, Egypt. The participants consist of 40 hepatocellular carcinoma (HCC) cases and 50 patients with liver cirrhosis. Through peripheral venepuncture single blood sample was drawn into 10 mL BD Vacutainer sterile vacuum tubes without the presence of anticoagulant. The blood was immediately centrifuged at 1000 × g for 10 min at room temperature. The serum supernatant was carefully collected and centrifuged at 2500 × g for 10 min at room temperature. After aliquoting, serum was kept frozen at -80 °C until use.
The data were acquired using ultra performance liquid chromatography (UPLC) coupled with QTOF MS in both positive and negative modes. The raw data were preprocessed by XCMS package [19]. The details of the data set can be found in [21].
Evaluation criteria
The main assumption of any normalization method is reproducibility of the experiment. Here we assume that at least a considerably large subset of the measured values is reproducible. A measured value refers to a single peak which represents an ion intensity across different runs. Since the QC runs are expected to be identical, their measurements can be used to estimate instrument variability and to assess the reproducibility of the experiment.
where ${x}_{ijk}^{QC}$ is the intensity of k th QC run for i th ion in batch j and ζ is the random effect so that $\forall i:{\mathbb{E}}_{k}\left[{\zeta}_{ik}\right]=0.$
A normalization method is evaluated on the basis of the number of ions with reduced variance of ζ _{ ik } . We evaluate this by using the F test for the ratio of the sum of squares from ζ to the sum of the squares of ∈ which is the unexplained variation or error. To correct for the multiple testing effect, we use q _{ ζ } < 0.1, where q is FDR-adjusted p-value estimated using the Storey method [22].
ions with significant group-batch interaction, i.e. q _{γ} < 0.1, were removed from the analysis, where q is FDR-adjusted p-value estimated using the Storey method [22]. Significant ions are selected based on q _{ i,α } ≤ 0.1.
Declarations
Acknowledgements
This work was supported in part by the National Institutes of Health Grant R01CA143420. The LC-MS data presented in the manuscript were generated through support from the Proteomics and Metabolomics Shared Resource at the Lombardi Comprehensive Cancer Center.
Declarations
The publication costs for this article were funded by the corresponding author.
This article has been published as part of Proteome Science Volume 11 Supplement 1, 2013: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Proteome Science. The full contents of the supplement are available online at http://www.proteomesci.com/supplements/11/S1.
Authors’ Affiliations
References
- Listgarten J, Emili A: Statistical and computational methods for comparative proteomic profiling using liquid chromatography-tandem mass spectrometry. Mol Cell Proteomics 2005, 4: 419–434. 10.1074/mcp.R500005-MCP200PubMedView ArticleGoogle Scholar
- Kultima K, Nilsson A, Scholz B, Rossbach UL, Flth M, Andrn PE: Development and Evaluation of Normalization Methods for Label-free Relative Quantification of Endogenous Peptides. Molecular and Cellular Proteomics 2009,8(10):2285–2295. 10.1074/mcp.M800514-MCP200PubMed CentralPubMedView ArticleGoogle Scholar
- Tuli L, Ressom HW: LC-MS Based Detection of Differential Protein Expression. Journal of proteomics bioinformatics 2009,2(10):416–438. 10.4172/jpb.1000102PubMed CentralPubMedView ArticleGoogle Scholar
- Callister SJ, Barry RC, Adkins JN, Johnson ET, Qian WJ, Webb-Robertson BJM, Smith RD, Lipton MS: Normalization Approaches for Removing Systematic Biases Associated with Mass Spectrometry and Label-Free Proteomics. J Proteome Res 2006,5(2):277–286. 10.1021/pr050300lPubMed CentralPubMedView ArticleGoogle Scholar
- Nezami Ranjbar M, Zhao Y, Tadesse M, Wang Y, Ressom H: Evaluation of normalization methods for analysis of LC-MS data. Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on 2012, 610–617.View ArticleGoogle Scholar
- Nezami Ranjbar M, Tadesse M, Wang Y, Ressom H: Normalization of LC-MS data using Gaussian process. Genomic Signal Processing and Statistics, (GEN-SIPS), 2012 IEEE International Workshop on 2012, 187–190.View ArticleGoogle Scholar
- Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 2002,30(4):e15. 10.1093/nar/30.4.e15PubMed CentralPubMedView ArticleGoogle Scholar
- Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002,18(1):S96-S104.PubMedView ArticleGoogle Scholar
- Anderle M, Roy S, Lin H, Becker C, Joho K: Quantifying reproducibility for differential proteomics: noise analysis for protein liquid chromatography-mass spectrometry of human serum. Bioinformatics 2004,20(18):3575–3582. 10.1093/bioinformatics/bth446PubMedView ArticleGoogle Scholar
- Bolstad BM, Irizarry RA, Åstrand M, Speed TP: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003,19(2):185–193. 10.1093/bioinformatics/19.2.185PubMedView ArticleGoogle Scholar
- van den Berg R, Hoefsloot H, Westerhuis J, Smilde A, van der Werf M: Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 2006, 7: 142+. 10.1186/1471-2164-7-142PubMed CentralPubMedView ArticleGoogle Scholar
- Sysi-Aho M, Katajamaa M, Yetukuri L, Oresic M: Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 2007, 8: 93. 10.1186/1471-2105-8-93PubMed CentralPubMedView ArticleGoogle Scholar
- Higgs RE, Knierman MD, Gelfanova V, Butler JP, Hale JE: Comprehensive label-free method for the relative quantification of proteins from biological samples. Journal of Proteome Research 2005,4(4):1442–1450. 10.1021/pr050109bPubMedView ArticleGoogle Scholar
- Cleveland WS: Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association 1979,74(368):829–836. 10.1080/01621459.1979.10481038View ArticleGoogle Scholar
- Dunn WB, Broadhurst D, Begley P, Zelena E, Francis-McIntyre S, Anderson N, Brown M, Knowles JD, Halsall A, Haselden JN, Nicholls AW, Wilson ID, Kell DB, Goodacre R: Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nature Protocols 2011,6(7):1060–1083. 10.1038/nprot.2011.335PubMedView ArticleGoogle Scholar
- Kamleh MA, Ebbels TMD, Spagou K, Masson P, Want EJ: Optimizing the Use of Quality Control Samples for Signal Drift Correction in Large-Scale Urine Metabolic Profiling Studies. Analytical Chemistry 2012,84(6):2670–2677. 10.1021/ac202733qPubMedView ArticleGoogle Scholar
- Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning series). The MIT Press; 2005.Google Scholar
- Benton HP, Wong DM, Trauger SA, Siuzdak G: XCMS2: Processing Tandem Mass Spectrometry Data for Metabolite Identification and Structural Characterization. Analytical Chemistry 2008,80(16):6382–6389. [PMID: 18627180] 10.1021/ac800795fPubMed CentralPubMedView ArticleGoogle Scholar
- Smith CA, Want EJ, OMaille G, Abagyan R, Siuzdak G: XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry 2006,78(3):779–787. 10.1021/ac051437yPubMedView ArticleGoogle Scholar
- Nezami Ranjbar MR, Wang Y, Ressom HW: Quality assessment of LC-MS metabolomic data. Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on 2011, 1034–1036.View ArticleGoogle Scholar
- Xiao JF, Varghese RS, Zhou B, Nezami Ranjbar MR, Zhao Y, Tsai TH, Di Poto C, Wang J, Goerlitz D, Luo Y, Cheema AK, Sarhan N, Soliman H, Tadesse MG, Ziada DH, Ressom HW: LCMS Based Serum Metabolomics for Identification of Hepatocellular Carcinoma Biomarkers in Egyptian Cohort. Journal of Proteome Research 2012,11(12):5914–5923.PubMed CentralPubMedGoogle Scholar
- Storey JD: A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002,64(3):479–498. 10.1111/1467-9868.00346View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.