In recent years, clinical urinary proteomic analyses have been widely used to discover biomarkers. A thorough and representative urinary proteome database of normal human samples is critically important as the background of a disease proteome for discovery proteomics and the source of candidate proteins/peptides for targeted proteomics. Since 2001, a number of groups have addressed this issue, and more than 2500 proteins have been identified from the normal human urinary proteome. However, there are still some important aspects that need to be defined.
To construct a representative urinary proteome, it is necessary to define the minimal sample number. Too few samples may present individual-specific proteins that do not represent the group pattern. Previous studies
[15–17] have used various sample numbers ranging from one to over ten. However, because this issue has not been thoroughly assessed, the minimal sample number was unknown. In this study, inter-individual and inter-gender variations were taken into consideration for qualitative analysis to achieve a representative urinary proteome. We used replicate LC/MS/MS analyses of 20 urine samples from healthy volunteers to define the minimal sample number needed. The results showed that 9 male/female samples may contain approximately 95% analytical completeness of a male/female group. For a group, 10 samples can achieve 95% analytical completeness. Importantly, the results of this study may be helpful for constructing a new urinary proteome database or evaluating an existing database. The universal application of these conclusions should be cautioned for several reasons. First, technical variations factors, including sample preparation, LC separation, mass spectrometer detection, and data processing can affect the final identification results. The conclusions of this study are based on the results obtained with 1DLC separation and a low sensitivity and resolution mass spectrometer (LTQ XL). Any change in these factors, such as using an instrument with high sensitivity and high resolution (i.e. Orbitrap or TripleTOF 5600), might result in a different conclusion. For example, in this report a total of 867 proteins were identified with one-dimensional separation (1DLC) and low resolution instrument (LTQ XL). Kentsis et al.
 identified 2362 proteins using three-dimensional separation (centrifugation, SDS-PAGE, and 1DLC) and a high resolution instrument (LTQ Orbitrap XL). Thus, with more separation approaches and a more accurate instrument, a substantially greater number of proteins could be identified and more useful information might be obtained. On the other hand, it is well known that the urinary proteome had great biological variation. In this study, only inter-individual and inter-gender variations were taken into consideration, and other biological variations (such as age, hormone level, exercise, and others
) may also have a marked impact on the results and increase the sample number. Therefore, the conclusion presented here represents a preliminary result that may be the minimal sample number needed. If other variation factors are included, the minimal sample number may increase.
Another important issue regarding a normal urinary proteome database is the quantitative information. For clinical research, the aim is generally focused on identifying disease-related biomarkers. The quantitative information of each protein would be helpful to define biological and technical variations so that the differential proteins with statistical significance in a group could be identified. In addition, the false differential proteins found due to the high variation in the group could be excluded. To date, most normal urinary proteome analyses have been qualitative studies, and only the study by Nagaraj et al.
 provided overall quantitative information of each protein using a peak intensity method. In this report, we used SC and western blot to assess quantitative information of high and low abundance proteins, respectively, and to estimate the minimal sample number needed for quantitative analysis. For high abundance proteins, the average minimal sample number was 18 with a 2-fold change, and for the proteins of low abundance, the number was 30 with a 2-fold change. These results indicated that a higher minimal sample number is required to obtain statistical significance when detecting proteins of low abundance. We also attempted to estimate the minimal sample number using the Nagaraj et al.
 data. With 66% inter-individual CV, the minimal average sample number was 16 with a 2-fold change and 58 with a 1.5-fold change, among all the acceptable levels of FDR and statistical power. These results were similar to our results for proteins of high abundance using the SC method. However, the sample number, separation method, MS instrument, data processing software and protein number used for quantitative analysis were different between these two studies. Considering that there were other quantitative methods, such as iTRAQ and TMT, it is difficult to conclude that the minimal sample number for urinary proteome quantitative analysis. Therefore, it is necessary to evaluate the variations of various quantitative methods in the future to define a proper minimal sample number for clinical research.
Our previous work
 showed that it was hard to define the difference between the male and female urinary proteomes, except for the identification of several male-specific proteins. In addition, recent studies by both LC/MS/MS
 and 2DE
 approaches also failed to identify these differences. In this study, the protein overlap rates among the 21 samples and the result of hierarchical clustering analysis allowed us to separate male and female samples into two groups, indicating a difference between the male and female urinary proteome pattern. However, because this study was only based on 1DLC/MS/MS analysis and low-resolution mass spectrometry, the conclusion should be confirmed with additional experiments before being universally applied. In addition, considering the existence of male-specific proteins, it is important that the ratio of male and female samples is balanced when constructing a database.
The choice of pooled or individual sample was also an important issue. Since the proteome is known to have substantial biological variation, an appropriate number of samples should be analyzed for proteomic analysis. However, a few years ago the throughput of proteomic techniques was limited, and in order to circumvent this problem, samples were pooled
. Previous reports on cell lines
 or tissues
[33, 34] by 2DE showed that pooling could reduce biological variation. On the other hand, Diz et al.
 as well as our previous study
 showed that pooling samples may lead to a loss of information through sample dilution. In this study, the pooled male sample was found to be clustered with male samples and closest to the samples from Male 7–10, indicating that a pooled sample may not adequately represent the pattern of all individual samples. Therefore, the results from a pooled sample should be carefully assessed before being applied to other experiments. In recent years, with the application of instruments having high sensitivity and high resolution (such as Orbitrap), high-throughput urinary proteome analysis has become possible. Nagaraj et al.
 identified over 800 proteins in a 4h 1DLC/MS/MS analysis using an LTQ Orbitrap XL. Therefore, the use of individual samples is recommended in future work.