Automated production of recombinant human proteins as resource for proteome research

Background An arbitrary set of 96 human proteins was selected and tested to set-up a fully automated protein production strategy, covering all steps from DNA preparation to protein purification and analysis. The target proteins are encoded by functionally uncharacterized open reading frames (ORF) identified by the German cDNA consortium. Fusion proteins were produced in E. coli with four different fusion tags and tested in five different purification strategies depending on the respective fusion tag. The automated strategy relies on standard liquid handling and clone picking equipment. Results A robust automated strategy for the production of recombinant human proteins in E. coli was established based on a set of four different protein expression vectors resulting in NusA/His, MBP/His, GST and His-tagged proteins. The yield of soluble fusion protein was correlated with the induction temperature and the respective fusion tag. NusA/His and MBP/His fusion proteins are best expressed at low temperature (25°C), whereas the yield of soluble GST fusion proteins was higher when protein expression was induced at elevated temperature. In contrast, the induction of soluble His-tagged fusion proteins was independent of the temperature. Amylose was not found useful for affinity-purification of MBP/His fusion proteins in a high-throughput setting, and metal chelating chromatography is recommended instead. Conclusion Soluble fusion proteins can be produced in E. coli in sufficient qualities and μg/ml culture quantities for downstream applications like microarray-based assays, and studies on protein-protein interactions employing a fully automated protein expression and purification strategy. Future applications might include the optimization of experimental conditions for the large-scale production of soluble recombinant proteins from libraries of open reading frames.

approaches for the functional characterization of not yet annotated proteins [10][11][12][13][14]. In the recent past, microarraybased assays have been employed to identify novel protein-protein interactions, small molecule ligands, and protein phosphorylation sites [15,16]. The production of protein microarrays requires recombinant proteins in sufficient quantities and of adequate purity, or their production in situ [17]. In order to guarantee that proteins are full-length and presented in a defined concentration on the array, proteins must be produced ahead of the printing process. The baculovirus as well as yeast expression systems have been exploited to produce proteins on a large scale for subsequent production of microarrays [18]. Both expression systems introduce host-specific post-translational modifications. In contrast, the bacterial expression system Escherichia coli [19] produces proteins devoid of those post-translational modifications typically present in endogenously expressed mammalian proteins. This circumstance can be advantageous for certain applications, e.g. to screen for novel substrates of human kinases. Furthermore, E. coli is a well established expression system with known growth kinetics, robust handling characteristics, and high yields of recombinant proteins. Therefore, we selected E. coli as expression system for the automated production of uncharacterized human proteins from the LIFEdb database [20]. Hence, the resulting in-vitro data could help to bridge the knowledge from different largescale technologies for functional genomics and proteomics applications [21,22]. Different automated strategies are commercially available for bacterial high-throughput protein expression screening [23], or were established by different research groups [24][25][26][27][28][29]. These approaches have several drawbacks in common. For example, only a limited number of steps of the workflow are automated, leaving the challenge to integrate them into a fully automated system. The development of an automated platform for bacterial protein expression should also include DNA handling and quality control steps, as well as the production, purification and analysis of the recombinant proteins. Hence, we undertook an independent approach based on commercial robotics to set-up an improved platform for automated protein expression screening. All individual steps, including the preparation and characterization of expression clones, transformation into bacteria, picking of expression clones, growing bacterial cultures, induction of protein expression, harvesting raw protein extracts, protein affinity purification and subsequent quality control of purified proteins ( Figure 1, Table 1) were performed in a multititer plate format and integrated in our protein production strategy. In addition, quality control steps were also included into the automated workflow. The correct insert size of the expression clones was verified by agarose gel electrophoresis, and the E-PAGE system (Invitrogen) was Work flow of the automated protein production strategy Figure 1 Work flow of the automated protein production strategy. Automated steps are shown in orange, steps involving manual intervention are shown in blue.
used to control the size and purity of affinity-purified proteins. This resulted in the development of a robust procedure which can easily be established on comparable clone picking and liquid handling equipment.

Technical set-up of the fully automated system
The liquid handling steps required for ORF cloning, protein expression and protein purification were implemented on the MULTI-probe II robot which was controlled with the application system software, if possible. Additional external equipment integrated into the robotic platform was navigated with the LabVIEW software. Clone picking was realized on the QPix robot. Figure 1 summarizes the single steps implemented into the automated routine. Open reading frames were transferred by Gateway LR reaction into four different destination vectors (Step1) and subsequently transformed into the bacterial strain DH5α for the amplification of recombinant expression plasmids (Step2). The automated restriction digest of expression plasmids confirmed the correct insert size for 361 of the 384 expression clones (Steps 3-5). Thus, 94% of destination clones were available for transformation into the bacterial strain BL21-SI (Step 6). In summary, each candidate was subjected to 15 different expression tests varying in the choice of fusion tag, induction temperature and purification strategy, or a combination thereof. Again, clone picking and the growth of pre-cultures were performed using our automated setup (Steps 7, 8). However, the induction of protein expression by addition of IPTG or AHT is faster when performed manually (Step 9). Cultures were placed on a shaker at the indicated temperature (Step 10). Protein expression was stopped by removing the culture medium using gravitydriven filter plates. After lysis and affinity-purification (Step 11) the yield of recombination fusion proteins was analyzed using the E-PAGE system, a gel-based approach suitable for the high throughput analysis of proteins (Step   (Figure 2A, B). The final analysis is assisted by the E-PAGE software allowing to reassemble twelve sample lanes, corresponding to a single 96-well row, into a single image ( Figure 2C). Calculation of the molecular weight of the purified fusion proteins is based on a molecular weight marker ( Figure 2B, D). The yield is summarized in the Additional file 1. In order to count as successfully purified, the resulting fusion protein had to yield a clean band of the expected molecular weight. This analysis was performed using the E-PAGE system which separates proteins over a distance of merely 2 cm. The low resolution capacity of the E-PAGE system was accounted for by introducing the rule that only those proteins were regarded as successfully purified when at least two independent expression tests resulted in a protein band of the expected size. According to these criteria, 52% of the uncharacterized proteins were purified in fusion with at least one of the different tags, and quantities up to 10 µg/ ml culture were obtained (Additional file 1). This yield was also reported for other strategies relying on the affinity purification of fusion proteins from small volume cultures [25,35]. However, the yield differs from our manual approach, where close to 80% of fusion proteins were obtained in quantities up to 100 µg/ml. Since the proteins analyzed in these two studies were comparable with respect to molecular weight and intracellular localization, we conclude that parameters such as aeration of culture, and the simplified one-step cell lysis and affinity purifica-tion strategy contribute to the reduced overall yield of the automated protein production strategy.

Influence of Fusion Tag and Temperature on Protein Yield
The influence of the different fusion tags was examined ( Figure 3) and compared with the outcome of our manual approach. With respect to the impact of the induction temperature on His-tagged protein expression, 15% (14 proteins), 19% (18 proteins), 5% (5 proteins) of His-tag proteins were purified when induced at a temperature of 25°C, 30°C, and 37°C, respectively. For reasons of technical simplicity, a one-step lysis and purification procedure was performed in the automated approach. This onestep procedure monitored exclusively the successfully purified proteins without analyzing the percentage of inducible proteins. Moreover, with an average yield of close to 30%, His-tagged fusion proteins were slightly better soluble when protein expression was induced in the manual approach [30].
We could confirm for the automated approach that the NusA tag potentially increases the solubility of difficult to express proteins. The expression of NusA-fusion proteins is more efficient at lower temperature [30]. For example, 42 (44%) NusA-fusion proteins could be purified when protein expression was induced at 25°C, but only 24 (25%) and 5 (5%) of NusA fusion proteins were purified when protein expression was induced at 30°C and 37°C, respectively. Quite the reverse was found for GST fusion proteins which were produced more efficiently when pro- tein expression was induced at elevated temperature. In our automated approach, 26 GST-fusion proteins (27%) were successfully purified when protein expression was induced at 37°C, 18 (19%) at 25°C, and 16 (17%) at 20°C. The MBP-tag behaved comparably to the NusA-tag, the number of successfully purified proteins decreased with increasing induction temperature (17, 15, and 2 proteins with increasing induction temperature).

Quality control of recombinant fusion proteins
Furthermore, we could confirm that amylose-based affinity chromatography does not perform well in an automated setting previously reported by Braun et al. [25]. In detail, MBP/His-fusion protein purified by metal chelate chromatography resulted in 36 soluble fusion proteins (38%) whereas merely 19% of MBP/His fusion tag proteins were obtained after amylose-based affinity chromatography (Table 3).

Development of the automated process
A comprehensive automation of working steps including transformation, bacterial culture, cell disruption and protein extraction, as well as protein purification, and quality control of the purified proteins has been developed to provide material for the large-scale in vitro characterization of human proteins. Every single step (Figure 1) contributed its own particular challenge which had to be solved to fit into a comprehensive automated protein expression approach.
Bacteria can efficiently be transformed by electroporation on a single-clone basis. However, this procedure is difficult to automate and to parallelize, and technical limitations exclude its application in a multi-well format. Therefore the transformation of bacteria by heat shock was chosen, which can proficiently be realized by integrating a PCR machine or a thermoblock on the robot desk.
The vessel dimensions, such as fermenter, Erlenmeyer flask, tube and deep well block, as well as well shape, size and volume and the shaking frequency influence the gasliquid mass transfer characteristics. Gas-liquid mass transfer phenomena in microtiter plates were described by Hermann et al. [36], and therefore 48-well blocks instead of 96-well blocks were chosen to insure sufficient aeration of the cultures. When we compared bacterial growth rates in 48-well plates with differently shaped wells, we observed that the cultures grew at a higher rate when square-shaped flat bottom wells were employed instead of wells with a round well U-bottom. This reflects most likely the more vigorous mixing of liquids in square-shaped wells. In the automated set-up presented here, bacterial cell lysis and affinity chromatography were performed as a one-step procedure without relying on sonication to break up cell walls. Insoluble material was not separated from the slurry due to difficulties to implement this step in our automated platform. Consequently, this automated strat-Influence of fusion tag and induction temperature on fusion protein yield

Influence of fusion tag and induction temperature on protein induction
Hydrophilic fusion tags such as NusA, MBP and GST enhance fusion protein solubility [33] when fused N-terminally to the ORF. This has previously been tested in large-scale protein expression strategies [25,30]. In the case of NusA and MBP fusion tags, protein expression at low temperatures yielded a higher percentage of soluble recombinant proteins. According to results from our automated approach, this finding applies exclusively to proteins induced at a low level (i.e. ORFs no. 3, 6, 96). In contrast, proteins inducible with a high yield were found to remain soluble over a broad temperature range (i.e. ORF no. 13,18,22,26,41,79).
The MBP-tag is known to support proper folding of recombinant proteins and to enhance protein solubility [37,38]. The affinity of MBP to amylose can be exploited for affinity purification. Nevertheless, the binding of MBP to amylose is too inefficient to be useful in a highthroughput setting, and a high proportion of MBP fusion proteins were observed in the flow through and wash fractions, resulting in a low overall yield. Thus, purifying MBP-fusion proteins via their internal His-tag on metal chelating chromatography turned out to be the better choice. With respect to difficult-to-express proteins such as membrane proteins, the NusA tag is useful as long as the induction of protein expression is performed at 20-25°C, and with sufficient aeration [30].

Characterization of fusion proteins
Occasionally, translation of GST-and MBP-tag fusion proteins stopped prematurely and the fusion tag itself copurified with the fusion protein. This effect was even more pronounced for the NusA-tag. In summary, controlling quality and purity of purified recombinant proteins by SDS-PAGE, for example by using the E-PAGE system, is mandatory as efficient quality control.

Comparison with other approaches
Bussow and coworkers have described the heterologous high-throughput production of 10,825 human clones in E. coli. In this case, 1,866 proteins purified as hexahistidine-tagged soluble protein of at least 15 kDa (17%) [39]. A comparable success rate, 16 % of soluble His-tagged proteins, was obtained in this approach with respect to the automated purification of His-tagged fusion proteins. However, in contrast to their approach, the vacuum-filter plate was replaced with a gravity-filter plate in our set-up, thus reducing extensive foaming that we observed in filtration steps after applying a strong vacuum. Extensive foam formation can easily result in well-to-well cross contamination.
Braun et al. [25] tested the automated purification of 32 different human proteins sizing between 16-220 kDa using four different fusion tags, among them MBP, GST and the hexahistidine tag. According to their results, sixty percent of the proteins were purified under non denaturing conditions. MBP and GST fusion tag proteins resulted in better yields than fusion proteins with a short tag, such as the hexahistidine tag. They also reported that the affinity of MBP to amylose as too low to be employed in a high throughput strategy. In contrast, 21% of GST fusion proteins and 11% of MBP fusion protein were purified, when expression tests performed at the three different temperatures were taken into account. However, Braun et al. tested protein expression exclusively at 25°C, and the apparent discrepancy between their results and our results can be explained with the temperature dependence of GST fusion protein expression. In our high-throughput set-up, the best yield was obtained when GST fusion proteins were induced at 37°C. Moreover, when our 37°C data were omitted from the comparison, success rates for our data set and for the Braun study were comparable. Pryor and Leiting tested the efficiency of the GST tag and the MBP tag for the production of soluble recombinant protein on a small scale at two different induction temperatures, 18°C and 37°C, and reported the MBP tag as superior at both temperatures [40]. This result contrasts our experience with the MBP fusion tag, but might be explained with by the very limited number of only two proteins tested by Pryor and Leiting.
Moreover, Braun et al. [25] observed that the yield of recombinant proteins also strongly depends on the subcellular localization of the endogenous protein. Integral membrane proteins and secreted proteins requiring separate optimization and purification methods and were therefore excluded from their study. As much as 50% of the total proteins encoded in the human genome are supposedly membrane or secreted proteins, and a unique strategy would be useful to purify also this large fraction of proteins. In contrast to Braun et al. [25], the strategy presented here did not exclude difficult to express proteins. We previously reported that the NusA tag is beneficial for the expression of difficult proteins which was confirmed in other non high throughput settings [24]. However, Hammarström et al. [41] compared the benefits of seven different fusion tags for the production of recombinant proteins in E. coli, and MBP was reported to be superior over NusA as fusion tag. In this instance, only small proteins (< 20 kDa) were tested, and protein expression was induced at 37°C. Again, the strong temperature dependence of both tags and the fact that only small pro-teins had been selected certainly contribute to the observed differences.

Conclusion
The automated protein production approach presented here introduces a simplified one-step lysis and purification procedure for affinity purification of soluble mammalian proteins. According to our data, NusA fusion proteins should be induced at a low temperature (25°C), whereas GST fusion proteins are better induced at elevated temperature. The purification of fusion protein should be based on metal chelating chromatography, or on affinity to Glutathione. Our strategy can ideally be applied as screening routine for the identification of highly soluble proteins which are required in structural analysis. The selected target proteins can subsequently be produced on a larger scale using a manual approach. In addition, our automated strategy is also useful, when large numbers of different fusion proteins are required, but µg-quantities of purified proteins are sufficient. This applies to highthroughput approaches as realized in functional assays performed in the protein microarray format, or on arrays with compound libraries. In summary, a robust robotic set-up based on standard instrumentation is described which overcomes inefficient steps from other strategies by introducing optimized automated steps, and comprises a larger number of automated steps than before described. This set-up can easily be established on comparable liquid-handling robotics.

Automated cloning, purification and characterization of Gateway-expression clones
The Gateway Cloning system (Invitrogen, Karlsruhe, Germany) was used to generate the protein expression clones listed in the Additional file 1 [34]. Open reading frames were available as entry clones without their native stop codons in vector pDONR201 [42]. Consequently, all fusion proteins contain C-terminally additional amino acids encoded by the respective destination plasmids [30]. All steps to clone the human ORFs [4,20]

Automated induction of protein expression
The heat shock transformation was performed using 50 ng of the expression plasmid added to 50 µL E. coli BL21(DE3) cells (Invitrogen). Target proteins were expressed in duplicate on a 4 mL scale in deep well blocks (Greiner).
Precultures were inoculated with a single colony and from a 48-well agar plate (Genetix QPix), and grown in 48 well blocks (Greiner) in 1 mL LB medium. After incubation for 16 h at 30°C, aliquots of 100 µL preculture were used to inoculate 3.6 mL prewarmed LB medium in the 48-deep well format. Two 48-well blocks were processed at a time at 25°C, 30°C, or 37°C. Recombinant protein expression was induced after 1.5 h, 2 h, and 3.5 h, depending on the expression temperature, by adding either 1 mM IPTG or 0.43 mM AHT (see Table 4 for details). Bacteria were harvested after 12 h continued culture by centrifugation for 10 min at 2,500 × g. Medium was removed by aspiration, and the remaining pellets were kept at -20°C for further analysis.
The E-PAGE system of Invitrogen was utilized for protein expression analysis, where a single gel can be loaded with 96 samples. All samples from one induction were loaded on a single E-PAGE gel with the pipetting robot. Electrophoresis was controlled by the standard soft-and hardware of the robot (Multiprobe, Perkin Elmer).

Additional file 1
Overview