Self-assembling protein microarrays represent a platform that is conceptually attractive for high-throughput analysis of proteins. The ability to synthesize proteins de novo from pre-spotted DNA elements, utilizing combined in vitro transcription and translation systems, in particular avoids problems associated with protein denaturation during microarray storage. It also provides extreme flexibility in terms of the choice of protein elements to be immobilized on the arrays.
To be useful, self-assembling protein microarrays require that array elements be produced at high levels, in native states, and at predefined and unique locations on the array surfaces. As originally described , NAPPA arrays employed DNA constructs encoding the proteins of interest fused to a C-terminal GST domain. Capture and immobilization of these chimeric species was done via co-spotting a polyclonal anti-GST antibody with the DNA constructs at the array element locations, these mixed macromolecules being immobilized on an aminosilane surface by chemical crosslinking [13, 14]. The amounts of immobilized proteins were then determined through addition of a second, horseradish peroxidase (HRP)-conjugated, monoclonal anti-GST antibody coupled to tyramide signal amplification (TSA) . In the TSA system, HRP activity catalytically produces activated dye molecules that in turn react with tyrosine residues of local proteins. Although TSA detection is highly sensitive [35, 36], caution is needed to ensure that it does not saturate and thereby provide misleading conclusions concerning the amounts of proteins synthesized. It should be noted that focusing on the epitope identified by the anti-GST antibody provides little direct information about the folding state of the N-terminal protein contained within the chimeras.
To address these issues, we chose to explore use of GFP not only as an epitope for protein array element capture and immobilization, but also as a monitor of protein production and folding. We established that expressing target proteins as N-terminal fusions with GFP allows their immobilization with a capture GFP antibody based on the same principle as employed by NAPPA arrays using GST . Further, for GFP to form its chromophore, it needs first to be folded correctly. It is generally accepted that this protein can be used as a folding reporter when expressed as a C-terminal fusion with other proteins, the proper folding of the GFP domain being related to the correct folding of the N-terminal moiety [37, 38]. The use of GFP provides the additional advantage for characterization of these arrays since the fluorescence intensity also indicates the amounts of properly-folded chimeric protein at each element location. This property of autofluorescence obviates the need for indirect labeling with antibodies that are conjugated with fluorescent dyes, such as was done for NAPPA arrays . Evidently, indirect labeling methods may not correlate linearly with the amount of folded protein present. The high concentrations of TSA reactants and unusually long incubation times in the TSA solution employed in that study , likely to result in signal saturation, may be the reason for the relatively-uniform levels of expression recorded for very different proteins across the NAPPA arrays. Other reported protein array platforms that use fluorescent proteins did not use intrinsic fluorescence for detection, instead opting for indirect labeling [12, 20]. A final advantage of autofluorescent microarrays is that the use of the labeling dyes, required for detection in non-fluorescent protein arrays, can result in background fluorescence; the lower background intrinsic to autofluorescent GFP arrays results in higher quality microarray images .
The next question to be addressed was of efficient production of the chimeric proteins in vitro. Our results using the E. coli S30 extract for protein synthesis indicated, for most of the Arabidopsis proteins that we aimed to express as GFP chimeras, levels of fluorescence not much higher than those of the negative control, although a few proteins, such as AXR3, SHY2, and ELF3, consistently showed very high fluorescence levels. In contrast, use of the wheat germ system led to the production of low levels of fluorescence above background for most proteins, but the highest levels seen for any protein were much lower than those seen using the E. coli system for the control and auxin proteins. Through performing the wheat germ cell-free batch reaction a second time (after removing the first reaction mixture), we found we could increase the fluorescence signal. This interesting result may be worthy of further study, but the observed increase was at the cost of longer incubation times and of more extract, and was incompatible with high throughput applications of the microarrays.
It is well established that in vivo expression of eukaryotic proteins in E. coli can be problematic. Many proteins aggregate, as a consequence of misfolding, to form insoluble inclusion bodies; this is particularly evident for large, multidomain proteins . Issues of misfolding have also been reported when using E. coli S30 extracts for protein synthesis in vitro
. It therefore seems probable that an absence of fluorescence using the E. coli system reflects inappropriate folding of the chimeric proteins. The results using the wheat germ extract are consistent with a lower capacity, in terms of yield, of this system to synthesize proteins. To cast further light on this question, we decided to mix the E. coli and wheat germ extracts. If it were possible to complement the high protein synthetic capacity of the E. coli extract with a capacity for correct folding of eukaryotic proteins provided by the wheat germ system, then we would expect to observe high levels of fluorescence for the different chimeric proteins. Remarkably, when the hybrid system was used in this way, most array elements increased fluorescence, including those representing the majority of the Arabidopsis proteins that showed low fluorescence values when translated using the E. coli extract alone. The levels of expression still varied across different proteins (Table 1), but these levels were higher than those seen following in vitro translation in the presence of wheat germ extract alone. For proteins were the S30 system showed higher fluorescence values than the hybrid, e.g. MBP, GST, AXR3, SHY2, and ELF3, can be due to the dilution of the S30 extract with the wheat germ in the hybrid. These proteins were translated and folded successfully with the S30 extract alone and more protein was produced with more of the prokaryotic extract. The hybrid system also performed better than the rabbit reticulocyte system (Figure 7); this image also indicates that our system is compatible with expression from linear DNA molecules immobilized on the array substrate. This implies autofluorescent protein arrays can be produced from fully programmable arrays, similar to the NAPPA system. Analysis of the protein products, using western blotting, confirms appropriate sizes for most of the chimeric proteins. The presence of multiple bands for some proteins, for example FYPP3 and RGA, indicates the quality of synthesis is protein-dependent, and suggests routine quality assurance should be employed in different applications.
Given that the S30 and wheat germ extracts appear to be complementary and act synergistically, the source of this effect can be discussed in more detail. In terms of the prokaryotic system, its main feature is a very high yield under conditions that proteins express and fold successfully . A further feature is its easy genetic manipulation; different E. coli strains have been generated for the specific purpose of increasing protein yields during cell-free expression. For example, strain A19 was created with the aim of stabilizing PCR products in S30 extracts for high-throughput protein expression . The KC6 strain was designed for total amino acid stabilization . Energy regeneration, an important aspect of cell-free protein expression, has been the subject of continuous development associated with the S30 extract. For the Roche kit used in these experiments, efficient ATP regeneration comes from the PANOx system ; this system regenerates ATP using phosphoenol pyruvate (PEP) and ADP, catalyzed by pyruvate kinase . A further component, oxalic acid, inhibits the reverse conversion of pyruvate to PEP by endogenous PEP synthase . Pyruvate, provided by the pyruvate kinase reaction, reacts with NAD+ and coenzyme A (CoA) to form acetyl phosphate, which regenerates ATP in excess of the ADP that is produced during protein synthesis .
In terms of the eukaryotic system, the wheat germ extract was developed with the primary aim of efficient cell-free protein synthesis. This extract, prepared from homogenized wheat embryos, contains all components necessary for translation , and ribosome-inactivating proteins such as tritin and other endogenous translation inhibitors, have been removed to improve its stability . Dialysis can be implemented with wheat germ expression reactions to provide a continuous supply of substrates and removal of inhibitory products, and high yields (up to 4 mg of individual proteins) can be obtained, but only after extremely long incubation times (more than 60 hours) . Such long reaction times are impractical for protein microarray production under high-throughput conditions; further, dialysis chambers compatible with the microarray format are not currently available. Clearly, the hybrid system that we have described is an excellent alternative, since it synthesizes folded polypeptides at a high rate in a single batch reaction.
From the point of existing knowledge concerning protein folding, it is not obvious as to why the two in vitro systems complement so effectively. Prokaryotic ribosomes, beyond synthesizing polypeptides at rates faster than eukaryotes , are also actively involved in protein folding and can effect this process in vitro
[47, 48]. For prokaryotes, protein folding is generally considered as being a post-transcriptional process; in contrast, eukaryotic organisms are believed to employ a different protein-folding mechanism, polypeptides being folded co-translationally [49, 50]. Evidence nevertheless exists that wheat germ and rat liver ribosomes are capable of refolding denatured proteins . This activity of wheat germ ribosomes may be responsible for the folding of proteins rapidly synthesized by the E. coli system; the wheat germ extract presumably provides chaperones, cofactors, and substrates that assist protein folding [23, 52]. Together these represent reasonable hypotheses as to why the hybrid system is particularly effective.
The molecular mechanism(s) underlying the cooperativity in protein production and expression observed between the bacterial and wheat germ systems might involve the following: (a) high-level protein synthesis on bacterial ribosomes, accompanied by co-translational folding, with components for the latter being supplied by the wheat germ extract, (b) stimulation of eukaryotic protein synthesis and of co-translational folding based on eukaryotic factors, by the E. coli extract, (c) post-translational folding of proteins synthesized on E. coli ribosomes mediated by the eukaryotic extract, or (d) some combination of these factors.
Protein size and domain structure may also influence folding as reflected by the results that we obtained. Eukaryotic cells contain a much greater number of longer proteins than prokaryotes  and these proteins contain more domains . The tendency of polypeptide chains to misfold increases significantly as a function of length . It has therefore been proposed that eukaryotic organisms developed a co-translational mechanism to ensure efficient folding, particularly of these larger, more complex proteins . Prokaryotic organisms exclusively use a post-translational folding mechanism, since their polypeptide elongation rates are considerable faster than those of eukaryotes . This is one reason as to why is it difficult to produce large multi-domain eukaryotic proteins in E. coli. The overall size of the protein appears to influence successful translation and folding in the hybrid system (Table 1), since the largest proteins (SEC and SPY) did not display high levels of fluorescence. It should be noted that since these proteins both contain glycosyl-transferase domains, which are membrane-associated, the addition of liposomes might improve their synthesis, as recently demonstrated for other membrane proteins [55, 56]. The sizes and structure of individual domains also acted as an influence; for example, the DELLA proteins (RGL1, RGA, GAI, and RGL3), which are all poorly expressed and folded, have variable N-terminal domains that contain a unique DELLA motif, but all share the same multi-domain C-terminus [57, 58]. Heterologous expression in E. coli of these proteins has been reported but only of their N-terminal domains . Therefore it seems likely the C-terminal region is, in this case, recalcitrant to folding. A final reason for low levels of fluorescence might be that those particular proteins can only be synthesized and folded co-translationally by the wheat germ components.