Deciphering the immunogenic potential of wheat flour: a reference map of the salt-soluble proteome from the U.S. wheat Butte 86

Background Within the complex wheat flour proteome, the gluten proteins have attracted most of the attention because of their importance in determining the functional properties of wheat flour doughs and their roles in human health conditions such as celiac disease and food allergies. However, certain non-gluten proteins also trigger immunological responses but may be present in flour in low amounts or obscured by the more abundant gluten proteins in two-dimensional gels of total protein preparations. Methods Non-gluten proteins were preferentially extracted from the flour with a dilute salt solution and separated by two-dimensional gel electrophoresis. Proteins in 173 gel spots were identified by tandem mass spectrometry after cleavage with trypsin or chymotrypsin. Transgenic wheat lines in which specific groups of gluten proteins were suppressed by RNA interference were used to estimate the amount of carry-over of gluten proteins in the salt-soluble protein fraction. Results Fifty-seven different types of non-gluten proteins were identified, including 14 types that are known or suspected immunogenic proteins. The predominant proteins in 18 gel spots were gluten proteins. Some of these also contained non-gluten proteins. Analysis of the salt-soluble proteins from a transgenic line in which omega-1,2 gliadins were eliminated by RNA interference indicated that certain omega-1,2 gliadins were present in large amounts in the salt-soluble fraction and obscured relatively small amounts of beta-amylase and protein disulfide isomerase. In comparison, analysis of a transgenic line in which alpha gliadins were absent revealed that glyceraldehyde-3 phosphate dehydrogenase was a moderately abundant protein that co-migrated with several alpha gliadins. Conclusions In this study, we constructed a proteomic map of the non-gluten protein fraction of wheat flour from the US wheat Butte 86 that complements a proteomic map of the total flour proteins developed previously for the same cultivar. Knowing the identities of low abundance proteins in the flour as well as proteins that are hidden by some of the major gluten proteins on two-dimensional gels is critical for studies aimed at assessing the immunogenic potential of wheat flour and determining which wheat proteins that should be targeted in future gene editing experiments to reduce the immunogenic potential of wheat flour.


Background
Wheat flour proteins can be separated on the basis of solubility into the water/salt-insoluble gliadins and glutenins (gluten proteins) and the water/salt-soluble albumins and globulins (non-gluten proteins). The gluten proteins, com-prising~80% of the total protein, are a collection of~60-80 different proteins that serve as the primary storage proteins of the grain while the non-gluten proteins are a diverse collection of different types of proteins that have metabolic, structural or defense functions. Much of the past research has focused on the gluten proteins because of their important role in determining the unique viscoelastic properties that give wheat flour its commercial value. These proteins are also of considerable interest because they are a major trigger of human health conditions, including celiac disease (CD), a chronic inflammatory disease of the small intestine that occurs in genetically susceptible individuals, and food allergies [1,2]. The gliadins are primarily monomeric proteins that are divided into four groups termed alpha, gamma, omega and delta gliadins while the glutenins are polymeric proteins that consist of high molecular weight glutenin subunits (HMW-GS) and low molecular weight glutenin subunits (LMW-GS). Epitopes that stimulate T-cells in CD patients have been identified in proteins within most major classes of gluten proteins, although epitopes found in certain alpha gliadins and a subclass of omega gliadins, the omega-1, 2 gliadins, have been found to be immunodominant [3,4]. Gluten proteins in all major classes also react with IgE from patients with wheat allergy [5]. In particular, the omega-5 gliadins, another subclass of omega gliadins, have been found to trigger the serious food allergy wheat-dependent exercise-induced anaphylaxis that occurs in sensitized individuals when the ingestion of wheat is followed by physical exercise [6].
Non-gluten proteins also have been found to be immunogenic. In studies that examined cohorts of 28 and 60 patients with confirmed cases of wheat allergy, Battais et al. [5,7] found that IgE from 67 and 72% of patients, respectively, reacted with proteins in the albumin/globulin fraction. Further studies using 1-D and 2-D immunoblots of albumin/globulin fractions demonstrated that a wide range of proteins often react with IgE from single patients and that the reactive proteins vary from individual to individual [8][9][10][11]. Non-gluten proteins likely to be allergenic are listed in the Allergome database (http://www.allergome. org/) and include beta amylase (Tri a 17), thioredoxin (Tri a 25), serpin (Tri a 33), triosephosphate isomerase (Tri a 31), peroxidase (Tri a peroxidase), chitinase (Tri a chitinase), peroxiredoxin (Tri a 32), glyceraldehyde 3-phosphate dehydrogenase (GAPDH) (Tri a 34), xylanase inhibitor (Tri a XI), dehydrin (Tri a 35), 9 kDa lipid transfer protein (Tri a 14) and a number of alpha amylase/trypsin inhibitors (AAI) (Tri a 28, Tri a 29, Tri a 30, Tri a 40). In addition, certain non-gluten proteins have been shown to be reactive with IgG and IgA antibodies from CD patients, suggesting a role for serpins, globulins, purinins, farinins and AAI in CD [12]. AAI also have been implicated in non-celiac wheat sensitivity (NCWS), a response to wheat ingestion that is distinct from CD and food allergies [13], while a globulin from wheat flour has been associated with the development of type 1 diabetes [14].
Because sensitivities to wheat are increasing, there is a need to identify clinically relevant proteins in wheat flour, associate specific flour proteins with particular types of allergies and sensitivities, and determine the frequencies of immunogenic responses to particular proteins. Twodimensional immunoblot analysis is a powerful method for assessing the reactivity of patient sera to specific proteins in the flour [8,[10][11][12][15][16][17]. The approach combines high resolution two-dimensional gel electrophoresis (2-DE), a robust, reproducible method for the both the separation and quantification of hundreds of proteins, with the identification of individual proteins by tandem mass spectrometry (MS/ MS). However, 2-DE analyses of total protein preparations from wheat flour can be problematic because the more abundant gluten proteins overlap with low-abundance nongluten proteins in some areas of the gels [18]. In a comprehensive analysis of total flour proteins, Dupont et al. [19] found MS/MS evidence for non-gluten proteins in 22% of 2-DE spots containing either LMW-GS or gliadins as the predominant proteins. In their study, LMW-GS overlapped with globulins, peroxidase, serpin and xylanase inhibitor; alpha gliadins overlapped with fructose bisphosphate aldolase, alcohol dehydrogenase and formate dehydrogenase; gamma gliadins overlapped with triticin and aspartic proteinase; and omega gliadins overlapped with beta-amylase. For this reason, saltsoluble protein fractions should also be considered for studies of immunogenic proteins. In this paper, we extracted proteins from flour from the US wheat Butte 86 with a dilute salt solution, separated the proteins by 2-DE, and identified proteins in 173 2-DE spots by MS/MS. The resulting proteomic map should make it possible to use 2-D immunoblots to rapidly identify and prioritize non-gluten proteins to be targeted in future gene editing experiments aimed at reducing the immunogenic potential of wheat flour.

Methods
Proteins were extracted from wheat flour from the US spring wheat cultivar Butte 86 with a buffer containing 100 mM KCl as described in Hurkman and Tanaka [18], quantified by Lowry, and analyzed by 2-DE as described in detail in Dupont et al. [19]. In short, 50 mg flour was suspended in 200 μl of 50 mM Tris-HCl pH 7.8, 100 mM KCl, 5 mM EDTA and incubated for 5 min with intermittent vortex mixing. Samples were centrifuged at 4°C at 14,500 x g for 15 min. The supernatant was collected and proteins were precipitated by the addition of 4 volumes of cold (− 20°C) acetone. The pellet was rinsed with cold acetone, air-dried, and stored at -20°C. After quantification, proteins were solubilized in urea buffer (9 M urea, 4% NP-40, 1% DTT, 2% ampholytes) for isoelectric focusing using capillary tube gels with a pI range of 3-10. NuPAGE 4-12% Bis-Tris protein gels (Life Technologies, Carlsbad, CA, USA) were used for the second dimension separation.

LC-MS analysis
Protein spots were excised from replicate 2-D gels and placed in 96-well plates where they were reduced, alkylated and digested separately with trypsin (Sequencing Grade Modified Trypsin, Promega Corp., Madison, WI, USA) or chymotrypsin (Sequencing Grade Chymotrypsin, Promega Corp., Madison, WI, USA) using a Diges-tPro according to the directions of the manufacturer (Intavis, Koeln, Germany).
Plates containing peptides from digested gel spots were placed in the autosampler of a NanoLC 425 (Sciex, Framinham, MA, USA) that was interfaced by a nano-electrospray source to an Orbitrap Elite mass spectrometer (Thermo Scientific, San Jose, CA, USA). Ten μl fractions were loaded by the autosampler onto one of three PicoChip reverse phase columns (PicoChip with ReproSil-PUR C18, 10.5 cm, ID 75 μm, 3 μm, 120 Å, ReproSil-PUR-C18-AQ, New Objective Inc., Woburn, MA, USA). The positions of the columns were switched in front of the mass spectrometer with the PicoSlide system for Thermo LTQ/Orbi (New Objective Inc., Woburn, MA, USA). With this configuration, Column A is eluting, Column B is loading and Column C is washing, thereby saving time and reducing carryover between samples.
The loaded sample was washed with solvent A to remove salts and the column position was switched in front of the Orbitrap Elite mass spectrometer and eluted with a gradient of acetonitrile into the mass spectrometer. Solvent A was 5% in acetonitrile and Solvent B was 80% in acetonitrile, both solvents were 0.05% in formic acid. Gradient elution was at a flow rate of 250 nl per minute from 100% Solvent A to 35% Solvent B in 45 min. Peptides were detected in the Orbitrap set to scan a range from 200 to 2000 m/z at a resolution of 60,000. The 10 most intense peaks were subject to collisioninduced dissociation (CID). The minimal signal threshold was set to 10,000. Dynamic exclusion with a repeat count of 2 was enabled for duration of 10 s. Normalized collision energy was set to 30%.
Results were loaded into Scaffold 4 (Scaffold_4.9.0, Proteome Software, Inc., Portland, OR, USA). The mass accuracy was set to 30 ppm on precursor level and 0.6 Da on MS/MS level. Scaffold summarized the Mascot and X!Tandem peptide hit results from the two different enzyme digestions (trypsin, chymotrypsin) with the MudPit analysis and compress data summarizing option.
Twenty-one spots that contained evidence of gluten proteins in this analysis were excised from replicate gels, digested with thermolysin (Promega Corp., Madison, WI, USA), and analyzed by MS/MS as above. These included spots 26, 27, 45, 54, 55, 60, 61, 80, 82, 83, 84, 87, 91, 101, 102, 104, 105, 106, 189, 190, and 191. Data from all three enzyme digestions were then combined in Scaffold. The predominant protein in each spot was deemed to be the protein for which the greatest number of unique peptides were identified and is reported in Table 1. All proteins identified in each spot are reported in Additional file 1. MS data for predominant proteins and proteins that contained at least half the number of unique peptides found in the predominant proteins are shown in Additional file 2. The mass spectrometry data have been deposited to the ProteomeXchange Consortium (http://proteomecentral. proteomexchange.org) via the PRIDE partner repository [23] with the dataset identifier PXD017260 and https:// doi.org/10.6019/PXD017260. Flour from transgenic lines missing specific gluten proteins was used to further evaluate spots containing both gluten proteins and non-gluten proteins. Transgenic lines were produced previously using RNA interference. Transgenic line SA-30-118a-5 is missing all omega-1,2 gliadins [16] while transgenic line SA-35a-124j is missing all alpha gliadins [17].

Results
KCl-soluble proteins from Butte 86 flour were separated by 2-DE and the pattern was compared to that of a total SDS protein extract (Fig. 1). The most abundant proteins in the total protein extract, representing 125 of 233 2-DE spots identified previously by MS/MS [19], were the gluten proteins; the HMW-GS and LMW-GS (shown in green), the alpha and gamma gliadins (shown in blue) and the omega gliadins (shown in red) (Fig. 1a). Several groups of non-gluten proteins, including serpins, purinins and alpha-amylase/trypsin inhibitors (AAI), were wellseparated from the major gluten proteins but of considerably lower abundance. In comparison, the most abundant      [19]. AAI sequences from Butte 86 are from [22] b a gluten protein with less than half the number of unique peptides found in the predominant protein was also identified in spot c also contains an omega gliadin d replicate sample was digested with thermolysin and analyzed by MS/MS e also contains a LMW-GS f also contains a non-gluten protein g also contains a gamma gliadin h also contains an alpha gliadin  (Fig. 1b) were located in the AAI, omega-1,2 gliadin and alpha and gamma gliadin regions of the gel. Additionally, proteins in the serpin and purinin regions were notably more abundant in the KCl-soluble fraction than in the total protein fraction. One-hundred-eighty-one spots were excised from 2-D gels of the KCl-soluble proteins and analyzed by MS/MS (Fig. 2). One hundred seventy-three of the spots yielded valid identifications. MS/MS coverage of the proteins ranged from 8 to 100% with an average of 49%. Sixty-one different protein types and 111 sequences were identified as the predominant proteins in these spots ( Table 1). The predominant proteins in 26 spots were globulins (Fig. 2). Nine different globulin sequences were identified. Seventeen spots were identified as AAI and 11 different sequences were found. Fourteen spots were serpins and seven different sequences were identified. In addition, the predominant proteins in multiple spots were identified as glucose and ribitol dehydrogenase (6), embryonic protein DC-8 (5), beta-amylase (4), enolase (4), GAPDH (4), lactoylglutathione lyase (4), peroxidase (4), chitinase (3), glutathione-S-transferase (3), late embryogenesis abundant protein (3), malate dehydrogenase (3), pyruvate phosphate dikinase (3), and purinin (3). Forty other non-gluten protein types were identified as the predominant proteins in either one or two 2-DE spots (Table 1). Only eight of these were identified previously in the total protein fraction [19].
In some cases, gluten proteins were separated effectively from non-gluten proteins on the basis of solubility. For example, peroxidase was the predominant protein in three moderately abundant spots (89, 90, 91) in the LMW-GS region of the gel from the KCl-soluble fraction (green arrows in Figs. 1b and 2). In the total protein fraction of Dupont et al. [19], the predominant proteins in these three spots were LMW-GS (indicated with green arrows in Fig. 1a). The absence of most LMW-GS in the KCl-soluble fraction also made it easier to visualize many of the minor globulins that are found in this region of the gel (labeled in yellow in Fig. 2). Fig. 2 Identification of proteins in a KCl-soluble fraction of wheat flour. Spots in which the predominant proteins were gluten proteins are labeled in red. Spots in which the predominant proteins were non-gluten proteins that are either known allergens or likely to be immunogenic are labeled in bright green (beta-amylase), dark blue (serpins), medium blue (glyceraldehyde-3-phosphate dehydrogenase (GAPDH), yellow (globulins), teal (peroxidase), magenta (purinins), orange (chitinase), purple (xylanase inhibitor), light blue (tritin), light purple (farinin), brown (dehydrin, peroxiredoxin, triosephosphate isomerase), or green (AAI). Protein identifications can be found in Table 1 and Additional file 1 Despite the fact that most gluten proteins are not readily soluble in salt solutions, gluten proteins were the predominant proteins identified in 18 spots in the KCl fraction (Fig. 2). Eight spots were alpha gliadins, four were gamma gliadins, one was an omega gliadin and five were LMW-GS ( Table 1). Fourteen of these spots also contained nongluten proteins (34, 55, 60, 61, 80, 82, 83, 84, 86, 87, 106,  189, 190, 191). Gluten proteins also were detected as minor components of 14 spots (Table 1, Additional file 1).
A number of spots in the region of the gel containing alpha and gamma gliadins also were problematic. Three minor spots in this region contained only alpha gliadins (79, 88, 191) while two minor spots contained only gamma gliadins (60, 61). However, five moderately abundant spots contained both an alpha gliadin as the predominant protein and GAPDH (83,106,189,190,191). Additionally, the predominant proteins in four other spots were GADPH, but MS/MS evidence also pointed to the presence of alpha gliadins in these spots (101, 102, 103, 105) (Fig. 2, Additional file 1).
To estimate the amounts of different gluten proteins that were carried over into the KCl-soluble protein fraction, we examined the profiles of flour proteins from transgenic lines in which the expression of either the omega-1,2 gliadins or the alpha gliadins was suppressed by RNA interference [16,17]. Detailed proteomic analyses of total protein fractions from flour of the transgenic lines revealed that all omega-1,2 gliadins were absent in line SA-30-118a-5 [16] while all alpha gliadins were absent in line SA-35a-124j [17]. In 2-D gels of the KCl-soluble protein fraction, spots 26, 27, 34, 38 containing omega-1,2 gliadins were major spots in the nontransgenic flour (Fig. 3a). Spot 26 was significantly reduced while other spots were not apparent in transgenic line SA-30-118a-5 (Fig. 3b), suggesting that significant amounts of certain omega-1,2 gliadins partition into the KCl-soluble fractions and that beta-amylase and protein disulfide isomerase are low abundance proteins. Within the alpha gliadin region of the gel (Fig. 3a, c), several spots in which either alpha gliadins (79) or alpha gliadins and GAPDH (83, 106, 189) were identified in the KCl fraction from Butte 86 were not apparent in transgenic line SA-35a-124j, suggesting that alpha gliadins comprise the bulk of the protein in these spots. Several other spots containing both alpha gliadin and GAPDH were present in similar amounts in Butte 86 and the transgenic line (101, 102, 103, 105, 190, 191), suggesting that most of the protein in these spots is GAPDH and that this protein is fairly abundant in the KCl fraction. Interestingly, GADPH was not identified in the total protein fraction in the Dupont et al. (2011) study [19].

Discussion
A number of non-gluten proteins that are abundant in the KCl-soluble protein fraction are well-separated from the major gluten proteins and can be identified unambiguously by MS/MS in a total SDS protein extract of Butte 86 flour [19]. These include the serpins, purinins and AAI. However, other non-gluten proteins are overshadowed by the more abundant gluten proteins. In 2-DE gels of total protein extracts, betaamylases are hidden by omega-1,2 gliadins, GAPDHs are masked by alpha gliadins, and peroxidases and some globulins are obscured by LMW-GS. In this study, non-gluten proteins were preferentially extracted from wheat flour using KCl and 57 different types of non-gluten proteins were identified, including 14 types that are known or suspected immunogenic proteins. Although most gluten proteins are insoluble in KCl, we observed some carry-over of the abundant gluten proteins in the KCl fraction. This was not unexpected and, in contrast to other studies [8][9][10][11], was likely noted because protein spots were digested separately with trypsin and chymotrypsin before analysis by MS/MS, thereby facilitating the identification of both non-gluten proteins that are readily digested with trypsin and gluten proteins that have few tryptic peptide cleavage sites. For example, 276 tryptic peptides were identified for beta-amylase in spot 27 and 62 tryptic peptides were identified for enolase in spot 38. The omega-1,2 gliadins in these two spots did not yield any tryptic peptides, but were identified on the basis of 81 and 41 chymotryptic peptides, respectively. Likewise, 72 tryptic peptides were identified for GAPDH in spot 101. Although the alpha gliadin in the same spot did not yield any tryptic peptides, it was identified by 24 chymotryptic peptides (Additional file 2). In general, protein spots that contained either gamma gliadins or LMW-GS were relatively minor while spots containing certain omega-1,2 gliadins or alpha gliadins were quite prominent in the KCl-soluble fraction. Interestingly, two spots found in the omega-1,2 gliadin region of the total protein fraction but not in the KCl-soluble fraction were identified previously as omega-1,2 gliadins that contain single cysteine residues [16]. These cysteine residues likely enable the proteins to be incorporated into glutenin polymers, rendering them insoluble in aqueous solutions even though their sequences are otherwise quite similar to omega-1,2 gliadins that partition into the KCl fraction. Indeed, in past experiments, these omega-1,2 gliadins were accumulated preferentially in the polymer fractions of Butte 86 wheat flour [24].
The availability of transgenic lines missing either the omega-1,2 gliadins or the alpha gliadins made it possible to estimate the abundance of specific non-gluten proteins that overlap with gluten proteins in the KClsoluble fraction. These analyses demonstrated that betaamylase and protein disulfide isomerase are low abundance proteins while GAPDH is a protein of moderate abundance.
Di Francesco et al. [25] recently used a shotgun approach to characterize the KCl-soluble proteins from old and modern durum wheats. In their study, proteins were prepared from mature kernels rather than milled flour and the protein extracts were digested only with trypsin. Analysis of the resulting spectral data used a database of protein sequences from Triticum, Oryza, Hordeum, Avena, Secale, Maize and Brachypodium downloaded from the UniProt database. Interestingly, only 21% of identified proteins were from Triticum, 10.8% from Hordeum and 1% from Secale. In the current study, proteins were digested individually with either trypsin or chymotrypsin and spectra were searched against a database containing all Triticeae sequences from NCBI along with a selection of sequences from Chinese Spring and Butte 86. Not surprisingly, many of the same types of proteins, including likely allergens, were identified in both studies. However, it is curious that Di Francesco et al. identified only one globulin in their study while nine different globulin sequences were identified in 26 2-DE spots in the current study. Likewise, Di Francesco et al. identified only three different serpins, while seven different sequences in 14 spots were found in the current study. They did, however, report a number of alpha and gamma gliadins as well as HMW-GS and LMW-GS in the KCl fraction, confirming the carryover of gluten proteins into this fraction that was also observed in our analysis. It is likely that they would have identified more gluten proteins in this fraction if proteins had been digested with both trypsin and chymotrypsin prior to MS analysis.

Conclusions
The proteomic map of KCl-soluble flour proteins generated in this study complements the map of total flour proteins from Butte 86 described by Dupont et al. [19] and serves as a reference map that, combined with 2-D immunoblots using sera from wellcharacterized patients, should provide new insight into non-gluten proteins that are responsible for immunogenic responses to wheat flour and lead to a  (d) and (e), respectively. Spots labeled in red in panel d were significantly reduced or absent in SA-35a-124j and are predominantly alpha gliadins that partition into the KCl fraction. Spots labeled in blue in panel (d) were also found in the KCl fraction of SA-35a-124j and contain mostly GAPDH better understanding of the complement of proteins associated with different types of allergies. Transgenic lines missing specific groups of gluten proteins should be of value in confirming whether reactivity with patient antibodies is due to non-gluten proteins or to gluten proteins that are carried over into the KClsoluble fraction. This information is critical for future efforts to develop wheat with reduced immunogenic potential.

Supplementary information
Supplementary information accompanies this paper at https://doi.org/10. 1186/s12953-020-00164-6.  Figure 2. For each spot, the predominant protein and all proteins that contained at least half the number of unique peptides as the predominant protein are highlighted in yellow. Proteins that contained less than half the number of unique peptides found in the predominant protein are highlighted in blue. Gluten proteins that contained less than half the number of unique peptides found in the predominant protein are highlighted in green.
Additional file 2. MS/MS data for individual protein spots shown in Figure 2. Scaffold data for all proteins for which the number of unique peptides was equal to or greater than half the number assigned to the predominant protein are shown.