Volume 10 Supplement 1

## Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Proteome Science

# Modeling regionalized volumetric differences in protein-ligand binding cavities

- Brian Y Chen
^{1}Email author and - Soutir Bandyopadhyay
^{2}

**10(Suppl 1)**:S6

https://doi.org/10.1186/1477-5956-10-S1-S6

© Chen and Bandyopadhyay; licensee BioMed Central Ltd. 2012

**Published: **21 June 2012

## Abstract

Identifying elements of protein structures that create differences in protein-ligand binding specificity is an essential method for explaining the molecular mechanisms underlying preferential binding. In some cases, influential mechanisms can be visually identified by experts in structural biology, but subtler mechanisms, whose significance may only be apparent from the analysis of many structures, are harder to find. To assist this process, we present a geometric algorithm and two statistical models for identifying significant structural differences in protein-ligand binding cavities. We demonstrate these methods in an analysis of sequentially nonredundant structural representatives of the canonical serine proteases and the enolase superfamily. Here, we observed that statistically significant structural variations identified experimentally established determinants of specificity. We also observed that an analysis of individual regions inside cavities can reveal areas where small differences in shape can correspond to differences in specificity.

## Background

Engineering or reverse engineering the molecular mechanisms that underlie specificity in protein-ligand binding is a crucial challenge in many fields. Understanding these mechanisms can explain, for example, why resistance occurs against certain drugs and not others [1], how we can mutate proteins to alter binding preferences [2], or how preferential binding in a few crucial molecules can control the organization of molecular and cellular environments [3]. The heart of this challenge lies in the fact that the mechanisms driving specificity are a product of multiple interacting components, such as amino acids [1] or cavity regions [4]. Fortunately, when the components involved in the mechanism are unknown, molecular structures can suggest testable possibilities, based on spatial proximity and biophysical principles.

One such principle relates to the shape of ligand binding cavities from families of closely related proteins. In such families, regions where cavities vary may cause differing substrates to bind. Similar regions might bind a molecular fragment that is common to substrates acted on by the entire family. This principle has been observed frequently, such as in the serine proteases, where binding cavities vary in size to better accommodate differently sized substrates [5], and in the enolase superfamily, where varying arrangements of amino acids around a common scaffold enable related but distinct reactions to be catalyzed [6–8]. Structural variations of this kind can sometimes be identified by visual inspection, but when many exist, or when they are very subtle, it can be unclear whether the variations found are significant enough to test experimentally as potential specificity determinants. Visual inspection is even harder when many structures must be considered, or when the flexibility of proteins must be taken into account.

*fragments*, that lie within one cavity and not within another (e.g. Figure 1h,i). A fragment is thus one of possibly several shape differences between two cavities, and it may create a difference in binding specificity. We hypothesize that most fragments, which are very small, do not influence specificity, and that fragments that are unusually large may be more influential.

Seeking to automatically isolate potential influences on specificity along these principles, we introduce two statistical models for evaluating fragments. The first model, referred to below as the "standard model", represents the volume of fragments that occur between binding cavities of proteins with identical specificity. This approach can identify fragments that are too large to be consistent with cavities having identical specificity, and, in our experimental results, we observed that it can thus isolate regions in cavities that influence specificity.

The second model, the "regionalized model", represents fragments between training set cavities that lie within a user-defined region. The regionalized model thus redefines statistical significance based on local differences in the training set: Small but statistically significant fragments can be isolated in regions where training set cavities hardly differ, while equally sized fragments might be statistically insignificant in regions where training set cavities differ wildly. In ligand design applications, where ligand skeletons can be altered at limited sites, the regionalized model might reveal local cavity differences that point to the design of a more selective inhibitor. Below, we demonstrate the capabilities of these models on binding cavities in the serine proteases and the enolase superfamily. Together, these models represent a comprehensive statistical framework for analyzing fragments between similar cavities.

### Related work

The solid representations of protein structures and binding cavities used in this work differ considerably from typical comparison algorithms, which typically employ point-based and surface-based representations. Point-based representations encode atoms in protein structures using points in three dimensions [9–13], matrices of distances between points [14], and nodes in geometric graphs [15, 16]. These representations are traditionally applied to rigidly superpose and align whole protein structures, but, more recently, flexible methods [17] have also emerged. A second type of point-based representation is specialized for the comparison of functional sites, using motifs in three dimensions that encode atoms in catalytic sites [18–20], evolutionarily significant amino acids [21], "pseudo-centers" representing protein-ligand interactions [22], and pseudoatoms representing amino acid sidechains [23]. Point-based methods exhibit extreme efficiency, enabling them to rapidly search for evolutionarily remote homologs [18, 19, 24] in large databases of protein structure [25], but they are not intended for isolating variations in empty cavity regions, like the methods presented here.

Surface-based representations employ closed surfaces or surface patches to represent or approximate solvent-accessible shape [26, 27]. These representations are built from triangular meshes [28, 29], alpha shapes [30–32], three dimensional grids [33], and spherical harmonics [34–36]. In some cases surface representations have been applied for the comparison of protein structures [28, 29] and electrostatic potentials [37], as well as in hybrid representations that combine point-based and surface-based information [38], but they have had widest application in the identification of cavities and hot spots [39] in protein surfaces [30, 40–42]. While surface-based methods identify and compare surface cavities, the work described here offers the new capability of isolating individual variations within cavities.

Volume-based concepts can be found in algorithms that compute molecular surfaces [43–47], in many techniques for identifying protein-ligand binding pockets (e.g. [40, 48–50]), and for representing electrostatic isopotentials [12]. Throughout, volume-based techniques have more visibly been applied for the visualization of protein structures, but not for their comparison. For example, slab-based visualizations, which render a protein in cross section, are a fixture of protein structure visualization tools like Pymol [51] and Rasmol [52]. Slab-based visualization uses rendering parameters such as the view frustrum and Goldfeather-like algorithms [53] to draw the slab, rather than explicitly computing the geometry of the region defined by the cross section. In contrast to existing work, techniques using Boolean Set operations to identify influences on specificity [54–56], are distinct in both methodology and application.

Statistical modeling plays a critical role in the unsupervised comparison of protein structures, especially in the identification of geometrically similar catalytic sites at remote evolutionary distances. In that application, statistical modeling enables the computation of data-specific thresholds to identify catalytic sites that are improbably similar, and thus potential markers of functional similarity. Several independent results, using Gaussian mixture models [19], extreme value distributions [57], nonparametric models [18], and empirical models [31, 58], have observed that statistically significant geometric similarity is an accurate marker of similar functional sites. In contrast to existing work, we introduce a new application for statistical modeling by first paraphrasing our standard model for cavity variation, described earlier [55], and then extending that model to represent local variations within a user defined region.

## Methods

In earlier work, we demonstrated that individual fragments could be automatically separated and that a statistical model of fragment volume could be used to identify fragments of unusually large (e.g. statistically significant) size [54]. We paraphrase this work here, for completeness. We have since extended this work by showing that we can restrict the training of our model on structural variations within a user defined cube. This regionalized approach enables our statistical model to vary it's significance thresholds based on the local variations of training set cavities in the region defined by the user.

### Identifying fragments

*A*(Figure 2a) and

*B*(Figure 2c), representing cavities from aligned protein structures. Using Boolean Set operations, we compute

*AB*, the region inside

*A*and not inside

*B*, as well as

*BA*, the region inside

*B*and not

*A*. Like

*A*and

*B*,

*AB*and

*BA*are geometric solids represented by closed triangular meshes. Fragments from

*AB*and

*BA*are, together, referred to as the set of fragments relating to cavities

*A*and

*B*(Figure 2b).

First, we translate the triangular mesh of *AB* and *BA* into a graph *G*, mapping corners to graph nodes, and triangle edges to graph edges. Since *G* is likely to have several connected components, we separate each connected component into an individual graph *G*
_{
i
} (Figure 2d). This can be accomplished through depth first search in linear time [59]. Each connected component does not necessarily represent an individual fragment, because fragments occasionally contain interior voids, as illustrated in Figure 2b, that are composed of multiple disconnected components.

Next, we determine which connected components reside within another connected component. This is accomplished with ray casting (Figure 2e). For each component *G*
_{
i
}, a ray, beginning at one point on *G*
_{
i
} is pointed in a random direction, and the number of intersections with other components is counted. If the ray intersects another component *G*
_{
j
} an even number of times, then we say that *G*
_{
i
} is not inside *G*
_{
j
}. Alternately, if the ray intersects *G*
_{
j
} an odd number of times, then we say that *G*
_{
i
} is inside *G*
_{
j
}. For all pairs *G*
_{
i
} and *G*
_{
j
}, we determine which contains the other.

Next, we represent the pattern of containment as a directed acyclic graph *H* (Figure 2f), where each node represents a graph *G*
_{
i
}, and an edge from *G*
_{
i
} to *G*
_{
j
} indicates that *G*
_{
j
} is within *G*
_{
i
} according to the test above. This graph is redundant, because all nested fragments pass the test, but only some nested fragments are part of the same fragment. Fortunately, using the topology of *H*, we can determine which *G*
_{
i
} are part of the same fragment: First, we identify subgraphs ${G}_{{i}_{k}}$ that are not contained inside any other graph, because their in-degree is zero. From each ${G}_{{i}_{k}}$, we perform a depth first search and assign an integer depth *d* to each *G*
_{
i
} considered. Since *H* is an acyclic graph, some *G*
_{
i
} may be visited more than once. In these cases, if the number of edges traversed from the originating ${G}_{{i}_{j}}$ to the current *G*
_{
i
} is greater than the depth assigned already, *d* is reassigned the larger value. This reassignment process determines the number of times each graph *G*
_{
i
} is nested within the entire group of connected components, since the largest possible depth reflects the actual number of times one graph is nested inside the others.

Finally, we separate *H* into subgraphs. Each *G*
_{
i
} with an even depth *d* is an exterior surface for one fragment. Based on the topology of *H*, each *G*
_{
i
} with an odd depth *d* resides inside an exterior surface with even depth equal to *d* − 1. Thus, we can associate the graphs *G*
_{
i
} into groups that are all part of the same fragment, and output the fragment. This correctly separates fragments of arbitrary nesting.

### The standard model of fragment volume

Our statistical model is based on a hypothesis testing framework that detects fragments with volume large enough to be statistically significant, i.e. unlikely to occur by random chance. The underlying assumption of our model is that fragments derived from cavities with no difference in specificity will have *small* volumes related to incidental and functionally irrelevant structural variation. Alternatively, if there exists a structural variation in one cavity large enough to create a steric influence on specificity, then the fragment generated by the variation between the cavities will have *unusually large* volume. Thus, for a query fragment *F* , based on cavities *A* and *B*, our null hypothesis asserts that the volume of *F*, *v*(*F* ), is *small*. The alternative hypothesis asserts that *v*(*F*) is *unusually large*. Since they are logical complements, exactly one of these hypotheses can hold for any fragment *F*.

We test the null hypothesis by first assuming that it holds for *F*, and then estimating the probability *p* of randomly observing another fragment *F*', with volume *v*(*F*') ≥ *v*(*F*). If the probability of randomly observing another fragment with larger volume is improbably low, typically below 0.05, then it is hard to continue assuming that *F* is small. In this circumstance, the null hypothesis is rejected as improbable, leaving us to favor the alternative hypothesis, that *F* is large. The biological interpretation of this decision follows from our underlying assumption: *F* is unusually large, and may thus be a structural variation in either *A* or *B* that creates a steric influence on specificity. This statement is a prediction based on quantified evidence, not a statement of fact.

In order to perform this prediction, we must estimate the probability *p*, which requires us to first train the statistical model. Training begins with aligned cavities from the training sets described in Section. First, we separate the fragments generated between all pairs of cavities using the method described in Section. Using the Surveyor's Formula [60], which provides a rapid and very accurate estimation of volume in a closed surface, we compute the volume of each fragment. These data are represented in a frequency distribution *D* (See Figure 5A). The shape of *D* closely fits a log-normal distribution, as seen in Section.

*D*fits

*log-normal*(

*µ*,

*σ*), we can use the log-normal distribution to smoothly estimate the probability

*p*of observing any a fragment

*F*', with volume greater than or equal to the volume of our query fragment

*v*(

*F*). This estimation occurs when we realize that the mean

*µ*and the variance

*σ*of the log-normal distribution are unknown: we estimate

*µ*and

*σ*with the mean $\stackrel{\u0304}{x}$ and variance

*s*from the distribution of log-transformed values of

*D*. We can thus estimate

*p*using the Equation in Figure 3.

*p*is the proportion of the volume under the log-normal curve to the right of

*v*(

*F*), relative to the total volume under the curve (

*x*≥ 0).

Fitting the log-normal function to *D* enables this probability to be estimated without the discretizing effect of the training data. Also, assuming that the log-normal distribution is a sufficiently accurate estimation of the underlying probability density function, we can extrapolate the probability beyond the largest volume observed in our training data. Such extrapolation would not be possible using nonparametric models, which have finite support. The accuracy of this extrapolation is illustrated in our results.

Having trained our statistical model on fragments derived from cavities with identical binding specificities, we hypothesize that our statistical model will behave as follows: Fragments generated between a cavity binding preferences similar to those of the training set and a cavity with different binding preferences are be expected to have a statistically significant fragment, if there exists a steric influence on specificity. Likewise, for two cavities having the same binding preferences as the training set, fragments generated between them are not expected to be statistically significant. We test this hypothesis in our experimental results.

### The regionalized model of fragment volume

Our regionalized model has the same theoretical foundation as the standard model, with some critical differences. Like the standard model, it is also based on a hypothesis testing framework for detecting improbably large fragments within a user-defined cube *g*. The null hypothesis asserts that a given fragment *F* within *g* has a *small* volume, and the alternative hypothesis asserts that *F* has an *unusually large* volume.

The fragments used to train the regionalized model are generated in the same way as in the standard model, except that the Boolean intersection of every fragment and *g* is computed after all fragments are generated. This extra step results in the elimination of many fragments that do not intersect *g*, and a reduction in the volume for fragments that are partially contained in *g*.

Like the standard model, the distribution *D* of volumes from fragments regionalized to *g* in this way are fit to a log-normal distribution. Given a fragment *F*, Equation 3 allows us to estimate the probability of observing a fragment with volume equal to or greater than those that typically occur inside *g* between training set cavities. This approach functions like that of the standard model, with one special case: It may be that *g* intersects no training set cavities. In such cases, when asked to estimate the *p*-value of a fragment in *g*, we assert categorically that they are significant, because the fragment relates to a difference in shape that is not reflected by any training set cavity in the same region.

### Data set construction

#### Protein families

The serine protease and the enolase superfamilies were selected for demonstrating our statistical models because several sequentially nonredundant structures exist for both superfamilies. Each superfamily contained at least three subfamilies with distinct binding preferences and at least two nonredundant structural representatives in each subfamily.

Serine proteases catalyze the hydrolysis of specific peptide bonds by recognizing neighboring amino acids with specificity subsites numbered *S* 4, *S* 3, . . . *S* 1, *S* 1^{'}, *S* 2^{'}, . . . , *S* 4^{'}. Each subsite preferentially binds one amino acid before or after the hydrolyzed bond between *S* 1 and *S* 1^{'}. Our demonstration, on three subfamilies, focuses on the *S1* subsite, which binds aromatics in chymotrypsins [61], positively charged amino acids in trypsins [62], and small hydrophobics in elastases [63].

Members of the enolase superfamily catalyze a variety of reactions that involve the abstraction of a proton from a carbon adjacent to a carboxylic acid [6]. Assisted by an N-terminal "capping domain" [64], amino acids at the C-terminal ends of beta sheets in a conserved TIM-barrel act as acid/base catalysts to facilitate several different reactions [6]. Our demonstration, on three subfamilies, is focused on the primary catalytic site, which facilitates the dehydration of 2-phospho-D-glycerate to phosphoenolpyruvate in enolase, [65], the conversion of (R)-mandelate to and from (S)-mandelate [66] in mandelate racemase, and reciprocal cycloisomerization of cis,cis-muconate and muconolactone in muconate-lactonizing enzyme [6].

#### Selection

#### Alignment

Ska [13], an algorithm for whole-protein structure alignment, was used to align all serine protease structures to bovine gamma-chymotrypsin (pdb code: 8gch), and all enolase superfamily structures to mandelate racemase from pseudomonas putida (pdb code: 1mdr). Since proteins in these datasets have identical folds, alignments to a different structure has little effect: This was observed earlier [54], where cavity comparisons, recomputed with the same method, generated identical results. Beginning with this alignment, solid geometric representations of binding cavities were generated with a method described earlier [54], based on cavities defined in SCREEN [40].

#### Cavity preparation

Solid geometric representations of binding cavities were generated with a method described earlier [54], and paraphrased here for convenience. First, for each of the aligned structures, using GRASP2 [12], which applies the classical rolling-probe technique [27], we compute a molecular surface, with a water-sized probe of radius 1.4 Å, and an "envelope surface" with a probe of radius 5.0 Å. The radius of the larger probe is based on an external cavity boundary used in SCREEN [40]. Second, spheres with a radius of 5 Å, are centered at atoms bound in the binding sites of 8gch and 1mdr. In the case of 8gch, these are a tryptophan amino acid and five waters in the S1 subsite [67], and in the case of 1mdr, these are are the heavy atoms of a bound atrolactic acid molecule [68]. Since all structures are aligned to either 8gch or 1mdr, the 5 Åspheres completely fill the now-superposed binding sites of all structures in both sets. Third, we compute the Boolean union of the spheres in 8gch, and, separately, the Boolean union for spheres in 1mdr. The remainder of the procedure is performed identically for each member of the serine protease set and the enolase set, using the corresponding sphere union: We compute the boolean difference between the sphere union and the molecular surface, and then the intersection between the resulting difference region and the molecular envelope. The result is a geometric solid representing the binding cavity.

## Results

### Validating the standard model

We constructed a statistical model based on all trypsin cavities and a second based on all enolase cavities. The distribution of fragment volumes between trypsin and enolase cavities is illustrated in Figure 5A. Fragments with volumes near zero dominated, though both distributions exhibited a positive tail. Seeking the best fitting parametric model, we tested gamma, Weibull, Pareto, generalized extreme value, and log-normal distributions.

### Calculating fragment significance

*p*-value. This process was repeated once each trypsin and enolase cavity, producing 40 trypsin fragments and 36 enolase fragments (Figure 6, dark blue).

We also calculated the statistical significance of fragments that exist between cavities with the same binding specificities. In a leave-two-out experiment, we first trained the statistical model with all but two trypsins or elastases. With the remaining two trypsins or elastases we computed the *p*-value of all fragments. This process was repeated for every combination of two members in the trypsin and enolase sets, producing 1893 trypsin fragments and 340 enolase fragments (Figure 6, light red).

The largest fragments from cavities with different binding preferences were always statistically significant, following the standard 0.05 threshold of statistical significance. By the same standard, fragments from cavities with identical binding preferences were rarely significant, exhibiting widely distributed *p*-values.

### Verifying fragment function

^{3}, and is the largest of several differences between these cavities. The position of the fragment highlights a region in the trypsin cavity that extends deeper than the elastase cavity. This region is essential for accommodating the longer, positively charged substrates preferred by trypsins [62]. A modeled Gly-Ala-Arg peptide illustrates this point in Figure 7. Much like this example, similar significant fragments could be found between all trypsin and elastase cavities, as well as between trypsin and chymotrypsin cavities. A second class of related effects were observed in enolase cavities, where sidechains protruding from different parts of the conserved beta-barrel scaffold created differences in cavity shape that accommodate different catalytic reactions.

### Validating the regionalized statistical model

To establish the best fitting model, we tested gamma, Weibull, Pareto, generalized extreme value, and log-normal distributions as parametric models of the distribution of fragment volumes inside a user defined region *g*. To test a range of regions, we generated lattices of 252 and 125 cubes, with side-lengths of 5 Å, that generously surrounded the aligned trypsin cavities and enolase cavities, respectively. Due mainly to the wide margins used to surround the training cavities, 229 and 103 cubes surrounding the trypsin and enolase training sets, respectively, did not intersect with fragments from the training set.

### Statistically significant regional fragments influence specificity

Beginning with the lattices generated in the previous section, we trained our regional statistical models on the cavities of nine of the ten trypsins (all but human trypsin, pdb: 1h4w) and five of the six enolases (all but enolase from Toxoplasma gondii, pdb: 3otr). This created regional models corresponding to 252 trypsin lattice cubes and 125 enolase latice cubes, though most were trivial, as mentioned earlier. In this case, lattices simulate a wide range of possible user inputs. The cavities of 1h4w and 3otr were used for fragment generation with the non-trypsin serine proteases and the non-enolase Enolase superfamily members, respectively. The intersection of these fragments with each of the lattice cubes was determined, and their statistical significance within each regional model was estimated.

Every statistically significant fragment identified by a regional model was a part of a larger difference in cavity structure that is related to a difference in binding specificity. Categorically significant fragments plainly distinguished the major structural differences between cavities with different binding preferences. Among enolases, the variations between the mandelate racemases and 3otr exhibited several regions of this nature, occupying approximately 27 Å^{3}. These differences in cavity shape were caused by the different placement of amino acids surrounding the binding site. A similar effect could also be seen in the fragments between chymotrypsin and trypsin cavities, where the added depth of chymotrypsin S1 cavities, used to bind large hydrophobic side chains, led to significant variations within cubes in that region [61]. Overall, categorically significant fragments generally revealed the same observations as those made with our standard model.

*p*-values that differed considerably from the standard model and from other regions. For example, one 55 Å

^{3}fragment (Figure 9D) in a cube intersecting the S1 subsite of both 1h4w and atlantic salmon elastase (Figure 9C) was assigned the

*p*-value .02, while a much smaller 17 Å

^{3}fragment (Figure 9F) in another cube (Figure 9E) received a similar

*p*-value. Both fragments are in regions that trypsin requires for recognizing larger amino acids, but the difference in volume indicates that structural variability in trypsins is considerably larger in the first cube relative to the second cube.

## Conclusions

We have presented a computational method for generating fragments and two statistical models for estimating the significance of fragment volume. These methods represent the first algorithms capable of separating individual differences in cavity shape, and also the first to measure their statistical significance, creating a new strategy for identifying influences on protein-ligand binding specificity.

After verifying the choice of distributions for our statistical model, we used our standard model to identify statistically significant fragments between serine protease and enolase cavities. In both cases, the largest fragment between cavities with different binding preferences was always statistically significant, while all fragments between cavities with identical binding preferences were rarely so. By identifying differences in binding cavity shape that are too large to have randomly occurred between cavities with identical binding preferences, this approach predicts cavity regions that drive different binding preferences.

We verified the accuracy of some of these predictions by relating them to experimentally established observations, where possible. In both serine protease and enolase datasets, the most statistically significant fragment was frequently a difference in binding cavity geometry that enabled the accommodation of differently shaped substrates. While other physical phenomena (e.g. electrostatics [62]) are known to influence specificity in both datasets, statistically significant fragments remained strong markers of structural influences on specificity. On other data sets, variations in shape may not be as strongly correlated with specificity. This possibility points to potentials for future work.

Using our regionalized model, we observed that statistically significant fragments in different regions could have very different volumes. These observations indicate that variations between cavities are considerably larger in some regions than in others, and that we can identify such regions. From an applied perspective, this approach could be used to identify regions where small differences in ligand shape could lead to altered or more selective binding, and other regions larger variations in ligand shape do not affect specificity.

## Declarations

### Acknowledgements

The authors sincerely thank Viacheslav Y. Fofanov for critical discussions. This work was supported in part by start up funds from Lehigh University.

This article has been published as part of *Proteome Science* Volume 10 Supplement 1, 2012: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Proteome Science. The full contents of the supplement are available online at http://www.proteomesci.com/supplements/10/S1.

## Authors’ Affiliations

## References

- Liu Y, Bishop A, Witucki L, Kraybill B, Shimizu E, Tsien J, Ubersax J, Blethrow J, Morgan DO, Shokat KM:
**Structural basis for selective inhibition of Src family kinases by PP1.***Chem Biol*1999,**6**(9):671–678. 10.1016/S1074-5521(99)80118-5PubMedView ArticleGoogle Scholar - Hedstrom L:
**Serine Protease Mechanism and Specificity.***Chem Rev*2002,**102**(12):4501–24. 10.1021/cr000033xPubMedView ArticleGoogle Scholar - Patel SD, Ciatto C, Chen CP, Bahna F, Rajebhosale M, Arkus N, Schieren I, Jessell TM, Honig B, Price SR, Shapiro L:
**Type II cadherin ectodomain structures: implications for classical cadherin specificity.***Cell*2006,**124**(6):1255–68. 10.1016/j.cell.2005.12.046PubMedView ArticleGoogle Scholar - Musah RA, Jensen GM, Bunte SW, Rosenfeld RJ, Goodin DB:
**Artificial protein cavities as specific ligand-binding templates: characterization of an engineered heterocyclic cation-binding site that preserves the evolved specificity of the parent protein.***J Mol Biol*2002,**315**(4):845–57. 10.1006/jmbi.2001.5287PubMedView ArticleGoogle Scholar - Shotton D, Watson H:
**The three-dimensional structure of porcine pancreatic elastase.***Philos Trans R Soc Lond B Biol Sci*1970,**257**(813):111–118. 10.1098/rstb.1970.0013PubMedView ArticleGoogle Scholar - Babbitt PC, Hasson MS, Wedekind JE, Palmer DR, Barrett WC, Reed GH, Rayment I, Ringe D, Kenyon GL, Gerlt JA:
**The enolase superfamily: a general strategy for enzyme-catalyzed abstraction of the alpha-protons of carboxylic acids.***Biochemistry*1996,**35**(51):16489–501. 10.1021/bi9616413PubMedView ArticleGoogle Scholar - Hasson MS, Schlichting I, Moulai J, Taylor K, Barrett WC, Kenyon GL, Babbitt PC, Gerlt JA, Petsko GA, Ringe D:
**Evolution of an enzyme active site: the structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase.***Proc Natl Acad Sci USA*1998,**95**(18):10396–401. 10.1073/pnas.95.18.10396PubMed CentralPubMedView ArticleGoogle Scholar - Gerlt JA, Babbitt PC:
**Divergent evolution of enzymatic function: mechanistically diverse superfamilies and functionally distinct suprafamilies.***Annu Rev Biochem*2001,**70:**209–46. 10.1146/annurev.biochem.70.1.209PubMedView ArticleGoogle Scholar - Nussinov R, Wolfson HJ:
**Efficient detection of three-dimensional structural motifs in biological macro-molecules by computer vision techniques.***Proc Natl Acad Sci USA*1991,**88**(23):10495–9. 10.1073/pnas.88.23.10495PubMed CentralPubMedView ArticleGoogle Scholar - Orengo CA, Taylor WR:
**SSAP: Sequential Structure Alignment Program for Protein Structure Comparison.***Methods Enzymol*1996,**266:**617–635.PubMedView ArticleGoogle Scholar - Shindyalov IN, Bourne PE:
**Protein structure alignment by incremental combinatorial extension (CE) of the optimal path.***Protein Eng*1998,**11**(9):739–47. 10.1093/protein/11.9.739PubMedView ArticleGoogle Scholar - Petrey D, Honig B:
**GRASP2: visualization, surface properties, and electrostatics of macromolecular structures and sequences.***Methods Enzymol*2003,**374**(1991):492–509.PubMedView ArticleGoogle Scholar - Yang AS, Honig B:
**An integrated approach to the analysis and modeling of protein sequences and structures. I. Protein structural alignment and a quantitative measure for protein structural distance.***J Mol Biol*2000,**301**(3):665–78. 10.1006/jmbi.2000.3973PubMedView ArticleGoogle Scholar - Holm L, Sander C:
**Mapping the protein universe.***Science*1996,**273**(5275):595–603. 10.1126/science.273.5275.595PubMedView ArticleGoogle Scholar - Gibrat JF, Madej T, Bryant SH:
**Surprising similarities in structure comparison.***Curr Opin Struct Biol*1996,**6**(3):377–85. 10.1016/S0959-440X(96)80058-3PubMedView ArticleGoogle Scholar - Xie L, Bourne PE:
**Detecting evolutionary relationships across existing fold space, using sequence order-independent profile-profile alignments.***Proc Natl Acad Sci USA*2008,**105**(14):5441–6. 10.1073/pnas.0704422105PubMed CentralPubMedView ArticleGoogle Scholar - Shatsky M, Nussinov R, Wolfson HJ:
**FlexProt: alignment of flexible protein structures without a predefinition of hinge regions.***J Comput Biol*2004,**11:**83–106. 10.1089/106652704773416902PubMedView ArticleGoogle Scholar - Chen BY, Fofanov VY, Bryant DH, Dodson BD, Kristensen DM, Lisewski AM, Kimmel M, Lichtarge O, Kavraki LE:
**The MASH pipeline for protein function prediction and an algorithm for the geometric refinement of 3D motifs.***Journal of Computational Biology*2007,**14**(6):791–816. 10.1089/cmb.2007.R017PubMedView ArticleGoogle Scholar - Barker JA, Thornton JM:
**An algorithm for constraint-based structural template matching : application to 3D templates with statistical analysis.***Bioinformatics*2003,**19**(13):1644–1649. 10.1093/bioinformatics/btg226PubMedView ArticleGoogle Scholar - Russell RB:
**Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution.***J Mol Biol*1998,**279**(5):1211–27. 10.1006/jmbi.1998.1844PubMedView ArticleGoogle Scholar - Chen BY, Fofanov VY, Kristensen DM, Kimmel M, Lichtarge O, Kavraki LE:
**Algorithms for structural comparison and statistical analysis of 3D protein motifs.***Pac Symp Biocomput*2005, 334–45.Google Scholar - Shatsky M, Shulman-peleg A, Nussinov R, J H:
**Recognition of Binding Patterns Common to a Set of Protein Structures.***Lect Notes Comput Sc*2005,**3500:**440–455. 10.1007/11415770_33View ArticleGoogle Scholar - Schmitt S, Kuhn D, Klebe G:
**A New Method to Detect Related Function Among Proteins Independent of Sequence and Fold Homology.***J Mol Biol*2002,**323**(2):387–406. 10.1016/S0022-2836(02)00811-2PubMedView ArticleGoogle Scholar - Moll M, Bryant DH, Kavraki LE:
**The LabelHash algorithm for substructure matching.***BMC Bioinformatics*2010,**11:**555. 10.1186/1471-2105-11-555PubMed CentralPubMedView ArticleGoogle Scholar - Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE:
**The Protein Data Bank.***Nucleic Acids Res*2000,**28:**235–42. 10.1093/nar/28.1.235PubMed CentralPubMedView ArticleGoogle Scholar - Lee B, Richards FM:
**The interpretation of protein structures: estimation of static accessibility.***J Mol Biol*1971,**55**(3):379–400. 10.1016/0022-2836(71)90324-XPubMedView ArticleGoogle Scholar - Connolly M:
**Solvent-accessible surfaces of proteins and nucleic acids.***Science*1983,**221**(4612):709–713. 10.1126/science.6879170PubMedView ArticleGoogle Scholar - Rosen M, Lin SL, Wolfson H, Nussinov R:
**Molecular shape comparisons in searches for active sites and functional similarity.***Protein Eng*1998,**11**(4):263–77. 10.1093/protein/11.4.263PubMedView ArticleGoogle Scholar - Kinoshita K, Nakamura H:
**Identification of the ligand binding sites on the molecular surface of proteins.***Protein Sci*2005,**14:**711–718. 10.1110/ps.041080105PubMed CentralPubMedView ArticleGoogle Scholar - Binkowski TA:
**CASTp: Computed Atlas of Surface Topography of proteins.***Nucleic Acids Res*2003,**31**(13):3352–3355. 10.1093/nar/gkg512PubMed CentralPubMedView ArticleGoogle Scholar - Binkowski TA, Adamian L, Liang J:
**Inferring Functional Relationships of Proteins from Local Sequence and Spatial Surface Patterns.***J Mol Biol*2003,**332**(2):505–526. 10.1016/S0022-2836(03)00882-9PubMedView ArticleGoogle Scholar - Binkowski TA, Joachimiak A:
**Protein functional surfaces: global shape matching and local spatial alignments of ligand binding sites.***BMC Struct Biol*2008,**8:**45. 10.1186/1472-6807-8-45PubMed CentralPubMedView ArticleGoogle Scholar - Laskowski RA:
**SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions.***J Mol Graph*1995,**13**(5):323–30, 307–8. 10.1016/0263-7855(95)00073-9PubMedView ArticleGoogle Scholar - Ritchie DW, Kemp GJL:
**Fast computation, rotation, and comparison of low resolution spherical harmonic molecular surfaces.***J Comput Chem*1999,**20**(4):383.. 10.1002/(SICI)1096-987X(199903)20:4<383::AID-JCC1>3.0.CO;2-MView ArticleGoogle Scholar - Kazhdan M, Funkhouser T, Rusinkiewicz S:
**Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors.***European Symposium on Geometry Processing 2003*2003.Google Scholar - Kahraman A, Morris RJ, Laskowski Ra, Thornton JM:
**Shape variation in protein binding pockets and their ligands.***J Mol Biol*2007,**368:**283–301. 10.1016/j.jmb.2007.01.086PubMedView ArticleGoogle Scholar - Zhang X, Bajaj CL, Kwon B, Dolinsky TJ, Nielsen JE, Baker NA:
**Application of new multi-resolution methods for the comparison of biomolecular electrostatic properties in the absense of global structural similarity.***Multiscale Model Simul*2006,**5**(4):1196–1213. 10.1137/050647670PubMed CentralPubMedView ArticleGoogle Scholar - Chen BY, Bryant DH, Fofanov VY, Kristensen DM, Cruess AE, Kimmel M, Lichtarge O, Kavraki LE:
**Cavity-aware motifs reduce false positives in protein function prediction.***Proceedings of the 2006 IEEE Computational Systems Bioinformatics Conference (CSB 2006)*2006, 311–23.View ArticleGoogle Scholar - Bogan AA, Thorn KS:
**Anatomy of hot spots in protein interfaces.***J Mol Biol*1998,**280:**1–9. 10.1006/jmbi.1998.1843PubMedView ArticleGoogle Scholar - Nayal M, Honig B:
**On the Nature of Cavities on Protein Surfaces : Application to the Identification of Drug-Binding Sites.***Proteins*2006,**63:**892–906. 10.1002/prot.20897PubMedView ArticleGoogle Scholar - Glaser F, Morris RJ, Najmanovich RJ, Laskowski Ra, Thornton JM:
**A method for localizing ligand binding pockets in protein structures.***Proteins*2006,**62**(2):479–88.PubMedView ArticleGoogle Scholar - Coleman RG, Sharp KA:
**Travel depth, a new shape descriptor for macromolecules: application to ligand binding.***J Mol Biol*2006,**362**(3):441–58. 10.1016/j.jmb.2006.07.022PubMedView ArticleGoogle Scholar - Lee B, Richards F:
**The interpretation of protein structures: estimation of static accessibility.***Journal of molecular biology*1971,**55**(3):379–400. 10.1016/0022-2836(71)90324-XPubMedView ArticleGoogle Scholar - Connolly M:
**Solvent-accessible surfaces of proteins and nucleic acids.***Science*1983,**221:**709–713. 10.1126/science.6879170PubMedView ArticleGoogle Scholar - Liang J, Edelsbrunner H, Woodward C, Liang JIE, Woodward C:
**Anatomy of protein pockets and cavities: Measurement of binding site geometry and implications for ligand design Anatomy of protein pockets and cavities : Measurement of binding site geometry and implications for ligand design.***Protein Sci*1998,**7:**1884–1897. 10.1002/pro.5560070905PubMed CentralPubMedView ArticleGoogle Scholar - Bajaj CL, Xu G, Zhang Q:
**A Fast Variational Method for the Construction of Resolution Adaptive C**^{ 2 }**-Smooth Molecular Surfaces.***Comput Methods Appl Mech Eng*2009,**198**(21):1684–90. 10.1016/j.cma.2008.12.042PubMed CentralPubMedView ArticleGoogle Scholar - Zhao J, Dundas J, Kachalo S, Ouyang Z, Liang J:
**Accuracy of functional surfaces on comparatively modeled protein structures.***J Struct Funct Genomics*2011,**12**(2):97–107. 10.1007/s10969-011-9109-zPubMed CentralPubMedView ArticleGoogle Scholar - Laskowski R:
**SURFNET: a program for visualizing molecular surfaces, cavities, and intermolecular interactions.***Journal of molecular graphics*1995,**13**(5):323–30. 10.1016/0263-7855(95)00073-9PubMedView ArticleGoogle Scholar - Kleywegt G, Jones T:
**Detection, delineation, measurement and display of cavities in macromolecular structures.***Acta Crystallogr D Biol Crystallogr*1994,**50**(Pt 2):178–185.PubMedView ArticleGoogle Scholar - Chen BY, Bryant DH, Fofanov VY, Kristensen DM, Cruess AE, Kimmel M, Lichtarge O, Kavraki LE:
**Cavity scaling: automated refinement of cavity-aware motifs in protein function prediction.***J Bioinform Comput Biol*2007,**5**(2a):353–82. 10.1142/S021972000700276XPubMedView ArticleGoogle Scholar - DeLano WL:
*The PyMOL Molecular Graphics System*. 2002.Google Scholar - Sayle RA, Milner-White EJ:
**RASMOL : biomolecular graphics for all.***Trends in Biochemical Sciences*1995,**20**(9):374–376. 10.1016/S0968-0004(00)89080-5PubMedView ArticleGoogle Scholar - Goldfeather J, Hultquist JPM:
**Fast constructive solid geometry display in the Pixel-Powers Graphics System.***Proceedings of the 13th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '86)*1986,**20:**107–116.View ArticleGoogle Scholar - Chen BY, Honig B:
**VASP: A Volumetric Analysis of Surface Properties Yields Insights into Protein-Ligand Binding Specificity.***PLoS Comput Biol*2010,**6**(8):e1000881. 10.1371/journal.pcbi.1000881PubMed CentralPubMedView ArticleGoogle Scholar - Chen B, Bandyopadhyay S:
**VASP-S: A Volumetric Analysis and Statistical Model for Predicting Steric Influences on Protein-Ligand Binding Specificity.***Proceedings of 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)*2011, 22–9.View ArticleGoogle Scholar - Chen B, Bandyopadhyay S:
**A Statistical Model of Overlapping Volume in Ligand Binding Cavities.***Proceedings of the Computational Structural Bioinformatics Workshop (CSBW 2011)*2011, 424–31.Google Scholar - Stark A, Sunyaev S, Russell RB:
**A Model for Statistical Significance of Local Similarities in Structure.***J Mol Biol*2003,**326:**1307–1316. 10.1016/S0022-2836(03)00045-7PubMedView ArticleGoogle Scholar - Polacco BJ, Babbitt PC:
**Automated Discovery of 3D Motifs for Protein Function Annotation.***Bioinformatics*2006,**22**(6):723–730. 10.1093/bioinformatics/btk038PubMedView ArticleGoogle Scholar - Cormen TH, Leiserson CE, Rivest RL:
**Introduction to Algorithms, Second Edition.***Computer*2001,**7**(9):984.Google Scholar - Schaer J, Stone M: Face traverses and a volume algorithm for polyhedra. Lect Notes Comput Sc 1991, (555/1991):290–297.Google Scholar
- Morihara K, Tsuzuki H:
**Comparison of the specificities of various serine proteinases from microorganisms.***Arch Biochem Biophys*1969,**129**(2):620–634. 10.1016/0003-9861(69)90223-9PubMedView ArticleGoogle Scholar - Gráf L, Jancsó a, Szilágyi L, Hegyi G, Pintér K, Náray-Szabó G, Hepp J, Medzihradszky K, Rutter WJ:
**Electrostatic complementarity within the substrate-binding pocket of trypsin.***Proc Natl Acad Sci USA*1988,**85**(14):4961–5. 10.1073/pnas.85.14.4961PubMed CentralPubMedView ArticleGoogle Scholar - Berglund GI, Smalas AO, Outzen H, Willassen NP:
**Purification and characterization of pancreatic elastase from North Atlantic salmon (Salmo salar).***Mol Mar Biol Biotechnol*1998,**7**(2):105–14.PubMedGoogle Scholar - Rakus JF, Fedorov AA, Fedorov EV, Glasner ME, Hubbard BK, Delli JD, Babbitt PC, Almo SC, Gerlt JA:
**Evolution of enzymatic activities in the enolase superfamily: L-rhamnonate dehydratase.***Biochemistry*2008,**47**(38):9944–54. 10.1021/bi800914rPubMed CentralPubMedView ArticleGoogle Scholar - Kühnel K, Luisi BF:
**Crystal structure of the Escherichia coli RNA degradosome component enolase.***J Mol Biol*2001,**313**(3):583–92. 10.1006/jmbi.2001.5065PubMedView ArticleGoogle Scholar - Schafer SL, Barrett WC, Kallarakal AT, Mitra B, Kozarich JW, Gerlt JA, Clifton JG, Petsko GA, Kenyon GL:
**Mechanism of the reaction catalyzed by mandelate racemase: structure and mechanistic properties of the D270N mutant.***Biochemistry*1996,**35**(18):5662–9. 10.1021/bi960174mPubMedView ArticleGoogle Scholar - Harel M, Su CT, Frolow F, Silman I, Sussman J:
**y-Chymotrypsin Is a Complex of a-Chymotrypsin with Its Own Autolysis.***Biochemistry*1991,**30:**5217–5225. 10.1021/bi00235a015PubMedView ArticleGoogle Scholar - Landro JA, Gerlt JA, Kozarich JW, Koo CW, Shah VJ, Kenyon GL, Neidhart DJ, Fujita S, Petsko GA:
**The role of lysine 166 in the mechanism of mandelate racemase from Pseudomonas putida: mechanistic and crystallographic evidence for stereospecific alkylation by (R)-alpha-phenylglycidate.***Biochemistry*1994,**33**(3):635–643. 10.1021/bi00169a003PubMedView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.