The role of electrostatic energy in prediction of obligate protein-protein interactions
© Maleki et al; licensee BioMed Central Ltd. 2013
Published: 7 November 2013
Prediction and analysis of protein-protein interactions (PPI) and specifically types of PPIs is an important problem in life science research because of the fundamental roles of PPIs in many biological processes in living cells. In addition, electrostatic interactions are important in understanding inter-molecular interactions, since they are long-range, and because of their influence in charged molecules. This is the main motivation for using electrostatic energy for prediction of PPI types.
We propose a prediction model to analyze protein interaction types, namely obligate and non-obligate, using electrostatic energy values as properties. The prediction approach uses electrostatic energy values for pairs of atoms and amino acids present in interfaces where the interaction occurs. The main features of the complexes are found and then the prediction is performed via several state-of-the-art classification techniques, including linear dimensionality reduction (LDR), support vector machine (SVM), naive Bayes (NB) and k-nearest neighbor (k-NN). For an in-depth analysis of classification results, some other experiments were performed by varying the distance cutoffs between atom pairs of interacting chains, ranging from 5Å to 13Å. Moreover, several feature selection algorithms including gain ratio (GR), information gain (IG), chi-square (Chi2) and minimum redundancy maximum relevance (mRMR) are applied on the available datasets to obtain more discriminative pairs of atom types and amino acid types as features for prediction.
Our results on two well-known datasets of obligate and non-obligate complexes confirm that electrostatic energy is an important property to predict obligate and non-obligate protein interaction types on the basis of all the experimental results, achieving accuracies of over 98%. Furthermore, a comparison performed by changing the distance cutoff demonstrates that the best values for prediction of PPI types using electrostatic energy range from 9Å to 12Å, which show that electrostatic interactions are long-range and cover a broader area in the interface. In addition, the results on using feature selection before prediction confirm that (a) a few pairs of atoms and amino acids are appropriate for prediction, and (b) prediction performance can be improved by eliminating irrelevant and noisy features and selecting the most discriminative ones.
Gene expression, cell growth, proliferation, signal transduction, cellular motion and gene regulation are some of the essential biological processes in living cells which are controlled by proteins . As a consequence of this, more attention has been drawn to this field of study, in particular, for identification and analysis of interacting proteins and their relevant properties [2, 3]. Proteins bind to each other, creating protein-protein interactions (PPIs) through a combination of hydrophobic bonding, van der Waals forces and salt bridges. The strength of these interactions may depend on the size of the binding interface which can be large surfaces, small binding clefts or even a few peptides.
Prediction of PPI types is one of the main challenges when studying protein interactions. There are different types of PPIs and their associated prediction problems, including homo vs. hetero-oligomers based on the similarities between sub-units , dimers vs. trimers based on the number of interacting sub-units, transient vs. permanent based on the duration of the interaction  and obligate vs. non-obligate based on the stability of the complex [6–9]. Despite obligate and permanent interactions, which are more stable and last for a longer period of time, studying non-obligate and transient interactions is a very difficult problem, because of their instability and short life . We focus on distinguishing between obligate and non-obligate complexes.
Using relevant features or observed properties of protein complexes is essential in performing accurate predictions. As a consequence of this, previous studies in PPI have considered a wide range of relevant properties that can be used for PPI prediction including geometric properties , recognition of sites , conservation of residues present in the surface of PPIs [13, 14], hydrogen bonds and salt bridges on the surface of the proteins , solvent accessibility [6, 15], hydrophobicity [8, 16], sequence-based features , desolvation energy [18–20] and recently, electrostatic energy . Electrostatic interactions are one of three types of non-covalent interactions, which occur between electrically charged atoms having both positive and negative interactions . Non-covalent interactions are very common between macromolecules such as proteins. Van der Waal interactions, which occur between any pair of charged atoms that are close to each other, and non-polar interactions, which occur between atoms that do not have any charge, are other two types of non-covalent interactions.
In previous studies, it has been claimed that only a few highly conserved residues are important for protein interactions [23–25]. Moreover, removing irrelevant and redundant features not only can decrease the computational burden, but also may increase the prediction performance . These are the main tasks carried out by specialized machine learning algorithms for feature selection and classification. In this regard, automatic feature selection algorithms have been used in many biological problems such as prediction of tyrosine sulfation and lysine ubiquitination [27, 28], prediction of protein-protein interactions [25, 29], protein-nucleic acid interactions , gene selection [31, 32] and gene expression . In this study, a few feature selection methods, including gain ratio (GR), information gain (IG), chi-square (Chi2) and minimum redundancy maximum relevance (mRMR), are applied to score and rank features based on their relevance, and select the top ranked features for prediction of obligate and non-obligate PPIs.
In one of our recent works , a model to predict obligate and non-obligate protein interaction types has been presented in which electrostatic energy values for both atom and amino acid pairs present in the interface were considered as the input features of the classifiers. Linear dimensionality reduction (LDR) and a support vector machine (SVM) were applied as the classifiers to predict these types. The prediction results of that study for two well-known datasets, referred to as the ZH  and MW  datasets, show an impressive accuracy in prediction. For the ZH dataset, an accuracy of 96.18% was achieved by using SVM and electrostatic energy values of amino acid type features, which is much higher than the accuracy obtained by using six interface properties including interface area, interface area ratio, conservation score and gap volume index of NOXClass  with 88.52% prediction accuracy (as reported by the authors), 46 solvent accessible and interface area properties of  with 81.83% prediction accuracy, 210 features of solvent accessible area of  with 92.20% prediction accuracy, and even higher than 210 desolvation energy values for amino acid type features of  with 83.21% prediction accuracy. Similarly, applying the proposed scheme on the MW dataset demonstrates that using electrostatic energy values of amino acid type features (95.38% prediction accuracy for SVM) is better than using the four interface features as in  (77.96% prediction accuracy), and also better than using 210 desolvation energy properties as in  (78.83% prediction accuracy). Generally, the results reported in our previous study  implied an increase of at least 5% in prediction performance from previous approaches.
This paper is an extension of the work presented in  by incorporating a wider range of classification techniques that include LDR, SVM, naive Bayes (NB) and k-nearest neighbor (k-NN). Distance cutoff selection approaches are also used for analysis of long-range interactions (ranging from 5Å to 13Å), and feature selection algorithms for identifying relevant physicochemical properties of interacting pairs of atoms and amino acids, including GR, IG, Chi2 and mRMR, and an extended visual analysis. The results confirm that electrostatic energy with distance cutoffs ranging from 9Å to 12Å is the best property to predict obligate and non-obligate PPIs on the basis of the experimental results using different classification methods and different distance cutoffs on two well-known datasets. This is due the fact that using electrostatic energy with a long distance cutoff, atoms on the surface and some atoms buried under the surface may participate in the prediction that lead to excellent classification performance. In fact, the latter is a problem that opens an interesting research avenue in the field. Furthermore, using LDR as the classification scheme, we demonstrate that prediction results are improved by applying feature selection and identifying more relevant and discriminative features, while removing redundant and noisy ones for the two datasets.
In this study, we have used the same datasets as those used in [18, 25]. The first dataset, referred to as the ZH dataset, was obtained from the study of Zhu et al. . It originally contained 62 non-obligate and 75 obligate complexes. Since the electrostatic energy values of some complexes (1cc0 A:E, 1qbk B:C, 1b8a A:B, 1cli A:B, 1qav A:B, 1bkd R:S and 1nse A:B) cannot be computed, they were removed from the ZH dataset. The second dataset, referred to as the MW dataset, was obtained from the study of Mintseris et al. , and originally contained 209 non-obligate and 115 obligate complexes. Similarly, 24 complexes of the original dataset (1b7y A:B, 1be3 CDEGK:A, 1jb0 AB:C, 1jb0 AB:D, 1jb0 AB:E, 1jro A:BD, 1jv2 A:B, 1k28 A:D, 1kqf A:B, 1ldj A:B, 1m2v A:B, 1mjg AB:M, 1nbw AC:B, 1prc C:HLM, 1bgx HL:T, 1de4 CF:A, 1ezv E:XY, 1is8 ABEJCIDHGF:KLOMN, 1m2o AC:B, 1o94 AB:CD, 1qfu AB:HL, 2hmi AB:CD, 4cpa I:0 and 2q33 A:B) were left out because the electrostatic energy values for all atoms in their interfaces cannot be computed.
Different properties can be employed to predict protein interactions and, in particular, types of protein complexes. In our recent study , it has been demonstrated that electrostatic energy is a powerful property to predict obligate and non-obligate complexes. Moreover, we have previously shown that desolvation energy is also very effective for prediction of these types of PPIs [18, 20]. In this study, electrostatic energy properties are used for prediction of obligate and non-obligate interactions and desolvation energy properties are used for comparison purposes. Our method to obtain these prediction properties are summarized below.
where all atom pairs (18 different atoms) are considered in the double summation and g(r ij ) is a smooth function based on the distance of interacting atoms i and j. For simplicity, in our comparisons, the value of g(r ij ) is 1 for pairs of atoms that are less than the selected distance cutoff apart from each other, and 0 otherwise. Using Eq. (1), the desolvation energy between any pair of ligand and receptor can be calculated. Thus, by following the approach of , it is possible to compute the desolvation energy by using different criteria. Desolvation energy values are calculated for atom and amino acid types. More details about the computation of desolvation energy values for atom and amino acid types as features can be found in .
The main property that we use in this study for predicting obligate and non-obligate complexes is electrostatic energy, because of its role in charged molecules . Electrostatic energy involves a long-range interaction and can occur between charged atoms of two interacting proteins or two different molecules. Moreover, these interactions can occur between charged atoms on the protein surface and charges in the environment. In order to compute electrostatic energy values, PDB2PQR and APBS  software packages are used.
For each complex in the datasets, after extracting the structural data from the Protein Data Bank (PDB) , PDB2PQR is employed for preparing the structures for electrostatic calculations. Adding missing heavy atoms, placing missing hydrogen atoms and assigning charges are some of the main tasks performed by PDB2PQR. To customize the parameters of PDB2PQR in our experiments, we consider the following parameters: (a) the AMBER forcefield is employed (b) "apbs-input" is specified to create output files with ".in" extension, and (c) "−− chain" is also specified to include the chain name in the ".pqr" files. The outputs of this package, a "pqr" file and an "in" file, are the inputs to APBS.
APBS is utilized to compute electrostatic energy values of interactions between solutes in salty and aqueous media. In APBS, the Poisson-Boltzmann equation is solved numerically and electrostatic calculations are performed in a range from ten to million atoms. Before running APBS, the parameters should be set accordingly as detailed in .
As in , 18 different atom types and 20 different amino acid types were taken into account to calculate the features for prediction. Since the order of the interacting atoms and amino acid pairs is not important, we generated feature vectors for atom type features containing 171 values. Similarly, for amino acid type features, the length of the feature vector 210 . Each feature contains the cumulative sum of electrostatic energy values for all pairs of atoms or amino acids of the same type. More details about the computation of electrostatic energy values for atom and amino acid type features are described in .
After finding the features of the complexes of the MW and ZH datasets, a prediction method should be applied to them. In this paper, the prediction is performed via several commonly used classification methods, including LDR, SVM, NB and k-NN. More details regarding the applied prediction methods are discussed below.
Linear Dimensionality Reduction
Fisher's discriminant analysis (FDA): FDA is a homoscedastic criterion that maximizes the Mahalanobis distance between the means assuming that the covariance matrices are equal.
Heteroscedastic discriminant analysis (HDA): HDA is a criterion that starts from the Chernoff distance in original space and takes correlations between random variables to project the data onto a lower dimensional space.
Chernoff discriminant analysis (CDA): CDA is a heteroscedastic criterion and aims to maximize the Chernoff distance between random vectors in the transformed space.
LDR is followed by a Bayesian classifier (linear or quadratic). More details about these LDR methods and the corresponding classification tasks can be found in .
Support Vector Machine
SVMs are well known machine learning techniques used for classification, regression and other tasks. The main goal of the SVM is to find a hyperplane that classifies all the feature vectors into two regions. In most cases, the separating hyperplane is not unique, and hence the SVM chooses the hyperplane that leaves the maximum margin from that hyperplane to the support vectors. Since most classification problems are not linearly separable, using a linear classifier is inefficient. Thus, in order to achieve a more efficient classification, using kernels to map the data onto a higher dimensional space can be useful. There are a number of kernels that can be used in SVM models such as polynomial, radial basis function (RBF) and sigmoid. The effectiveness of the SVM depends on the selection of the kernel, the selection parameters and the soft margin . In addition, sequential minimal optimization (SMO), is a fast learning algorithm that has been widely applied to the training phase of a SVM classifier to solve the underlying optimization problem. In this study, the SMO module of the Waikato Environment for Knowledge Analysis (WEKA) with a polynomial kernel, default parameter settings and 10-fold cross validation is used for performing classification via the SVM .
k- NN is one of the simplest classification methods in which the class of each test sample can be easily found by voting on the class labels of its neighbors. To achieve this, after computing and sorting the distances between the test sample and each training sample, the most frequent class label in the first k train samples (nearest neighbors) is assigned to the class of the test sample. Determining the appropriate number of neighbors is one of the challenges of this method. In this study, the IBK module of WEKA with Euclidean distance, default parameter settings, and 10-fold cross validation is used for k-NN classification .
One of the simplest probabilistic classifiers is NB. Assuming independence of the features, the class of each test samples can be found by applying Bayes' theorem. The basic mechanism of NB is rather simple. The reader is referred to  for more details. In this study, the NaiveBayes module of WEKA with default parameters and 10-fold cross validation is used .
Feature selection methods
Feature selection is the process of selecting the best subset of relevant features that represents the whole dataset efficiently and removing redundant and/or irrelevant ones. Applying feature selection before running a classifier is useful in reducing the dimensionality of the data and, thus, reducing the prediction time, while improving the prediction performance by eliminating irrelevant, redundant and noisy features. There are two different ways of doing feature selection: wrapper methods and filter methods . In this study filter-based methods are used in which the quality of the selected features are scored and ranked independently of the classification algorithm and by using some criteria based on their relevance. The following filter-based feature selection methods are used in this study.
Minimum Redundancy Maximum Relevance
One of the most widely-used feature selection methods based on mutual information is mRMR [45, 46]. In this method, the features are selected and scored based on their relevance and redundancy among other features. A feature with minimum redundancy and maximum relevance and with respect to the class concept is assigned a high score. After assigning a significance score to each feature, a ranking list of all features is generated. In this study, the online mRMR tool  with default parameters is used to obtain a complete list of all scored features by mRMR.
where p(y) is the marginal probability density function for random variable Y and p(y|x) is the conditional probability of Y given X. In this study, the InfoGainAttributeEval module of WEKA is used for feature ranking based on the score of features by measuring the information gain with respect to the class.
where H(Y), the entropy of class Y , and H(Y|X), the conditional entropy of Y given X, are calculated using Eqs. (3) and (4) respectively. A value of GR = 1 indicates that feature X is highly relevant and one of the best features to predict class Y , while GR = 0 means that feature X is not relevant at all. In this study, the GainRatioAttributeEval module of WEKA is used for feature ranking based on the relevance of each feature by measuring its gain ratio with respect to the class.
Feature selection via the Chi square test is another, very commonly used method . This method evaluates the relevance of a feature with respect to a class by computing the value of the Chi square statistic. In this study, the ChiSquaredAttributeEval module of WEKA is used to obtain the scored feature vector.
Results and discussion
To test our proposed method and perform an in-depth analysis of the strength of electrostatic energy as the prediction property, four different classification methods including SMO, k-NN, LDR and NB and also four different feature selection methods including IG, GR, Chi2 and mRMR have been used. The performances of the prediction methods are compared in terms of their accuracies, which are computed as follows: acc = (T P + T N )/N , where T P and T N are the total numbers of true positive (obligate) and true negative (non-obligate) counters over the 10-fold cross-validation procedure, respectively, and N is the total number of complexes in the dataset.
Analysis of prediction properties
In previous works [18–20], it has been shown that desolvation energy is very efficient for prediction of obligate and non-obligate complexes in comparison with solvent accessible and interface area properties. However, in our recent study of  and in this work, it has been shown that employing electrostatic energy deliver impressive prediction accuracy.
To validate our previous results and compare the strength of electrostatic and desolvation energies as properties for prediction, SMO, k-NN, NB and LDR have been applied for prediction on these two types of features. For the LDR schemes, six different classifiers were implemented and evaluated, namely the combinations of FDA, HDA and CDA with quadratic and linear classifiers; the maximum average classification accuracy for each classifier is reported for each dataset. For SVM, k-NN and NB, the classification modules of WEKA have been used with default parameters in a 10-fold cross-validation process. The distance cut-offs between atom pairs of interacting chains are 9Å and 7Å for electrostatic and desolvation energies as properties respectively.
Comparison of accuracies for electrostatic and desolvation energies as properties.
Generally, from the table, it can be concluded that electrostatic energy yields much more efficient prediction than desolvation energy, on the basis of the experimental results shown here using different classification methods. In addition, for most subsets of features, SMO performs better than k-NN, NB and LDR, for both desolvation and electrostatic energies.
Analysis of distance cutoffs
In order to obtain a better insight into the classification results by using desolvation and electrostatic energies as properties, different experiments were performed by varying the distance cutoff between atom pairs of interacting chains.
Prediction accuracies using desolvation energy and different distance cutoffs.
Inter-atom distance cutoffs
Prediction accuracies for electrostatic energy and different distance cutoffs
Inter-atom distance cutoffs
As a general remark, it can be concluded that the best distance cutoffs for prediction of obligate and non-obligate complexes using electrostatic energy range from 9Å to 12Å, while by using desolvation energy the best distance cutoffs range from 5Å to 7Å. These distance cutoffs for desolvation energy are reasonable and are in agreement with all previous studies [5, 6, 36]. In most studies, a distance cutoff of 6Å is typically used to determine whether or not two atoms from different chains interact with each other. Moreover, in [20, 35, 36], a function g is used to compute the distance between two atoms. These approaches consider a smooth function for inter-atom distances between 5Å and 7Å, while g evaluates to 0 if the distance is greater than 7Å. On the other hand, electrostatic energy is considered to be long-range [21, 48], extending inter-atom interactions up to a 10°A distance or more, and hence covering a much broader and deeper area of the interface. In other words, this suggests that using electrostatic energy with a long distance cutoff, the atoms in the surface and some atoms buried under the surface may participate in the prediction that led to outstanding classification performance. This is a topic of interest for further studies.
Analysis of feature selection
Prediction accuracies for electrostatic energy and different feature selection methods.
In general, it can be concluded that a few pairs of atoms/amino acids are appropriate for prediction. Also, feature selection increases the performance of classification models by eliminating redundant, irrelevant and noisy features and selecting the more discriminative features. Moreover, by comparing the performance of the applied feature selection methods, Chi2 is the best method for ranking features. In contrast, mRMR is the worst ranking method because it used more features and achieved lower performance for all datasets.
To show the effect of using electrostatic energy for prediction of PPI types from a different perspective, a visual analysis is presented. In this analysis, an obligate complex, PDB ID 2min, and a non-obligate complex, PDB ID 1a2k, both from the MW dataset are considered. For these protein complexes the solvent accessible surfaces by electrostatic potential are generated with the help of Jmol embedded in APBS. In the plots, positive electrostatic potentials are shown in blue, while negative electrostatic potentials are shown in red.
The proposed prediction model works exceptionally well for distinguishing protein interaction types. Our prediction approach uses electrostatic energy values for pairs of atoms or amino acids present in the interfaces of obligate and non-obligate complexes. The classification is performed via various classification techniques including LDR, SVM, k-NN and NB.
We observe that electrostatic energy values with distance cutoffs in the range 9Å to 12Å turn out to be the best ones for prediction of interaction types on the basis of our experimental results. The reason for why electrostatic energy yields better prediction results is because electrostatic interactions are long-range. Thus, by using electrostatic energy with a large distance cutoff, not only the atoms in the surface but also some atoms which are buried under the surface may participate in the interaction, and this leads to excellent prediction results. Therefore, among various types of molecular interactions, electrostatic interactions play a special role. The proposed features then exploit the high affinity of proteins to interact with each other (in terms of negative and positive potentials). Furthermore, applying several feature selection algorithms on the MW and ZH datasets demonstrates that removing irrelevant and noisy pairs of atom type/amino acid type features and selecting the most relevant pairs improve the prediction results.
From this study, various open questions remain to be answered. One of these is to investigate domains and motifs present in the interface in order to achieve a better insight on proteins, their interactions, and function. Another problem that deserves attention is to investigate the role of buried atoms and their influence in obligate interactions. This study could consider atoms that are 10Å (or more) apart from each other, but one of these atoms may not be on the surface of the protein.
This work has been partially supported by NSERC, the Natural Science and Engineering Council of Canada. This work has also been made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET:http://www.sharcnet.ca) and Compute/Calcul Canada.
This article has been published as part of Proteome Science Volume 11 Supplement 1, 2013: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2012: Proteome Science. The full contents of the supplement are available online at http://www.proteomesci.com/supplements/11/S1.
- Mendelsohn A, Brent R: Protein interaction methods-toward an endgame. Science 1999,284(5422):1948–1950. 10.1126/science.284.5422.1948PubMedView ArticleGoogle Scholar
- Park S, Reyes J, Gilbert D, Kim J, Kim S: Prediction of protein-protein interaction types using association rule based classification. BMC Bioinformatics 2009, 10: 36. 10.1186/1471-2105-10-36PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang Q, Petrey D, Deng L, Qiang L, Shi Y, Thu C, Bisikirska B, Lefebvre C, Accili D, Hunter T, et al.: Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature 2012,490(7421):556–560. 10.1038/nature11503PubMed CentralPubMedView ArticleGoogle Scholar
- Qiu J, Sun X, Suo S, Shi S, Huang S, Liang P, Zhang L: Predicting homo-oligomers and hetero-oligomers by pseudo-amino acid composition: an approach from discrete wavelet transformation. Biochimie 2011,93(7):1132–1138. 10.1016/j.biochi.2011.03.010PubMedView ArticleGoogle Scholar
- Mintseris J, Weng Z: Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci 2005,102(31):10930–10935. 10.1073/pnas.0502667102PubMed CentralPubMedView ArticleGoogle Scholar
- Zhu H, Domingues F, Sommer I, Lengauer T: NOXclass: Prediction of Protein-protein Interaction Types. BMC Bioinformatics 2006.,7(27): 10.1186/1471-2105-7-27Google Scholar
- LoConte L, Chothia C, Janin J: The atomic structure of protein-protein recognition sites. J Mol Biol 1999,285(5):2177–2198. 10.1006/jmbi.1998.2439View ArticleGoogle Scholar
- Young J: A role for surface hydrophobicity in protein protein recognition. Protein Sci 1994, 3: 717–729.PubMed CentralPubMedView ArticleGoogle Scholar
- A Zen MichelettiOKC, Nussinov R: Comparing interfacial dynamics in protein-protein complexes: an elastic network approach. BMC Structural Biology 2010.,10(26): 10.1186/1472-6807-10-26Google Scholar
- Jones S, Thornton JM: Principles of protein-protein interactions. Proc Natl Acad Sci, USA 1996, 93: 13–20. 10.1073/pnas.93.1.13PubMed CentralPubMedView ArticleGoogle Scholar
- Lawrence MC, Colman PM: Shape complementarity at protein/protein interfaces. J Mol Biol 1993,234(4):946–950. 10.1006/jmbi.1993.1648PubMedView ArticleGoogle Scholar
- Chakrabarti P, Janin J: Dissecting protein-protein recognition sites. Proteins 2002,47(3):334–343. 10.1002/prot.10085PubMedView ArticleGoogle Scholar
- Xu D, Tsai C, Nussinov R: Hydrogen bonds and salt bridges accross protein-protein interfaces. Protein Eng 1997,10(9):999–1012. 10.1093/protein/10.9.999PubMedView ArticleGoogle Scholar
- Ma B, Elkayam T, Wolfson H, RNussinov : Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci, USA 2003,100(10):5772–5777. 10.1073/pnas.1030237100PubMed CentralPubMedView ArticleGoogle Scholar
- Shanahan H, Thornton J: Amino acid architecture and the distribution of polar atoms on the surfaces of proteins. Biopolymers 2005,78(6):318–328. 10.1002/bip.20295PubMedView ArticleGoogle Scholar
- Glaser F, Steinberg DM, Vakser IA, Ben-Tal N: Residue frequencies and pairing preferences at protein-protein interfaces. Proteins 2001,43(2):89–102. 10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-HPubMedView ArticleGoogle Scholar
- Mintseris J, Weng Z: Atomic Contact Vectors in Protein-Protein Recognition. PROTEINS: Structure, Function and Genetics 2003, 53: 629–639. 10.1002/prot.10432View ArticleGoogle Scholar
- Rueda L, Banerjee S, Aziz M, Raza M: Protein-protein interaction prediction using desolvation energies and interface properties. Bioinformatics and Biomedicine (BIBM) 2010, 17–22.Google Scholar
- Rueda L, Garate C, Aziz MM: Biological Protein-protein Interaction Prediction Using Binding Free Energies and Linear Dimensionality Reduction. Proceedings of the 5th. IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB 2010) 2010, 383–394.Google Scholar
- Aziz MM, Maleki M, Rueda L, Raza M, Banerjee S: Prediction of Biological Protein-protein Interactions using Atom-type and Amino Acid Properties. Proteomics 2011, 11: 3802–3810. 10.1002/pmic.201100186PubMedView ArticleGoogle Scholar
- Vasudev G, Rueda L: A Model to Predict and Analyze Protein-protein Interaction Types Using Electrostatic Energies. 5th IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012) 2012, 543–547.Google Scholar
- Kessel A, Ben-Tal N: Introduction to Proteins: Structure, Function, and Motion. CRC Press; 2010.View ArticleGoogle Scholar
- De S, Krishnadev O, Srinivasan N, Rekha N: Interaction preferences across protein-protein interfaces of obligatory and non-obligatory components are different. BMC Structural Biology 2005.,5(15):Google Scholar
- Eichborn JV, Gunther S, Preissner R: Structural features and evolution of protein-protein interactions. Intenational Conference of Genome Informatics 2010, 22: 1–10.Google Scholar
- Maleki M, Aziz M, Rueda L: Analysis of relevant physicochemical properties in obligate and non-obligate protein-protein interactions. IEEE International Conference in Bioinformatics and Biomedicine Workshops (BIBMW) 2011, 2011: 345–351.View ArticleGoogle Scholar
- Theodoridis S, Koutroumbas K: Pattern Recognition. Elsevier Academic Press; 2006.Google Scholar
- Cai Y, Huang T, Hu L, Shi X, Xie L, Li Y: Prediction of lysine ubiquitination with mRMR feature selection and analysis. Amino Acids 2012.Google Scholar
- Niu S, Huang T, Feng K, Cai Y, Li Y: Prediction of tyrosine sulfation with mRMR feature selection and analysis. J Proteome Res 2010,9(12):6490–6497. 10.1021/pr1007152PubMedView ArticleGoogle Scholar
- Liu L, Cai Y, Lu W, Peng C, Niub B: Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection. Biochemical and Biophysical Research Communications 2009,380(2):318–322. 10.1016/j.bbrc.2009.01.077PubMedView ArticleGoogle Scholar
- Yuan Y, x Shi, Li X, Lu W, Cai Y, Gu L, Liu L, Li M, Kong X, Xing M: Prediction of interactiveness of proteins and nucleic acids based on feature selections. Mol Divers 2009,14(4):627–33.PubMedView ArticleGoogle Scholar
- Mundra P, Rajapakse J: SVM-RFE With MRMR Filter for Gene Selection. IEEE Transactions on Nanobioscience 2010, 9: 31–37.PubMedView ArticleGoogle Scholar
- Zhao Y, Yand Z: Improving MSVM-RFE for Multiclass Gene Selection. The Fourth International Conference on Computational Systems Biology (ISB2010) 2010.Google Scholar
- Lee Y, Chang C, Chao C: Incremental forward feature selection with application to microarray gene expression data. biopharmaceutical statistics 2008,18(5):827–840. 10.1080/10543400802277868View ArticleGoogle Scholar
- Liu Q, Li J: Propensity vectors of low-ASA residue pairs in the distinction of protein interactions. Proteins: Structure, Function, and Bioinformatics 2010,78(3):589–602.Google Scholar
- Camacho C, Zhang C: FastContact: rapid estimate of contact and binding free energies. Bioinformatics 2005,21(10):2534–2536. 10.1093/bioinformatics/bti322PubMedView ArticleGoogle Scholar
- Zhang C, Vasmatzis G, LCornette J, DeLisi C: Determination of Atomic Desolvation Energies From the Structures of Crystallized Proteins. J. Mol. Biol 1997, 267: 707–726. 10.1006/jmbi.1996.0859PubMedView ArticleGoogle Scholar
- Hartvig R, van de Weert M, Ostergaard J, Jorgensen L, Jensen H: Protein Adsorption at Charged Surfaces: The Role of Electrostatic Interactions and Interfacial Charge Regulation. Langmuir 2011,27(6):2634–2643. 10.1021/la104720nPubMedView ArticleGoogle Scholar
- Dolinsky TJ, Czodrowski P, Li H, Nielsen JE, Jensen JH, Klebe G, Baker NA: PDB2PQR: expanding and upgrading automated preparation of biomolecular structures for molecular simulations. Nucleic Acids Research 2007, 35: 522–525. 10.1093/nar/gkm276View ArticleGoogle Scholar
- Baker NA, Sept D, Joseph S, Holst MJ, Mccammon JA: Electrostatics of nanosystems: Application to microtubules and the ribosome. Proceedings of the National Academy of Sciences 2001,98(18):10037–10041. 10.1073/pnas.181342398View ArticleGoogle Scholar
- Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov I, Bourne P: The Protein Data Bank. Nucleic Acids Research 2000, 28: 235–242. 10.1093/nar/28.1.235PubMed CentralPubMedView ArticleGoogle Scholar
- Rueda L, Herrera M: Linear Dimensionality Reduction by Maximizing the Chernoff Distance in the Transformed Space. Pattern Recognition 2008,41(10):3138–3152. 10.1016/j.patcog.2008.01.016View ArticleGoogle Scholar
- Duda R, Hart P, Stork D: Pattern Classification. 2nd edition. New York, NY: John Wiley and Sons, Inc.; 2000.Google Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA Data Mining Software: An Update. SIGKDD Explorations 2009, 11: 10–18. 10.1145/1656274.1656278View ArticleGoogle Scholar
- Novakovic J, Strbac P, Bulatovic D: Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J of Operations Research 2011, 21: 119–135. 10.2298/YJOR1101119NView ArticleGoogle Scholar
- Ding C, Peng H: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 2005,3(2):185–205. 10.1142/S0219720005001004PubMedView ArticleGoogle Scholar
- Peng H, Long F, Ding C: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 2005,27(8):1226–1238.PubMedView ArticleGoogle Scholar
- minimum Redundancy Maximum Relevance Feature Selection (mRMR) [http://penglab.janelia.org/proj/mRMR/]
- Fadrná E, Hladecková K, Koca J: Long-range Electrostatic Interactions in Molecular Dynamics: An Endothelin-1 Case Study. Journal of Biomolecular Structure and Dynamics 2005,23(2):151–162. 10.1080/07391102.2005.10531229PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.