- Open Access
IDDI: integrated domain-domain interaction and protein interaction analysis system
© Kim et al; licensee BioMed Central Ltd. 2012
- Published: 21 June 2012
Deciphering protein-protein interaction (PPI) in domain level enriches valuable information about binding mechanism and functional role of interacting proteins. The 3D structures of complex proteins are reliable source of domain-domain interaction (DDI) but the number of proven structures is very limited. Several resources for the computationally predicted DDI have been generated but they are scattered in various places and their prediction show erratic performances. A well-organized PPI and DDI analysis system integrating these data with fair scoring system is necessary.
We integrated three structure-based DDI datasets and twenty computationally predicted DDI datasets and constructed an interaction analysis system, named IDDI, which enables to browse protein and domain interactions with their relationships. To integrate heterogeneous DDI information, a novel scoring scheme is introduced to determine the reliability of DDI by considering the prediction scores of each DDI and the confidence levels of each prediction method in the datasets, and independencies between predicted datasets. In addition, we connected this DDI information to the comprehensive PPI information and developed a unified interface for the interaction analysis exploring interaction networks at both protein and domain level.
IDDI provides 204,705 DDIs among total 7,351 Pfam domains in the current version. The result presents that total number of DDIs is increased eight times more than that of previous studies. Due to the increment of data, 50.4% of PPIs could be correlated with DDIs which is more than twice of previous resources. Newly designed scoring scheme outperformed the previous system in its accuracy too. User interface of IDDI system provides interactive investigation of proteins and domains in interactions with interconnected way. A specific example is presented to show the efficiency of the systems to acquire the comprehensive information of target protein with PPI and DDI relationships. IDDI is freely available at http://pcode.kaist.ac.kr/iddi/.
- Confidence Score
- Prediction Score
- Reliability Score
- Pfam Domain
- Protein Interaction Data
Protein interactions, including binary PPIs and co-complexes, regulate biological process and biochemical reactions. Discovering protein interactions provides detailed interpretation of cellular mechanism of biological functions. Therefore, the identification of protein interaction is a critical issue for biology researchers. Recently, massive amount of protein interaction data is available due to the advancement of large-scale screening techniques such as yeast two-hybrid, affinity purification followed by mass spectrometry. Lots of protein interaction data verified from different experimental methods is publically available. However, although the increased data can give a landscape of the protein interactome, they are not much informative in detailed binding mechanisms and high false positive rate of the data is a big hurdle to interpret the interactome .
Investigating protein interactions in domain level can complement these limitations. Proteins consist of one or multiple domains thought as functional units of protein. In most cases, domain-domain interactions (DDIs) are crucial clues of protein interactions. Therefore, DDIs can be key supporting evidences for protein interaction mechanisms.
DDIs first have been identified based on 3-dimensional (3D) structures of protein complexes from Protein Data Bank . 3DID , iPfam  and PInS  extract DDIs from the binding regions in known 3D structures. However, these datasets cover only a small proportion of DDIs due to insufficient available 3D structures. DDIs obtained from 3D structures cover less than 20% of the PPIs in Escherichea coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens . To complement DDIs, various computational methods have been proposed to predict DDIs in recent years [7–25]. However, it is a cumbersome work for individual researchers to gather and integrate each predicted dataset because reliability of each datasets should be further analyzed since each method has different reliability level. Therefore, it is necessary to build an integrated system which combines all DDIs with a unified reliability scoring scheme.
Up to now, two combined DDI databases, DOMINE  and UniDomInt , have been published. DOMINE combined two 3D structure-based DDI datasets and thirteen predicted DDI datasets. Confidence level of each predicted DDIs in DOMINE is classified as High, Middle and Low based on the prediction overlap indexes (POIs) of the predicted DDI dataset. On the other hand, UniDomInt merged two 3D structure-based DDI datasets and eight predicted DDI datasets. UniDomInt provides numerical reliability scores for predicted DDIs by comparing an accuracy of the predicted datasets. Although DOMINE and UniDomInt provide a large amount of DDIs and compare the reliabilities between predicted DDIs with a unified format, some datasets are outdated and the total number of datasets is far below than that of currently published. They also ignored the scores measured by each prediction method of the datasets, so it is impossible to compare reliabilities between DDIs predicted in the same datasets. In addition, DOMINE and UniDomInt do not provide PPI information mediated by DDIs.
In this paper, we proposed an integrated analysis system for DDIs and their related protein interactions, called IDDI. We first combined three 3D structure-based DDI datasets and twenty predicted DDI datasets. To estimate the reliability of predicted DDIs, we developed a novel scoring scheme considering the individual accuracy of each datasets, independency among the datasets and the internal prediction scores of the DDIs measured by each method. Total amount of DDIs is increased significantly compared to previous comprehensive DDI databases, and the novel reliability scoring scheme achieved outstanding performance on sorting highly reliable DDIs. Furthermore, we joined our new DDI database with comprehensive PPI database, ComBiCom , and constructed a unified analysis system with a unique interface for the protein interaction network analysis that enables exploring the protein and domain interaction mechanism together.
Statistics and confidence scores of DDI datasets in IDDI
DDI Data [Ref.]
No. of Domains
No. of DDIs
3D-structure Based Datasets
Assessment of the reliability score for the predicted DDI
Each predicted DDI in our new database is evaluated by a reliability score. We considered three factors that affect reliabilities i) a confidence level of the each predicted dataset, ii) an independency of the dataset and iii) a local prediction score of the DDIs measured by each dataset.
Each predicted dataset has different confidence level. Predicted DDIs are more reliable when they were found in more accurate datasets.
where I is a set of DDIs, I a→b is a subset of I a which interacting domains belong to both dataset a and b and, likewise, I b→a is a subset of I b which interacting domains are found in both datasets.
Table 1 shows confidence scores of each predicted dataset. Although the gap between two scores does not stand for absolute difference between two datasets, it is quite obvious that the DDIs are more reliable as they were predicted in higher confidence datasets. Based on confidence scores, the most reliable dataset is ME, followed TW, DIPD and Top-down. In contrast, RDFF, LLZ KGIDDI and DIMA-DPROF has low confidence scores which means DDIs predicted in these datasets have a high probability of false positive.
where e is the all datasets that predict i except d. For example, a dataset whose DDI is not overlapped with other datasets will receive an independence score of one.
Local prediction scores of DDIs measured by each predicted dataset are also important key evidence for inferring reliabilities. Although DDIs were found in a same dataset, reliabilities of these DDIs are discrete depending on prediction scores. We scaled different ranges of original prediction score of each dataset from 0 to 1 by using an ordinal scaling method. Six of the datasets including HiMAP, KGIDDI, LLZ, RDFF, P-value and TW don`t provide own prediction scores. DDIs predicted in these datasets receive an average prediction score of the DDIs found in the same number of the datasets.
where d is the all datasets that predicted i and P d, i is a prediction score of i measured by the dataset d.
Integrated analysis system construction
IDDI doesn`t include our new integrated DDI database only but also protein interactions from ComBiCom  to grasp the detailed interactions in both domain and protein level. ComBiCom, developed in our group, is the database system providing 257,902 non-redundant binary PPIs and 11,964 protein complexes from 9 experimentally identified PPI databases, which cover the most of publically available PPI information. In order to mapping of domains to their containing protein, SwissPfam available at the Pfam site was used. It provides SWISS-PROT and TrEMBL proteins with their assigned Pfam domains. In addition, we stored protein functional annotations obtained from the Gene Ontology to build a reference set of functional information. An update module is also implemented to semi-automatically update database.
IDDI provides four kinds of searching services: protein search, domain search, PPI search, and DDI search. This searching system is based on PFAM ID and Uniprot accession number for domain and protein classifier, respectively. PPI relationship was searched from ComBiCom, and protein function information is annotated from Gene Ontology. To provide comprehensive searching system, we need to map proteins with their contained domains and SwissPfam was used to map proteins with their corresponding domains. Using this mapping data, IDDI could provide possible DDIs for protein search or possible PPIs for DDI search.
Performance evaluation of reliability scoring scheme in IDDI
Figure 4(a) shows ROC curves of IDDI and UniDomInt with their own DDI datasets and scoring schemes. The ROC curves demonstrate that IDDI has high true positive rate than UniDomInt at same false positive rate. It indicates IDDI has greater power to filter more reliable DDIs. UniDomInt combines only 8 predicted datasets and the reliability score of UniDomInt is heavily dependent on ME owing to its overwhelming accuracy. It inhibits an accurate measurement of the reliability scores. On the other hand, IDDI include additional predicted datasets including TW, DIPD and Top-down which have as high confidence as ME. It prevents the excessive focus of the reliability scores on a single predicted dataset. For example, interaction between Signal peptide binding domain (PF02978) and SRP19 protein (PF01922), the known DDI searched in iPfam, is found only in the p-value method among 13166 predicted DDIs of UniDomInt and has low reliability score, 0.0548. This score is ranked in the top 87.3% of the total predicted interactions, which means it has high possibility of being false positive. On the other hand, IDDI has additional prediction information for the same DDI from the updated version of InterDom and DIPD, APMM and Top-down, which are not existing datasets in UniDomInt. IDDI's reliability score for this DDI is ranked in top 0.42% of the total predicted interactions and represents high probability of being true positive.
Figure 4(b) shows a comparison between IDDI and UniDomInt`s scoring schemes with same DDI datasets in IDDI. A result reveals that additional factors in our new scoring scheme are efficient enough to filter reliable interactions. UniDomInt considers only the confidence level of the predicted datasets for accessing the reliability score to the each DDI. As a result, comparisons between DDIs found in the same dataset are impossible because all of them receive same scores. It also causes an overestimation problem of the reliability scores. DDIs in a high-confidence dataset are accessed high reliability scores even if they are more likely to false positive because of their low prediction scores.
We tested the average accuracy for reliability score cut-off in IDDI. The result reveals that the cut-off of 0.329 has the highest accuracy, 0.98. For reference, cut-off value that shows 0.90 of accuracy was 0.102 and 21027 DDIs were included within the cut-off value. End user can determine the cut-off value for research purpose and those DDIs which have cut-off value for high accuracy may show more reliable results.
Comparison of PPI coverage rates
We tried to compare the PPI coverage rate of 3D structure-based DDIs, DOMINE, UniDomInt and IDDI by using binary PPIs in ComBiCom. We defined that the PPI is covered when at least one DDI are found between interacting proteins.
Comparison of PPI coverage rates in different DDI databases
3D-structure based only
PPI with DDI
PPI without DDI
Functionality of the integrated interaction analysis system
Example of integrated interaction analysis
IDDI provides comprehensive searching service to explore the relationship of proteins and domains. It can be used for gene selection for study by prioritization of list of proteins with using filtering function. In this section, we provide an example of p53 interacting target analysis. Figure 5(a) and 5(d) illustrates the example of integrated analysis for the specific PPIs and DDIs of p53 protein. Interacting partners for p53 can be searched using protein search and the list of interacting partners are subdivided by domain interaction. Among them, those which have domain interaction with the transactivation domain of p53 can be selected using filtering option (Figure 5(a), only the part of the list is shown here). With this specification of interacting partners, total 11 interacting partners were selected from the 355 partners of p53. The specified DDI can be further investigated by the "DDI" link as shown in Figure 5(d). In this example, as a summary, it shows that the Mage domain of Necdin interact with the transactivation domain of p53. Actually, the interaction mechanism of both domains for the function of two proteins has been turned out by the elaborated experimental works . The investigation can be expanded more with other selected proteins or by tracing the other proteins having Mage domain by using our system. As in this example, our system will enable more sophisticated and efficient investigation about the protein interaction and their function by providing an integrated analysis scheme of DDIs and PPIs.
We proposed a new unified interaction analysis system, IDDI, which enables the comprehensive analysis of protein and domain interactions with their interconnectivity. Large increase of total DDIs enables high interconnectivity of DDIs and PPIs and an advanced scoring scheme enhances the reliability of integrated DDIs in a substantial amount. Furthermore, IDDI provides a convenient interface to investigate the protein interaction with detail domain interaction. IDDI will be a valuable resource for the in-depth study of interaction mechanism and thereby to derive the functional implication of interacting proteins.
This work was partially supported by the Converging Research Center Program of the Ministry of Education, Science and Technology of Korea (Project No.2011K000864) and by the National Research Foundation of Korea (NRF) grant (No. 2011-0018264) by the Ministry of Education, Science and Technology of Korea
This article has been published as part of Proteome Science Volume 10 Supplement 1, 2012: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine 2011: Proteome Science. The full contents of the supplement are available online at http://www.proteomesci.com/supplements/10/S1.
- Pawson T, Nash P: Assembly of cell regulatory systems through protein interaction domains. Science 2003,300(5618):445–52. 10.1126/science.1083653PubMedView ArticleGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000,28(1):235–42. 10.1093/nar/28.1.235PubMed CentralPubMedView ArticleGoogle Scholar
- Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 2005, 33: D413–417.PubMed CentralPubMedView ArticleGoogle Scholar
- Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics 2005,21(3):410–412. 10.1093/bioinformatics/bti011PubMedView ArticleGoogle Scholar
- Bordner AJ, Gorin AA: Comprehensive inventory of protein complexes in the Protein Data Bank from consistent classification of interfaces. BMC Bioinformatics 2008, 9: 234. 10.1186/1471-2105-9-234PubMed CentralPubMedView ArticleGoogle Scholar
- Schuster-Böckler B, Bateman A: Reuse of structural domain-domain interactions in protein networks. BMC Bioinformatics 2007, 8: 259. 10.1186/1471-2105-8-259PubMed CentralPubMedView ArticleGoogle Scholar
- Wang RS, Wang Y, Wu LY, Zhang XS, Chen L: Analysis on multi-domain cooperation for predicting protein-protein interactions. BMC Bioinformatics 2007, 8: 391. 10.1186/1471-2105-8-391PubMed CentralPubMedView ArticleGoogle Scholar
- Luo Q, Pagel P, Vilne B, Frishman D: DIMA 3.0: Domain Interaction Map. Nucleic Acids Res 2010, 39: D724–9.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhao XM, Chen L, Aihara K: A discriminative approach for identifying domain-domain interactions from protein-protein interactions. Proteins 2010,78(5):1243–53. 10.1002/prot.22643PubMedView ArticleGoogle Scholar
- Singhal M, Resat H: A domain-based approach to predict protein-protein interactions. BMC Bioinformatics 2007, 8: 199. 10.1186/1471-2105-8-199PubMed CentralPubMedView ArticleGoogle Scholar
- Riley R, Lee C, Sabatti C, Eisenberg D: Inferring protein domain interactions from databases of interacting proteins. Genome Biol 2005,6(10):R89. 10.1186/gb-2005-6-10-r89PubMed CentralPubMedView ArticleGoogle Scholar
- Guimarães KS, Przytycka TM: Interrogating domain-domain interactions with parsimony based approaches. BMC Bioinformatics 2008, 9: 171. 10.1186/1471-2105-9-171PubMed CentralPubMedView ArticleGoogle Scholar
- Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005,23(8):951–959. 10.1038/nbt1103PubMedView ArticleGoogle Scholar
- Ng SK, Zhang Z, Tan SH: Integrative approach for computationally inferring protein domain interactions. Bioinformatics 2003,19(8):923–929. 10.1093/bioinformatics/btg118PubMedView ArticleGoogle Scholar
- Schelhorn SE, Lengauer T, Albrecht M: An integrative approach for predicting interactions of protein regions. Bioinformatics 2008,24(16):i35–41. 10.1093/bioinformatics/btn290PubMedView ArticleGoogle Scholar
- Liu M, Chen XW, Jothi R: Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks. Bioinformatics 2009,25(19):2492–2499. 10.1093/bioinformatics/btp480PubMed CentralPubMedView ArticleGoogle Scholar
- Liu Y, Liu N, Zhao H: Inferring protein-protein interactions through high-throughput interaction data from diverse organisms. Bioinformatics 2005,21(15):3279–3285. 10.1093/bioinformatics/bti492PubMedView ArticleGoogle Scholar
- Lee H, Deng M, Sun F, Chen T: An integrated approach to the prediction of domain-domain interactions. BMC Bioinformatics 2006, 7: 269. 10.1186/1471-2105-7-269PubMed CentralPubMedView ArticleGoogle Scholar
- Guimarães KS, Jothi R, Zotenko E, Przytycka TM: Predicting domain-domain interactions using a parsimony approach. Genome Biol 2006,7(11):R104. 10.1186/gb-2006-7-11-r104PubMed CentralPubMedView ArticleGoogle Scholar
- Nye TM, Berzuini C, Gilks WR, Babu MM, Teichmann SA: Statistical analysis of domains in interacting protein pairs. Bioinformatics 2005,21(7):993–1001. 10.1093/bioinformatics/bti086PubMedView ArticleGoogle Scholar
- Jothi R, Cherukuri PF, Tasneem A, Przytycka TM: Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol 2006,362(4):861–875. 10.1016/j.jmb.2006.07.072PubMed CentralPubMedView ArticleGoogle Scholar
- Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005,21(24):4394–4400. 10.1093/bioinformatics/bti721PubMedView ArticleGoogle Scholar
- Guda C, King BR, Pal LR, Guda P: A top-down approach to infer and compare domain-domain interactions across eight model organisms. PLoS One 2009,4(3):e5096. 10.1371/journal.pone.0005096PubMed CentralPubMedView ArticleGoogle Scholar
- Wuchty S: Topology and weights in a protein domain interaction network--a novel way to predict protein interactions. BMC Genomics 2006, 7: 122. 10.1186/1471-2164-7-122PubMed CentralPubMedView ArticleGoogle Scholar
- Ng SK, Zhang Z, Tan SH: Integrative approach for computationally inferring protein domain interactions. Bioinformatics 2003,19(8):923–929. 10.1093/bioinformatics/btg118PubMedView ArticleGoogle Scholar
- Yellaboina S, Tasneem A, Zaykin DV, Raghavachari B, Jothi R: DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Res 2010, 39: D730–5.PubMed CentralPubMedView ArticleGoogle Scholar
- Björkholm P, Sonnhammer EL: Comparative analysis and unification of domain-domain interaction networks. Bioinformatics 2009,25(22):3020–3025. 10.1093/bioinformatics/btp522PubMedView ArticleGoogle Scholar
- Youngwoong Han, Choong-Hyun Sun, Min-Sung Kim, Gwan-Su Yi: Combined Database System for Binary Protein Interaction and Co-complex Association. Proceedings of the International Association of Computer Science and Information Technology 2009, 17–20.Google Scholar
- Taniura H, Matsumoto K, Yoshikawa K: Physical and functional interactions of neuronal growth suppressor necdin with p53. J Biol Chem 1999,274(23):16242–16248. 10.1074/jbc.274.23.16242PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.