- Research
- Open Access
Surface antigens and potential virulence factors from parasites detected by comparative genomics of perfect amino acid repeats
- Niklaus Fankhauser1,
- Tien-Minh Nguyen-Ha1,
- Joël Adler2 and
- Pascal Mäser1Email author
https://doi.org/10.1186/1477-5956-5-20
© Fankhauser et al; licensee BioMed Central Ltd. 2007
- Received: 28 June 2007
- Accepted: 20 December 2007
- Published: 20 December 2007
Abstract
Background
Many parasitic organisms, eukaryotes as well as bacteria, possess surface antigens with amino acid repeats. Making up the interface between host and pathogen such repetitive proteins may be virulence factors involved in immune evasion or cytoadherence. They find immunological applications in serodiagnostics and vaccine development. Here we use proteins which contain perfect repeats as a basis for comparative genomics between parasitic and free-living organisms.
Results
We have developed Reptile http://reptile.unibe.ch, a program for proteome-wide probabilistic description of perfect repeats in proteins. Parasite proteomes exhibited a large variance regarding the proportion of repeat-containing proteins. Interestingly, there was a good correlation between the percentage of highly repetitive proteins and mean protein length in parasite proteomes, but not at all in the proteomes of free-living eukaryotes. Reptile combined with programs for the prediction of transmembrane domains and GPI-anchoring resulted in an effective tool for in silico identification of potential surface antigens and virulence factors from parasites.
Conclusion
Systemic surveys for perfect amino acid repeats allowed basic comparisons between free-living and parasitic organisms that were directly applicable to predict proteins of serological and parasitological importance. An on-line tool is available at http://genomics.unibe.ch/dora.
Keywords
- Virulence Factor
- Amino Acid Repeat
- Variable Surface Glycoprotein
- Perfect Repeat
- Amino Acid Subsequence
Background
Repetitive amino acid subsequences in polypeptides are of interest regarding the function as well as the evolution of proteins. At least 14% of all proteins contain internal repeats, the proportion being somewhat lower in prokaryote and higher in eukaryote proteomes [1]. Multicellular eukaryotes in particular, possess numerous adhesion proteins of repetitive nature in the extracellular matrix. Other highly repetitive proteins are those of the cytoskeleton [1, 2]. Typical motifs involved in protein-protein interaction are the tetratricopeptide repeat (34 aa), armadillo (47 aa), ankyrin (33 aa), and the leucine-rich repeat (about 20 aa) [3]. Several tools are available for the detection of repeats in proteins: Radar [4, 5], Repro [6, 7], Internal Repeats Finder [8, 9], TRIPS [10, 11], Trust [12, 13], Davros [14], RepSeq [15, 16], REP [2, 17], Repper [18, 19], and ProtRepeatsDB [20, 21]. Apart from simply counting repetitive occurrences of amino acid subsequences in polypeptides, repeats can be detected by self-alignment or – if they are evenly distributed – by Fourier transform. Here we present Reptile, a simple tool for quantitative proteome-wide surveys of perfect amino acid repeats, and its use for the prediction of surface antigens and virulence factors from parasites.
Pathogenic bacteria as well as eukaryotic parasites often possess surface proteins of repetitive nature, presumably to protect themselves against their hosts' defence responses [22, 23]. Examples are the procyclins of the sleeping sickness parasite Trypanosoma brucei with over twenty Glu-Pro (EP-type), respectively five Gly-Pro-Glu-Glu-Thr (GPEET-type) repeats [24, 25], the circumsporozoite protein of the malaria parasite Plasmodium falciparum with around forty Asn-Ala-Asn-Pro (NANP) repeats [26], or SdrE from Staphylococcus aureus, a determinant of staphylococcal sepsis with 83 Ser-Glu (SE) repeats [27]. Such short, perfect repeats are usually very immunogenic. They may serve for serological diagnostics – the presence of repeat-directed antibodies in the serum indicating infection – as is the case with PfHRP2 [28], a malaria antigen with over fifty Ala-His-His (AHH) repeats. Repetitive amino acid sequences also find applications in synthetic vaccines [29]. Furthermore, repeat-containing proteins from parasites may be virulence factors involved in immune evasion, cytoadherence, stress resistance, or biofilm formation [30–35]. The completion of the genome sequencing projects for P. falciparum, T. brucei, Leishmania major, and other parasites now permits systemic approaches to repeat-containing proteins. Here we identify all proteins from pathogens that contain repeats and use them for comparative genomics between parasitic and non-parasitic species. All data and programs are freely accessible via the world-wide web.
Results and Discussion
Probabilistic description of perfect repeats with Reptile
Where r is the length of the repeat and f i is the frequency in the corresponding proteome – respectively set of sequences submitted by the user – of the amino acid at position i of the repeat. AM is symmetric to zero, negative values indicating that a repeat predominantly consists of rare amino acids (and vice versa). Reptile is running on-line [37] and accepts batch input of up to 50,000 sequences in any of the commonly used formats.
Comparison of programs for the detection of repetitive subsequences in proteins
Program | Method used | Detection of degenerate repeats | Calculation of a P-Value | Analysis of whole Proteomes | %Hits found in SwissProt | Detection of T. brucei procyclin 1 |
---|---|---|---|---|---|---|
Reptile | Hashing2 | No | Yes | Yes | 153 | Yes |
REP [2] | Profiles of known repeats | Yes | No | No | 1.1 | No |
RADAR [5] | Alignment | Yes | No | No | 28 | Yes |
REPRO [7] | Alignment | Yes | No | No | n.a. | Yes |
Internal Repeats finder [8] | Alignment | Yes | Yes | No | 14 | No |
TRIPS [9] | Fourier transform | Yes | No | No | 12 | No |
RepSeq [10] | Hashing | Yes | Yes | Yes | n.a. | Yes |
ProtRepeatsDB [11] | Mixed | Yes | Yes | Yes | n.a. | Yes |
Repper [12] | Fourier transform | Yes | No | No | n.a. | No |
Genome-wide surveys for highly repetitive proteins
Eukaryotic proteomes analyzed
Organism | Kingdom | Type | Proteins |
---|---|---|---|
Homo sapiens | Metazoa | F | 38220 |
Mus musculus | Metazoa | F | 35593 |
Arabidopsis thaliana | Viridiplantae | F | 34554 |
Caenorhabditis elegans | Metazoa | F | 22431 |
Drosophila melanogaster | Metazoa | F | 16239 |
Brachydanio rerio | Metazoa | F | 15647 |
Anopheles gambiae | Metazoa | F | 13486 |
Dictyostelium discoideum | Protozoa | F | 13017 |
Rattus norvegicus | Metazoa | F | 11987 |
Yarrowia lipolytica | Fungi | F | 6525 |
Saccharomyces cerevisiae | Fungi | F | 5810 |
Kluyveromyces lactis | Fungi | F | 5326 |
Schizosaccharomyces pombe | Fungi | F | 5009 |
Entamoeba histolytica | Protozoa | P | 9772 |
Giardia duodenalis | Protozoa | P | 9646 |
Trypanosoma brucei | Protozoa | P | 9210 |
Leishmania major | Protozoa | P | 8010 |
Cryptococcus neoformans | Fungi | P | 6569 |
Plasmodium falciparum | Protozoa | P | 5283 |
Theileria parva | Protozoa | P | 4071 |
Cryptosporidium hominis | Protozoa | P | 3886 |
Theileria annulata | Protozoa | P | 3790 |
Encephalitozoon cuniculi | Fungi | P | 1909 |
Comparative genomics of repeat-containing proteins. Double logarithmic plot of the percentage of highly repetitive (P < 10-10) proteins vs. mean protein length of eukaryotic proteomes. Ag, A. gambiae; At, A. thaliana; Br, B. rerio; Ce, C. elegans; Dd, D. discoideum; Dm, D. melanogaster; Hs, H. sapiens; Kl, K. lactis; Mm, M. musculus; Rn, R. norvegicus; Sc, S. cerevisiae; Sp, S. pombe; Yl, Y. lipolytica; Ch, C. hominis; Cn, C. neoformans; Ec, E. cuniculi; Eh, E. histolytica; Gd, G. duodenalis; Lm, L. major; Pf, P. falciparum; Ta, T. annulata; Tb, T. brucei; Tp, T. parva; rS, Spearman coefficient.
A selection of the most repetitive proteins from pathogens
Name, accession | Sp | L | Repeat | pP |
---|---|---|---|---|
Hypothetical protein, Tb927.1.1740 | Tb | 7154 | 132 × LAEESQQHTARSEADIDE | 2806 |
Gene 11-1 protein*, Q8I6U6 | Pf | 10589 | 967 × EEV | 2457 |
Conserved protein, LmjF29.0110 | Lm | 3418 | 146 × AEEQARR | 1080 |
Proteophosphoglycan-like, LmjF35.0550 | Lm | 2425 | 105 × SSSSSAPSA | 1052 |
Putative antigen*, Tb04.29M18.750 | Tb | 4455 | 66 × NEQYETLQRTNAA | 958 |
Gb4*, Tb09.160.1200 | Tb | 8214 | 35 × VVIIDCRLGSLLIDYKVI | 701 |
Hypothetical protein, Chro.50162 | Ch | 1589 | 84 × KKDAP | 407 |
Hypothetical protein, Q8I455 | Pf | 2349 | 67 × LKEEER | 389 |
Interspersed repeat antigen*, Q8I486 | Pf | 1720 | 67 × QEPVT | 313 |
Putative antigen 332*, Q8IHN3 | Pf | 5507 | 144 × EEI | 274 |
Cell wall surface anchor family, Q97P71 | Spn | 4776 | 1074 × SAS | 3418 |
Cell surface SD repeat protein, Q88XB6 | Lpl | 3360 | 796 × DS | 1619 |
Hypothetical protein, Q8E473 | Sag | 1310 | 106 × TSAS | 447 |
Putative peptidoglycan-bound, Q8Y697 | Lmo | 903 | 78 × ADADA | 403 |
Avirulence protein, Q5GYF3 | Xor | 1790 | 20 × ETVQRLLPVLCQDHGLTP | 401 |
Serine/threonine-rich antigen, Q99QY4 | Sau | 2271 | 163 × STS | 391 |
PE-PGRS family, PG54_MYCTU | Mt | 1901 | 136 × GAG | 326 |
Structural toxin RtxA, Q5X7A6 | Lpn | 7679 | 29 × RFEDDGPVV | 247 |
Ice nucleation protein, Q8PD38 | Xca | 1333 | 52 × GYGST | 242 |
PPE family protein, Q6MX44 | Mtu | 3300 | 95 × NTG | 184 |
Amino acid composition of the repeats
Amino acid composition of the repeats. For each amino acid, the frequency in the repeats of P < 10-10 is plotted vs. its frequency in the remainder of the proteome (rS, Spearman coefficient). Data are pooled for bacteria (n = 193) and eukaryotes (n = 49). The small diamonds at 0.05 mark the expected frequency for random distribution, the diagonal represents equal frequency in the repeats as in the remainder of the respective proteome. Complete datatables including standard deviation are provided as a supplementary file [Additional file 1].
Potential N-glycosylation sites in the repeats. The percentage of asparagines that are in glycosylation consensus (Asn-not Pro-Ser/Thr) is plotted for repeats of P < 10-10 and for the remainders of the respective proteomes. Bars indicate the median. The organism with 30% of asparagines in the repeats in N-glycosylation consensus is T. brucei.
Prediction of repetitive surface antigens
Repetitive membrane proteins of P. falciparum (top) and T. brucei (bottom)
Name, accession | Topology | Repeat | pP |
---|---|---|---|
Hypothetical protein, Q8IJ50 | GPI | 16 × EESHNFYNPTH | 184 |
Circumsporozoite protein, Q7K740 | GPI | 38 × ANPN | 145 |
Merozoite surface protein 8, Q8I476 | GPI | 32 × NN | 29 |
Liver stage antigen, Q8IJ44 | 1 TM | 45 × AKEKLQEQQSDLEQER | 839 |
Erythrocyte membrane protein 3, O96124 | 1 TM | 61 × QQNTGLKNTP | 665 |
Trophozoite antigen, Q8IFL9 | 1 TM | 60 × NHKSD | 287 |
Glycophorin-binding protein, Q8I6U8 | 1 TM | 10 × DPEGQIMREYAADPEYRKHL | 213 |
MAEBL, Q8IHP3 | 1 TM | 19 × EEKKKADELKK | 213 |
PF70 exoantigen, Q8IK15 | 3 TM | 8 × TKKPSKYTMNLDSPLLKGSS | 165 |
MESA, Q8I492 | 1 TM | 94 × KE | 97 |
PfEMP1, Q8I519 | 1 TM | 16 × GGGGGS | 77 |
RESA, Q8IHN1 | 1 TM | 33 × EEN | 63 |
Hypothetical protein, Tb11.02.2360 | GPI | 11 × TAVTDVNDNNSANTSNEDE | 229 |
Hypothetical protein, Tb11.1550 | GPI | 12 × IIAHYC | 68 |
Procyclin (EP-type), Tb10.6k15.0020 | GPI | 29 × PE | 46 |
Hypothetical protein, Tb927.7.360 | GPI | 3 × DKEKTERTEVEEVPKKDPEG | 45 |
Procyclin (GPEET-type), Tb927.6.510 | GPI | 6 × EETGP | 24 |
VSG, Tb10.v4.0209 | GPI | 19 × AA | 13 |
CRAM, Tb10.6k15.3510 | 1 TM | 80 × ITGDCNETDDC | 1050 |
Hypothetical protein, Tb927.3.5530 | 2 TM | 49 × RLRAEEE | 337 |
Hypothetical protein, Tb10.61.0660 | 3 TM | 12 × NEEVPAGVSARRGGVAMSF | 241 |
Procyclic surface glycoprotein, Tb10.26.0790 | 2 TM | 5 × YGQPPPPQ | 31 |
Invariant surface glycoprotein, Tb927.5.350 | 1 TM | 18 × EA | 12 |
Flowchart to Dora, database of repetitive antigens. Reptile, Phobius [20], and GPI-SOM [43] are integrated into an automated pipeline for the classification of proteins (top). The data are stored in a database that is accessible on-line [44] via the depicted interface (bottom). This allows user-defined Boolean queries for repeat-containing surface proteins.
New specific and robust tests are urgently needed for the diagnosis of sleeping sickness, malaria, tuberculosis, and other neglected diseases [45, 46]. PCR not being applicable in the field, serology (i.e. the detection of parasite-specific antibodies) remains the principal method of detection for many tropical diseases. Dora provides a convenient portal for identification of candidate antigens for serological tests. In addition, it can be helpful for the selection of vaccine candidates. Dora returns the hits in Fasta format, which is suitable for subsequent bioinformatic analyses.
Conclusion
Reptile's simple algorithm allows large-scale and quantitative description of perfect amino acid repeats. Originally designed to scan parasite proteomes for potential antigens and virulence factors, Reptile detects any protein of repetitive nature and thereby complements existing tools which work by self-alignment. Parasite proteomes vary considerably regarding the proportion of repetitive proteins, in contrast to those of free-living eukaryotes which all contain around 3% highly repetitive (P < 10-10) proteins. Furthermore, the proportion of highly repetitive proteins correlates with mean protein length in parasites but not in the proteomes of free-living eukaryotes, illustrating the importance of amino acid repeats for parasites.
Scanning the predicted proteomes of parasites for amino acid repeats returned a large number of interesting proteins. Particularly useful was the combination of Reptile with prediction of glycosylation sites, export signals, transmembrane domains and GPI-anchor attachment sites, carried out on more than one million proteins from 242 different organisms. All data are accessible on-line via Dora, database of repetitive antigens. The approach was validated against T. brucei and P. falciparum, where a Dora search returned the known surface antigens, virulence factors, and vaccine candidates plus many new, so far uncharacterized proteins.
Methods
Proteome files
Predicted proteome files were obtained from the Integr8 database of the European Bioinformatics Institute [47]. The download was automated with a Python script that periodically checks for newly available proteomes, respectively for updates to previous proteome files.
Statistics
Statistical tests were performed with Prism 4.0 (GraphPad Software). Since the percentages of repeats in proteomes as well as the frequencies of amino acids were not normally distributed, non-parametric tests were used: Mann-Whitney test [48], Wilcoxon signed rank test [49], and Spearman correlation [50].
Reptile
The repeat detection algorithm is described under Results. The program is written in C++ and the web-interface in Perl-CGI. Reptile uses sreformat from the HMMer package [51] to convert different input formats (Fasta, GenBank, EMBL, Swiss-Prot, PIR, GCG) to Fasta. Reptile runs on a vmware (virtual infrastructure) server. Availability and requirements:
Project name: Reptile
Project home page: http://genomics.unibe.ch/software/reptile.tar.gz
Operating systems: Linux, Unix
Programming language: C++
Licence: GNU GPL
Dora
A Python script periodically runs Reptile, GPI-SOM, and Phobius over all new or updated proteome files of Integr8. The results are stored in a MySQL database. For sake of simplicity, for each protein only the repeat with the lowest P-value is stored. A Perl script is used to interconvert Fasta format and SQL. The web interface of Dora is written in PhP. The database and all the programs run on the vmware server of the Informatics Services of the University of Bern.
Declarations
Acknowledgements
We wish to thank the Informatikdienste of the University of Bern for resources and support. This work was supported by the Swiss National Science Foundation, the Roche Research Foundation, and Biomedizin-Naturwissenschaft-Forschung Bern (TN).
Authors’ Affiliations
References
- Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D: A census of protein repeats. J Mol Biol 1999, 293: 151–160. 10.1006/jmbi.1999.3136PubMedView ArticleGoogle Scholar
- Andrade MA, Ponting CP, Gibson TJ, Bork P: Homology-based method for identification of protein repeats using statistical significance estimates. J Mol Biol 2000, 298: 521–537. 10.1006/jmbi.2000.3684PubMedView ArticleGoogle Scholar
- Andrade MA, Perez-Iratxeta C, Ponting CP: Protein repeats: structures, functions, and evolution. J Struct Biol 2001, 134: 117–131. 10.1006/jsbi.2001.4392PubMedView ArticleGoogle Scholar
- Heger A, Holm L: Rapid automatic detection and alignment of repeats in protein sequences. Proteins 2000, 41: 224–237. 10.1002/1097-0134(20001101)41:2<224::AID-PROT70>3.0.CO;2-ZPubMedView ArticleGoogle Scholar
- Radar [http://www.ebi.ac.uk/Radar]
- Repro [http://ibivu.cs.vu.nl/programs/reprowww]
- George RA, Heringa J: The REPRO server: finding protein internal sequence repeats through the Web. Trends Biochem Sci 2000, 25: 515–517. 10.1016/S0968-0004(00)01643-1PubMedView ArticleGoogle Scholar
- Pellegrini M, Marcotte EM, Yeates TO: A fast algorithm for genome-wide analysis of proteins with repeated sequences. Proteins 1999, 35: 440–446. 10.1002/(SICI)1097-0134(19990601)35:4<440::AID-PROT7>3.0.CO;2-YPubMedView ArticleGoogle Scholar
- Internal Repeats Finder [http://nihserver.mbi.ucla.edu/Repeats]
- Katti MV, Sami-Subbu R, Ranjekar PK, Gupta VS: Amino acid repeat patterns in protein sequences: their diversity and structural-functional implications. Protein Sci 2000, 9: 1203–1209.PubMed CentralPubMedView ArticleGoogle Scholar
- TRIPS [http://www.ncl-india.org/trips]
- Szklarczyk R, Heringa J: Tracking repeats using significance and transitivity. Bioinformatics 2004, 20 Suppl 1: I311-I317. 10.1093/bioinformatics/bth911PubMedView ArticleGoogle Scholar
- Trust [http://zeus.cs.vu.nl/programs/trustwww/]
- Murray KB, Taylor WR, Thornton JM: Toward the detection and validation of repeats in protein structure. Proteins 2004, 57: 365–380. 10.1002/prot.20202PubMedView ArticleGoogle Scholar
- Depledge DP, Lower RP, Smith DF: RepSeq--a database of amino acid repeats present in lower eukaryotic pathogens. BMC Bioinformatics 2007, 8: 122. 10.1186/1471-2105-8-122PubMed CentralPubMedView ArticleGoogle Scholar
- RepSeq [http://www.repseq.gugbe.com]
- REP [http://www.embl-heidelberg.de/~andrade/papers/rep/search.html]
- Gruber M, Soding J, Lupas AN: REPPER--repeats and their periodicities in fibrous proteins. Nucleic Acids Res 2005, 33: W239–43. 10.1093/nar/gki405PubMed CentralPubMedView ArticleGoogle Scholar
- Repper [http://toolkit.tuebingen.mpg.de/repper]
- Kalita MK, Ramasamy G, Duraisamy S, Chauhan VS, Gupta D: ProtRepeatsDB: a database of amino acid repeats in genomes. BMC Bioinformatics 2006, 7: 336. 10.1186/1471-2105-7-336PubMed CentralPubMedView ArticleGoogle Scholar
- ProtRepeatsDB [http://bioinfo.icgeb.res.in/repeats]
- Leid RW, Suquet CM, Tanigoshi L: Parasite defense mechanisms for evasion of host attack; a review. Vet Parasitol 1987, 25: 147–162. 10.1016/0304-4017(87)90101-4PubMedView ArticleGoogle Scholar
- Kedzierski L, Montgomery J, Curtis J, Handman E: Leucine-rich repeats in host-pathogen interactions. Arch Immunol Ther Exp (Warsz) 2004, 52: 104–112.Google Scholar
- Roditi I, Carrington M, Turner M: Expression of a polypeptide containing a dipeptide repeat is confined to the insect stage of Trypanosoma brucei. Nature 1987, 325: 272–274. 10.1038/325272a0PubMedView ArticleGoogle Scholar
- Vassella E, Acosta-Serrano A, Studer E, Lee SH, Englund PT, Roditi I: Multiple procyclin isoforms are expressed differentially during the development of insect forms of Trypanosoma brucei. J Mol Biol 2001, 312: 597–607. 10.1006/jmbi.2001.5004PubMedView ArticleGoogle Scholar
- Enea V, Ellis J, Zavala F, Arnot DE, Asavanich A, Masuda A, Quakyi I, Nussenzweig RS: DNA cloning of Plasmodium falciparum circumsporozoite gene: amino acid sequence of repetitive epitope. Science 1984, 225: 628–630. 10.1126/science.6204384PubMedView ArticleGoogle Scholar
- Peacock SJ, Moore CE, Justice A, Kantzanou M, Story L, Mackie K, O'Neill G, Day NP: Virulent combinations of adhesin and toxin genes in natural populations of Staphylococcus aureus. Infect Immun 2002, 70: 4987–4996. 10.1128/IAI.70.9.4987-4996.2002PubMed CentralPubMedView ArticleGoogle Scholar
- Beadle C, Long GW, Weiss WR, McElroy PD, Maret SM, Oloo AJ, Hoffman SL: Diagnosis of malaria by detection of Plasmodium falciparum HRP-2 antigen with a rapid dipstick antigen-capture assay. Lancet 1994, 343: 564–568. 10.1016/S0140-6736(94)91520-2PubMedView ArticleGoogle Scholar
- Snounou G, Renia L: The vaccine is dead--long live the vaccine. Trends Parasitol 2007, 23: 129–132. 10.1016/j.pt.2007.02.001PubMedView ArticleGoogle Scholar
- Ansari FA, Kumar N, Bala Subramanyam M, Gnanamani M, Ramachandran S: MAAP: Malarial adhesins and adhesin-like proteins predictor. Proteins 2007.Google Scholar
- Samen U, Eikmanns BJ, Reinscheid DJ, Borges F: The surface protein Srr-1 of Streptococcus agalactiae binds human keratin 4 and promotes adherence to epithelial HEp-2 cells. Infect Immun 2007.Google Scholar
- Brinster S, Posteraro B, Bierne H, Alberti A, Makhzami S, Sanguinetti M, Serror P: Enterococcal leucine-rich repeat-containing protein involved in virulence and host inflammatory response. Infect Immun 2007, 75: 4463–4471. 10.1128/IAI.00279-07PubMed CentralPubMedView ArticleGoogle Scholar
- Tomley FM, Billington KJ, Bumstead JM, Clark JD, Monaghan P: EtMIC4: a microneme protein from Eimeria tenella that contains tandem arrays of epidermal growth factor-like repeats and thrombospondin type-I repeats. Int J Parasitol 2001, 31: 1303–1310. 10.1016/S0020-7519(01)00255-7PubMedView ArticleGoogle Scholar
- de la Fuente J, Garcia-Garcia JC, Barbet AF, Blouin EF, Kocan KM: Adhesion of outer membrane proteins containing tandem repeats of Anaplasma and Ehrlichia species (Rickettsiales: Anaplasmataceae) to tick cells. Vet Microbiol 2004, 98: 313–322. 10.1016/j.vetmic.2003.11.001PubMedView ArticleGoogle Scholar
- Cherny I, Rockah L, Levy-Nissenbaum O, Gophna U, Ron EZ, Gazit E: The formation of Escherichia coli curli amyloid fibrils is mediated by prion-like peptide repeats. J Mol Biol 2005, 352: 245–252.PubMedView ArticleGoogle Scholar
- Inclusion-exclusion principle [http://en.wikipedia.org/wiki/Inclusion-exclusion_principle]
- Reptile [http://reptile.unibe.ch]
- Katinka MD, Duprat S, Cornillot E, Metenier G, Thomarat F, Prensier G, Barbe V, Peyretaillade E, Brottier P, Wincker P, Delbac F, El Alaoui H, Peyret P, Saurin W, Gouy M, Weissenbach J, Vivares CP: Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 2001, 414: 450–453. 10.1038/35106579PubMedView ArticleGoogle Scholar
- Petersen C, Nelson R, Leech J, Jensen J, Wollish W, Scherf A: The gene product of the Plasmodium falciparum 11.1 locus is a protein larger than one megadalton. Mol Biochem Parasitol 1990, 42: 189–195. 10.1016/0166-6851(90)90161-EPubMedView ArticleGoogle Scholar
- Ilg T: Proteophosphoglycans of Leishmania. Parasitol Today 2000, 16: 489–497. 10.1016/S0169-4758(00)01791-9PubMedView ArticleGoogle Scholar
- Campuzano J, Aguilar D, Arriaga K, Leon JC, Salas-Rangel LP, Gonzalez-y-Merchand J, Hernandez-Pando R, Espitia C: The PGRS domain of Mycobacterium tuberculosis PE_PGRS Rv1759c antigen is an efficient subunit vaccine to prevent reactivation in a murine model of chronic tuberculosis. Vaccine 2007, 25: 3722–3729. 10.1016/j.vaccine.2006.12.042PubMedView ArticleGoogle Scholar
- Kall L, Krogh A, Sonnhammer EL: A combined transmembrane topology and signal peptide prediction method. J Mol Biol 2004, 338: 1027–1036. 10.1016/j.jmb.2004.03.016PubMedView ArticleGoogle Scholar
- Fankhauser N, Maser P: Identification of GPI anchor attachment signals by a Kohonen self-organizing map. Bioinformatics 2005, 21: 1846–1852. 10.1093/bioinformatics/bti299PubMedView ArticleGoogle Scholar
- Dora [http://genomics.unibe.ch/dora]
- Usdin M, Guillerm M, Chirac P: Neglected tests for neglected patients. Nature 2006, 441: 283–284. 10.1038/441283aPubMedView ArticleGoogle Scholar
- FIND diagnostics [http://www.finddiagnostics.org]
- Pruess M, Kersey P, Apweiler R: The Integr8 project--a resource for genomic and proteomic data. In Silico Biol 2005, 5: 179–185. [ftp://ftp.ebi.ac.uk/pub/databases/integr8/]PubMedView ArticleGoogle Scholar
- Mann Whitney test [http://en.wikipedia.org/wiki/Mann-Whitney_U]
- Wilcoxon test [http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test]
- Spearman correlation [http://en.wikipedia.org/wiki/Spearman_correlation]
- Eddy SR: Multiple alignment using hidden Markov models. Proc Int Conf Intell Syst Mol Biol 1995, 3: 114–120.PubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.