iCitrus database. Three major sources were used in creating iCitrus dataset: UC Riverside HarvEST:citrus (C46 assembly), NCBI/citrus/unigenes and NCBI/citrus/proteins (see text). The first two datasets were translated into all 6 reading frames, split at stop codons, and sequences shorter than 50 amino acids were removed. These were combined with the NCBI protein sequences, and all three protein sequence sets were then clustered at 100% identity using CD-HIT http://bioinformatics.ljcrf.edu/cd-hi/, meaning that sequences that aligned with 100% identity to a longer sequence in the combined set were removed. All remaining sequences were then blasted to TAIR proteins, and separately to the subset of NCBI's nr database belonging to taxa within Viridiplantae, to collect GO-term and descriptive annotation for the clustered sequences.