Figure 3From: Protein sequence classification using feature hashing The distribution of the variable length k -grams. The variable length k-grams in each protein data set: (a) non-plant, (b) plant, and (c) psortNeg, follow a Zipf distribution, i.e., only very few k-grams occur with high frequency, whereas the majority of them occur very rarely.Back to article page