Skip to main content
Figure 3 | Proteome Science

Figure 3

From: Protein sequence classification using feature hashing

Figure 3

The distribution of the variable length k -grams. The variable length k-grams in each protein data set: (a) non-plant, (b) plant, and (c) psortNeg, follow a Zipf distribution, i.e., only very few k-grams occur with high frequency, whereas the majority of them occur very rarely.

Back to article page