Skip to main content

Table 1 Comparison of fixed-length with variable-length k-gram representations.

From: Protein sequence classification using feature hashing

Bag of fixed or variable length k-grams

non-plant

 

Accuracy %

# features

1-grams

71.21

20

2-grams

70.85

400

3-grams

79.80

7999

4-grams

79.03

146598

(1-2)-grams

70.56

420

(1-3)-grams

79.69

8419

(1-4)-grams

82.83

155017

(1-5)-grams

80.09

950849

  1. The performance of SVM classifiers trained using feature hashing on fixed length, 1-, 2-, 3-, 4-gram representations, as well as variable length, (1-2)-, (1-3)-, (1-4)-, (1-5)-grams representations, where the hash size is set to 222, on the non-plant data set.