Читать книгу Informatics and Machine Learning - Stephen Winters-Hilt - Страница 30

2.2 Counting, the Enumeration Problem, and Statistics

In the example in the previous section we left off with counts on all 4096 hexamers seen in a given genome. If we go from counts on substrings of length 6 to substrings of length 30 we run into a problem – there are now a million million million (10¹⁸) substrings to get counts on. No genome is even remotely this large, so when getting counts on substrings in this situation most substring counts will necessarily be zero. Due to the large number of substrings, this is often referred to as “the enumeration problem,” but since counts need only be maintained that are nonzero, we are bounded by genome size, for which there is no enumeration problem. The main mechanism for capturing count information on substrings without dedicated (array) memory, is by use of associative memory constructs, such as the hash variable, and this technique is employed in the code examples.

Подняться наверх