Friday, February 25, 2011

Notes on Suffix tree

# A very nice tutorial on Suffix tree <http://www.cise.ufl.edu/~sahni/dsaaj/enrich/c16/suffix.htm


[Applications]
longest repeated substring problem:
The longest repeated substring problem is finding the longest substring of a string that occurs at least twice. This problem can be solved in linear time and space by building a suffix tree for the string, and finding the deepest internal node in the tree. The string spelled by the edges from the root to such a node is a longest repeated substring. The problem of finding the longest substring with at least k occurrences can be found by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least k descendants.

Solutions from Sartaj Sahni

Find the longest substring of S that appears at least m > 1 times. This query can be answered in O(|S|) time in the following way:
(a) Traverse the suffix tree labeling the branch nodes with the sum of the label lengths from the root and also with the number of information nodes in the subtrie.
(b) Traverse the suffix tree visiting branch nodes with information node count >= m. Determine the visited branch node with longest label length.


Note that step (a) needs to be done only once. Following this, we can do step (b) for as many values of m as is desired. Also, note that when m = 2 we can avoid determining the number of information nodes in subtries. In a compressed trie, every subtrie rooted at a branch node has at least two information nodes in it.

[Some Notes]
An example of Surfix Tree:

 
Figure 4 A more humane drawing of a suffix tree

A fundamental observation used when searching for a pattern P in a string S is that P appears in S (i.e., P is a substring of S) iff P is a prefix of some suffix of S
Let's take the above figure as an example, 
if follow A->C->H : pe is the prefix of eper

[Some Related Codes]
--

♥ ¸¸.•*¨*•♫♪♪♫•*¨*•.¸¸♥

No comments:

Post a Comment