Naive Implementation

This is also known as a keyword tree, but it accomplishes the same task.

Naive

Disadvantages:

  • Not compressed, traversals are redundant in some parts.

  • Use of literal characters rather than indices, so its also not space efficient.

  • O(n2)O(n^2) time complexity. For construction of single word of size P|P|, every nn suffixes of quantity PP are inserted character by character by traversing down through the tree.

Advantages:

  • Less error prone

  • Simple to implement

  • Works

#define ALPHA_SIZE 26
#define EOC '$'

struct Node {
    Node() {
        // assumption: the number unique characters does not 
        // extend beyond the english alphabet
        children.reserve(ALPHA_SIZE);
    }

    unordered_map<char, Node*> children;
    int index;
};

class Suffix_Tree {
public:
    Suffix_Tree(const string word) : _word(word) {
        _root = new Node();
        const size_t size = word.size();

        for (int i = 0; i < size - 1; i++) {
            _insert(word.substr(i, size - i));
        }
    }

    void bfs_pretty_print() 
    {
        if (!_root) return;

        queue<Node*> q;
        q.push(_root);
        q.push(nullptr);

        while (!q.empty()) {
            Node* cur_top = q.front();
            q.pop();

            if (cur_top) {
                for (auto i : cur_top->children) {
                    cout << i.first << ' ';
                    if (i.first == EOC) {
                        cout << "(" << _word.size() - cur_top->index << ") ";
                    }
                }

                for (auto i : cur_top->children) 
                    q.push(i.second);
            }
            else {
                if (!q.empty())
                    q.push(nullptr);
                cout << endl;
            }
        }
    }

private:
    Node *_root;
    const string _word;

    void _insert(const string name) 
    {
        int depth_count = 0;
        Node *iter = _root;
        for (char c : name) {
            depth_count++;

            if (iter->children.find(c) == iter->children.end()) {
                Node *newNode = new Node();
                iter->children.insert({ c, newNode });

                if (c == EOC) {
                    // making 1-indexed
                    iter->index = depth_count + 1;
                }
            }

            iter = iter->children.find(c)->second;

        }
    }
};

int main() {
    string test_str = "aabaacab$";
    Suffix_Tree tree(test_str);
    tree.bfs_pretty_print();
}

Output

a b c
a b c a $ (8) a
b c a $ (7) a a b
a a a b c $ (6)
a b c $ (5) a
c $ (4) a b
a b $ (3)
b $ (2)
$ (1)

Implementation Details

How this algorithm works can be summarized in the _insert method. Basically, we are recreating a trie with a few minor changes such as the special character delimiter, and insertion of additional suffixes.

A trie is essentially a n-ary tree. Since we know that each node that carries a character is guarenteed to be unique, we can substitute a vector with an unordered_map. For each word that gets inserted, its characters guide its traversal down the tree. If no node is found, a new one gets created. The leaf nodes, indicated by the special character delimiter carry the occurrences of a suffix.

Last updated