Longest Common Substring

Problem: Given two strings ‘X’ and ‘Y’, return the longest common substring.

Input : X = "abcdxyz", y = "xyzabcd"
Output : "abcd"

Input : X = "zxabcdezy", y = "yzabcdezx"
Output : "abcdez"

Solution

Assume strings s1 = "abab" and s2 = "baba"

S1 + S2 = a b a b $ b a b a #
Index   = 1 2 3 4 5 6 7 8 9 0

First build a generalized suffix tree that includes both strings. For this, you should add a unique delimiter to each string before making the generalized tree.

Then search for the deepest internal node in the tree that contains both delimiters within its subtree. The path label along this tree will be LCS. In our case, our tree returned two valid solutions: 'aba' or 'bab'.

Why this works

A internal node in a suffix tree carries that property that it shares a common suffix from its parent. Therefore the deeper we move down the tree, the longer the path label gets are is shared by at least on other suffix. Searching for both delimiters in the tree is a way of confirm that both strings have visited the internal node, and are therefore both guaranteed to share a common substring.

Last updated