Disjoint Sets

Disjoint sets are data structures that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. That means given a set of numbers, the challenge is to efficiently union groups together. One popular way this is implemented is using union by rank and path compression.

Applications:

Kruskel's algorithm
Finding a cycle in an undirected graph

The API:

/*
 * makes a new set by creating a new element with a unique id, a rank of 0,
 * and a parent pointer to itself
 */ 
 MakeSet(x)


/*
 * follows the chain of parent pointers from x upwards through the tree until 
 * an element is reached whose parent is itself
 */
 Find(x)


/*
 * uses Find to determine the roots of the trees x and y belong to
 */
 Union(x, y)

Complexity: O(N) space for N elements, and O(M*alpha(N)) where alpha(N) is the Inverse Ackermann function which grows very slowly. We can argue that alpha(N) <= 4, and hence the time complexity becomes O(4M) or O(M)

The Algorithm:

Disjoint_set(7)

We begin with a set of numbers, or id's. Calling MakeSet(x) on these will form each of the node into their own discrete set with a rank of 0 (in red).

Union(1,2)

Union groups two sets together. It does so by first performing a find operation on both 1 and 2 to identify it's root. In this process it also does path compression, but more on that later. Since these nodes are already the root, it simply returns it's id. Next we look at the rank of both nodes. Both are 0, so selecting either of the nodes as the representative of the set will do. For clarity, I always chose the smaller of the two. Since 1 is smaller, the parent pointer of the second set points to the representative node of the first set. Additionally, since both representative nodes have the same rank, the rank of the new merged representative node gets incremented by 1.

Union(2,3)

Both 2 and 3 are in their own sets, so this union will merge these groups together. Find 2 reveals the root node 1, and Find 3 returns 3. Since 1 has a higher rank, the parent pointer of 3 will point to the representative node of the first set (1). Since the ranks are different, the new merged representative node does not get incremented. Notice how the rank is simply the depth of the tree.Union(4,5)

This process was the same as performing Union(1,2) from earlier.

Union(6,7)

Same as before.Union(5,6)

Find 5 returns 4, and Find 6 returns 6.The ranks are identical, so we select the smaller of the two (5), and then have 6's parent point to 5. Finally, the rank of 5 gets incremented.Union(3,7)

Find 3 returns 1. Find 7 returns 4, but in the process of doing so it performed path compression, so that 7's new parent will now point to 4. Node 4 has a higher rank, so 1 will point to 4, and 4's rank does not get incremented.Find(2)

At this point, all the nodes are unioned into a single set. However, for the purposes of demonstration, finding 2 will perform path compression to it's parent 4.

Discussion

How is path compression beneficial?

It optimizes the performance for future findand unionoperations as it shorted the number of chained find operations in order to reach the root. Additionally, since this computation is done regardless, there is no harm is modifying these pointers since the entire purpose of findis to reach the root anyway.

What effect does prioritizing the higher rank have of two nodes have?

Minimizing the depth of the unioned tree, and hence few chained findoperations. Rank effectively represents the depth of the tree, and so if a higher ranked tree pointed to a lower ranked tree. The merged tree would have a smaller width. However, if we maximize the width of the tree (by merging a lower rank to a higher rank), the depth becomes smaller, and so findwill perform fewer calls.

Why does the merging on two trees with the same rank increment the depth of the new tree?

Because they have the same depth, and one pointer must be modified to point to the new root, hence increasing the depth of the new tree by one.

PreviousImplementation 2 NextImplementation

Last updated 4 years ago