Disjoint Sets
The Dynamic Connectivity Problem
Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:
connect(p, q)
: Connect items p and q.isConnected(p, q)
: Are p and q connected?
1 | connect(0, 1) |
The Disjoint Sets ADT
1 | public interface DisjointSets { |
Goal: Design an efficient DisjointSets
implementation.
- Number of elements can be huge.
- Number of method calls can be huge.
- Calls to methods may be interspersed (e.g. can’t assume that we stop getting connect calls after some point).
The Naive Approach
- Connecting two things: Record every single connecting line in some data structure.
- Checking connectedness: Do some sort of (??) iteration over the lines to see if one thing can be reached from the other.
A Better Approach: Connected Components
- Rather than manually writing out every single connecting line, record the sets that something belongs to.
A connected component is a maximal set of items that are mutually connected.
- Naive approach: Record every single connecting line somehow.
- Better approach: Model connectedness in terms of sets.
- How things are connected isn’t something we need to know.
{ 0, 1, 2, 4 }, {3, 5}, {6}
Quick Find
Challenge: Pick Data Structures to Support Tracking of Sets
Map<Integer, SetGuy>
, where the Integer is the item in questionmap.get(1)
← return a reference to {0, 1, 2, 4}- A map from integers is also an array.
List<HashSet>
, where we move things around between sets- Requires iterating through all the hashsets to find something.
One natural choice: int[]
where ith entry gives set number of item i.
1 | public class QuickFindDS implements DisjointSets { |
Performance Summary
Implementation | constructor | connect | isConnected |
---|---|---|---|
QuickFindDS |
QuickFindDS
is too slow: Connecting two items takes N time.
Quick Union
Improving the Connect Operation
- Approach zero: Represent everything as boxes and lines. This was overkill.
- Approach one: Represent everything as connected components. Represented connected components as an array.
- Approach two: We’re still going to stick with connected components, but will represent connected components differently.
A hard question: How could we change our set representation so that combining two sets into their union requires changing one value?
Possibly harder question:
- Knowing that we only need to support the
union/belongs
operations, how can we represent a set such that the set union operation is very fast? - Idea: Assign each node a parent (instead of an id).
- An innocuous sounding, seemingly arbitrary solution.
- Unlocks a pretty amazing universe of math that we won’t discuss.
connect(5, 2)
How do we do this?
- If you’re not sure where to start, consider: why can’t we just set
id[5]
to 2? - Find the boss of 5. ← this isn’t free!
- Find the boss of 2.
- Change the value of the boss of 5 to boss of 2?
- make
root(5)
into a child ofroot()
. - What are the potential performance issues with this approach?
- Tree can get too tall!
If we always connect the first items’ tree below the second item’s tree, we can end up with a tree of height M after M operations:
1 |
|
Performance Summary
Implementation | constructor | connect | isConnected |
---|---|---|---|
QuickFindDS |
QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind
if tree is imbalanced.
Weighted Quick Union
Modify quick-union to avoid tall trees.
- Track tree size (number of elements).
- New rule: Always link root of smaller tree to larger tree.
Minimal changes needed:
- Use
parent[]
array as before, but also addsize[]
array. isConnected(int p, int q)
requires no changes.connect(int p, int q)
needs two changes:- Link root of smaller tree to larger tree.
- Update size[] array.
Now the connect method looks like:
1 | public void connect(int p, int q) { |
As before, connect
and isConnected
require time proportional to depth of the items involved.
Max depth of any item:
Very brief proof for the curious (not covered in lecture):
- Depth of an element x increases only when tree T1 that contains x is linked below some other tree T2.
- The size of the tree at least doubles since weight(T2) ≥ weight(T1).
- Tree containing x doubles at most log N times.
Performance Summary
Implementation | constructor | connect | isConnected |
---|---|---|---|
QuickFindDS |
By tweaking QuickUnionDS
we’ve achieved logarithmic time performance.
- Fast enough for most (all?) practical problems.
Performing M operations on a DisjointSet object with N elements:
- Runtime goes from to
- For and , time to run goes from 30 years to 6 seconds.
- Key point: Good data structure unlocks solutions to problems that could otherwise not be solved!
- … pretty good for most problems, but could we do better?