The Dynamic Connectivity Problem

Goal: Given a series of pairwise integers connectedness declarations, determine if two integers (or items) are connected. Two operations:

  • connect(p, q): Connect items p and q.
  • isConnected(p, q): Are p and q connected?
1
2
3
4
5
6
7
8
9
10
connect(0, 1)
connect(1, 2)
connect(0, 4)
connect(3, 5)
isConnected(2, 4): true
isConnected(3, 0): false
connect(4, 2)
connect(4, 6)
connect(3, 6)
isConnected(3, 0): true

connect

The Disjoint Sets ADT

1
2
3
4
5
6
7
public interface DisjointSets {
/** Connects two items P and Q. */
void connect(int p, int q);

/** Checks to see if two items are connected. */
boolean isConnected(int p, int q);
}

Goal: Design an efficient DisjointSets implementation.

  • Number of elements NN can be huge.
  • Number of method calls MM can be huge.
  • Calls to methods may be interspersed (e.g. can’t assume that we stop getting connect calls after some point).

The Naive Approach

  • Connecting two things: Record every single connecting line in some data structure.
  • Checking connectedness: Do some sort of (??) iteration over the lines to see if one thing can be reached from the other.

A Better Approach: Connected Components

  • Rather than manually writing out every single connecting line, record the sets that something belongs to.

A connected component is a maximal set of items that are mutually connected.

  • Naive approach: Record every single connecting line somehow.
  • Better approach: Model connectedness in terms of sets.
    • How things are connected isn’t something we need to know.

{ 0, 1, 2, 4 }, {3, 5}, {6}

Quick Find

Challenge: Pick Data Structures to Support Tracking of Sets

  1. Map<Integer, SetGuy>, where the Integer is the item in question
    • map.get(1) ← return a reference to {0, 1, 2, 4}
    • A map from integers is also an array.
  2. List<HashSet>, where we move things around between sets
    • Requires iterating through all the hashsets to find something.

One natural choice: int[] where ith entry gives set number of item i.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
public class QuickFindDS implements DisjointSets {
private int[] id;

public boolean isConnected(int p, int q) { // Very fast: Two array accesses.
return id[p] == id[q];
}

public void connect(int p, int q) { // Relatively slow: N+2 to 2N+2 array accesses.
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++) {
if (id[i] == pid) {
id[i] = qid;
}
}...

public QuickFindDS(int N) {
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}
}

Performance Summary

Implementation constructor connect isConnected
QuickFindDS Θ(N)Θ(N) Θ(N)Θ(N) Θ(1)Θ(1)

QuickFindDS is too slow: Connecting two items takes N time.

Quick Union

Improving the Connect Operation

  • Approach zero: Represent everything as boxes and lines. This was overkill.
  • Approach one: Represent everything as connected components. Represented connected components as an array.
  • Approach two: We’re still going to stick with connected components, but will represent connected components differently.

A hard question: How could we change our set representation so that combining two sets into their union requires changing one value?

Possibly harder question:

  • Knowing that we only need to support the union/belongs operations, how can we represent a set such that the set union operation is very fast?
  • Idea: Assign each node a parent (instead of an id).
    • An innocuous sounding, seemingly arbitrary solution.
    • Unlocks a pretty amazing universe of math that we won’t discuss.

connect(5, 2)

How do we do this?

  • If you’re not sure where to start, consider: why can’t we just set id[5] to 2?
  • Find the boss of 5. ← this isn’t free!
  • Find the boss of 2.
  • Change the value of the boss of 5 to boss of 2?
  • make root(5) into a child of root().
  • What are the potential performance issues with this approach?
    • Tree can get too tall!

If we always connect the first items’ tree below the second item’s tree, we can end up with a tree of height M after M operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

public class QuickUnionDS implements DisjointSets {
private int[] parent;

public QuickUnionDS(int N) {
parent = new int[N];
for (int i = 0; i < N; i++)
parent[i] = i;
}

private int find(int p) { // For N items, this means a worst case runtime of Θ(N).
while (p != parent[p])
p = parent[p];
return p;
}

public boolean isConnected(int p, int q) {
return find(p) == find(q);
}

public void connect(int p, int q) {
int i = find(p);
int j = find(q);
parent[i] = j;
}
}

Performance Summary

Implementation constructor connect isConnected
QuickFindDS Θ(N)Θ(N) O(N)O(N) O(N)O(N)

QuickUnion defect: Trees can get tall. Results in potentially even worse performance than QuickFind if tree is imbalanced.

Weighted Quick Union

Modify quick-union to avoid tall trees.

  • Track tree size (number of elements).
  • New rule: Always link root of smaller tree to larger tree.

Minimal changes needed:

  • Use parent[] array as before, but also add size[] array.
  • isConnected(int p, int q) requires no changes.
  • connect(int p, int q) needs two changes:
    • Link root of smaller tree to larger tree.
    • Update size[] array.

Now the connect method looks like:

1
2
3
4
5
6
7
8
9
10
11
12
public void connect(int p, int q) {
int i = find(p);
int j = find(q);
if (i == j) return;
if (size[i] < size[j]) {
parent[i] = j;
size[j] += size[i];
} else {
parent[j] = i;
size[i] += size[j];
}
}

As before, connect and isConnected require time proportional to depth of the items involved.

Max depth of any item: logNlog N

Very brief proof for the curious (not covered in lecture):

  • Depth of an element x increases only when tree T1 that contains x is linked below some other tree T2.
    • The size of the tree at least doubles since weight(T2) ≥ weight(T1).
    • Tree containing x doubles at most log N times.

Performance Summary

Implementation constructor connect isConnected
QuickFindDS Θ(N)Θ(N) O(logN)O(log N) O(logN)O(log N)

By tweaking QuickUnionDS we’ve achieved logarithmic time performance.

  • Fast enough for most (all?) practical problems.

Performing M operations on a DisjointSet object with N elements:

  • Runtime goes from O(MN)O(MN) to O(N+MlogN)O(N + M log N)
  • For N=109N = 10^9 and M=109M = 10^9, time to run goes from 30 years to 6 seconds.
    • Key point: Good data structure unlocks solutions to problems that could otherwise not be solved!
  • … pretty good for most problems, but could we do better?

Path Compression