Union-Find

Dynamic connectivity


Given a set of N objects.

  • Union command: connect two objects.
  • Find/connect query: is there a path connecting the two objects?

When programming, convenient to name objects 00 to N1N - 1.

  • Use integers as array index.
  • Suppress details not relevant to union-find.

We assume "is connected to" is an equivalence relation:

  • Reflexive
  • Symmetric
  • Transitive

Connect components. Maximal set of objects that are mutually connected.
Find query. Check if two objects are in the same component.
Union command. Replace components containing two objects with their union.

Goal. Design efficient data structure for union-find.

  • Number of objects NN can be huge.
  • Number of operations MM can be huge.
  • Find queries and union commands may be intermixed.

Public class UF.

1
2
3
4
5
UF(int N)
void union(int p, int q)
boolean connected(int p, int q)
int find(int p)
int count()

Dynamic-connectivity client.

  • Read in number of objects NN from standard input.
  • Repeat:
    • Read in pair of integers from standard input.
    • If they are not yet connected, connect them and print out pair.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public static void main(String[] args)
{
int N = StdIn.readInt();
UF uf = new UF(N);
While (!StdIn.isEmpty())
{
int p = StdIn.readInt();
int q = StdIn.readInt();
if (!uf.connected(p, q))
{
uf.union(p, q);
StdOut.println(p + " " + q);
}
}
}

Quick-find [eager approach]


Data structure.

  • Integer array id[] of size NN.
  • Interpretation: pp and qq are connected iff they have the same idid.

Find. Check if pp and qq have the same idid.

Union. To merge components containing pp and qq, change all entries whose idid equals id[p] to id[q].

Java implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
public class QuickFindUF
{
private int[] id;

public QuickFindUF(int N)
{
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}

public boolean connected(int p, int q)
{ return id[p] == id[q]; }

public void union(int p, int q)
{
int pid = id[p];
int qid = id[q];
for (int i = 0; i < id.length; i++)
if (id[i] == pid)
id[i] = qid;
}
}

Cost model. Number of array accesses (for read or write).

algorithm initialize union find
quick-find NN NN 11

Quadratic algorithms do not scale.
Rough standard (for now).

  • 10910^9 operations per second.
  • 10910^9 words of main memory.
  • Touch all words in approximately 1 second.

Huge problem for quick find.

  • 10910^9 union commands on 10910^9 objects.
  • Quick-find takes more than 101810^{18} operations.
  • 30+ years of computer time!

Quick-union [lazy approach]


Data structure.

  • Integer array id[] of size NN.
  • Interpretation: id[i] is parent of ii.
  • Root of ii is id[id[id[...id[i]...]]]. (Keep going until it doesn't change)

Find. Check if pp and qq have the same root.

Union. To merge components containing pp and qq, set the idid of p's root to the idid of q's root.

Java implementation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public class QuickUnionUF
{
private int[] id;

public QuickUnionUF(int N)
{
id = new int[N];
for (int i = 0; i < N; i++)
id[i] = i;
}

private int root(int i)
{
while (i != id[i])
i = id[i];
return i;
}

public boolean connected(int p, int q)
{ return root(p) == root(q); }

public void union(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}

Cost model. Number of array accesses (for read or write).

algorithm initialize union find
quick-find NN NN 11
quick-union NN NN NN

Quick-find defect.

  • Union too expensive (NN array accesses).
  • Trees are flat, but too expensive to keep them flat.

Quick-union defect.

  • Trees can get tall.
  • Find too expensive (could be NN array accesses).

Improvements


Improvement 1: weighting

Weighted quick-union. Always put the small tree down below.

  • Modify quick-union to avoid tall trees.
  • Keep track of size of each tree (number of objects).
  • Balance by linking root of smaller tree to root of larger tree.

Data structure. Same as quick-union, but maintain extra array sz[i] to count number of objects in the tree rooted at ii.

Find. Identical to quick-union.
return root(p) == root(q);

Union. Modify quick-union to:

  • Link root of smaller tree to root of larger tree.
  • Update the sz[] array.
1
2
3
4
5
int i = root(p);
int j = root(q);
if (i = j) return;
if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; }
else { id[j] = i; sz[i] += sz[j]; }

Running time.

  • Find: takes time proportional to depth of pp and qq.
  • Union: takes constant time, given roots.

Proposition. Depth of any node xx is at most lgNlg\enspace N. (lglg = base 2 logarithm)

Pf. When does depth of xx increase?
Increase by 11 when tree T1T_1 containing xx is merged in to another tree T2T_2.

  • The size of the tree containing xx at least doubles since T2T1|T_2| \ge |T_1|.
  • Size of tree containing xx can double at most lgNlg\enspace N times. (Start with 11, double lgNlg\enspace N times, and you will get a tree with NN nodes.)
algorithm initialize union find
quick-find NN NN 11
quick-union NN NN NN
weighted QU NN lgNlg\enspace N lgNlg\enspace N

Q. Stop at guaranteed acceptable performance?
A. No, easy to improve further.

Improvement 2: path compression

Quick union with path compression. Just after computing the root of pp, set the idid of each examined node to point to that root.

Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root.

Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).

1
2
3
4
5
6
7
8
9
private int root(int i)
{
while (i != id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}

In practice. No reason not to! Keeps tree almost completely flat.

Proposition. Starting from an empty data structure, any sequence of MM union-find ops on NN objects makes c(N+MlgN)\ge c (N + M lg^* N) array accesses.

  • Analysis can be improved to N+Mα(M,N)N + M \alpha(M, N).
  • Simple algorithm with fascinating mathematics.

Iterate log function.

NN lgNlg^* N
11 00
22 11
44 22
1616 33
6553665536 44
2655362^{65536} 55

Linear-time algorithm for M union-find ops on N objects?

  • Cost within constant factor of reading in the data.
  • In theory, WQUPC is not quite linear.
  • In practice, WQUPC is linear.

Bottom line. Weighed quick union (with path compression) makes it possible to solve problems that could not otherwise be addressed.

M union-find operations on a set of N objects

algorithm worst-case time
quick-find MM NN
quick-union MM NN
weighted QU N+MlogNN + M log N
QU + path compression N+MlogNN + M log N
weighted QU + path compression N+MlgNN + M lg^* N

Ex. [10910^9 unions and finds with 10910^9 objects]

  • WQUPC reduces time from 30 years to 6 seconds.
  • Supercomputer won't help much; good algorithm enables solution.

Applications


Percolation

A model for many physical systems:

  • NN-by-NN grid of sites.
  • Each site is open with probability pp (or blocked with probability 1p1 - p).
  • System percolates iff top and bottom are connected by open sites.

Likelihood of percolation. Depends on site vacancy probability pp.

When NN is large, theory guarantees a sharp threshold pp*.

  • p>pp > p^*: almost certainly percolates.
  • p<pp < p^*: almost certainly does not percolate.

Q. What is the value of pp^*?

Monte Carlo simulation:

  • Initialize NN-by-NN whole grid to be blocked.
  • Declare random sites open until top connected to bottom.
  • Vacancy percentage estimates pp^*.

Q. How to check whether an N-by-N system percolates?

  • Create an object for each site and name them 00 to N21N^2 - 1.
  • Sites are in same component if connected by open sites.
  • Percolates iff any sites on bottom row is connected to site on top row. (Brute-force algorithm: N2N^2 calls to connected())

Clever trick. Introduce 2 virtual sites (and connections to top and bottom).

  • Percolates iff virtual top site is connected to virtual bottom site.

Q. How to model opening a new site?
A. Connect newly opened site to all of its adjacent open sites.

Q. What is percolation threshold pp^*?
A. About 0.592746 for large square lattices.

Steps to developing a usable algorithm.

  • Model the problem.
  • Find an algorithm to solve it.
  • Fast enough? Fits in memory?
  • If not, figure out why.
  • Find a way to address the problem.
  • Iterate until satisfied.