Algorithms Week 1 Union-Find
Union-Find
Dynamic connectivity
Given a set of N objects.
- Union command: connect two objects.
- Find/connect query: is there a path connecting the two objects?
When programming, convenient to name objects to .
- Use integers as array index.
- Suppress details not relevant to union-find.
We assume "is connected to" is an equivalence relation:
- Reflexive
- Symmetric
- Transitive
Connect components. Maximal set of objects that are mutually connected.
Find query. Check if two objects are in the same component.
Union command. Replace components containing two objects with their union.
Goal. Design efficient data structure for union-find.
- Number of objects can be huge.
- Number of operations can be huge.
- Find queries and union commands may be intermixed.
Public class UF.
1 | UF(int N) |
Dynamic-connectivity client.
- Read in number of objects from standard input.
- Repeat:
- Read in pair of integers from standard input.
- If they are not yet connected, connect them and print out pair.
1 | public static void main(String[] args) |
Quick-find [eager approach]
Data structure.
- Integer array
id[]
of size . - Interpretation: and are connected iff they have the same .
Find. Check if and have the same .
Union. To merge components containing and , change all entries whose equals id[p]
to id[q]
.
Java implementation
1 | public class QuickFindUF |
Cost model. Number of array accesses (for read or write).
algorithm | initialize | union | find |
---|---|---|---|
quick-find |
Quadratic algorithms do not scale.
Rough standard (for now).
- operations per second.
- words of main memory.
- Touch all words in approximately 1 second.
Huge problem for quick find.
- union commands on objects.
- Quick-find takes more than operations.
- 30+ years of computer time!
Quick-union [lazy approach]
Data structure.
- Integer array
id[]
of size . - Interpretation:
id[i]
is parent of . - Root of is
id[id[id[...id[i]...]]]
. (Keep going until it doesn't change)
Find. Check if and have the same root.
Union. To merge components containing and , set the of p's root to the of q's root.
Java implementation.
1 | public class QuickUnionUF |
Cost model. Number of array accesses (for read or write).
algorithm | initialize | union | find |
---|---|---|---|
quick-find | |||
quick-union |
Quick-find defect.
- Union too expensive ( array accesses).
- Trees are flat, but too expensive to keep them flat.
Quick-union defect.
- Trees can get tall.
- Find too expensive (could be array accesses).
Improvements
Improvement 1: weighting
Weighted quick-union. Always put the small tree down below.
- Modify quick-union to avoid tall trees.
- Keep track of size of each tree (number of objects).
- Balance by linking root of smaller tree to root of larger tree.
Data structure. Same as quick-union, but maintain extra array sz[i]
to count number of objects in the tree rooted at .
Find. Identical to quick-union.
return root(p) == root(q);
Union. Modify quick-union to:
- Link root of smaller tree to root of larger tree.
- Update the
sz[]
array.
1 | int i = root(p); |
Running time.
- Find: takes time proportional to depth of and .
- Union: takes constant time, given roots.
Proposition. Depth of any node is at most . ( = base 2 logarithm)
Pf. When does depth of increase?
Increase by when tree containing is merged in to another tree .
- The size of the tree containing at least doubles since .
- Size of tree containing can double at most times. (Start with , double times, and you will get a tree with nodes.)
algorithm | initialize | union | find |
---|---|---|---|
quick-find | |||
quick-union | |||
weighted QU |
Q. Stop at guaranteed acceptable performance?
A. No, easy to improve further.
Improvement 2: path compression
Quick union with path compression. Just after computing the root of , set the of each examined node to point to that root.
Two-pass implementation: add second loop to root()
to set the id[]
of each examined node to the root.
Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).
1 | private int root(int i) |
In practice. No reason not to! Keeps tree almost completely flat.
Proposition. Starting from an empty data structure, any sequence of union-find ops on objects makes array accesses.
- Analysis can be improved to .
- Simple algorithm with fascinating mathematics.
Iterate log function.
Linear-time algorithm for M union-find ops on N objects?
- Cost within constant factor of reading in the data.
- In theory, WQUPC is not quite linear.
- In practice, WQUPC is linear.
Bottom line. Weighed quick union (with path compression) makes it possible to solve problems that could not otherwise be addressed.
M union-find operations on a set of N objects
algorithm | worst-case time |
---|---|
quick-find | |
quick-union | |
weighted QU | |
QU + path compression | |
weighted QU + path compression |
Ex. [ unions and finds with objects]
- WQUPC reduces time from 30 years to 6 seconds.
- Supercomputer won't help much; good algorithm enables solution.
Applications
Percolation
A model for many physical systems:
- -by- grid of sites.
- Each site is open with probability (or blocked with probability ).
- System percolates iff top and bottom are connected by open sites.
Likelihood of percolation. Depends on site vacancy probability .
When is large, theory guarantees a sharp threshold *.
- : almost certainly percolates.
- : almost certainly does not percolate.
Q. What is the value of ?
Monte Carlo simulation:
- Initialize -by- whole grid to be blocked.
- Declare random sites open until top connected to bottom.
- Vacancy percentage estimates .
Q. How to check whether an N-by-N system percolates?
- Create an object for each site and name them to .
- Sites are in same component if connected by open sites.
- Percolates iff any sites on bottom row is connected to site on top row. (Brute-force algorithm: calls to connected())
Clever trick. Introduce 2 virtual sites (and connections to top and bottom).
- Percolates iff virtual top site is connected to virtual bottom site.
Q. How to model opening a new site?
A. Connect newly opened site to all of its adjacent open sites.
Q. What is percolation threshold ?
A. About 0.592746 for large square lattices.
Steps to developing a usable algorithm.
- Model the problem.
- Find an algorithm to solve it.
- Fast enough? Fits in memory?
- If not, figure out why.
- Find a way to address the problem.
- Iterate until satisfied.