Algorithms Week 1 Analysis of Algorithms · Blog

Analysis of Algorithms

Introduction

Cast of characters.

Programmer needs to develop a working solution.
Client wants to solve problem efficiently.
Theoretician wants to understand.
Basic blocking and tackling is sometimes necessary.
Student might play any or all of these roles someday.

Running time.

Analytic Engine: how many times do you have to turn the crank?

Reasons to analyze algorithms:

Predict performance.
Compare algorithms.
Provide guarantees.
Understand theoretical basis.
Primary practical reason: avoid performance bugs.

Discrete Fourier transform.

Break down waveform of $N$ samples into periodic components.
Applications: DVD, JPEG, MRI, astrophysics, ….
Brute force: $N^2$ steps.
FFT algorithm: $N log N$ steps, enables new technology.

N-body simulation.

Simulate gravitational interactions among $N$ bodies.
Brute force: $N^2$ steps.
Barnes-Hut algorithm: $N log N$ steps, enables new research.

The challenge.
Q. Will my program be able to solve a large practical input?
Insight. [Knuth 1970s] Use scientific method to understand performance.

A framework for predicting performance and comparing algorithms.

Scientific method.

Observe some feature of the natural world.
Hypothesize a model that is consistent with the observations.
Predict events using the hypothesis.
Verify the predictions by making further observations.
Validate by repeating until the hypothesis and observations agree.

Huge problem for quick find.

$10^9$ union commands on $10^9$ objects.
Quick-find takes more than $10^{18}$ operations.
30+ years of computer time!

Principles.

Experiments must be reproducible.
Hypotheses must be falsifiable.

Feature of the natural world. Computer itself.

Observations

Example: 3-SUM. Given $N$ distinct integers, how many triples sum to exactly zero?

% more 8ints.txt
8
30 -40 -20 -10 40 0 10 5
% java ThreeSum 8ints.txt
4

a[i]	a[j]	a[k]	sum
$30$	$-40$	$10$	$0$
$30$	$-20$	$-10$	$0$
$-40$	$40$	$0$	$0$
$-10$	$0$	$10$	$0$

Brute-force algorithm.

public class ThreeSum {
    public static int count(int[] a) {
        int N = a.length;
        int count = 0;
        for (int i = 0; i < N; i++)
            for (int j = i + 1; j < N; j++)
                for (int k = j + 1; k < N; k++)
                    if (a[i] + a[j] + a[k] == 0)
                        count++;
        return count;
    }

    public static void main(String[] args) {
        int[] a = In.readInts(args[0]);
        StdOut.println(count(a));
    }
}

Measuring the running time.
Q. How to time a program?
A. Manual.

Q. How to time a program?
A. Automatic.

1
2
3

public class Stopwatch
Stopwatch()
double elapsedTime()

public static void main(String[] args) {
        int[] a = In.readInts(args[0]);
        Stopwatch stopwatch = new Stopwatch();
        StdOut.println(ThreeSum.count(a));
        double time = stopwatch.elapsedTime();
    }

Empirical analysis.
Run the program for various input sizes and measure running time.

N	time (seconds) ?
$250$	$0.0$
$500$	$0.0$
$1,000$	$0.1$
$2,000$	$0.8$
$4,000$	$6.4$
$8,000$	$51.1$
$16,000$	$?$

Data analysis.
Standard plot. Plot running time $T (N)$ vs. input size $N$ .
Log-log plot. Plot running time $T (N)$ vs. input size $N$ using log-log scale.
Regression. Fit straight line through data points: $aN^b$

$lg(T (N)) = b lg N + c$

$b = 2.999$

$c = -33.2103$

$T (N) = a N^b, where a = 2^c$

Hypothesis. The running time is about 1.006 × 10^–10 × N^2.999 seconds.

Prediction and validation.
Predictions.

$51.0$ seconds for $N = 8,000$ .
$408.1$ seconds for $N = 16,000$ .
Observations. validates hypothesis!

Doubling hypothesis. Quick way to estimate b in a power-law relationship.
Run program, doubling the size of the input.
$lg$ ratio converge to a constant.
Hypothesis. Running time is about $a N^b$ with $b = lg$ ratio.
Caveat. Cannot identify logarithmic factors with doubling hypothesis.
Q. How to estimate $a$ (assuming we know $b$ ) ?
A. Run the program (for a sufficient large value of $N$ ) and solve for $a$ .

System independent effects. Determines exponent $b$ and constant $a$ in power law.

Algorithm.
Input data.

System dependent effects. Determines constant $a$ in power law

Hardware: CPU, memory, cache, …
Software: compiler, interpreter, garbage collector, …
System: operating system, network, other apps, …

Bad news. Difficult to get precise measurements.
Good news. Much easier and cheaper than other sciences.

Mathematical Models

Total running time: sum of cost $*$ frequency for all operations.

Need to analyze program to determine set of operations.
Cost depends on machine, compiler.
Frequency depends on algorithm, input data.
In principle, accurate mathematical models are available.

Cost of basic operations.

operation	example	nanoseconds ?
integer add	a + b	2.1
integer multiply	a * b	2.4
integer divide	a / b	5.4
floating-point add	a + b	4.6
floating-point multiply	a * b	4.2
floating-point divide	a / b	13.5
sine	Math.sin(theta)	91.3
arctangent	Math.atan2(y, x)	129.0
...	...	...

operation example nanoseconds †
variable declaration int a c1
assignment statement a = b c2
integer compare a < b c3
array element access a[i] c4
array length a.length c5
1D array allocation new int[N] c6 N
2D array allocation new int[N][N] c7 N 2
string length s.length() c8
substring extraction s.substring(N/2, N) c9
string concatenation s + t c10 N

Novice mistake. Abusive string concatenation.

Example: 1-SUM.
Q. How many instructions as a function of input size $N$ ?
Depth of any node $x$ is at most $lg\enspace N$ . ( $lg$ = base 2 logarithm)

Pf. When does depth of $x$ increase?
Increase by $1$ when tree $T_1$ containing $x$ is merged in to another tree $T_2$ .

The size of the tree containing $x$ at least doubles since $|T_2| \ge |T_1|$ .
Size of tree containing $x$ can double at most $lg\enspace N$ times. (Start with $1$ , double $lg\enspace N$ times, and you will get a tree with $N$ nodes.)

algorithm	initialize	union	find
quick-find	$N$	$N$	$1$
quick-union	$N$	$N$	$N$
weighted QU	$N$	$lg\enspace N$	$lg\enspace N$

Q. Stop at guaranteed acceptable performance?
A. No, easy to improve further.

Improvement 2: path compression

Quick union with path compression. Just after computing the root of $p$ , set the $id$ of each examined node to point to that root.

Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root.

Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).

private int root(int i)
    {
        while (i != id[i])
        {
            id[i] = id[id[i]];
            i = id[i];
        }
        return i;
    }

In practice. No reason not to! Keeps tree almost completely flat.

Proposition. Starting from an empty data structure, any sequence of $M$ union-find ops on $N$ objects makes $\ge c (N + M lg^* N)$ array accesses.

Analysis can be improved to $N + M \alpha(M, N)$ .
Simple algorithm with fascinating mathematics.

Iterate log function.

$N$	$lg^* N$
$1$	$0$
$2$	$1$
$4$	$2$
$16$	$3$
$65536$	$4$
$2^{65536}$	$5$

Linear-time algorithm for M union-find ops on N objects?

Cost within constant factor of reading in the data.
In theory, WQUPC is not quite linear.
In practice, WQUPC is linear.

Bottom line. Weighed quick union (with path compression) makes it possible to solve problems that could not otherwise be addressed.

M union-find operations on a set of N objects

algorithm	worst-case time
quick-find	$M$ $N$
quick-union	$M$ $N$
weighted QU	$N + M log N$
QU + path compression	$N + M log N$
weighted QU + path compression	$N + M lg^* N$

Ex. [ $10^9$ unions and finds with $10^9$ objects]

WQUPC reduces time from 30 years to 6 seconds.
Supercomputer won't help much; good algorithm enables solution.

Applications

Percolation

A model for many physical systems:

$N$ -by- $N$ grid of sites.
Each site is open with probability $p$ (or blocked with probability $1 - p$ ).
System percolates iff top and bottom are connected by open sites.

Likelihood of percolation. Depends on site vacancy probability $p$ .

When $N$ is large, theory guarantees a sharp threshold $p$ *.

$p > p^*$ : almost certainly percolates.
$p < p^*$ : almost certainly does not percolate.

Q. What is the value of $p^*$ ?

Monte Carlo simulation:

Initialize $N$ -by- $N$ whole grid to be blocked.
Declare random sites open until top connected to bottom.
Vacancy percentage estimates $p^*$ .

Q. How to check whether an N-by-N system percolates?

Create an object for each site and name them $0$ to $N^2 - 1$ .
Sites are in same component if connected by open sites.
Percolates iff any sites on bottom row is connected to site on top row. (Brute-force algorithm: $N^2$ calls to connected())

Clever trick. Introduce 2 virtual sites (and connections to top and bottom).

Percolates iff virtual top site is connected to virtual bottom site.

Q. How to model opening a new site?
A. Connect newly opened site to all of its adjacent open sites.

Q. What is percolation threshold $p^*$ ?
A. About 0.592746 for large square lattices.

Steps to developing a usable algorithm.

Model the problem.
Find an algorithm to solve it.
Fast enough? Fits in memory?
If not, figure out why.
Find a way to address the problem.
Iterate until satisfied.