Analysis of Algorithms

Introduction


Cast of characters.

  • Programmer needs to develop a working solution.
  • Client wants to solve problem efficiently.
  • Theoretician wants to understand.
  • Basic blocking and tackling is sometimes necessary.
  • Student might play any or all of these roles someday.

Running time.

  • Analytic Engine: how many times do you have to turn the crank?

Reasons to analyze algorithms:

  • Predict performance.
  • Compare algorithms.
  • Provide guarantees.
  • Understand theoretical basis.
  • Primary practical reason: avoid performance bugs.

Discrete Fourier transform.

  • Break down waveform of NN samples into periodic components.
  • Applications: DVD, JPEG, MRI, astrophysics, ….
  • Brute force: N2N^2 steps.
  • FFT algorithm: NlogNN log N steps, enables new technology.

N-body simulation.

  • Simulate gravitational interactions among NN bodies.
  • Brute force: N2N^2 steps.
  • Barnes-Hut algorithm: NlogNN log N steps, enables new research.

The challenge.
Q. Will my program be able to solve a large practical input?
Insight. [Knuth 1970s] Use scientific method to understand performance.

A framework for predicting performance and comparing algorithms.

Scientific method.

  • Observe some feature of the natural world.
  • Hypothesize a model that is consistent with the observations.
  • Predict events using the hypothesis.
  • Verify the predictions by making further observations.
  • Validate by repeating until the hypothesis and observations agree.

Huge problem for quick find.

  • 10910^9 union commands on 10910^9 objects.
  • Quick-find takes more than 101810^{18} operations.
  • 30+ years of computer time!

Principles.

  • Experiments must be reproducible.
  • Hypotheses must be falsifiable.

Feature of the natural world. Computer itself.

Observations

Example: 3-SUM. Given NN distinct integers, how many triples sum to exactly zero?

1
2
3
4
5
% more 8ints.txt
8
30 -40 -20 -10 40 0 10 5
% java ThreeSum 8ints.txt
4
a[i] a[j] a[k] sum
3030 40-40 1010 00
3030 20-20 10-10 00
40-40 4040 00 00
10-10 00 1010 00

Brute-force algorithm.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class ThreeSum {
public static int count(int[] a) {
int N = a.length;
int count = 0;
for (int i = 0; i < N; i++)
for (int j = i + 1; j < N; j++)
for (int k = j + 1; k < N; k++)
if (a[i] + a[j] + a[k] == 0)
count++;
return count;
}

public static void main(String[] args) {
int[] a = In.readInts(args[0]);
StdOut.println(count(a));
}
}

Measuring the running time.
Q. How to time a program?
A. Manual.

Q. How to time a program?
A. Automatic.

1
2
3
public class Stopwatch
Stopwatch()
double elapsedTime()
1
2
3
4
5
6
public static void main(String[] args) {
int[] a = In.readInts(args[0]);
Stopwatch stopwatch = new Stopwatch();
StdOut.println(ThreeSum.count(a));
double time = stopwatch.elapsedTime();
}

Empirical analysis.
Run the program for various input sizes and measure running time.

N time (seconds) ?
250250 0.00.0
500500 0.00.0
1,0001,000 0.10.1
2,0002,000 0.80.8
4,0004,000 6.46.4
8,0008,000 51.151.1
16,00016,000 ??

Data analysis.
Standard plot. Plot running time T(N)T (N) vs. input size NN.
Log-log plot. Plot running time T(N)T (N) vs. input size NN using log-log scale.
Regression. Fit straight line through data points: aNbaN^b

lg(T(N))=blgN+clg(T (N)) = b lg N + c

b=2.999b = 2.999

c=33.2103c = -33.2103

T(N)=aNb,wherea=2cT (N) = a N^b, where a = 2^c

Hypothesis. The running time is about 1.006 × 10^–10 × N^2.999 seconds.

Prediction and validation.
Predictions.

  • 51.051.0 seconds for N=8,000N = 8,000.
  • 408.1408.1 seconds for N=16,000N = 16,000.
    Observations. validates hypothesis!

Doubling hypothesis. Quick way to estimate b in a power-law relationship.
Run program, doubling the size of the input.
lglg ratio converge to a constant.
Hypothesis. Running time is about aNba N^b with b=lgb = lg ratio.
Caveat. Cannot identify logarithmic factors with doubling hypothesis.
Q. How to estimate aa (assuming we know bb) ?
A. Run the program (for a sufficient large value of NN) and solve for aa.

System independent effects. Determines exponent bb and constant aa in power law.

  • Algorithm.
  • Input data.

System dependent effects. Determines constant aa in power law

  • Hardware: CPU, memory, cache, …
  • Software: compiler, interpreter, garbage collector, …
  • System: operating system, network, other apps, …

Bad news. Difficult to get precise measurements.
Good news. Much easier and cheaper than other sciences.

Mathematical Models

Total running time: sum of cost * frequency for all operations.

  • Need to analyze program to determine set of operations.
  • Cost depends on machine, compiler.
  • Frequency depends on algorithm, input data.
    In principle, accurate mathematical models are available.

Cost of basic operations.

operation example nanoseconds ?
integer add a + b 2.1
integer multiply a * b 2.4
integer divide a / b 5.4
floating-point add a + b 4.6
floating-point multiply a * b 4.2
floating-point divide a / b 13.5
sine Math.sin(theta) 91.3
arctangent Math.atan2(y, x) 129.0
... ... ...

operation example nanoseconds †
variable declaration int a c1
assignment statement a = b c2
integer compare a < b c3
array element access a[i] c4
array length a.length c5
1D array allocation new int[N] c6 N
2D array allocation new int[N][N] c7 N 2
string length s.length() c8
substring extraction s.substring(N/2, N) c9
string concatenation s + t c10 N

Novice mistake. Abusive string concatenation.

Example: 1-SUM.
Q. How many instructions as a function of input size NN?
Depth of any node xx is at most lgNlg\enspace N. (lglg = base 2 logarithm)

Pf. When does depth of xx increase?
Increase by 11 when tree T1T_1 containing xx is merged in to another tree T2T_2.

  • The size of the tree containing xx at least doubles since T2T1|T_2| \ge |T_1|.
  • Size of tree containing xx can double at most lgNlg\enspace N times. (Start with 11, double lgNlg\enspace N times, and you will get a tree with NN nodes.)
algorithm initialize union find
quick-find NN NN 11
quick-union NN NN NN
weighted QU NN lgNlg\enspace N lgNlg\enspace N

Q. Stop at guaranteed acceptable performance?
A. No, easy to improve further.

Improvement 2: path compression

Quick union with path compression. Just after computing the root of pp, set the idid of each examined node to point to that root.

Two-pass implementation: add second loop to root() to set the id[] of each examined node to the root.

Simpler one-pass variant: Make every other node in path point to its grandparent (thereby halving path length).

1
2
3
4
5
6
7
8
9
private int root(int i)
{
while (i != id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}

In practice. No reason not to! Keeps tree almost completely flat.

Proposition. Starting from an empty data structure, any sequence of MM union-find ops on NN objects makes c(N+MlgN)\ge c (N + M lg^* N) array accesses.

  • Analysis can be improved to N+Mα(M,N)N + M \alpha(M, N).
  • Simple algorithm with fascinating mathematics.

Iterate log function.

NN lgNlg^* N
11 00
22 11
44 22
1616 33
6553665536 44
2655362^{65536} 55

Linear-time algorithm for M union-find ops on N objects?

  • Cost within constant factor of reading in the data.
  • In theory, WQUPC is not quite linear.
  • In practice, WQUPC is linear.

Bottom line. Weighed quick union (with path compression) makes it possible to solve problems that could not otherwise be addressed.

M union-find operations on a set of N objects

algorithm worst-case time
quick-find MM NN
quick-union MM NN
weighted QU N+MlogNN + M log N
QU + path compression N+MlogNN + M log N
weighted QU + path compression N+MlgNN + M lg^* N

Ex. [10910^9 unions and finds with 10910^9 objects]

  • WQUPC reduces time from 30 years to 6 seconds.
  • Supercomputer won't help much; good algorithm enables solution.

Applications


Percolation

A model for many physical systems:

  • NN-by-NN grid of sites.
  • Each site is open with probability pp (or blocked with probability 1p1 - p).
  • System percolates iff top and bottom are connected by open sites.

Likelihood of percolation. Depends on site vacancy probability pp.

When NN is large, theory guarantees a sharp threshold pp*.

  • p>pp > p^*: almost certainly percolates.
  • p<pp < p^*: almost certainly does not percolate.

Q. What is the value of pp^*?

Monte Carlo simulation:

  • Initialize NN-by-NN whole grid to be blocked.
  • Declare random sites open until top connected to bottom.
  • Vacancy percentage estimates pp^*.

Q. How to check whether an N-by-N system percolates?

  • Create an object for each site and name them 00 to N21N^2 - 1.
  • Sites are in same component if connected by open sites.
  • Percolates iff any sites on bottom row is connected to site on top row. (Brute-force algorithm: N2N^2 calls to connected())

Clever trick. Introduce 2 virtual sites (and connections to top and bottom).

  • Percolates iff virtual top site is connected to virtual bottom site.

Q. How to model opening a new site?
A. Connect newly opened site to all of its adjacent open sites.

Q. What is percolation threshold pp^*?
A. About 0.592746 for large square lattices.

Steps to developing a usable algorithm.

  • Model the problem.
  • Find an algorithm to solve it.
  • Fast enough? Fits in memory?
  • If not, figure out why.
  • Find a way to address the problem.
  • Iterate until satisfied.