Introduction to Algorithm Design

posted 2 years ago

What is an Algorithm?

An algorithm is a procedure to accomplish a specific task.

An algorithm is the idea behind any reasonable computer program.

Algorithms should solve a general, well-specified problem.

For example, the algorithmic program of sorting is defined as follows:

Problem: Sorting
Input: A sequence of n keys
Output: The reordering of the input sequence such that a1 <= a2 <= a3< an

There are three desirable properties for a good algorithm:

  1. Algorithms should be correct.
  2. Algorithms should be efficient.
  3. Algorithms should be easy to implement.

Correct algorithms usually come with a proof of correctness, which explains why we know an algorithm must take every instance of the problem to the desired result.

The steps of an algorithm must should meet certain characteristics:

  1. The steps must terminate after a finite number of steps. An algorithm cannot run forever.
  2. The steps must be precise, so that we can execute them without confusion.
  3. The algorithm may operate on some input.
  4. The algorithm has some output. The whole purpose of an algorithm is to produce something as a result.
  5. The algorithm must be effective. A human should be able to execute each step in a reasonable amount of time with pen and paper.

Reasoning about Correctness

We need tools to distinguish correct algorithms from incorrect ones, the primary tool being a proof.

A proper mathematical proof comprises several parts:

  1. A clear, precise statement of what you are trying to prove.
  2. A set of assumptions of things which are taken to be true and hence used as part of the proof.
  3. A chain of reasoning which takes you from these assumptions to the statement you are trying to prove.
  4. A little square or QED at the bottom to denote that you have finished, representing the Latin phrase for "thus it is demonstrated".

Expressing Algorithms

The three most common forms of algorithmic notation are:

  1. English
  2. pseudocode
  3. a actual programming language

All three notations are useful because there is a natural tradeoff between greater ease of expression and precision.

The choice of which notation is best really just depends upon which method you are most comfortable with.

Problems and Properties

Problem specifications have two parts:

  1. The set of allowed input instances.
  2. The required properties of the algorithm's output.

It is impossible to prove the correctness of an algorithm for a badly stated problem.

Ask the wrong problem, and you will receive a wrong answer.

There are two common traps in specifying the output requirements of a problem:

  1. Asking an ill-defined question. For example, asking for the best route between two places on a map is a silly question unless you define what best means.
  2. Creating compound goals. You must pick a single criterion. Problems should try to be as easy to reason and solve as possible.

Demonstrating Incorrectness

The best way to prove that an algorithm is incorrect is to produce an instance in which it yields an incorrect answer.

We know these instances as counter-examples.

Good counter-examples have two important properties:

  1. Verifiability - You must be able to calculate what answer your algorithm will give in this instance, and you must display a better answer so as to prove the algorithm didn't find it.
  2. Simplicity - You must make it clear exactly why the proposed algorithm fails. Once a counter-example has been found, it is worth simplifying it down to its simplest form.

Here are some techniques to hunt for counter-examples:

  1. Think small - Look carefully at several small examples, because they are easier to verify and reason about.
  2. Think exhaustively - There are only a small number of possibilities for the smallest nontrivial value of n.
  3. Hunt for the weakness - If a proposed algorithm is of the form "always take the biggest", think about why that might prove to be the wrong thing to do.
  4. Go for a tie - Provide instances where everything is the same size. The heuristic will have nothing to base its decision on, and perhaps will have the freedom to return something suboptimal as the answer.
  5. Seek extremes - Many counter-examples are mixtures of huge and tiny discrepancies.

Induction and Recursion

Failure to find a counter-example to a given algorithm does not mean that it is obviously true.

The algorithm must include a proof or demonstration of correctness. Often, mathematical induction is the method of choice.

Consider the correctness of insertion sort, for example:

  1. The base case comprises a single element, and by definition a one-element array is completely sorted.
  2. We can assume that the first n - 1 elements of array A are completely sorted after n - 1 iterations of insertion sort.
  3. To insert one last element x to A, we find where it goes. This is done by moving all the greater elements back by one position, creating room for x in the desired location. QED

Let's consider a problem:

Incremental Correctness

Problem: Prove the correctness of the following recursive algorithm for incrementing natural numbers: y -> y + 1:

Increment(y)
    if y = 0 then return (1) else
        if (y mod 2) = 1 then
            return (2 * Increment([y / 2]))
        else
            return (y + 1)

Solution:

  • The base case of y = 0 is obviously correct because the value 1 is returned and 0 + 1 = 1.
  • For even numbers, they are correct, as y + 1 is explicitly returned.
  • For odd numbers, the answer depends upon what is returned by Increment([y / 2]). We can assume that increment worked correctly for y = n - 1, but not for a value which is about half of it.

    The case of odd y (y = 2m + 1 for some integer m) can be dealt with as:

    2 * Increment([2m + 1) / 2]) = 2 * Increment([m + 1 / 2])
                               = 2 * Increment(m)
                               = 2 (m + 1)
                               = 2m + 2 = y + 1
    

    and the general case is resolved. QED

Modeling the Problem

Modeling is the art of formulating your application in terms of precisely described, well-understood problems.

Proper modeling is the key to applying algorithmic design techniques to real-world problems.

To exploit the algorithms literature, you must learn to describe your problem abstractly, in terms of procedures on fundamental structures.

Combinatorial Objects

Odds are very good that others have stumbled upon your algorithmic problem before you, perhaps in substantially different contexts.

You can formulate optimizations in terms of computing properties of common structures such as:

  • Permutations - Arrangements of items. For example {1, 4, 3, 2} and {4, 3, 2, 1} are two distinct permutations of the same set of four integers.
  • Subsets - Selections from a set of items. For example, {1, 3, 4} and {2} are two distinct subsets of the first four integers. Order does not matter in subsets the way it does in permutations. Thus, the subsets {1, 3, 4} and {4, 3, 1} would be identical.
  • Trees - Hierarchical relationships between items. For example, a family tree.
  • Graphs - Relationships between arbitrary pairs of objects. For example, the network of roads.
  • Points - Locations in some geometric space. For example, the location of buildings can be described by points on a map.
  • Polygons - Regions in some geometric spaces. For example, the borders of a country can be described by a polygon on a map.
  • Strings - Sequences of characters or patterns. For example, the names of employees in a company.

Familiarity with these structures is important, because they provide the structure for modeling applications.

Modeling your application in terms of well-defined structures and algorithms is the most important single step towards a solution.

Recursive Objects

Learning to think recursively is learning to look for big things that are made from smaller things of exactly the same type as the big thing.

The primary abstract structures can be thought about recursively:

  • Permutations - Delete the first element of a permutation of {1,...,*n*} things and you get a permutation of the remaining n - 1 things.
  • Subsets - Every subset of the elements {1,...,*n*} contains a subset of {1,...,*n* - 1} made visible by deleting element n if it is present.
  • Trees - Delete the root of a tree and what do you get? A collection of smaller trees. Delete the leaf of a tree and what do you get? A slightly smaller tree.
  • Graphs - Delete any vertex from a graph, and you get a smaller graph. Cut through all edges which span from left to right, and what do you get? Two smaller graphs, and a bunch of broken edges.
  • Points - Take a cloud of points, and separate them into two groups by drawing a line. Now you have two smaller clouds of points.
  • Polygon - Inserting any internal chord between two nonadjacent vertices of a simple ploygon on n vertices cuts it into two smaller polygons.
  • Strings - Delete the first character from a string and you'll get a shorter string.

Proof by Contradiction

Induction isn't the only way to prove things.

Sometimes, the best method is contradiction.

The basic scheme of contradiction is as follows:

  1. Assume that the hypothesis is false.
  2. Develop some logical consequences of this assumption.
  3. Show that one consequence is demonstrably false, thereby showing that the assumption is incorrect and the hypothesis is true.

Estimation

When you don't know the right answer, the best thing to do is to make a guess.

Principled guessing is called estimation.

Estimation problems are best solved through some kind of logical reasoning process, typical a mix of principled calculations and anologies.

A best practice in estimation is to try to solve the problem in different ways and see if the answers generally agree in magnitude.

Sources

Data structure - https://en.wikipedia.org/wiki/Data_structure

Discussion on induction and conduction - https://portal.research.lu.se/ws/files/5654540/4249227.pdf

The Algorithm Design Manual - https://www.algorist.com/