Math 55 - Fall 2007 - Lecture notes #10 - Sep 19 (Wednesday)

   Goals for today: (Finish Big-O)
                    Expressing algorithms in pseudo code
                    Using Big-O to measure, compare running times
                    Introduction to Complexity Theory

Algorithms:

ASK&WAIT: what does this program do?
 
          prog1(integer n, integer array a(1),...,a(n))
             M = a[1]
             for i = 2 to n
                M = max(M,a(i))
             end for
             return M
ASK&WAIT: What does this program do?

          prog2(integer n, integer array a(1),...,a(n))
             for i = 1 to n
                M = a(i)
                for j = 1 to n except i
                   if a(j)>M goto next
                end for
                return M
                next:
              endfor
ASK&WAIT: Which program do you think is faster? How much faster?

          How do we determine how long prog1 takes to run?
          Approach 1: run it and measure the run time in seconds, 
ASK&WAIT: What are the pros and cons of this approach?

          Approach 2: it takes time proportional to n, ie. about k*n
          for some constant k, or O(n)
ASK&WAIT: What are the pros and cons of this approach?

          How long does prog1 take to run, in a Big-O sense?
          Label each line of code with cost, add up:
             M = a[1]                  O(1)
             for i = 2 to n            multiply cost of inside of loop by n
                M = max(M,a(i))           O(1)
             end for                   cost of loop = n*O(1) = O(n
             return M

          How long does prog2 take to run, in a Big-O sense?
             for i = 1 to n               multiply inside cost by n
                M = a(i)                     O(1)
                for j = 1 to n except i      multiply inside cost by n
                   if a(j)>M goto next          O(1)
                end for                      cost of inner loop = n*O(1)=O(n)
                return M                     O(1)
                next:
              endfor                      cost = n*O(n) = O(n^2)

          But it depends on values of a(1),...,a(n), since may branch out
          of inner loop, and so end faster.
  
ASK&WAIT: What is the best (fastest) case?
          what is the worst (slowest) case?
ASK&WAIT: Is prog2 or prog1 faster, in the worst case?

ASK&WAIT: Generally, if prog1 runs in time O(n) and prog2 runs in time O(n^2),
     then is prog1 necessarily always faster than prog2 for large enough n?

ASK&WAIT: If prog1 runs in time BIG_THETA(n) and prog2 runs in time 
     BIG_THETA(n^2), then is prog1 is faster than prog2 for large enough n?

     "Complexity Theory" is the study of which how fast certain
     problems can be solved, expressed in terms like 
     "Given an input of size n, this algorithm will run in time O(f(n))"

       EG: There is a standard list of functions that appears frequently
           when computing running time of programs, and you should
           recognize them, and which is bigger than the other:
           O(1)       = time for any fixed number of operations
           O(log n)   = time to find an element s in a sorted list of
                        n numbers, using binary search
           O(n)       = time to find an element s in an unsorted list of
                        n numbers, using linear search
           O(n*log n) = time to sort n numbers, using good algorithm
           O(n^2)     = time to sort n numbers, using dumb algorithm
     EX: Given a highway map labelled with distances between n towns,
         finding the shortest way to drive between every pair of towns
         (also called "all pairs shortest paths"), costs O(n^3)
         using the Floyd-Warshall algorithm (see CS170)
     DEF All these algorithms and many others are called "polynomial
         time algorithms" because they cost O(n^a) for some constant a.

     Here is another problem which cannot be solved in polynomial time
     as far as anyone knows:
        given any compound logical proposition such as
               q = p1 and p2 or not p3 .... 
        combining n propositions p1, p2, ... , pn, can q ever be True for
        any values of the p1,..,pn? 
     This problem is called the "satisfiability" problem or SAT for short:
         can any values of p1,..,pn satisfy q, i.e. make it true?
     EX: if q = p1 and p2 and not p3
         then setting p1 = True, p2 = True and p3 = False makes q = True
     EX: if q = (p1 or p2) and (not p1 or not p2) and
                (not p1 or p2) and (p1 or not p2)
         then no matter what values p1 and p2 have, q is False
     Here is an obvious algorithm to solve this problem:
         evaluate q for all possible values of p1, p2,..., pn
         if q is ever True then the answer is yes (q can be true) else no
ASK&WAIT: What is the cost of this algorithm, at least?
     What may be surprising is that no significantly better algorithm
     is known, i.e. no algorithm that runs in polynomial time, O(n^a)
     for some constant a. They all run in "exponential time" 
     (maybe faster than 2^n, but still exponential)

     One of the most famous open problem in mathematics (and computer
     science) is the question of whether any polynomial time algorithm
     for SAT can exist. The problem is also sometimes asked as
     "does P = NP or not?". Here P is the set of problem you can
     solve in polynomial time, and NP is a larger class including SAT.
     CS 170 and especially CS 172 talk about this question in more detail.