(Adminstrivia: relation to CS276, CS261; office hours; workload; etc.)

How to break simple substitution ciphers:
- Frequency analysis.
  (e has prob. 0.12, t,a,o,i,n,s,h,r has 0.06-0.09, d,l 0.04,
   c,u,m,w,f,g,y,p,b 0.015-0.028, v,k,j,x,q,z < 0.01)
- Digraph analysis
  (most common: th, he, in, er, an, re, ed, on, es, st, en, at,
   to, nt, ha, nd, ou, ea, ng, as, or, ti, is, et, it, ar, te, se, hi,
   of (in order))
- Trigraphs
  (the, ing, and, her, ere, ent, tha, nth, was, eth, for, dth)
- Word patterns
  (e.g., WXYYXYYXZZX probably corresponds to MISSISSIPI)
- Probable plaintext
  (e.g., pick the word "mathematics" and drag it through the ciphertext)
- Vowel-consonant contacts
  (the low-frequency things are probably consonants; vowels
   contact them a lot; and you can iterate)
  Q: relate this to Hidden Markov models


do not worry about your difficulties in mathematics.
i assure you that mine are greater.  - albert einstein

kt vtx ztrra pltex ater kuiiuoeyxumd uv gpxnmgpxuod.
u pdderm ate xnpx guvm prm jrmpxmr.  - pylmrx muvdxmuv


abcdefghijklmnopqrstuvwxyz  ykqsupmjfgdbehcazrvoinxtlw
plokmijnuhbygvtfcrdxeszwaq  abcdefghijklmnopqrstuvwxyz


above ciphertext has 86 letters, in total
frequency analysis of above ciphertext:
  10: x m         5: v e d
   9: u           3: g a
   8: r p         2: y o n l k i
   6: t           1: z j
digraph analysis of above ciphertext:
   4: uv px       3: te rm
   2: xu xn xm uo mu mr gp er at
trigraph analysis of above ciphertext:
   2: gpx muv ate
word patterns:
   kuiiuoeyxumd: difficulties missionaries
   gpxnmgpxuod: amalgamated amalgamates mathematics
   u: a i
   



How to break polyalphabetic substitution (extended Vigenere) ciphers:
- Determine the period
  - Kasiski's method: repeated plaintext trigraphs leak through
    into ciphertext; look for distances between repeated segments,
    and take gcd (use method to account for accidental repeats)
  - Index of Coincidence: see HAC, Stinson
    If r.v. X is some text (n characters worth) and i,j are
    uniform r.v.'s on {1,..,n}, let f(X) = Pr[X_i = X_j].  Then 
    f(X) ~ 0.065 if X is a random English text; f(X) = 1/26 ~ 0.038
    if X is random white noise; and we can estimate f(X) from a
    simple sub. ciphertext x by computing \sum_a p_a^2, where p_a
    denotes the fraction of characters of x that are equal to a.
    This gives us a test to distinguish between encryptions of
    English text and white noise, and so we can guess the period
    and use this test to confirm whether our guess is correct.
  - Autocorrelation: Compute g(t) = \sum_i 1_{x_i = x_{i+t}};
    if the cipher has period t, we expect g(t) to be large
    (say, g(t) ~ 0.065 n); otherwise, g(t) should be smaller
    (say, g(t) ~ 0.038 n).
- Then, separate into t separate simple substitutions, and
  solve each separately, possibly using digraphs to link them
  together.
 

How to break two-time pads:
- Text-xors.  Pick a crib, drag it through, extend on both ends.


How to break linear ciphers:
- Linear algebra.

=> Lessons: Types of attack (ciphertext-only, ...);
   complexity of attack (data complexity, workfactor, ...)