(Adminstrivia: relation to CS276, CS261; office hours; workload; etc.)
How to break simple substitution ciphers:
- Frequency analysis.
(e has prob. 0.12, t,a,o,i,n,s,h,r has 0.06-0.09, d,l 0.04,
c,u,m,w,f,g,y,p,b 0.015-0.028, v,k,j,x,q,z < 0.01)
- Digraph analysis
(most common: th, he, in, er, an, re, ed, on, es, st, en, at,
to, nt, ha, nd, ou, ea, ng, as, or, ti, is, et, it, ar, te, se, hi,
of (in order))
- Trigraphs
(the, ing, and, her, ere, ent, tha, nth, was, eth, for, dth)
- Word patterns
(e.g., WXYYXYYXZZX probably corresponds to MISSISSIPI)
- Probable plaintext
(e.g., pick the word "mathematics" and drag it through the ciphertext)
- Vowel-consonant contacts
(the low-frequency things are probably consonants; vowels
contact them a lot; and you can iterate)
Q: relate this to Hidden Markov models
do not worry about your difficulties in mathematics.
i assure you that mine are greater. - albert einstein
kt vtx ztrra pltex ater kuiiuoeyxumd uv gpxnmgpxuod.
u pdderm ate xnpx guvm prm jrmpxmr. - pylmrx muvdxmuv
abcdefghijklmnopqrstuvwxyz ykqsupmjfgdbehcazrvoinxtlw
plokmijnuhbygvtfcrdxeszwaq abcdefghijklmnopqrstuvwxyz
above ciphertext has 86 letters, in total
frequency analysis of above ciphertext:
10: x m 5: v e d
9: u 3: g a
8: r p 2: y o n l k i
6: t 1: z j
digraph analysis of above ciphertext:
4: uv px 3: te rm
2: xu xn xm uo mu mr gp er at
trigraph analysis of above ciphertext:
2: gpx muv ate
word patterns:
kuiiuoeyxumd: difficulties missionaries
gpxnmgpxuod: amalgamated amalgamates mathematics
u: a i
How to break polyalphabetic substitution (extended Vigenere) ciphers:
- Determine the period
- Kasiski's method: repeated plaintext trigraphs leak through
into ciphertext; look for distances between repeated segments,
and take gcd (use method to account for accidental repeats)
- Index of Coincidence: see HAC, Stinson
If r.v. X is some text (n characters worth) and i,j are
uniform r.v.'s on {1,..,n}, let f(X) = Pr[X_i = X_j]. Then
f(X) ~ 0.065 if X is a random English text; f(X) = 1/26 ~ 0.038
if X is random white noise; and we can estimate f(X) from a
simple sub. ciphertext x by computing \sum_a p_a^2, where p_a
denotes the fraction of characters of x that are equal to a.
This gives us a test to distinguish between encryptions of
English text and white noise, and so we can guess the period
and use this test to confirm whether our guess is correct.
- Autocorrelation: Compute g(t) = \sum_i 1_{x_i = x_{i+t}};
if the cipher has period t, we expect g(t) to be large
(say, g(t) ~ 0.065 n); otherwise, g(t) should be smaller
(say, g(t) ~ 0.038 n).
- Then, separate into t separate simple substitutions, and
solve each separately, possibly using digraphs to link them
together.
How to break two-time pads:
- Text-xors. Pick a crib, drag it through, extend on both ends.
How to break linear ciphers:
- Linear algebra.
=> Lessons: Types of attack (ciphertext-only, ...);
complexity of attack (data complexity, workfactor, ...)