First week: Introduction
August 28 #1 Introduction: (Richard and Henry)
Course outline, projects, theory, practice. Why this is interesting,
challenging, and fun.
READINGS:
This next paper is useful but really describes their own specific system (which is nearly, but not entirely “generic” to the field) S. Tsujimoto & H. Asada, ``Major Components of a Complete Text Reading System,'' reprinted in O'Gorman & Kasturi, pp. 298-314.
Aug 30: #2: Applications: the major niche markets (Henry)
postal codes, checks, tables, forms, ...
nominally "general" OCR: desk-top OCR; state of the art
online "electronic ink",
multilingual challenges. Open problems
G. Nagy, S. Seth, & M. Viswanathan, ``DIA, OCR, & the WWW,'' in Bunke & Wang, pp. 729-755.
A. L. Spitz, ``Multilingual Document Recognition,'' in Bunke & Wang, pp. 259-284.
Sept 6 #3: The prerequisite engineering (Richard)
what does hardware provide
TIFF/ a simplified view
compressed forms: token-based compression, Digipaper,CPC, DJVU PDF, CPC
output formats: ASCII, Unicode, XML
(note : the following links are extracted from the PPT presentations and are worth perusing for background reading and projects.)
Sept 11: #4: Methodological Outline (Richard)
- contrast w/ computer vision, image processing, ...
- contrast w/ Gestalt theory, human reading, psychophysics of reading
- image encodings, esp. high-resolution bi-level raster images
- separation of text from non-text: halftones, graphics
- partitioning (segmentation into blocks, textlines, words, etc)
- morphological transforms, thinning
READINGS:
Unfortunately this next paper doesn’t really give concrete examples; the lecture notes should help orient you so you can make some sense of this; the paper also is not comprehensive in the sense that it avoids median axis computation (deferred to part 2..)
Sept 13 #5 Metrics for Success (Richard)
· Most tasks are currently performed rather well by humans; but not all; e.g. fast browsing/ word-spotting
· Performance measures dictated by downstream use
o mostly non-human users (i.e. programs)
o decision-theoretic cost model
§ string-matching measures (Levenstein distance, etc)
o cost structure imposed by post-editing, effects of rejects and subs
o (“economic value of information”)
o error/reject characteristics
· error feedback/detection/correction dependent on application
· linguistic constraints
o morphological, lexical, syntactic, semantic
o language-based differences
READING
T.A. Nartker, “Benchmarking DIA Systems” in Bunke & Wang pp 801-820
PARC scoring program?
Hal Varian: Economics and Search: http://www.sims.berkeley.edu/~hal/Papers/sigir/sigir.html (though not entirely what we want, it includes a discussion of the value of uncertain information)
The rest of this readings list will be filled in as we progress through the course.
Approximate lecture numbers are noted.
6-10 Symbol Recognition
11-12 Recognition and Matching
13-14 Physics of Document Images
15-18 Layout Analysis
19-23 Contextual Analysis
24-25 Coding the Answer: output and their downstream fates
26-27 Proofreading, correction, interaction
28-30 Odds and Ends, student presentations