First week: Introduction
August 28 #1 Introduction: (Richard and Henry)
Course outline, projects, theory, practice. Why this is interesting,
challenging, and fun.
READINGS:
This next paper is useful but really describes their own specific system (which is nearly, but not entirely “generic” to the field) S. Tsujimoto & H. Asada, ``Major Components of a Complete Text Reading System,'' reprinted in O'Gorman & Kasturi, pp. 298-314.
Aug 30: #2: Applications: the major niche markets (Henry)
postal codes, checks, tables, forms, ...
nominally "general" OCR: desk-top OCR; state of the art
online "electronic ink",
multilingual challenges. Open problems
G. Nagy, S. Seth, & M. Viswanathan, ``DIA, OCR, & the WWW,'' in Bunke & Wang, pp. 729-755.
A. L. Spitz, ``Multilingual Document Recognition,'' in Bunke & Wang, pp. 259-284.
Sept 6 #3: The prerequisite engineering (Richard)
what does hardware provide
TIFF/ a simplified view
compressed forms: token-based compression, Digipaper,CPC, DJVU PDF, CPC
output formats: ASCII, Unicode, XML
(note : the following links are extracted from the PPT presentations and are worth perusing for background reading and projects.)
READINGS:
Unfortunately
this next paper doesn’t really give concrete examples; the lecture notes should
help orient you so you can make some sense of this; the paper also is not comprehensive in the sense that it
avoids median axis computation
(deferred to part 2..) Sept 13 #5 Metrics for Success (Richard) ·
Most tasks are currently performed rather well by
humans; but not all; e.g. fast
browsing/ word-spotting ·
Performance measures dictated by downstream use o
mostly
non-human users (i.e. programs) o
decision-theoretic cost model §
string-matching measures (Levenstein distance, etc) o
cost structure imposed by post-editing, effects of
rejects and subs o
(“economic value of information”) o
error/reject characteristics ·
error feedback/detection/correction dependent on
application ·
linguistic constraints o
morphological, lexical, syntactic, semantic o
language-based differences READING T.A. Nartker, “Benchmarking DIA Systems” in Bunke
& Wang pp 801-820 PARC
scoring program? Hal Varian:
Economics and Search: http://www.sims.berkeley.edu/~hal/Papers/sigir/sigir.html
(though not entirely what we want, it includes a discussion of the value of
uncertain information) The
rest of this readings list will be filled in as we progress through the course. Approximate
lecture numbers are noted. 6-10
Symbol Recognition 11-12
Recognition and Matching 13-14
Physics of Document Images 15-18
Layout Analysis 19-23
Contextual Analysis 24-25
Coding the Answer: output and their downstream fates 26-27
Proofreading, correction, interaction 28-30
Odds and Ends, student presentations
Sept 11: #4: Methodological Outline (Richard)
- contrast w/ computer vision, image processing, ...
- contrast w/ Gestalt theory, human reading, psychophysics of reading
- image encodings, esp. high-resolution bi-level raster images
- separation of text from non-text: halftones, graphics
- partitioning (segmentation into blocks, textlines, words, etc)
- morphological transforms, thinning