First week: Introduction

August 28 #1 Introduction: (Richard and Henry)
 

Course outline, projects, theory, practice. Why this is interesting,
challenging, and fun.
 

READINGS:

L. O'Gorman & R. Kasturi, ``Text Analysis and Recognition,'' Chapter 4 Preface in O'Gorman & Kasturi, pp. 161-179.

S. Mori, C. Y. Suen, & K. Yamamoto, ``Historical Review of OCR Research and Development,'' reprinted in O'Gorman & Kasturi, pp. 244-273.

This  next paper is useful but really describes their own specific system  (which is nearly, but not entirely “generic” to the field) S. Tsujimoto & H. Asada, ``Major Components of a Complete Text Reading System,'' reprinted in O'Gorman & Kasturi, pp. 298-314.

Aug 30: #2: Applications: the major niche markets (Henry)

   postal codes, checks, tables, forms, ...
   nominally "general" OCR:  desk-top OCR;  state of the art
   online "electronic ink",
   multilingual challenges. Open problems

G. Nagy, S. Seth, & M. Viswanathan, ``DIA, OCR, & the WWW,'' in Bunke & Wang, pp. 729-755.

P. W. Palumbo & S. N. Srihari, ``Postal Address Reading in Real Time,'' Int'l J. of Imaging Systems & Tech, Vol. 7, No. 4, Winter 1996, pp. 370-378.

C. Y. Suen, L. Lam, D. Guillevic, N. W. Strathy, M. Cheriet, J. N. Said, & R. Fan, ``Bank Check Processing System,'' Int'l J. of Imaging Systems & Tech., Vol. 7, No. 4, Winter 1996, pp. 392-403.

 A. L. Spitz, ``Multilingual Document Recognition,'' in Bunke & Wang, pp. 259-284.

 

 Sept 6 #3: The prerequisite engineering (Richard)

  what does hardware provide
  TIFF/ a simplified view
  compressed forms:  token-based compression, Digipaper,CPC, DJVU PDF, CPC
 output formats:  ASCII, Unicode, XML
(note : the following links are extracted from the PPT presentations and are worth perusing for background reading and projects.)

Information on TIFF

 

 

Digipaper (Xerox)

 

 

CPC (cartesian) compression

 

DJVu

 JStor application

 

 

ResearchIndex

 

Multivalent documents

 

 

Tilepics

 

UNICODE

 
Sept 11: #4: Methodological Outline  (Richard)
 
   - contrast w/ computer vision, image processing, ...
   - contrast w/ Gestalt theory, human reading, psychophysics of reading
   - image encodings, esp. high-resolution bi-level raster images
   - separation of text from non-text:  halftones, graphics
   - partitioning (segmentation into blocks, textlines, words, etc)
   - morphological transforms, thinning
 

READINGS:

T. M. Ha & H. Bunke, ``Image Processing Methods for Document Image Analysis,'' in Bunke & Wang, pp. 1-48.

Unfortunately this next paper doesn’t really give concrete examples; the lecture notes should help orient you so you can make some sense of this;  the paper also is not comprehensive in the sense that it avoids  median axis computation (deferred to part 2..)

 L. Lam, S-W Lee, & C. Y. Suen, ``Thinning Methodologies -- A Comprehensive Survey,'' reprinted in O'Gorman & Kasturi, pp. 61-77.

 

 Sept 13 #5 Metrics for Success (Richard)

·        Most tasks are currently performed rather well by humans; but  not all; e.g. fast browsing/ word-spotting

·        Performance measures dictated by downstream use

o        mostly non-human users (i.e. programs)

o       decision-theoretic cost model

§         string-matching measures (Levenstein distance, etc)

o       cost structure imposed by post-editing, effects of rejects and subs

o       (“economic value of information”)

o       error/reject characteristics

·        error feedback/detection/correction dependent on application

·        linguistic constraints

o       morphological, lexical, syntactic, semantic

o       language-based differences

READING

T.A. Nartker, “Benchmarking DIA Systems” in Bunke & Wang pp 801-820

PARC scoring program?

Hal Varian: Economics and Search: http://www.sims.berkeley.edu/~hal/Papers/sigir/sigir.html (though not entirely what we want, it includes a discussion of the value of uncertain information)

The rest of this readings list will be filled in as we progress through the course.

Approximate lecture numbers are noted.

6-10 Symbol Recognition

11-12 Recognition and Matching

13-14 Physics of Document Images

15-18 Layout Analysis

19-23 Contextual Analysis

24-25 Coding the Answer: output and their downstream fates

26-27 Proofreading, correction, interaction

28-30 Odds and Ends, student presentations