Informal talk at UWO June 11, 2003
Problems and Solutions in Recognizing Handwritten Mathematics
Prof. Richard Fateman
Computer Science
University of California, Berkeley
This talk is based on our experience at UC Berkeley in writing
programs to read mathematics formulas from printed form and
translating these into an electronic form that can be manipulated
further. There are differences in the commercial approach to
OCR (or more properly, Document Image Analysis) and handwriting,
but it is likely that for the task of reading math, the commercial
approaches tend to be misdirected. One may be thrown back on
one's own resources, for reasons to be explained below.
There are many plausible routes to solving each of the related small
problems, many of which lead to dead ends, delicate and error-prone
methods, and morasses of unmaintainable programs.
In the hundreds of projects of this nature implemented since 1963,
intractable difficulties are often overcome by over-simplifying the
problem. By declaring success on toy problems, one can then move
on to other problems, or (often) pick a well-defined sub-problem
and address that.
We discuss the route taken by our group at Berkeley to resolve issues
of character recognition, disambiguation of characters and other
symbols, lexical recognition, geometric parsing, (e.g. pictures into
TeX), and finally ambiguous string-math parsing into a superset of
``Maple'' or equivalent.
We speculate on the true advantages of on-line (real-time handwriting)
recognition and other uses of interaction to improve the communication
of mathematics from humans to computers.
....
points
A program that makes too many mistakes is slow, inefficient, useless.
Correcting an incorrect result changes the dynamics and
cost considerably.
If handwriting input is NOT the right thing to do, you can consider other
tasks, like gesture-directed editing of draft mathematics, for
which the same equipment -- pen, tablet, voice input -- could be used.
You can consider math as a subset of text, or text as merely a special
case of ``linear math''. It is unlikely that anyone else's text recognition
program will be remotely useful in math unless special efforts have
been made to accomodate math. It is unlikely that YOUR efforts
to do a bang-up job on math will, as a side effect do a superior
job on text unless you use all the tricks of the trade, including
letter frequencies, dictionaries, specialized or common sense knowledge,
etc.
Special symbols, integral, sigma, especially lines >=, a/b etc
more attention to super/sub script, size, etc.
Oddly enough you may not care to distinguish 1l, O0, etc. as characters,
since any effort you make at the character level is devoid of context.
Later on you can distinguish 3.l4l5 from 3.1415, and O.OO1 from 0.001.
.........
How we expect handwriting math to work (first approach, believing
it will in fact work like humans).
Phase 1.
Blank tablet --> write a symbol --> collect character id, location, size,
(offshoot of this: uncertainty, error estimation, correction, LEARNING)
LOSING information: details of strokes, timing, less-likely
interpretations. (Do we maintain a list of possibilities, e.g. 1
20\%, I 15\% | 10 \% ? We don't but you could, if you want to make
parsing much harder, but with a chance of recovery. We prefer to
leave ambiguity in the character, e.g. ``vertical stroke''. Most
other uncertainties, e.g. alpha or ``a'' will NOT be resolved later and
so what can one do? exponentially many parses?
Phase 2: Collection of boxes:
a. process boxes with horizontal lines to distinguish -, = overbar, underbar,
>=, a/b, etc.
Also, Look at all the DOTS as separate characters, even if you think that they might be parts of letters like i, j.
Consider
5.6 7.8
h k which is (* (^ h 5.6)(k 7.8))
vs
5.6 7.8
h l k which is perhaps (* (^ h 5)(^i 6)(k 7.8))
5 6 7.8
h i k
.
X makes sense ... (deriv x t) usually.
b. process lexical tokens, e.g. 3, ., 1, 4, ...--> 3.14, a number
s i n -> sin an operator
LOSING information: exact sizes of characters.
Phase 3:
{One can formalize the geometric parsing for sub, super, etc
grammar. This was done rather exhaustively early on, R. Anderson (1965 or so)
and try to do this with great generality. This is probably
a losing technology.
Maintaining a consistent unambiguous grammar is probably impossible
for any decent math subset.
Recursive descent left-to-write seems more plausible (implemented!) and
essentially instantaneous. Flexibility in modifying the patterns based
on heuristics (gaps, margins, sizes) on a per-operator basis.
Essentially, based on a governing operator, match patterns in some
order, above, below, right, recursively.
Special hacks for fences or brackets.
Concepts of looking recursively for an expression in a region, e.g.
above a divide bar, above and to the right of an expression (superscript)
but delimited how? (How far above and to the right do you look)
Hand made recognizer is not very large and can be extended
to cover examples.}
Collect via parsing, left to right top-down, and hope there are
no exceptions to this general rule. Start with the whole collection
and look for the leftmost symbol. These boxes still have a lot of
geometric info.
e.g.
divide-bar --> recursively parse numerator in ``above'' region
and in below region. Remove pieces as recognized, and build up
a block.
then if still more elements, concatenate to the right (etc)
operator, implicit multiply look to the right of the collection so far.
similar rules for sigma, integral (look for the dx)
bracket operators like () or [] must be balanced, generally.
exception for \{ \} which may be a conditional or choice.
How to deal with multiple lines... paste lower lines to the right of
the initial line.
Result is something like
(hbox (dividebox (superbox a x) (subbox b 3)) z) .
LOSING information : Geometric information is now relative,
not absolute and numeric.
Phase 4 prepare the result
Now this is a linear expression. write it algebraically instead
of geometrically.
(superbox a x) becomes (expt a x),
(hbox a b) becomes (times a b)
(hbox sin x) becomes (sin x) or maybe (apply sin x)
note that \verb|\cos^2x| is just fine for \TeX, but
Maple needs \verb|(cos(x))\ ^ 2|
We can convert some form (I used a convenient Lisp parenthesized
prefix form) to TeX , Maple, MathML. In each case one LOSES
even more information.