Informal talk at UWO June 11, 2003 Problems and Solutions in Recognizing Handwritten Mathematics Prof. Richard Fateman Computer Science University of California, Berkeley This talk is based on our experience at UC Berkeley in writing programs to read mathematics formulas from printed form and translating these into an electronic form that can be manipulated further. There are differences in the commercial approach to OCR (or more properly, Document Image Analysis) and handwriting, but it is likely that for the task of reading math, the commercial approaches tend to be misdirected. One may be thrown back on one's own resources, for reasons to be explained below. There are many plausible routes to solving each of the related small problems, many of which lead to dead ends, delicate and error-prone methods, and morasses of unmaintainable programs. In the hundreds of projects of this nature implemented since 1963, intractable difficulties are often overcome by over-simplifying the problem. By declaring success on toy problems, one can then move on to other problems, or (often) pick a well-defined sub-problem and address that. We discuss the route taken by our group at Berkeley to resolve issues of character recognition, disambiguation of characters and other symbols, lexical recognition, geometric parsing, (e.g. pictures into TeX), and finally ambiguous string-math parsing into a superset of ``Maple'' or equivalent. We speculate on the true advantages of on-line (real-time handwriting) recognition and other uses of interaction to improve the communication of mathematics from humans to computers. .... points A program that makes too many mistakes is slow, inefficient, useless. Correcting an incorrect result changes the dynamics and cost considerably. If handwriting input is NOT the right thing to do, you can consider other tasks, like gesture-directed editing of draft mathematics, for which the same equipment -- pen, tablet, voice input -- could be used. You can consider math as a subset of text, or text as merely a special case of ``linear math''. It is unlikely that anyone else's text recognition program will be remotely useful in math unless special efforts have been made to accomodate math. It is unlikely that YOUR efforts to do a bang-up job on math will, as a side effect do a superior job on text unless you use all the tricks of the trade, including letter frequencies, dictionaries, specialized or common sense knowledge, etc. Special symbols, integral, sigma, especially lines >=, a/b etc more attention to super/sub script, size, etc. Oddly enough you may not care to distinguish 1l, O0, etc. as characters, since any effort you make at the character level is devoid of context. Later on you can distinguish 3.l4l5 from 3.1415, and O.OO1 from 0.001. ......... How we expect handwriting math to work (first approach, believing it will in fact work like humans). Phase 1. Blank tablet --> write a symbol --> collect character id, location, size, (offshoot of this: uncertainty, error estimation, correction, LEARNING) LOSING information: details of strokes, timing, less-likely interpretations. (Do we maintain a list of possibilities, e.g. 1 20\%, I 15\% | 10 \% ? We don't but you could, if you want to make parsing much harder, but with a chance of recovery. We prefer to leave ambiguity in the character, e.g. ``vertical stroke''. Most other uncertainties, e.g. alpha or ``a'' will NOT be resolved later and so what can one do? exponentially many parses? Phase 2: Collection of boxes: a. process boxes with horizontal lines to distinguish -, = overbar, underbar, >=, a/b, etc. Also, Look at all the DOTS as separate characters, even if you think that they might be parts of letters like i, j. Consider 5.6 7.8 h k which is (* (^ h 5.6)(k 7.8)) vs 5.6 7.8 h l k which is perhaps (* (^ h 5)(^i 6)(k 7.8)) 5 6 7.8 h i k . X makes sense ... (deriv x t) usually. b. process lexical tokens, e.g. 3, ., 1, 4, ...--> 3.14, a number s i n -> sin an operator LOSING information: exact sizes of characters. Phase 3: {One can formalize the geometric parsing for sub, super, etc grammar. This was done rather exhaustively early on, R. Anderson (1965 or so) and try to do this with great generality. This is probably a losing technology. Maintaining a consistent unambiguous grammar is probably impossible for any decent math subset. Recursive descent left-to-write seems more plausible (implemented!) and essentially instantaneous. Flexibility in modifying the patterns based on heuristics (gaps, margins, sizes) on a per-operator basis. Essentially, based on a governing operator, match patterns in some order, above, below, right, recursively. Special hacks for fences or brackets. Concepts of looking recursively for an expression in a region, e.g. above a divide bar, above and to the right of an expression (superscript) but delimited how? (How far above and to the right do you look) Hand made recognizer is not very large and can be extended to cover examples.} Collect via parsing, left to right top-down, and hope there are no exceptions to this general rule. Start with the whole collection and look for the leftmost symbol. These boxes still have a lot of geometric info. e.g. divide-bar --> recursively parse numerator in ``above'' region and in below region. Remove pieces as recognized, and build up a block. then if still more elements, concatenate to the right (etc) operator, implicit multiply look to the right of the collection so far. similar rules for sigma, integral (look for the dx) bracket operators like () or [] must be balanced, generally. exception for \{ \} which may be a conditional or choice. How to deal with multiple lines... paste lower lines to the right of the initial line. Result is something like (hbox (dividebox (superbox a x) (subbox b 3)) z) . LOSING information : Geometric information is now relative, not absolute and numeric. Phase 4 prepare the result Now this is a linear expression. write it algebraically instead of geometrically. (superbox a x) becomes (expt a x), (hbox a b) becomes (times a b) (hbox sin x) becomes (sin x) or maybe (apply sin x) note that \verb|\cos^2x| is just fine for \TeX, but Maple needs \verb|(cos(x))\ ^ 2| We can convert some form (I used a convenient Lisp parenthesized prefix form) to TeX , Maple, MathML. In each case one LOSES even more information.