Find connected components Local Morphological transformations Analyze grey scale.? Learn characters by properties Look up characters by properties Compare bitmaps Top-down tools Scanning at angles Skew detection De-skewing Parsing of rectangular regions / Math Other kinds of parsing
THE FOLLOWING UNORDERED PIECES ARE GATHERED HERE FOR REFERENCE...
Computing property vectors.
As one classification technique for identifying connected
components as potential characters,
we
divide the rectangular region into by
regions and count
the ratio of black vs. white bits in each region.
(we currently use 5 by 5, but more or fewer divisions could be used).
For each character we compute the 25 values, each
scaled from 0 to 255, for convenience in storage.
The cost for this computation
(for the 5 by 5 scale) averages (for page35) to about .51ms, or
about 1958 computations per second. (1.09 seconds for the 2134
items on page35).
We can compute
the (optionally, weighted) distance between property vectors and
as
the Euclidean distance:
.
We found it advantageous to make an ``absolute'' grouping
for complete characters; that is
we would only bother to compare
and
if they were within 10%
in absolute height and their height to width ratio was within 10%.
This characterization assume (in general this is an optimistic
assumption) that all the parts of the character are connected, or
have, by means of heuristics, been put into a single picture form.
Note that there are some absolutely or relatively tall or wide characters in our domain of interest: divide bars for fractions; tall parentheses or integral signs, etc.
Another routine computes the count of the bits in the exclusive-or
of two bitmaps A and B.
This is done rather faster than computing the x-or followed
by a bitcount:
First note that
we can get a quick upper and lower bound on K=count(xor(A,B)):
assume we precompute count(A) and count(B). Then
let M= max(count(A),count(B)), N=min(count(A),count(B)).
also if we compute count(and(A,B)), then K = count(A)+count(B)-2 count(and(A,B)).
The big hazard here is that to make this a good detector of similarity, the bitmaps for A and B must be aligned accurately. To achieve this with minimal cost, it seems appropriate to pre-compute for each template, a number of shifted images. This tends to proliferate templates dramatically: In addition to shifting North/South and East/West, one can shift NW (etc.). Programs to shift are called:
(In the past we have used other measures such as the Hausdorf distance; this is less sensitive to minor shifts or rotations. We are unconvinced that the complexity of this computation is justified.)
Technology used: All of the library libtiff is available in the lisp system. The specifications are mapped from the usual C-interface into Lisp, roughly as follows. We have prepared a special version of the Allegro Common Lisp system by loaded it with libtiff, and then dumping it back out. Thus subsequent users do not have to specify the library, nor wait for the loading. Next, any functions that are needed are mapped on to lisp names: for example, the declarations for TIFFOpen and TIFFReadScanline look like this:
(defforeign 'tiffopen :entry-point (convert-to-lang "TIFFOpen") :arguments '(string string) :return-type :integer) (defforeign 'tiffreadscanline :entry-point (convert-to-lang "TIFFReadScanline") :arguments '(t array integer integer) )
Any conversion of Lisp's data types (string, integer, etc) to the C conventions is done automatically, with the exception of return-values by reference. In such cases, the Lisp programmer passes a fixnum array A of length 1 as the argument to the C program, and on return, the value of A[0] (or in Lisp, (aref a 0) is set).
We wrote one program in C, called solely by tiff2pict to convert the tiff scanlines into interval endpoint data, and continue to use it, although it provides only about a factor of 2 speed improvement.
implementation notes: tiff2pict calls a C-language routine defined in the file cfuns10.c with two arguments. one is a bit array that has just been filled in by TIFFreadscanline, and the other is a scratch-array of fixnums. The vector of bits represents background and foreground colors in the picture. Unfortunately, the order of the bits is backwards in bytes, so we have to read them out (this is what the C routine does) in order 7-6-5-4-3-2-1-0-15-14-13-12-11-10-9-8 etc. When the color changes from background to foreground, we record in next 16 bits of the fixnum array, the index or position of the start of a foreground interval. When the color changes to background, we record in the next 16 bits of the fixnum array, the end. This working array has been provided by the Lisp program. Now the array looks like
-------------------------------- | s0 | e0 | s1 | e1 | ... -------------------------------- (each segment is 16 bits).
on returning to Lisp this object as an array of 32-bit fixnum quantities. Because we are going to be handling many such vectors, of varying lengths, and in a manner that may require us to add or delete intervals from the middle, a linked list representation of this would seem to be a good representation. Thus we can represent a scanline from the file (a row in the picture) by a list like ((s0 . e0) (s1 . e1) ....) or we could have a list of pairs of 16-bit numbers packed into 32-bit INOBs ``Immediate Number-Like Objects''. That is, n0 = s0e0 packed together, n1 = s1e1 packed together, etc. Thus the row is now a list (n0 n1 ...). Since INOBs, as long as we do not use ALL 32 bits, but only 30, are packed nicely by lisp into the same space as a pointer, very little excess storage allocation is needed.
There is nothing essential in the computing requiring the C-language program, and in fact we have a version entirely in Lisp: This also calls the tiff library, but does the fancy footwork for decoding, directly in lisp. This is called tiff3pict. In fact, the Lisp system's handling of bit vectors seems sufficiently fast that we left the program pict2tiff without a C-language helper.
The major operator of the C program is to count how many bits in a row are the same color, where ``in a row'' involves counting backwards in a byte (Presumably this is caused of byte-order differences between our scanner's native mode and our machine's native mode.)