De-skewing



next up previous
Next: Connected Components Up: Utility Programs Previous: Utility Programs

De-skewing

Pages scanned or printed by a mechanical device are ordinarily read at a slight angle. While humans do not usually have difficulty reading such skewed pages, it is advantageous for computer process to remove skew when possible. Two programs are provided:

skew(p) given a picture p computes heuristically a skew angle d in degrees that the picture is tilted. Typically the angle will be less than degrees.

unskew(p d) given a picture and an angle, produces a new picture that is unskewed.

There are a number of recent papers (refs) written on deskewing algorithms, but it appears that a simple and useful approach is based on a simple observation. Compute the distribution of black dots on a line-by-line basis in a page of text (or mostly text). If the scan lines are aligned with the text grid, there will be substantial numbers of blank or nearly-blank lines, and substantial numbers of lines with considerable black density. In the case of horizontal lines (say as a table-rule, underline, or mathematical fraction divide-bar), the line may even be a majority of black bits.

On the other hand, if the scan lines are at an angle to the text (think of it as 30 or 45 degrees), then the distribution of white and black bits will be far more uniform.

A statistical measure of the variance of the distribution of bit-counts is easy to compute: we, in effect, do a ``ray scan'' of lines oriented horizontally (very easy and fast), or at slight positive and negative angles. The scanning angle that corresponds most closely to the skew angle will have the highest variance. A short somewhat heuristic search can identify the ``best'' angle.

Given our representation, scanning at a zero angle is very inexpensive, but increases as the angle increases. Counting the number of black bits in a full row is a simple operation, and we can compute the variance of 6.tif at zero degrees in less than 0.1 second. The cost at one degree is more than one second: to scan a single ``row'' one must scan and sum up the count of black bits from some original row for bit positions 0 to 56 bits, and then hop up one row and scan bit positions 57 to 113 (etc.). The step-size for one degree is approximately = 57.29. Hoping up one row and skipping to the interval that encloses the appropriate bits can eventually become time-consuminggif

Fortunately, the statistical computation need not rely on looking at every scan-line, and therefore one can select every kth row (perhaps with a random perturbation), to speed up the calculation.

We have programmed two picture-oriented transformations that, when combined, can form a near-rotation of a picture. Each of the transformations is a shear: think of a stack of dominos sitting on a table. Pushing it sideways so that each domino is shifted a distance proportional to its height, is a horizontal shear. This is a row-preserving transformation we call a ``slant.'' If each row going upward progressively moves further to the right as the row number increases, we have ``italicized'' the picture. The second kind of shear is at right angles to the first shear: a column-preserving transformation we refer to as a ``tilt''. A combination of the two is nearly a rotation for small angles. An actual rotation would require that in each of the two shears, the amount of shifting varies as the row (or the column) changes (Ref: Foley, van Dam etal 829). In particular, their row-preserving transformation is and the column-preserving transformation is

How far off is the maximum error in using our near-rotation? The approximation we use is based on the fact that and (in radians) are approximately equal for small , and that is approximately one. Thus

For an angle of one degree (.01745 radians) and an 8.5 by 11 inch page scanned at 300dpi, or even at 1000dpi the true rotation point given by the formula is off by one pixel at the worst. (A page with such a slant might have been inserted in the copier offset about 0.2 inches out of 11).

At 3 degrees, and 300dpi, the center of the page is displaced = (1, 3) pixels, about 0.01 inches from the true location, and the worst case, the upper right-hand corner is displaced by (4, 4) pixels, still less than 0.02 inch off the correct position. This represents a slant of about 0.58 inches out of 11; this would be a quite noticeable skew.

We do not expect pages to be scanned skewed at angles approaching 10 degrees, since that represents a slant of about 2 inches out of 11. This is quite a crooked page.

For 10 degrees the deviation of our formula from a true rotation is substantial, with a (33, 47) displacement at the upper right-hand corner, nearly 0.2 inches.

Note that deskewing necessarily puts a jag into the data at the point of the corrections. One alternative approach to deskewing would be to maintain all intra-connected-component relationships and only deskew BETWEEN such objects. This keeps the letter shapes unchanged, but levels out the base-lines of text. For small letters this is fine, but presents a problem for those connected components, the long divide bar in fractions, for example that we would also like to deskew. For the present, we deskew full pages and suffer the consequences of the occasional letter with a jag in it.



next up previous
Next: Connected Components Up: Utility Programs Previous: Utility Programs



Class Account
Fri Dec 1 14:31:16 PST 1995