Skew Detection and Correction
Our recognition algorithm relies on the characters of a page
being oriented correctly, for this reason detection and correction of any skew the page may have been scanned at is vital. Fateman has done some work in this area.
Skew Detection
There are several commonly used methods for detecting skew in a page,
some rely on detecting connected components (for many purposes, they are
roughly equivalent to characters) and finding the average angles
connecting their centroids. The method we employed (after observing it in Fateman's program) was to project the
page at several angles, and determine the variance in the number of
black pixels per projected line.
The projetion parallel to the true alignment of the lines will likely have the the maximum variance, since when parallel, each given ray
projected through the image will hit either almost no black pixels (as
it passes between text lines) or many black pixels (while passing
through many characters in sequence).
Sample image and a projection parallel to its text.
Oblique projections will commonly pass both through lines of text, and spaces between lines, the variance in the number of pixels hit by the individual rays will thus be smaller than in the parallel case.
Sample image and an oblique projection. Notice the uniform nature of the distribution
After detecting the skew, it is necessary to correct for it.
Skew Correction
After the skew angle of the page has been detected, our recognition
algorithm demands that the page must be rotated to correct for this
skew. Our rotation algorithm had to be both fairly fast and fairly accurate.
We looked into two strategies for correcting slight skews in scanned
images.
The first was a pure coordinate transformation, which takes a little
bit of time on large images, but gets the rotation exact. The next was somewhat less accurate, but faster (with the proper data
structures) and works pretty well with slight skews that are common
in scanned images. The second
method uses two shears to achieve its result. Our final choice was the
coordinate rotation, since we were focusing more on accuracy than speed
and the this method was not horrendously slow.
An almost working rotation:
=>