The first cut at computing with images is likely to be based on dealing with an array of numbers representing pixel values/colors. The value may be an index into a color map, or some other quantity. In the case of a black and white image, an array of zero and one bits seems to suffice.
We found this initial approach is impractical except for rather small images and/or low resolutions. Recall that we are scanning at 400 to 600 dots per inch. Displaying a 600dpi 8.5 by 11 inch picture on a 72 dot/inch workstation screen means that we would need a display lineally about 8.3 times larger than the document size, or about 7.6 by 6 feet. It also means we are using about 4.2 megabytes of RAM for the buffer, at 1 bit per pixel.
We are not saying that one needs to display such full pages at full resolution - just that this is a substantial sized array - and computing with it on a bit-wise basis is not something that one should do casually.
Normally, some substantially compressed data format is used for raster images. Example:
6.tif is a b/w tiff file of a picture of a page approximately
8.2 by 5.7 inches in size, scanned at about 600 dots per inch.
It happens to be a page (half of a double page spread that we
scanned at the same time) from a table of mathematical formulas,
[][].
This image, comprising 3408 by 5064 bits,
is delivered by our scanner and associated
software as a compressed TIF file of 72,592 bytes.
In terms of black and white bits, uncompressed, it would be megabits, or 2.16 megabytes. Thus the TIF file represents
a 30:1 compression.
At first blush it would seem that to do any processing, we would have to unpack the image to bits, and it is certainly possible to do so. Yet the raster image as an array is actually not as convenient as one might think. A basic operation in character recognition and document processing would seem to require acess to successive individual bits: one is routinely trying to find out the extent of a uniformly-colored region. Yet access to the individual adjacent bits in a word, one-by-one may not be the ideal access. A better result would be one that provided higher-level groupings. Ideally, if we had one per character, much of our processing would be done! In point of fact, we have changed our representation to be a run-length encoding, and have found this to be enormously advantageous in space and time.
Consider an image to be a sequence of (say, horizontal) scan-lines.
Each scan-line is a sequence of pairs, where
is an integer
denoting the start of a section of black bits, and
is the corresponding
end index. The endpoints of these intervals can be encoded in a modest number
of bits: For example, with 13 bits we can
encode numbers up to
: assuming 600dpi this limits
us to 54.6 inches across, which is far wider than our scanners.
Converted into our picture data as intervals, 6.tif is 3408 lines of intervals. If we scan this along the long dimension (arguably the less efficient way if we wish to compress horizonal lines efficiently) we find this particular page has a total of 78,873 entries in the scan lines, where, as indicated above, each entry is a pair of numbers. How much storage does this take? Our current design uses a few words for descriptive header information, and then:
1. An array of 3408 words (pointers) to the data structure for each scan-line;
2. A scan-line structure of variable length. The
conceptually simplest Lisp
structure
would use 4 words for each pair: two Lisp CONS cells: one for
maintaining the ``backbone'' of the list, and another for the
pair
(s . e)
of start and end indexes. This would look like
((23 . 47) (102 . 131) ...)
.
With this encoding, we would use a total of 315,492 words for the lists of pairs. Considering the array and the pairs together we have a total of: 318,900 words or 1.275 megabytes for the representation of this picture.
This is somewhat wasteful: Since the indexes are numbers less than
two can easily be stored as ``immediate'' values in the same space
as a 32-bit pointer: In particular
in our Lisp, numbers less that
are easily
packed into one ``fixnum'' (an encoded immediate number form). So if
we care to pack these integers two-per-word,
we now have a linked list of fixnums (not conses) of length 78873.
Although the exact
details are irrelevant, we actually use two 16-bit fields (``short'' in
C-language parlance) for storing the two parts.
For many computers extracting bits by loading
half-words from a register (or shifting) is much faster than
following a pointer to memory, so this is a savings in time and space.
We now have:
1. 3408 words, one for each line
2. 1 Cons cell (= 2 words each) for each of 78873 pairs = 157746 words. for total storage of 645 kilobytes.
This is the representation we use.
Another step in storage reduction would be to replace the lists by vectors, reducing the storage to 1 word per pair, and the total storage to about 320 kbytes. But the use of linked lists is such a convenience that we are not especially eager to trade that off for space.
{