Sunday, December 18, 2011

Scanning and Photocopying Documents With a Digital Camera

Occasionally one needs to make backup copies of important paper documents in case they get lost in the mail. In addition to the obvious ways of doing this, I’d like to offer a quick and potentially less-wasteful alternative:

  •  Scan them quickly with a digital camera and only print them if necessary.
A well-focused 3 megapixel image should produce results comparable to a high-resolution fax (about 200dpi).
The trick to making this practical is an effortless transformation which converts the images you get from your camera into something black and white which can print like a photocopy (if necessary) and is highly compressed if you just want to store or email it. For that, I offer first the following simple one-line “ImageMagick” command:

mogrify -format tif -colorspace gray -compress group4 -resize 5120x5120 -normalize -threshold 65% *.jpg 

Full IRS Tax Form Photo 1.2MB
piece of tax photo
Full Black and White TIF 98KB
piece of tax.tif

which produces the image above right from the one on the left. The threshold value: 65% sets how light a pixel can be and still be black. You may need to adjust up (darker) or down (lighter) depending on your camera’s exposure metering.
ImageMagick is free OpenSource software which runs from a command line on Windows, Linux/Unix, Macintosh OS-X, and Windows+Cygwin and is available from http://www.imagemagick.com. (It is also installed by default on many Unix/Linux systems, including those at UW managed by C&C, so if you don’t already have it on your PC, that is another option to consider). Binaries of ImageMagick are here:ftp://ftp.imagemagick.org/pub/ImageMagick/binaries/ or here: ftp://gd.tuwien.ac.at/pub/graphics/ImageMagick/binaries/.
Unfortunately, the simple ImageMagick command above may obscure text on a non-white background and is somewhat sensitive to variations in exposure metering. If you’re using ImageMagick under Unix/Linux/OS-X/Cygwin, you can use my scancvt shell script instead which does a better job by subtracting out a local average background value before thresholding. Scancvt also creates two output files (in the current directory): “b-name.tif” and “g-name.jpg” (where name refers to the intput filename). Hopefully one of them will be perfect for your needs.
When you take the pictures, put each original on a white background on the floor, use flash, take the pictures from about waist-height (2.5-3 feet away) and use zoom to fill the frame with your document. (If you’re doing many pages and your camera has a manual focus feature, use it so you won’t need to verify each picture is in focus.)
How well does scancvt work? Judge for yourself. Here are 2 examples taken with a hand-held 3-megapixel point and shoot camera and processed with the scancvt script:
Full IRS Tax Form Photo 1.2MB
piece of tax photo
piece of b-tax.tifFull Black and White TIF 106KBFull Grayscale JPG 1.1MB
piece of g-tax.jpg

Full Voters Pamphlet Photo 1.5MB
piece of voter photo
Full Black and White TIF 144KB
piece of b-vote.tif
Full Grayscale JPG 1.5MB
piece of g-vote.jpg
If your images contain sensitive information, consider encrypting them. TrueCrypt (http://www.truecrypt.org) is one good/easy/free option for doing that.
Because Windows, Macintosh and Linux all come with software which can easily display and print TIF files, what follows is only a matter of convenience. It turns out the TIF files above can be quickly converted into similarly sized and completely equivalent PDF files or combined into a single multi-page PDF file. This tiff2pdf shell script will do that for you on Linux/Unix/OS-X/Cygwin (if you have the necessary building blocks installed). Here is the combined PDF output of the two TIF files above as an example.

No comments:

Post a Comment