fScanX Details

Overview

The primary component of the fScanX package is the fscanx command-line utility. It does the hard work of interfacing with the scanner and of creating the output files. It is intended to be used to script workflow applications, and can easily be called from shell scripts, AppleScript, or any programming language that supports executing command-line utilities--which is probably all current languages. There is also a simple GUI application included, fScanX.app, which is convenient for standalone scanning. It is a Cocoa application, and the source is included as an example of how to build a user interface on top of the command-line utility.

A TWAIN driver is planned for a future 1.x release.

Installation Notes

The software requires at least OS X 10.3.9, but it does not check the OS version and provide a proper error message. If you run it on an older version, it will probably unexpectedly quit.

The installer puts the GUI application, fScanX, into the Applications folder. This application is self-contained and can just be dragged to any other location on your hard disk if you wish. The user interface is self-explanatory: a few options for scan type and size, and a scan button that when clicked asks you to choose a location and name for the file.

The installer puts the command-line utility fscanx into your /usr/sbin/ directory. The command-line utility can also be located anywhere on your hard disk. Like any command-line program, if you move it to a directory that is not part of your command search path, you will have to change to its containing directory or use the full path name in order to invoke it.

The installer installs the FujiScannerDontSeize.kext kernel extension into /System/Library/Extensions/. This is needed to prevent the following situation: at startup if OS X finds a USB device for which it does not have a driver installed, it allows Classic to "seize" the device. If an OS X native program later tries to access that device, there may be a small delay (typically a second or two) while OS X and Classic go through some arbitration to make sure there is no Classic application already using the device. The kernel extension prevents Classic from seizing the scanner. Please note that although this bundle goes into the extensions folder, it does not contain any executable code (just a property list that tells OS X "if you find this device, don't let Classic have it") so it can not cause stability problems. Further, if you are running an Intel Mac or 10.5 or 10.6, it will simply be ignored since there is no Classic environment.

The installer does not install the source for the GUI application. That is in the "fScanX project" folder, and if you want to look at it you can copy it where you want it. Please note that it was built with XCode 3.1 and will not work with old versions of Apple's development tools. XCode is available from Apple as a free download.

Command-Line Utility

The command line utility takes a number of options, followed by a file name. The options all have reasonable defaults: monochrome scanning, 300dpi, 8.5x11 inch page with no margins, multi-page TIFF output. So the command:

fscanx myfile.tif

will scan all the pages on the scanner's document feeder and create myfile.tif in the current directory. This assumes that fscanx is located where it will be found in the search path, if not you'll have to either use the full path, or be in the directory where the utility is located. For instance:

/Users/me/myscanstuff/fscanx myfile.tif

or:

cd /Users/me/myscanstuff
./fscanx myfile.tif

The rest of the examples will assume the utility is on the search path, as is the case with the default location /usr/sbin.

Note that trying the command fscanx with no arguments will in typical UNIX fashion give you a concise summary of the options:

Build 1.3.1 2009-10-24 usage:

  fscanx --enumerate
    scan the USB bus for all supported scanners and list them

  fscax --capabilities location
    given a usb location (from --enumerate), list the scanner capabilities

  fscanx --register name email license
    add the license

  fscanx --registrations
    list all licenses

  fscanx options filename
    scan pages into file 'filename' where possible options are:

  --append to add to, rather than overwrite, an existing file
  --adf to scan from automatic document feeder, --flatbed to scan from flatbed
    default is adf
  --tiff to create TIFF file, --pdf to create PDF file, default is tiff
  --compress option, where only option is zlib for gray & color TIFF
  --duplex to scan front and back, otherwise scan front only
  --landscape to scan pages in landscape orientation
  --mono, --gray, --rgb, default is --mono
  --resolution xxx, dots per inch, from 50-600 in 1 dpi increments,
    default is 300
  --paper-width xxx, --paper-height xxx, size of document in units of
    1/1200 inch, default is 8.5 x 11 inches
  --auto-length, if specified, automatically detect length of paper up
    to length specified in paperheight, and create output image of
    actual length (does not work with landscape scanning)
  --left xxx, --top xxx, --width xxx, --height xxx, scan area in units
    of 1/1200 inch, centered on scan area, default is the entire page
  --threshold xxx, is 1-254 or float, applies to mono, ignored for gray
    & color, default of float
  --despeckle xxx, is 0-100, applies to mono, ignored for gray & color,
    default is 0, reasonable values are: 92 @ 600dpi, 60 @ >= 400dpi,
    39 @ >= 300dpi, 32 @ >= 200dpi, 15 @ >= 150dpi, 7 @ < 150dpi
  --bits x, bit depth of output, 1-8, applies to gray & color, ignored
    for mono, default is 6
  --black xxx, --white xxx, specify endpoints for contrast stretch,
    defaults are: 7 & 240 for 3-7 bits out, 0 & 255 for 8-bits out
  --over-sample, applies to gray & color only, up to 300dpi only;
    scan at 2x resolution and scale to 1x resolution for
    output; the use of an advanced scaling algorithm can result in
    slightly higher quality than scanning at 1x resolution, but with
    the same file size
  --fcreator code, is file creator, either ASCII like ABCD or
    hex-encoded like 0x41424344, default is 0x00000000
  --images-per-file xx, is 1 or more, scan into multiple files with xx
    images per file, file number is inserted before last "." character
    if any, otherwise at end of file name, default is to put all pages
    into a single file
  --double-feed option
    option is n for none, t for overlap/thickness variance, l for length
    variance, tl for both overlap/thickness & length, default is t
  --location location
    use the scanner at the specified USB location rather than the first
    matching scanner, locations can be found by using the --enumerate
    option, or in System Profiler, or using IORegistryExplorer

returns 0 for success, 1 for argument error, 2 for failure to find
    scanner (or failure to validate license), 3 for failure to start scan,
    4 for unspecified error during scan, 5 for failure to create or open file,
    6 for paper feed failure

(c) 2004-2009 Scott Ribe. http://www.elevated-dev.com/Products/fScanX/

Many thanks to libtiff.org for the libtiff library, portions of which
  are (C) 1988-1997 Sam Leffler and (C) 1991-1997 Silicon Graphics Inc.
Also thanks for zlib, (C) 1995-2004 Jean-loup Gailly and Mark Adler

Command-Line Examples

The output above is a complete list of the options; following are some examples to help you figure out the options.

This command is the same as using the default options, they're just spelled out:
fscanx --mono --resolution 300 --left 75 --top 75 --width 10050 --height 13050 --threshold 128 myfile.tif

This command would be good to scan a form with gray backgrounds:
fscanx --gray myfile.tif

This command would be good to scan a form with colored text and/or backgrounds:
fscanx --rgb myfile.tif

The threshold option is good for adjusting for light or dark originals when scanning in black and white. For dark originals try lower values and for light originals try higher values. The despeckle option controls how aggressively the software tries to remove "specks", isolated pixels not part of larger groups. A setting of 0 doesn't remove any specks, a setting of 100 removes any pixel that's not completely surrounded by adjacent pixels.

These two settings do interact. To deal with problematic originals, try a 2-step process. First adjust the threshold until the density of text is correct, that is until the thickness of strokes of letters is not too thin (look at letters "l" and "t") but small interior spaces (as in "e") are not filled in. Second, adjust the despeckling to reduce specks. Note that for a good quality original on clean paper, the despeckling settings will not make a visible difference. For really bad originals, the best thing to do may be to scan to 2- 3- or 4-bit grayscale.

The contrast stretch options, --black and --white are good for adjusting the contrast of forms. Blank paper is not quite white, it's typically seen by the scanner as a light gray and likewise printed black may be a dark gray. The contrast stretch options let you specify a cutoff below which all grays will be pushed to black, and one above which all values will be pushed to white. The default values are good for a wide variety of inputs. But if you have problem originals (streaky, blotchy, stained, light printing, smeared backgrounds) you may be able to improve the scans by using these options.

The default reduction of output to 6-bit gray or 6-6-6 rgb works well for forms. This provides enough colors to match form colors well enough, and dropping the less significant bits reduces noise that would otherwise show up as inconsistent shades. It also results in files that compress better.

Stretching light and dark grays to white and black and throwing away half the color information produces poor results for photographs. The Fuji scanners are document scanners and documents are the focus of this software, but there are options that won't mess up photos.

This would be better to scan a black & white photograph:
fscanx --gray --bits 8 myfile.tif

This would be better to scan a color photograph, and would type the file to be opened by GraphicConverter:
fscanx --rgb --bits 8 --fcreator GKON myfile.tif

This would scan into multiple files, 1 per page, named test1.tiff and so on:
fscanx --images-per-file 1 test.tiff

The --duplex option scans both sides of each page. With this option, the images-per-file option counts each side separately. So images-per-file 2 would put 2 images into each file, the front and back side of a single page.

The --pdf option creates a PDF file instead of a TIFF. PDF documents are good for printing; the Adobe Reader application has an option in the print dialog to not scale the pages when printing, which provides good print quality. (See the next section for more information.)

Printing Notes

Scaling scanned images by non-integral amounts during printing produces ugly results. In other words scaling a 600dpi image to 50% to print on a 300dpi printer is fine, but scaling an image down by 10% to fit into the printable margins produces an ugly jagged result. The Preview application in OS X will scale scans to "fit" according to its own notions, and there does not seem to be any combination of preference settings that will prevent this. Many other applications have this problem, so if you print scans and they look bad it might be an issue with scaling.

GraphicConverter can be made to print a scanned image without scaling, more easily in some versions than others. I use it for my quality checks. Note that with many recent Macs it is included in the software Apple ships.

Adobe Reader has an option to not perform any scaling when printing, and this seems to be very reliable about not messing up scanned images.

Compression Notes

Monochrome scans are compressed using CCITT Group 4 (fax) compression. This offers about as much compression as can be had, and is lossless. Gray and color scans are compressed using lossless compression.

The TIFF format only supports LZW and JPEG and compression for gray and color images. JPEG is lossy (using JPEG without loss typically provides very little compression). I don't think it's a good idea for me to discard image data right out of the scanner; if your originals and your application are a good match for high-ratio lossy JPEG compression, you can certainly compress the files but I want to provide the option of keeping all the data--so I use LZW for TIFF files with gray and color scans.

The PDF format supports flate (zlib) compression in addition to LZW and JPEG. Flate is lossless and usually provides more compression than LZW, so this is what I use for gray and color images in PDF. For monochrome, I use CCITT Group 4 just as with TIFF so PDF files are very nearly the same size as TIFF, only about 2% larger or less because of some overhead in PDF. (I have discovered that most PDF utilities available use less effective compression options for monochrome scans and thus typically produce files about twice as large.)

The PNG format is lossless and often provides good compression ratios on scanned forms (thanks in part to the color reduction), but does not support multiple pages in a single file. Using bzip2 on TIFF files without compression gives even better, still lossless, compression but the output is not a format recognized by graphic programs so you would have to decompress the files before opening them.

So in summary, for gray and color scans I give you a fairly large file with all the data, and let you decide whether or not to perform some post-scanning conversion to reduce the file size.

FileMaker Notes

fScanX can easily be integrated with FileMaker using FileMaker's AppleScript capabilities. There are 2 basic approaches to take, each with its own tradeoffs.

You can use the AppleScript do shell script command to directly execute the command-line utility. The scan will run synchronously and FileMaker will pause while the scan is running. When that command of the FileMaker script completes, the TIFF file is ready and the script can use it, copy it, display it as needed. This is no problem most of the time, but if you are using FileMaker client/server and scan a long job, the server can time out the client and disconnect it. I don't know exactly how long this takes, but it has been reported to me that it may only be an issue for jobs where you add paper during scanning in order to scan more than the scanner's 50-page capacity in a single batch.

You can tell application terminal to run the utility. The terminal application will launch in the background and the scan will run asynchronously; the user can do other work in FileMaker; there is no problem with a timeout in client/server operation. But there is no easy way for the script to know when the scan is completed, so you may have to depend on the user to click a button when the scan is completed.

4th Dimension Notes

Integration with 4D is very easy. 4D 2004 should be able to use the command-line utility directly with the LAUNCH EXTERNAL PROCESS command. Earlier versions can use the plug-in Scripting Tools from Pluggers Software. Scanned images can be displayed using QPix from Escape.

OCR Notes

I've done some light testing with Readiris 9.0. I could not find documentation on its AppleScript features, but through inspecting its AppleScript dictionary and a bit of trial and error developed the following example script:

set fpath to "/Users/sribe/Documents/testocr.tif"
set fref to (POSIX file fpath as file)
do shell script "fscanx --mono --resolution 600 " & fpath

tell application "Readiris"
  activate
  open fref
  recognize front document saving to ("OSX:Users:sribe:Documents:testocr-ri.doc" as file specification)
  close front document
end tell

OmniPage Pro X was also pretty easy. The same command to scan the file, with slightly different vocabulary to perform the OCR, and OmniPage automatically names the file:

tell application "OmniPage Pro X"
  activate
  load and OCR fref
end tell

Notes: Command-line utilities such as fscanx expect POSIX-style file paths, while applications expect HSF-style paths via AppleScript. This is why I use the two variables fpath and fref. The POSIX file conversion only works for full paths; if you had a relative path you would have to expand it to a full path before using it in a script like the one above.

Version History

2009-10-24 version 1.3.1: Added "Licenses & Scanners" menu item to let users review their registrations. Added --flatbed and --capabilities to the command line in preparation for flatbed support.

2009-10-22 version 1.3: Added ability to append to existing files. Used this to make it easy to scan larger files and to recover from paper feed failures. Added --append to command-line utility, and new return codes for failure to open/create file and paper feed failures. Stopped outputting partial pages in case of paper feed failures. Fixed a bug where some errors from deep inside the scanning code were not reported to the user.

2009-10-21 version 1.2.4: Fixed a bug with activation codes related to Unicode character composition.

2009-09-29 version 1.2.3: Fixed a bug scanning black-and-white PDFs. Fixed behavior of "Detect Paper Length" checkbox. Forced correction of kext privileges installed by earlier versions for compatibility with 10.6.

2009-09-24 version 1.2.2: Fixed a bug that would cause a crash scanning large (about 300 pages) black-and-white PDFs.

2009-09-21 version 1.2.1b: Corrected installer for 10.6.

2009-05-26 version 1.2.1: Fixes for minor glitches with activation codes.

2009-05-26 version 1.2: Support for the fi-5350C2 and fi-6670 scanners. Single build, with activation codes to unlock support for different levels of scanners. Official release of support for the fi-6140. Replacement of confusing "landscape" checkbox with icons for selection of paper orientation. Fixed black point and white point settings to work correctly for both grayscale and color scans. Support for double-feed detection options. Option to automatically open scans in Preview as they are completed. Command-line functions to enumerate all attached scanners, and to select a specific scanner per job.

2008-08-15 version 1.1: Support for the fi-6130 scanner; Universal Binary for native operation on Intel-based Macs; support for length detection; support for oversampling; revised user interface for easier changing of paper sizes & landscape printing, and easy re-scanning. Support for the fi-6140 scanner in the level 2 (more expensive) version.

2006-07-20 version 1.0.8: Support for the fi-5120c scanner. Fix error where choice of PDF format would not be saved across launches. Fix error where metric input of sizes was inappropriately limited. Finally remember to add new scanner models to FujiScannerDontSeize.kext to get rid of the 1-2 second pause at the beginning of scans. Abandon all attemps to support 10.2. Complete move to XCode in preparation for Universal Binary version.

2005-05-20 version 1.0.7: Fix threshold, despeckle black level and white level options. Enable controls correctly at startup after loading saved parameters. Move some of the components from CodeWarrior to XCode in preparation for Universal Binary version. (Distributed only to early adopters; never announced for general release.)

2005-05-17 version 1.0.6: No changes were required for Tiger compatibility, so I believe all versions of fScanX will be compatible, but this is the only one tested and supported under Tiger. Support for long document scanning, using the new --paperheight setting. Changed default format of gray and color TIFF files back to using no compression instead of zlib compression, because most software on versions of OS X prior to 10.4 (Tiger) will not handle TIFF files using zlib compression. Added --compress zlib option to support creation of compressed gray and color TIFF files for those cases where files will be used by software that supports the format. Removed smoothing of gray and color images because when needed this is better left to post-scan image processing. Significantly reduced memory used during gray and color scanning. Rewrote the GUI in Cocoa, which will make addition of features easier; the GUI now saves all its settings between runs. A level 2 (more expensive) version with preliminary support for the fi-5650C: black & white scanning at full speed, gray and color need further optimization.

2005-03-12 version 1.0.5: Support for the fi-4120C2 scanner with USB 2.

2005-03-01 version 1.0.4: Support for duplex scanning. Option to output PDF files in addition to TIFF. Compression (lossless, zlib) of grayscale and color scans. Modest improvement in throughput of monochrome scans at 300dpi and lower resolutions.

2004-11-14 version 1.0.3: Added images-per-file option to allow scanning into multiple files. Also added options to GUI utility to display the command line corresponding to the selected options and/or put the command on the clipboard.

2004-09-19 version 1.0.2: Fixed bug that a scan width of exactly 8.5 inches would be reported as out of range. Added section on integration with OCR to this file.

2004-09-18 version 1.0.1: Fixed problem that caused banding in low-resolution (< 300dpi) monochrome scans. Added control over despeckle factor to command-line. Made command line report many errors in options, rather than just silently substituting default values. Added control over threshold, despeckle, black point, and white point to GUI. Added to GUI ability to view or put on clipboard the command built from the options specified. Added brief comment on integration with 4th Dimension to this file.

2004-09-07 version 1.0: First public release.