Scantools

Scantools is a high-quality library and a matching set of command line programs for the handling and manipulation of scanned documents. The library is written in C++ and makes heavy use of Qt5.

At present, the library can convert JBIG2 images to PDF, without re-encoding the image data — JBIG2 is a graphics file format that was specifically desgined for the compression of scanned documents. HOCR files, which are produced by optical character recognition programs such as ‘tesseract’, can be used to make the PDF file searchable.  The resulting files comply with the ISO PDF/A standard for long-term archiving of digital documents and offer compression rates comparable to that of the DJVU file format.

There are currently two command line utilities.

  • image2pdf, converts JBIG2 to a PDF/A compliant PDF file.
  • hocr2any, converts HOCR files to text, or renders them as raster graphics or PDF files

License

The scantools software suite is open source and available for free. It is licensed under the GNU Public License v3, or any later version of the GNU Public License.

Links

  • An encoder for the JBIG2 file format is found here.