Scantools is a high-quality library and a matching set of command line programs for the handling and manipulation of scanned documents. The library is written in C++ and makes heavy use of Qt5.
At present, the library can convert image files to PDF/A. Files in JBIG2, JPEG and JPEG2000 format are directly included into the PDF, other files are compressed in a lossless manner. HOCR files, which are produced by optical character recognition programs such as ‘tesseract’, can be used to make the PDF file searchable. The resulting files comply with the ISO PDF/A standard for long-term archiving of digital documents and offer compression rates comparable to that of the DJVU file format.
There are currently three command line utilities.
- image2pdf, converts images to a PDF/A compliant PDF file.
- hocr2any, converts HOCR files to text, or renders them as raster graphics or PDF files
- ocrPDF, adds a text layer to a graphics-only PDF file, without re-encoding graphics data or otherwise modifying file content
The scantools software suite is open source and available for free. It is licensed under the GNU Public License v3, or any later version of the GNU Public License.
- An encoder for the JBIG2 file format is found here.