libscantools is a library for graphics manipulation, written with a view towards the handling of scanned documents and generation of high-quality PDF files. The library is written in C++ and makes heavy use of Qt5.
Current Features
The development of libscantools currently concentrates on the production of high-quality, well-compressed and standards-compliant PDF/A files; PDF/A is the ISO standard for long-term archiving of digital documents. More features, including graphics manipulation and scanner access, will be added in the future.
- Conversion of images to PDF/A format. HOCR files, which are produced by optical character recognition programs such as 'tesseract', can optinonally be used to make the PDF file searchable.
How to start
First time users will likely want to look at the following classes and namespaces first.
- The class 'PDFAWriter' generates well-crafted, PDF/A-2b compliant documents. Just construct a PDFAWriter instance, add graphic files and HOCR files to create a and well-crafted, searchable PDF file. Files in JBIG2 and JPEG format, as well as JPEG2000 files in JPX format will be written directly into the PDF, all all other graphic files will be converted to RGB, and encoding losslessly in a way that depends on the image characteristics. Multi-page TIFF files are well supported.
- The class 'HOCRDocument' reads and interprets HOCR files, the standard output file format for Optical Character Recognition systems. It converts HOCR files to text, or renders them on any QPainDevice.
- The namspace 'compression' gives access to zlib and Fax G4, as well as state-of-the-art zopfli compression routines, all implemented in a thread-safe manner.
API stability
The API is currently experimental, and subject to change. We expect that the API will stabilize with the 1.0.0 release.