Today I published the very first release of scantools, a library and a matching set of linux command line programs for manipulation of graphic files, written with a view towards handling of scanned documents. At present, there is only one command line utility, image2pdf, which converts JBIG2 to a PDF/A compliant PDF.
As a next milestone, I plan to support HOCR files. HOCR is a standard output format used by text recognition programs. Once we are there, users will then be able to produce searchable, extremely efficient PDF files from their scanned data that fully support the PDF/A standard.