scantools
1.0.4
Graphics manipulation with a view towards scanned documents
|
Simple generator for PDF/A-2b compliant documents. More...
#include <PDFAWriter.h>
Inherits QObject.
Public Slots | |
void | waitForWorkerThreads () |
Waits for all worker threads to finish. More... | |
Signals | |
void | authorChanged () |
Emitted when author changes. | |
void | keywordsChanged () |
Emitted when keywords change. | |
void | subjectChanged () |
Emitted when subject changes. | |
void | titleChanged () |
Emitted when title changes. | |
void | pageSizeChanged () |
Emitted when pageSize changes. | |
void | resolutionOverrideHorizontalChanged () |
Emitted when resolutionOverrideHorizontal changes. | |
void | resolutionOverrideVerticalChanged () |
Emitted when resolutionOverrideVertical changes. | |
void | autoOCRChanged () |
Emitted when autoOCR changes. | |
void | autoOCRLanguagesChanged () |
Emitted when autoOCRLanguages change. | |
void | finished () |
Emitted just before waitForWorkerThreads() returns. More... | |
void | progress (qreal percentage) |
Progress indicator. More... | |
Public Member Functions | |
~PDFAWriter () | |
Destructor. More... | |
PDFAWriter (bool bestCompression=false) | |
Constructor. More... | |
QString | author () |
Metadata: Author. More... | |
void | setAuthor (const QString &author) |
Set the author string in the PDF/A meta data. More... | |
QString | keywords () |
Metadata: Keywords. More... | |
void | setKeywords (const QString &keywords) |
Set the author string in the PDF/A meta data. More... | |
QString | subject () |
Metadata: Subject string. More... | |
void | setSubject (const QString &subject) |
Set the subject string in the PDF/A meta data. More... | |
QString | title () |
Metadata: Title String. More... | |
void | setTitle (const QString &title) |
Set the title string in the PDF/A meta data. More... | |
paperSize | pageSize () |
Page Size. More... | |
void | setPageSize (const paperSize &size) |
Sets page size, effective for future calls of the methods addPage() More... | |
void | setPageSize (paperSize::format size=paperSize::empty) |
Sets page size, effective for future calls of the methods addPage() More... | |
resolution | resolutionOverrideHorizontal () |
Horizontal resolution. More... | |
void | setResolutionOverrideHorizontal (resolution horizontal) |
Set horizontal resolution. More... | |
resolution | resolutionOverrideVertical () |
Vertical resolution. More... | |
void | setResolutionOverrideVertical (resolution vertical) |
Set vertical resolution. More... | |
void | setResolutionOverride (resolution horizontal, resolution vertical) |
Sets graphic resolution for future calls of the methods addPage() More... | |
void | setResolutionOverride (resolution res) |
Overloaded method that sets horizontal and vertical resolution to the same value. More... | |
void | clearResolutionOverride () |
Set horizontal and vertical override resolution to zero. | |
bool | autoOCR () |
AutoOCR. More... | |
void | setAutoOCR (bool autoOCR) |
Specify if the tesseract OCR engine should be run automatically. More... | |
QStringList | autoOCRLanguages () |
List of languages used for OCR. More... | |
QString | setAutoOCRLanguages (const QStringList &OCRLanguages) |
Specify languages used by the tesseract OCR engine. More... | |
void | appendToOCRData (const HOCRDocument &doc) |
Specify pre-processed OCR data. More... | |
HOCRDocument | OCRData () |
Return a copy of the internal HOCRDocument. More... | |
void | clearOCRData () |
Delete all pages from the internal HOCRDocument. More... | |
QString | addPages (const QImage &image, QStringList *warnings=0) |
Add an image to the PDF document. More... | |
QString | addPages (const JBIG2Document &jbig2doc, QStringList *warnings=0) |
Add JBIG2 images to the PDF document. More... | |
QString | addPages (const QString &imageFileName, QStringList *warnings=0) |
Add images to the PDF document. More... | |
operator QByteArray () | |
Conversion to a QByteArray containing PDF data. More... | |
Simple generator for PDF/A-2b compliant documents.
The class takes a number of images and embeds them into a PDF/A file (Conformance level PDF/A-2b), generating one page per image. OCR data can optionally used to create an invisible text overlay, which means that the PDF/A file will be searchable and that text can be copied from the file. The class contains an conversion operator to a QByteArray, which generates the actual PDF/A file. This makes it extremely simple to write the PDF data to a QFile, or any other I/O device. The life cycle of a PDF/A writer object is mostly this
A minimal example which creates a PDF/A file might look like this.
A more sophisticated example which uses preprocessed HOCR files to create a text overlay might look like this.
All methods of this class are reentrant and thread-safe.
Definition at line 127 of file PDFAWriter.h.
PDFAWriter::~PDFAWriter | ( | ) |
Destructor.
The destructor waits for all worker threads to finish and can therefore take considerable time.
|
explicit |
Constructor.
Constructs a PDFAWriter. The following default values are set.
bestCompression | If true, then use the slow, but very effective zopfli compression algorithm for the lossless compression of bitmap graphics. Once the PDFAWriter is constructed, this property cannot be changed anymore. |
QString PDFAWriter::addPages | ( | const JBIG2Document & | jbig2doc, |
QStringList * | warnings = 0 |
||
) |
Add JBIG2 images to the PDF document.
This method differs from the generic method addPages() only in the arguments: it expects a JBIG2Document instead of a filename.
The images contained in jbig2doc will be embedded in the PDF without re-encoding. The method does not check in detail if the file complies with the JBIG2 standard. If invalid input data is fed into this method, then the resulting PDF file might possibly not comply to the PDF/A standard.
jbig2doc | Reference to a document whose images will be added to the PDF file |
warnings | If non-zero, pointer to a QStringList where warnings will be stored |
QString PDFAWriter::addPages | ( | const QImage & | image, |
QStringList * | warnings = 0 |
||
) |
Add an image to the PDF document.
This method differs from the generic method addPages() only in the arguments: it expects a QImage instead of a filename. The input image must not be empty. The format of the PDF/A data stream will be chosen according to the image content.
Alpha-channels will be deleted. The images will be compressed using a lossless compressor. The method is therefore slow. Currently, Black and white and bitonal images are compressed using FAX G4 compression, all other images are compressed using state-of-the-art zlib or zopfli compression with heurestic prediction.
image | Image that is added to the document |
warnings | If non-zero, pointer to a QStringList where warnings will be stored |
QString PDFAWriter::addPages | ( | const QString & | imageFileName, |
QStringList * | warnings = 0 |
||
) |
Add images to the PDF document.
Adds all images contained in 'imageFileName' as individual pages to the PDF document. The method accepts file in JBIG2, JPEG and JPX format, and any other format that Qt can read. The way that the image is encoded in the PDF file depends on the file type.
If a non-empty page size has been set using the method setPageSize(), then the page will be of that size, and the graphics will be centered on their pages. Otherwise, the page size will be chosen to fit the graphic size exactly.
If preprocessed OCR data has been added to the internal HOCRDocument through then method appendToOCRData(), then a text overlay is generated from the first page of the internal HOCRDocument, and the first page is then deleted. If the interal HOCRDocument is empty and the property autoOCR is true, then the tesseract OCR engine is run to create the data needed to generate a text overlay. If autoOCR is false, no text overlay is generated.
This method will never leave the PDFAWriter in any invalid state. It will add as many pages to the document as can be read from the file without errors.
The method might or might not return immediately, as most of the computationally intense jobs (image conversion, optical character recognition, compression) are run concurrently in separate worker threads.
imageFileName | Name of a graphics file whose images are added one-by-one as pages to the PDF/A document. |
warnings | If non-zero, warnings that come up while reading the graphics files are added to this list. |
void PDFAWriter::appendToOCRData | ( | const HOCRDocument & | doc | ) |
Specify pre-processed OCR data.
This method can be used to specify pre-processed OCR data that will be used to generate a text layer whenever pages are added to the document.
To be more precisely: every PDFAWriter keeps an internal HOCRDocument, and this methods appends the given HOCRDocument to the internal one. Whenever pages are added to the document, the first page of the internal document is used to generate a text layer and is then removed from the internal document. If the internal document is empty, then either the tesseract OCR engine is run (if setAutoOCR() has been set to true) or no text layer is generated at all.
doc | HOCRDocument that will be appended to the internal document |
QString PDFAWriter::author | ( | ) |
Metadata: Author.
bool PDFAWriter::autoOCR | ( | ) |
AutoOCR.
QStringList PDFAWriter::autoOCRLanguages | ( | ) |
List of languages used for OCR.
void PDFAWriter::clearOCRData | ( | ) |
Delete all pages from the internal HOCRDocument.
|
signal |
Emitted just before waitForWorkerThreads() returns.
This signal is emitted by the methods waitForWorkerThreads(), immediately before the method returns.
QString PDFAWriter::keywords | ( | ) |
Metadata: Keywords.
HOCRDocument PDFAWriter::OCRData | ( | ) |
Return a copy of the internal HOCRDocument.
PDFAWriter::operator QByteArray | ( | ) |
Conversion to a QByteArray containing PDF data.
This operator converts the document to a QByteArray holding a PDF/A file. This allows to write a PDFAWriter directly to a QFile, resulting in a valid PDF/A document on the disk.
This method waits for all worker threads to finish and can therefore take considerable time. Just before returning, the signal done() is emitted.
paperSize PDFAWriter::pageSize | ( | ) |
Page Size.
|
signal |
Progress indicator.
This signal is emitted at irregular intervals while the method waitForWorkerThreads() is running, in order to provide progress information.
percentage | Number in the interval [0.0 .. 1.0] that indicates the fraction of PDF objects that are still being constructed by worker threads among all PDF objects. |
resolution PDFAWriter::resolutionOverrideHorizontal | ( | ) |
Horizontal resolution.
resolution PDFAWriter::resolutionOverrideVertical | ( | ) |
Vertical resolution.
void PDFAWriter::setAuthor | ( | const QString & | author | ) |
Set the author string in the PDF/A meta data.
author | Name of author |
void PDFAWriter::setAutoOCR | ( | bool | autoOCR | ) |
Specify if the tesseract OCR engine should be run automatically.
autoOCR | If set to true, then the PDFAWriter will automatically run the tesseract OCR engine in the background whenever pages are added to the PDF, unless preprocessed ocr data has been specified via the method appendToOCRData(). |
QString PDFAWriter::setAutoOCRLanguages | ( | const QStringList & | OCRLanguages | ) |
Specify languages used by the tesseract OCR engine.
To improve recognition quality, the tesseract OCR engine needs to know the language(s) of the text. The languages specified here will be passed on to tesseract in future runs.
OCRLanguages | List of languages to be used in the OCR process. Tesseract identifies languages by their 3-character ISO 639-2 language codes (e.g. "deu" for German or "fra" for French). The languages specified must be present in the current tesseract installation. If an empty list is provided, English will be used as a default language. |
void PDFAWriter::setKeywords | ( | const QString & | keywords | ) |
Set the author string in the PDF/A meta data.
keywords | Keyword string |
void PDFAWriter::setPageSize | ( | const paperSize & | size | ) |
Sets page size, effective for future calls of the methods addPage()
size | Paper size |
void PDFAWriter::setPageSize | ( | paperSize::format | size = paperSize::empty | ) |
Sets page size, effective for future calls of the methods addPage()
size | Paper size |
void PDFAWriter::setResolutionOverride | ( | resolution | horizontal, |
resolution | vertical | ||
) |
Sets graphic resolution for future calls of the methods addPage()
To add a raster graphic to a PDF file, the resolution of the raster graphic needs to be known. This method can be used to manually set resolutions before adding graphic files that either do not specify their resolution, or that contain incorrect information.
horizontal | Horizontal resolution, which must be either be valid (in other words, horizonal.isValid() must return true), or zero. If zero, this is interpreted as "no override resolution set". |
vertical | Ditto for vertical resolution. |
|
inline |
Overloaded method that sets horizontal and vertical resolution to the same value.
res | resolution |
Definition at line 280 of file PDFAWriter.h.
void PDFAWriter::setResolutionOverrideHorizontal | ( | resolution | horizontal | ) |
void PDFAWriter::setResolutionOverrideVertical | ( | resolution | vertical | ) |
void PDFAWriter::setSubject | ( | const QString & | subject | ) |
Set the subject string in the PDF/A meta data.
subject | Subject string |
void PDFAWriter::setTitle | ( | const QString & | title | ) |
Set the title string in the PDF/A meta data.
title | Title string |
QString PDFAWriter::subject | ( | ) |
Metadata: Subject string.
QString PDFAWriter::title | ( | ) |
Metadata: Title String.
|
slot |
Waits for all worker threads to finish.
This method blocks until all worker slots finished execution. While this method is running, the signal progress() is emitted at infrequent intervals. The signal finished() is emitted before the method exits, even if there were no running thread at the time that the method was called.