scantools  1.0.4
Graphics manipulation with a view towards scanned documents
Public Member Functions | List of all members
HOCRTextBox Class Reference

Text box, as defined in an HOCR file. More...

#include <HOCRTextBox.h>

Public Member Functions

 HOCRTextBox ()
 Constructs an empty text box.
 
bool hasText () const
 Decide if the text box contains non-trivial text. More...
 
qreal angle () const
 Text angle. More...
 
QXmlStreamAttributes attributes () const
 Returns the attributes of the textBox. More...
 
QVector< qreal > baselinePolynomial () const
 Base line as a polynomial. More...
 
QPoint baselineReferencePoint () const
 Base line reference point. More...
 
QRect boundingBox () const
 Bounding box. More...
 
QString classType () const
 Class of this textBox. More...
 
int confidence () const
 Confidence level. More...
 
QString direction () const
 Text flow direction. More...
 
qreal fontSize () const
 Font size. More...
 
QString imageName () const
 Image associated with content of this text box. More...
 
QString language () const
 Language of the content of this text box. More...
 
void render (QPainter &painter) const
 Paint the contents of the text box to a painter. More...
 
QImage toImage (const QFont &overrideFont, QImage::Format format=QImage::Format_Grayscale8) const
 Export this text box as an image. More...
 
QByteArray toRawPDFContentStream (const QFont &font, resolution xRes, resolution yRes, length deltaX=length(), length deltaY=length()) const
 Return raw PDF text rendering commands. More...
 
QString toText () const
 Export this text box as text. More...
 
qint64 estimateFit (const QFont &font) const
 Estimate how well a given font fits the textbox. More...
 
QFont suggestFont () const
 Suggest font. More...
 
QString text () const
 Text content of the text box. More...
 

Detailed Description

Text box, as defined in an HOCR file.

This class represents a box that may contain text and also other text boxes. Every text box has a bounding rectagle, text, a type, and a list of attributes. There are several helper functions to interpret the attribules.

Texboxes can be rendered on a QPainter. For users who wish to implement their own rendering routines, there are several helper functions that give suggestions for optimal rendering.

The methods of this class are reentrant, but not thread safe.

Definition at line 44 of file HOCRTextBox.h.

Member Function Documentation

◆ angle()

qreal HOCRTextBox::angle ( ) const
inline

Text angle.

Returns
The text angle, as specified in the HOCR file or inherited from parent. If no angle line was specified, this number is zero.

Definition at line 62 of file HOCRTextBox.h.

◆ attributes()

QXmlStreamAttributes HOCRTextBox::attributes ( ) const
inline

Returns the attributes of the textBox.

Returns
The attributes of the textBox, as stored in the corresponding entity of the HOCR file.

Definition at line 69 of file HOCRTextBox.h.

◆ baselinePolynomial()

QVector<qreal> HOCRTextBox::baselinePolynomial ( ) const
inline

Base line as a polynomial.

Returns
The base line as a polynomial, as specified in the HOCR file or inherited from parent. If no base line was specified, this vector is empty.

Definition at line 77 of file HOCRTextBox.h.

◆ baselineReferencePoint()

QPoint HOCRTextBox::baselineReferencePoint ( ) const
inline

Base line reference point.

Returns
The base line reference point, as specified in the HOCR file or inherited from parent. If no base line polynomial is specified, this member is meaningless.

Definition at line 85 of file HOCRTextBox.h.

◆ boundingBox()

QRect HOCRTextBox::boundingBox ( ) const
inline

Bounding box.

Returns
The bounding box of this textBox. If no bounding box was specified in the HOCR file, an empty box is returned.

Definition at line 92 of file HOCRTextBox.h.

◆ classType()

QString HOCRTextBox::classType ( ) const

Class of this textBox.

Returns
The type of this textBox. Typical types are "ocr_page", "ocr_carea", "ocr_par", "ocr_line" or "ocrx_word". If the HOCR element described by this box is not of OCR-related type, an empty string is returned.

◆ confidence()

int HOCRTextBox::confidence ( ) const
inline

Confidence level.

Returns
The confidence level for the content of this textBox. If no confidence level was specified in the HOCR file, -1 is returned.

Definition at line 108 of file HOCRTextBox.h.

◆ direction()

QString HOCRTextBox::direction ( ) const
inline

Text flow direction.

Returns
The text flow direction of the corresponding element in the HOCR file. Returns 'ltr' for left-to-right, and 'rtl' for right-to-left. Any other return value means 'undefined'.

Definition at line 116 of file HOCRTextBox.h.

◆ estimateFit()

qint64 HOCRTextBox::estimateFit ( const QFont &  font) const

Estimate how well a given font fits the textbox.

For this box, and for each subbox, this method computes the width of the text when rendered with the given font, and compares it to the width of the bounding box. The square of the difference is then computed, and the results are added up.

Parameters
fontFont with which the text is to be rendered. The font's pixelSize will be changed in the process.
Returns
Sum of squares. The lower the number, the better does the font fit the textbox

◆ fontSize()

qreal HOCRTextBox::fontSize ( ) const
inline

Font size.

Returns
The font size for the content of this textBox. If no font size was specified in the HOCR file, 0.0 is returned.

Definition at line 123 of file HOCRTextBox.h.

◆ hasText()

bool HOCRTextBox::hasText ( ) const

Decide if the text box contains non-trivial text.

Returns
True if the text box or any of its subboxes contains text other than white space.

◆ imageName()

QString HOCRTextBox::imageName ( ) const
inline

Image associated with content of this text box.

Returns
The name of an image file associated with the content of this text box. If nothing is specified in the HOCR file, an empty string is returned.

Definition at line 131 of file HOCRTextBox.h.

◆ language()

QString HOCRTextBox::language ( ) const
inline

Language of the content of this text box.

Returns
The name of the language of the content of this text box ('eng'). If nothing is specified in the HOCR file, this string is empty.

Definition at line 138 of file HOCRTextBox.h.

◆ render()

void HOCRTextBox::render ( QPainter &  painter) const

Paint the contents of the text box to a painter.

Paint the contents of the text box to a painter. Coordinates are in pixels. This is convenient if the contents of an HOCR file is drawn onto a bitmap that is meant to resemble the original scanned image as closely as possible. When writing to a PDF, for instance, resolution needs to be taken into account.

Parameters
painterQPainter to be used in the paint job

◆ suggestFont()

QFont HOCRTextBox::suggestFont ( ) const

Suggest font.

Suggests a font for rendering this document. Three standard fonts ("Helvetica", "Times", "Courier") are tried, and the one is chose that fits the text box best.

Returns
The best-fitting font
Warning
This method is expensive

◆ text()

QString HOCRTextBox::text ( ) const
inline

Text content of the text box.

Returns
The text of this text box. If nothing is specified in the HOCR file, an empty string is returned.

Definition at line 232 of file HOCRTextBox.h.

◆ toImage()

QImage HOCRTextBox::toImage ( const QFont &  overrideFont,
QImage::Format  format = QImage::Format_Grayscale8 
) const

Export this text box as an image.

Paint the contents of the text box to an image and returns the image. In case of an error, an empty image is returned.

Parameters
overrideFontIf null, a standard font will be taken. If not null, then the specified font will be taken.
formatFormat of the returned QImage
Returns
An image

◆ toRawPDFContentStream()

QByteArray HOCRTextBox::toRawPDFContentStream ( const QFont &  font,
resolution  xRes,
resolution  yRes,
length  deltaX = length(),
length  deltaY = length() 
) const

Return raw PDF text rendering commands.

This method converts the text box into a sequence of raw PDF text rendering command that are suitable for inclusion in a PDF file. The commands refer to the font /F1 and assume that this font exists. The text is encoded using the Windows-1252 encoding. Characters that cannot be represented in this encoding are discarded.

Parameters
fontA reference to a QFont that should match the font "/F1" as closely as possible. The font is used to optimise placement of the text.
xResHorizontal resolution. Used to transform the bitmap coordinates found in the HOCR file to physical sizes. Needs to be valid, or else the result is undefined.
yResVertical resolution. Used to transform the bitmap coordinates found in the HOCR file to physical sizes. Needs to be valid, or else the result is undefined.
deltaXHorizontal offest for text placement
deltaYVertical offest for text placement
Returns
A QByteArray containing PDF drawing commands

◆ toText()

QString HOCRTextBox::toText ( ) const

Export this text box as text.

Returns
The contents of this text box, and of all child text boxes, with line breaks added.

The documentation for this class was generated from the following file: