Operator Reference

apply_deep_ocrT_apply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr (Operator)

apply_deep_ocrT_apply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr — Apply a Deep OCR model on a set of images for inference.

Signature

apply_deep_ocr(Image : : DeepOcrHandle, Mode : DeepOcrResult)

Description

apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr applies the Deep OCR model given by DeepOcrHandleDeepOcrHandleDeepOcrHandledeepOcrHandledeep_ocr_handle on the tuple of input images ImageImageImageimageimage. The operator returns DeepOcrResultDeepOcrResultDeepOcrResultdeepOcrResultdeep_ocr_result, a tuple with a result dictionary for every input image.

The operator apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr poses requirements on the input ImageImageImageimageimage:

Image type: byte.
Number of channels: 1 or 3.

Further, the operator apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr will preprocess the given ImageImageImageimageimage to match the model specifications. This means, the byte images will be normalized and converted to type real. Further, for ModeModeModemodemode = 'auto'"auto""auto""auto""auto" or 'detection'"detection""detection""detection""detection" the input image ImageImageImageimageimage is padded to the model input dimensions and, in case it has only one channel, converted into a three-channel image. For ModeModeModemodemode = 'recognition'"recognition""recognition""recognition""recognition", three-channel images are automatically converted to single-channel images.

The parameter ModeModeModemodemode specifies a mode and with it, which component is executed. Supported values:

'auto'"auto""auto""auto""auto" (A):: Perform both parts, detection of the word and its recognition.
'detection'"detection""detection""detection""detection" (DET):: Perform only the detection part. Hence, the model will merely localize the word regions within the image.
'recognition'"recognition""recognition""recognition""recognition" (REC):: Perform only the recognition part. Hence, the model requires that the image contains solely a tight crop of the word.

Note, the model must have been created with the desired component, see create_deep_ocrcreate_deep_ocrCreateDeepOcrCreateDeepOcrcreate_deep_ocr.

The output dictionary DeepOcrResultDeepOcrResultDeepOcrResultdeepOcrResultdeep_ocr_result can have entries according to the applied ModeModeModemodemode (marked by its abbreviation):

image (A, DET, REC):

Preprocessed image.

score_maps (A, DET):

Scores given as image with four channels:

Character score: Score for the character detection.
Link score: Score for the connection of detected character centers to a connected word.
Orientation 1: Sine component of the predicted word orientation.
Orientation 2: Cosine component of the predicted word orientation.

words (A, DET):

Dictionary containing the following entries. Thereby, the entries are tuples with a value for every found word.

word (A): Recognized word.
char_candidates (A): A dictionary with information for every character of every recognized word. The dictionary contains for every word a key/value pair: The index of the word as key and a tuple of dictionaries as value. Each of these character dictionaries contains the following key/value pairs:
- 'candidate': Tuple with the best 'recognition_num_char_candidates'"recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates" candidates.
- 'confidence': Softmax based confidence values of the best candidates. Note, these values are not calibrated and should be used with care. They can vary significantly for different models.
word_image (A): Preprocessed image part containing the word.
row (A, DET): Localized word: Center point, row coordinate.
col (A, DET): Localized word: Center point, column coordinate.
phi (A, DET): Localized word: Angle phi.
length1 (A, DET): Localized word: Half length of edge 1.
length2 (A, DET): Localized word: Half length of edge 2.
line_index (A, DET): Line index of localized word if 'detection_sort_by_line'"detection_sort_by_line""detection_sort_by_line""detection_sort_by_line""detection_sort_by_line" set to 'true'"true""true""true""true".

The word localization is given by the parameters of an oriented rectangle, see gen_rectangle2gen_rectangle2GenRectangle2GenRectangle2gen_rectangle2 for further information.

word_boxes_on_image (A, DET):

Dictionary with the word localization on the coordinate system of the preprocessed images placed in image. The entries are tuples with a value for every found word.

row (A, DET): Localized word: Center point, row coordinate.
col (A, DET): Localized word: Center point, column coordinate.
phi (A, DET): Localized word: Angle phi.
length1 (A, DET): Localized word: Half length of edge 1.
length2 (A, DET): Localized word: Half length of edge 2.

The word localization is given by the parameters of an oriented rectangle, see gen_rectangle2gen_rectangle2GenRectangle2GenRectangle2gen_rectangle2 for further information.

word_boxes_on_score_maps (A, DET):

Dictionary with the word localization on the coordinate system of the score images placed in score_maps. The entries are the same as for word_boxes_on_image above.

word (REC):

Recognized word.

char_candidates (REC):

A tuple of dictionaries with information for every character in the recognized word.

Each of these character dictionaries contains the following key/value pairs:

'candidate': Tuple with the best 'recognition_num_char_candidates'"recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates""recognition_num_char_candidates" candidates.
'confidence': Softmax based confidence values of the best candidates. Note, these values are not calibrated and should be used with care. They can vary significantly for different models.

The recognition component can be retrained with custom data in order to further enhance the performance. See OCR / Deep OCR for more information.

Attention

System requirements: To run this operator on GPU (see get_deep_ocr_paramget_deep_ocr_paramGetDeepOcrParamGetDeepOcrParamget_deep_ocr_param), cuDNN and cuBLAS are required. For further details, please refer to the “Installation Guide”, paragraph “Requirements for Deep Learning and Deep-Learning-Based Methods”. Alternatively, this operator can also be run on CPU.

Execution Information

Multithreading type: reentrant (runs in parallel with non-exclusive operators).
Multithreading scope: global (may be called from any thread).
Automatically parallelized on internal data level.

This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.

This operator supports canceling timeouts and interrupts.

This operator supports breaking timeouts and interrupts.

Parameters

ImageImageImageimageimage (input_object) (multichannel-)image(-array) → object (byte)

Input image.

DeepOcrHandleDeepOcrHandleDeepOcrHandledeepOcrHandledeep_ocr_handle (input_control) deep_ocr → (handle)

Handle of the Deep OCR model.

ModeModeModemodemode (input_control) string → (string)

Inference mode.

Default: []

List of values: 'auto'"auto""auto""auto""auto", 'detection'"detection""detection""detection""detection", 'recognition'"recognition""recognition""recognition""recognition"

DeepOcrResultDeepOcrResultDeepOcrResultdeepOcrResultdeep_ocr_result (output_control) dict(-array) → (handle)

Tuple of result dictionaries.

Result

If the parameters are valid, the operator apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrapply_deep_ocr returns the value 2 ( H_MSG_TRUE) . If necessary, an exception is raised.

Possible Predecessors

get_deep_ocr_paramget_deep_ocr_paramGetDeepOcrParamGetDeepOcrParamget_deep_ocr_param, set_deep_ocr_paramset_deep_ocr_paramSetDeepOcrParamSetDeepOcrParamset_deep_ocr_param, create_deep_ocrcreate_deep_ocrCreateDeepOcrCreateDeepOcrcreate_deep_ocr

Module

OCR/OCV

Operators