Operator Reference

do_ocr_word_svmT_do_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm (Operator)

do_ocr_word_svmT_do_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm — Classify a related group of characters with an OCR classifier.

Signature

do_ocr_word_svm(Character, Image : : OCRHandle, Expression, NumAlternatives, NumCorrections : Class, Word, Score)

Description

do_ocr_word_svmdo_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm works like do_ocr_multi_class_svmdo_ocr_multi_class_svmDoOcrMultiClassSvmDoOcrMultiClassSvmdo_ocr_multi_class_svm insofar as it computes the best class for each of the characters given by the regions CharacterCharacterCharactercharactercharacter and the gray values ImageImageImageimageimage with the OCR classifier OCRHandleOCRHandleOCRHandleOCRHandleocrhandle, and returns the results in ClassClassClassclassValclass.

In contrast to do_ocr_multi_class_svmdo_ocr_multi_class_svmDoOcrMultiClassSvmDoOcrMultiClassSvmdo_ocr_multi_class_svm, do_ocr_word_svmdo_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm treats the group of characters as an entity which yields a WordWordWordwordword by concatenating the class names for each character region. This allows to restrict the allowed classification results on a textual level by specifying an ExpressionExpressionExpressionexpressionexpression describing the expected word.

The ExpressionExpressionExpressionexpressionexpression may restrict the word to belong to a predefined lexicon created using create_lexiconcreate_lexiconCreateLexiconCreateLexiconcreate_lexicon or import_lexiconimport_lexiconImportLexiconImportLexiconimport_lexicon, by specifying the name of the lexicon in angular brackets as in '<mylexicon>'"<mylexicon>""<mylexicon>""<mylexicon>""<mylexicon>". If the ExpressionExpressionExpressionexpressionexpression is of any other form, it is interpreted as a regular expression with the same syntax as specified for tuple_regexp_matchtuple_regexp_matchTupleRegexpMatchTupleRegexpMatchtuple_regexp_match. Note that you will usually want to use an expression of the form '^...$' when using variable quantifiers like '*', to ensure that the entire word is used in the expression. Also note that in contrast to tuple_regexp_matchtuple_regexp_matchTupleRegexpMatchTupleRegexpMatchtuple_regexp_match, do_ocr_word_svmdo_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm does not support passing extra options in an expression tuple.

If the word derived from the best class for each character does not match the ExpressionExpressionExpressionexpressionexpression, do_ocr_word_svmdo_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm attempts to correct it by considering the NumAlternativesNumAlternativesNumAlternativesnumAlternativesnum_alternatives best classes for each character. The alternatives used are identical to those returned by do_ocr_single_class_svmdo_ocr_single_class_svmDoOcrSingleClassSvmDoOcrSingleClassSvmdo_ocr_single_class_svm for a single character. It does so by testing all possible corrections for which the classification result is changed for at most NumCorrectionsNumCorrectionsNumCorrectionsnumCorrectionsnum_corrections character regions. Note that NumAlternativesNumAlternativesNumAlternativesnumAlternativesnum_alternatives and NumCorrectionsNumCorrectionsNumCorrectionsnumCorrectionsnum_corrections affect the complexity of the algorithm, so that in some cases internal restrictions are made. See the section 'Complexity' below for further information.

In case the ExpressionExpressionExpressionexpressionexpression is a lexicon and the above procedure did not yield a result, the most similar word in the lexicon is returned as long as it requires less than NumCorrectionsNumCorrectionsNumCorrectionsnumCorrectionsnum_corrections edit operations for the correction (see suggest_lexiconsuggest_lexiconSuggestLexiconSuggestLexiconsuggest_lexicon).

The resulting word is graded by a ScoreScoreScorescorescore between 0.0 (no correction found) and 1.0 (original word correct). The ScoreScoreScorescorescore is lowered by adding a penalty according to the number of corrected characters and another (minor) penalty depending on how many better classes have been ignored in order to match the ExpressionExpressionExpressionexpressionexpression:

with num_corr being the actual number of applied corrections and num_alt the total number of discarded alternatives.

Execution Information

Multithreading type: reentrant (runs in parallel with non-exclusive operators).
Multithreading scope: global (may be called from any thread).
Processed without parallelization.

Parameters

CharacterCharacterCharactercharactercharacter (input_object) region(-array) → object

Characters to be recognized.

ImageImageImageimageimage (input_object) singlechannelimage → object (byte / uint2)

Gray values of the characters.

OCRHandleOCRHandleOCRHandleOCRHandleocrhandle (input_control) ocr_svm → (handle)

Handle of the OCR classifier.

ExpressionExpressionExpressionexpressionexpression (input_control) string → (string)

Expression describing the allowed word structure.

NumAlternativesNumAlternativesNumAlternativesnumAlternativesnum_alternatives (input_control) integer → (integer)

Number of classes per character considered for the internal word correction.

Default: 3

Suggested values: 3, 4, 5

Value range: 1 ≤ NumAlternatives NumAlternatives NumAlternatives numAlternatives num_alternatives

NumCorrectionsNumCorrectionsNumCorrectionsnumCorrectionsnum_corrections (input_control) integer → (integer)

Maximum number of corrected characters.

Default: 2

Suggested values: 1, 2, 3, 4, 5

Value range: 0 ≤ NumCorrections NumCorrections NumCorrections numCorrections num_corrections

ClassClassClassclassValclass (output_control) string(-array) → (string)

Result of classifying the characters with the SVM.

Number of elements: Class == Character

WordWordWordwordword (output_control) string → (string)

Word text after classification and correction.

ScoreScoreScorescorescore (output_control) real → (real)

Measure of similarity between corrected word and uncorrected classification results.

Complexity

The complexity of checking all possible corrections is of magnitude , where a is the number of alternatives, n is the number of character regions, and c is the number of allowed corrections. However, to guard against a near-infinite loop in case of large n, c is internally clipped to 5, 3, or 1 if a*n >= 30, 60, or 90, respectively.

Result

If the parameters are valid, the operator do_ocr_word_svmdo_ocr_word_svmDoOcrWordSvmDoOcrWordSvmdo_ocr_word_svm returns the value 2 ( H_MSG_TRUE) . If necessary an exception is raised.

Possible Predecessors

trainf_ocr_class_svmtrainf_ocr_class_svmTrainfOcrClassSvmTrainfOcrClassSvmtrainf_ocr_class_svm, read_ocr_class_svmread_ocr_class_svmReadOcrClassSvmReadOcrClassSvmread_ocr_class_svm

Alternatives

do_ocr_multi_class_svmdo_ocr_multi_class_svmDoOcrMultiClassSvmDoOcrMultiClassSvmdo_ocr_multi_class_svm

Module

OCR/OCV

Operators