Operator Reference

select_feature_set_trainf_svmT_select_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmselect_feature_set_trainf_svm (Operator)

select_feature_set_trainf_svmT_select_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmselect_feature_set_trainf_svm — Selects an optimal combination of features to classify OCR data.

Signature

Herror T_select_feature_set_trainf_svm(const Htuple TrainingFile, const Htuple FeatureList, const Htuple SelectionMethod, const Htuple Width, const Htuple Height, const Htuple GenParamName, const Htuple GenParamValue, Htuple* OCRHandle, Htuple* FeatureSet, Htuple* Score)

void SelectFeatureSetTrainfSvm(const HTuple& TrainingFile, const HTuple& FeatureList, const HTuple& SelectionMethod, const HTuple& Width, const HTuple& Height, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* OCRHandle, HTuple* FeatureSet, HTuple* Score)

HTuple HOCRSvm::SelectFeatureSetTrainfSvm(const HTuple& TrainingFile, const HTuple& FeatureList, const HString& SelectionMethod, Hlong Width, Hlong Height, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* Score)

HTuple HOCRSvm::SelectFeatureSetTrainfSvm(const HString& TrainingFile, const HString& FeatureList, const HString& SelectionMethod, Hlong Width, Hlong Height, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* Score)

HTuple HOCRSvm::SelectFeatureSetTrainfSvm(const char* TrainingFile, const char* FeatureList, const char* SelectionMethod, Hlong Width, Hlong Height, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* Score)

HTuple HOCRSvm::SelectFeatureSetTrainfSvm(const wchar_t* TrainingFile, const wchar_t* FeatureList, const wchar_t* SelectionMethod, Hlong Width, Hlong Height, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* Score)   ( Windows only)

def select_feature_set_trainf_svm(training_file: MaybeSequence[str], feature_list: MaybeSequence[str], selection_method: str, width: int, height: int, gen_param_name: Sequence[str], gen_param_value: Sequence[Union[int, str, float]]) -> Tuple[HHandle, Sequence[str], Sequence[float]]

Description

select_feature_set_trainf_svmselect_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmselect_feature_set_trainf_svm selects an optimal combination of features, to classify the data given in the training file TrainingFileTrainingFileTrainingFiletrainingFiletraining_file with a Support Vector Machine (SVM), for details see create_ocr_class_svmcreate_ocr_class_svmCreateOcrClassSvmCreateOcrClassSvmcreate_ocr_class_svm.

Possible features are all OCR features listed and explained in create_ocr_class_svmcreate_ocr_class_svmCreateOcrClassSvmCreateOcrClassSvmcreate_ocr_class_svm. All candidates which should be tested can be specified in FeatureListFeatureListFeatureListfeatureListfeature_list. A subset of these features is returned as selected features in FeatureSetFeatureSetFeatureSetfeatureSetfeature_set.

select_feature_set_trainf_svmselect_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmselect_feature_set_trainf_svm is specialized on OCR problems and only supports the features in the list mentioned before. In order to use other features, please use the more general operator select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvmselect_feature_set_svm.

The selection method SelectionMethodSelectionMethodSelectionMethodselectionMethodselection_method is either a greedy search 'greedy'"greedy""greedy""greedy""greedy" (iteratively add the feature with highest gain) or the dynamically oscillating search 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" (add the feature with highest gain and test then if any of the already added features can be left out without great loss). The method 'greedy'"greedy""greedy""greedy""greedy" is generally preferable, since it is faster. Only in cases when a large training set is available the method 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" might return better results.

The optimization criterion is the classification rate of a two-fold cross-validation of the training data. The best achieved value is returned in ScoreScoreScorescorescore.

The parameters 'nu'"nu""nu""nu""nu" and 'gamma'"gamma""gamma""gamma""gamma" for the SVM that is used to classify can be set to 'auto'"auto""auto""auto""auto" by using the parameters GenParamNameGenParamNameGenParamNamegenParamNamegen_param_name and GenParamValueGenParamValueGenParamValuegenParamValuegen_param_value. If they are set to 'auto'"auto""auto""auto""auto", the estimated optimal 'nu'"nu""nu""nu""nu" and/or 'gamma'"gamma""gamma""gamma""gamma" is estimated. The automatic estimation of 'nu'"nu""nu""nu""nu" and 'gamma'"gamma""gamma""gamma""gamma" can take a substantial amount of time (up to days, depending on the data set and the number of features). Alternatively, a certain value for both can be set the same way. An explanation of the parameters 'nu'"nu""nu""nu""nu" and 'gamma'"gamma""gamma""gamma""gamma" as the kernel parameter of the radial basis function (RBF) kernel can be found in create_class_svmcreate_class_svmCreateClassSvmCreateClassSvmcreate_class_svm.

Attention

This operator may take considerable time, depending on the size of the data set in the training file, and the number of features.

Please note, that this operator should not be called, if only a small set of training data is available. Due to the risk of overfitting the operator select_feature_set_trainf_svmselect_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmselect_feature_set_trainf_svm may deliver a classifier with a very high score. However, the classifier may perform poorly when tested.

Execution Information

  • Multithreading type: reentrant (runs in parallel with non-exclusive operators).
  • Multithreading scope: global (may be called from any thread).
  • Automatically parallelized on internal data level.

This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.

Parameters

TrainingFileTrainingFileTrainingFiletrainingFiletraining_file (input_control)  filename.read(-array) HTupleMaybeSequence[str]HTupleHtuple (string) (string) (HString) (char*)

Names of the training files.

Default: '' "" "" "" ""

File extension: .trf, .otr

FeatureListFeatureListFeatureListfeatureListfeature_list (input_control)  string(-array) HTupleMaybeSequence[str]HTupleHtuple (string) (string) (HString) (char*)

List of features that should be considered for selection.

Default: ['zoom_factor','ratio','width','height','foreground','foreground_grid_9','foreground_grid_16','anisometry','compactness','convexity','moments_region_2nd_invar','moments_region_2nd_rel_invar','moments_region_3rd_invar','moments_central','phi','num_connect','num_holes','projection_horizontal','projection_vertical','projection_horizontal_invar','projection_vertical_invar','chord_histo','num_runs','pixel','pixel_invar','pixel_binary','gradient_8dir','cooc','moments_gray_plane'] ["zoom_factor","ratio","width","height","foreground","foreground_grid_9","foreground_grid_16","anisometry","compactness","convexity","moments_region_2nd_invar","moments_region_2nd_rel_invar","moments_region_3rd_invar","moments_central","phi","num_connect","num_holes","projection_horizontal","projection_vertical","projection_horizontal_invar","projection_vertical_invar","chord_histo","num_runs","pixel","pixel_invar","pixel_binary","gradient_8dir","cooc","moments_gray_plane"] ["zoom_factor","ratio","width","height","foreground","foreground_grid_9","foreground_grid_16","anisometry","compactness","convexity","moments_region_2nd_invar","moments_region_2nd_rel_invar","moments_region_3rd_invar","moments_central","phi","num_connect","num_holes","projection_horizontal","projection_vertical","projection_horizontal_invar","projection_vertical_invar","chord_histo","num_runs","pixel","pixel_invar","pixel_binary","gradient_8dir","cooc","moments_gray_plane"] ["zoom_factor","ratio","width","height","foreground","foreground_grid_9","foreground_grid_16","anisometry","compactness","convexity","moments_region_2nd_invar","moments_region_2nd_rel_invar","moments_region_3rd_invar","moments_central","phi","num_connect","num_holes","projection_horizontal","projection_vertical","projection_horizontal_invar","projection_vertical_invar","chord_histo","num_runs","pixel","pixel_invar","pixel_binary","gradient_8dir","cooc","moments_gray_plane"] ["zoom_factor","ratio","width","height","foreground","foreground_grid_9","foreground_grid_16","anisometry","compactness","convexity","moments_region_2nd_invar","moments_region_2nd_rel_invar","moments_region_3rd_invar","moments_central","phi","num_connect","num_holes","projection_horizontal","projection_vertical","projection_horizontal_invar","projection_vertical_invar","chord_histo","num_runs","pixel","pixel_invar","pixel_binary","gradient_8dir","cooc","moments_gray_plane"]

List of values: 'anisometry'"anisometry""anisometry""anisometry""anisometry", 'chord_histo'"chord_histo""chord_histo""chord_histo""chord_histo", 'compactness'"compactness""compactness""compactness""compactness", 'convexity'"convexity""convexity""convexity""convexity", 'cooc'"cooc""cooc""cooc""cooc", 'default'"default""default""default""default", 'foreground'"foreground""foreground""foreground""foreground", 'foreground_grid_16'"foreground_grid_16""foreground_grid_16""foreground_grid_16""foreground_grid_16", 'foreground_grid_9'"foreground_grid_9""foreground_grid_9""foreground_grid_9""foreground_grid_9", 'gradient_8dir'"gradient_8dir""gradient_8dir""gradient_8dir""gradient_8dir", 'height'"height""height""height""height", 'moments_central'"moments_central""moments_central""moments_central""moments_central", 'moments_gray_plane'"moments_gray_plane""moments_gray_plane""moments_gray_plane""moments_gray_plane", 'moments_region_2nd_invar'"moments_region_2nd_invar""moments_region_2nd_invar""moments_region_2nd_invar""moments_region_2nd_invar", 'moments_region_2nd_rel_invar'"moments_region_2nd_rel_invar""moments_region_2nd_rel_invar""moments_region_2nd_rel_invar""moments_region_2nd_rel_invar", 'moments_region_3rd_invar'"moments_region_3rd_invar""moments_region_3rd_invar""moments_region_3rd_invar""moments_region_3rd_invar", 'num_connect'"num_connect""num_connect""num_connect""num_connect", 'num_holes'"num_holes""num_holes""num_holes""num_holes", 'num_runs'"num_runs""num_runs""num_runs""num_runs", 'phi'"phi""phi""phi""phi", 'pixel'"pixel""pixel""pixel""pixel", 'pixel_binary'"pixel_binary""pixel_binary""pixel_binary""pixel_binary", 'pixel_invar'"pixel_invar""pixel_invar""pixel_invar""pixel_invar", 'projection_horizontal'"projection_horizontal""projection_horizontal""projection_horizontal""projection_horizontal", 'projection_horizontal_invar'"projection_horizontal_invar""projection_horizontal_invar""projection_horizontal_invar""projection_horizontal_invar", 'projection_vertical'"projection_vertical""projection_vertical""projection_vertical""projection_vertical", 'projection_vertical_invar'"projection_vertical_invar""projection_vertical_invar""projection_vertical_invar""projection_vertical_invar", 'ratio'"ratio""ratio""ratio""ratio", 'width'"width""width""width""width", 'zoom_factor'"zoom_factor""zoom_factor""zoom_factor""zoom_factor"

SelectionMethodSelectionMethodSelectionMethodselectionMethodselection_method (input_control)  string HTuplestrHTupleHtuple (string) (string) (HString) (char*)

Method to perform the selection.

Default: 'greedy' "greedy" "greedy" "greedy" "greedy"

List of values: 'greedy'"greedy""greedy""greedy""greedy", 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"

WidthWidthWidthwidthwidth (input_control)  integer HTupleintHTupleHtuple (integer) (int / long) (Hlong) (Hlong)

Width of the rectangle to which the gray values of the segmented character are zoomed.

Default: 15

HeightHeightHeightheightheight (input_control)  integer HTupleintHTupleHtuple (integer) (int / long) (Hlong) (Hlong)

Height of the rectangle to which the gray values of the segmented character are zoomed.

Default: 16

GenParamNameGenParamNameGenParamNamegenParamNamegen_param_name (input_control)  string-array HTupleSequence[str]HTupleHtuple (string) (string) (HString) (char*)

Names of generic parameters to configure the selection process and the classifier.

Default: []

List of values: 'gamma'"gamma""gamma""gamma""gamma", 'nu'"nu""nu""nu""nu"

GenParamValueGenParamValueGenParamValuegenParamValuegen_param_value (input_control)  number-array HTupleSequence[Union[int, str, float]]HTupleHtuple (real / integer / string) (double / int / long / string) (double / Hlong / HString) (double / Hlong / char*)

Values of generic parameters to configure the selection process and the classifier.

Default: []

Suggested values: 'auto'"auto""auto""auto""auto", '0.1'"0.1""0.1""0.1""0.1", '0.3'"0.3""0.3""0.3""0.3"

OCRHandleOCRHandleOCRHandleOCRHandleocrhandle (output_control)  ocr_svm HOCRSvm, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Trained OCR-SVM Classifier.

FeatureSetFeatureSetFeatureSetfeatureSetfeature_set (output_control)  string-array HTupleSequence[str]HTupleHtuple (string) (string) (HString) (char*)

Selected feature set, contains only entries from FeatureListFeatureListFeatureListfeatureListfeature_list.

ScoreScoreScorescorescore (output_control)  real-array HTupleSequence[float]HTupleHtuple (real) (double) (double) (double)

Achieved score using tow-fold cross-validation.

Result

If the parameters are valid, the operator select_feature_set_trainf_svmselect_feature_set_trainf_svmSelectFeatureSetTrainfSvmSelectFeatureSetTrainfSvmselect_feature_set_trainf_svm returns the value 2 ( H_MSG_TRUE) . If necessary, an exception is raised.

Alternatives

select_feature_set_trainf_mlpselect_feature_set_trainf_mlpSelectFeatureSetTrainfMlpSelectFeatureSetTrainfMlpselect_feature_set_trainf_mlp, select_feature_set_trainf_knnselect_feature_set_trainf_knnSelectFeatureSetTrainfKnnSelectFeatureSetTrainfKnnselect_feature_set_trainf_knn, select_feature_set_trainf_mlp_protectedselect_feature_set_trainf_mlp_protectedSelectFeatureSetTrainfMlpProtectedSelectFeatureSetTrainfMlpProtectedselect_feature_set_trainf_mlp_protected

See also

select_feature_set_trainf_svm_protectedselect_feature_set_trainf_svm_protectedSelectFeatureSetTrainfSvmProtectedSelectFeatureSetTrainfSvmProtectedselect_feature_set_trainf_svm_protected, select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvmselect_feature_set_svm

Module

OCR/OCV