Operator Reference

create_dl_layer_box_proposalsT_create_dl_layer_box_proposalsCreateDlLayerBoxProposalsCreateDlLayerBoxProposalscreate_dl_layer_box_proposals (Operator)

create_dl_layer_box_proposalsT_create_dl_layer_box_proposalsCreateDlLayerBoxProposalsCreateDlLayerBoxProposalscreate_dl_layer_box_proposals — Create a layer for generating box proposals.

Signature

Herror T_create_dl_layer_box_proposals(const Htuple DLLayerClassScore, const Htuple DLLayerBoxDelta, const Htuple DLLayerBox, const Htuple DLLayerInputImage, const Htuple LayerName, const Htuple GenParamName, const Htuple GenParamValue, Htuple* DLLayerBoxProposals)

void CreateDlLayerBoxProposals(const HTuple& DLLayerClassScore, const HTuple& DLLayerBoxDelta, const HTuple& DLLayerBox, const HTuple& DLLayerInputImage, const HTuple& LayerName, const HTuple& GenParamName, const HTuple& GenParamValue, HTuple* DLLayerBoxProposals)

static HDlLayer HDlLayer::CreateDlLayerBoxProposals(const HDlLayerArray& DLLayerClassScore, const HDlLayerArray& DLLayerBoxDelta, const HDlLayerArray& DLLayerBox, const HDlLayer& DLLayerInputImage, const HString& LayerName, const HTuple& GenParamName, const HTuple& GenParamValue)

HDlLayer HDlLayer::CreateDlLayerBoxProposals(const HDlLayer& DLLayerBoxDelta, const HDlLayer& DLLayerBox, const HDlLayer& DLLayerInputImage, const HString& LayerName, const HString& GenParamName, const HString& GenParamValue) const

HDlLayer HDlLayer::CreateDlLayerBoxProposals(const HDlLayer& DLLayerBoxDelta, const HDlLayer& DLLayerBox, const HDlLayer& DLLayerInputImage, const char* LayerName, const char* GenParamName, const char* GenParamValue) const

HDlLayer HDlLayer::CreateDlLayerBoxProposals(const HDlLayer& DLLayerBoxDelta, const HDlLayer& DLLayerBox, const HDlLayer& DLLayerInputImage, const wchar_t* LayerName, const wchar_t* GenParamName, const wchar_t* GenParamValue) const   ( Windows only)

def create_dl_layer_box_proposals(dllayer_class_score: MaybeSequence[HHandle], dllayer_box_delta: MaybeSequence[HHandle], dllayer_box: MaybeSequence[HHandle], dllayer_input_image: HHandle, layer_name: str, gen_param_name: MaybeSequence[str], gen_param_value: MaybeSequence[Union[int, float, str]]) -> HHandle

Description

The operator create_dl_layer_box_proposalscreate_dl_layer_box_proposalsCreateDlLayerBoxProposalsCreateDlLayerBoxProposalscreate_dl_layer_box_proposals creates a layer for generating box proposals whose handle is returned in DLLayerBoxProposalsDLLayerBoxProposalsDLLayerBoxProposalsDLLayerBoxProposalsdllayer_box_proposals.

This layer expects several feeding input layers:

The parameter LayerNameLayerNameLayerNamelayerNamelayer_name sets an individual layer name. Note that if creating a model using create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelcreate_dl_model each layer of the created network must have a unique name.

The box proposal layer processes the input boxes in the following steps (see also the detailed description of generic parameters below):

Apply scores:

For each input box in DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box, the corresponding score in DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score is set as box confidence. If DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score contains scores for more than one class, one box is created for each class where the score exceeds the 'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence" and the class ID belonging to the class index is set as output class ID. All boxes with a score smaller than 'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence" are removed. During training, the score threshold 'min_confidence_train'"min_confidence_train""min_confidence_train""min_confidence_train""min_confidence_train" is used instead of 'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence". A lower value allows to forward more boxes to consecutive stages of the network. At most 'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms" boxes per input with highest score are kept for the following steps.

Apply box deltas:

If box deltas are given in DLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltadllayer_box_delta and 'apply_box_regression'"apply_box_regression""apply_box_regression""apply_box_regression""apply_box_regression" is 'true'"true""true""true""true", box deltas are applied to the input boxes given by DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box. Before being applied, the box deltas are transformed by the inverse function as their targets are transformed in create_dl_layer_box_targetscreate_dl_layer_box_targetsCreateDlLayerBoxTargetsCreateDlLayerBoxTargetscreate_dl_layer_box_targets. All coordinates shall be given subpixel-precisely. If DLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltadllayer_box_delta is set to an empty tuple, box coordinates are kept as given in DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box.

Class specific non-maximum-suppression (NMS):

For each box B that has not been suppressed by another box, all other boxes B' that are not suppressed, have the same class ID, have a lower score and have an intersection over union (IoU) of at least 'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap" with B are suppressed.

Class agnostic NMS:

For each box B that has not been suppressed by another box, all other boxes B' that are not suppressed, have a lower score and have an IoU of at least 'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic" with B are suppressed.

Set outputs:

After NMS was applied, the at most 'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms" boxes (in total for all inputs) with the highest scores that have not been suppressed are given as output. If less than 'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms" boxes are present within one batch item, the remaining output values are filled up with zeros.

For each box the output contains its parameters, its class index, and its score in this order. The box parameters depend on the 'instance_type'"instance_type""instance_type""instance_type""instance_type": 4 for 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1" (row1, column1, row2, column2), and 5 for 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2" (row, column, phi, length1, length2) respectively. Hence, the output depth equals the maximum number of output boxes per batch item 'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms", the height equals the number of box parameters plus two (for class index and score), and the width equals one. The subpixel-precise coordinates (pixel-centered, see Transformations / 2D Transformations) of the output boxes are given with respect to the input image dimensions.

After creating a network with create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelcreate_dl_model and setting the model type to 'detection'"detection""detection""detection""detection" via set_dl_model_paramset_dl_model_paramSetDlModelParamSetDlModelParamset_dl_model_param, the last box proposals layer within the network is used as the box output layer: Its outputs are given in tuples within the result dictionary of the model, similar to the outputs given by a detection model that has been created by create_dl_model_detectioncreate_dl_model_detectionCreateDlModelDetectionCreateDlModelDetectioncreate_dl_model_detection. The output dictionary contains:

  • box parameters. Depending on the 'instance_type'"instance_type""instance_type""instance_type""instance_type" the keys are:

    • 'bbox_row1', 'bbox_col1', 'bbox_row2', and 'bbox_col2' for 'instance_type'"instance_type""instance_type""instance_type""instance_type" = 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1".

    • 'bbox_row', 'bbox_col', 'bbox_length1', 'bbox_length2', and 'bbox_phi' for 'instance_type'"instance_type""instance_type""instance_type""instance_type" = 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2".

  • class IDs. Key: 'bbox_class_id'

  • scores. Key: 'bbox_confidence'

For a created model with type set to 'detection'"detection""detection""detection""detection", the following parameters (see their explanation below) of the last box proposal layer within the network can be set with the operators set_dl_model_paramset_dl_model_paramSetDlModelParamSetDlModelParamset_dl_model_param or set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param:

  • 'max_num_detections'"max_num_detections""max_num_detections""max_num_detections""max_num_detections" (overwrites 'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms"),

  • 'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap",

  • 'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic",

  • 'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence", and

  • 'nms_pre_top_n_per_level'"nms_pre_top_n_per_level""nms_pre_top_n_per_level""nms_pre_top_n_per_level""nms_pre_top_n_per_level" (overwrites 'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms").

The following generic parameters GenParamNameGenParamNameGenParamNamegenParamNamegen_param_name and the corresponding values GenParamValueGenParamValueGenParamValuegenParamValuegen_param_value are supported:

'apply_box_regression'"apply_box_regression""apply_box_regression""apply_box_regression""apply_box_regression":

If set to 'false'"false""false""false""false", box regression is not applied.

Default: 'true'"true""true""true""true".

'box_cls_specific'"box_cls_specific""box_cls_specific""box_cls_specific""box_cls_specific":

Should be set to 'true'"true""true""true""true", if the box deltas are calculated class-specifically, see create_dl_layer_box_targetscreate_dl_layer_box_targetsCreateDlLayerBoxTargetsCreateDlLayerBoxTargetscreate_dl_layer_box_targets.

Default: 'false'"false""false""false""false".

'clip_boxes'"clip_boxes""clip_boxes""clip_boxes""clip_boxes":

If set to 'true'"true""true""true""true", output boxes are clipped to the image boundaries.

Restriction: Only for 'instance_type'"instance_type""instance_type""instance_type""instance_type" 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1".

Default: 'false'"false""false""false""false".

'ignore_direction'"ignore_direction""ignore_direction""ignore_direction""ignore_direction":

If set to 'false'"false""false""false""false", the orientation of 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2" boxes is in the range , else in the range .

Restriction: Only for 'instance_type'"instance_type""instance_type""instance_type""instance_type" 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2".

Default: 'false'"false""false""false""false".

'input_mode'"input_mode""input_mode""input_mode""input_mode":

Type of the underlying box inputs and box deltas. The following types can be set:

'anchors'"anchors""anchors""anchors""anchors":

The input boxes given by DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box are anchors, e.g., generated with create_dl_layer_anchorscreate_dl_layer_anchorsCreateDlLayerAnchorsCreateDlLayerAnchorscreate_dl_layer_anchors. In this case, both the score inputs and the optional box delta inputs shall have the same width and height as the anchors.

The depth of the score inputs corresponds to the anchor type and class index. Hence, if k is the number of anchor types (the number of subscales times the number of aspect ratios times the number of angles) and n is the number of classes, the depth shall be k times n. The ordering is the same as given in the class target output of the box target layer, i.e., (anchor type 0, class index 0), (anchor type 0, class index 1), ..., (anchor type 0, class index n-1), ..., (anchor type k-1, class index 0), (anchor type k-1, class index 1), ..., (anchor type k-1, class index n).

The depth of the box delta inputs corresponds to the number of anchor types times the number of box parameters (NBP), i.e. k * NBP. NBP depends on the 'instance_type'"instance_type""instance_type""instance_type""instance_type": there are 4 parameters for 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1" (row1, column1, row2, column2), and 5 parameters for 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2" (row, column, phi, length1, length2 ) respectively. Hence the box delta inputs and the input anchors shall be equal in depth and ordered in the same way. If create_dl_layer_box_targetscreate_dl_layer_box_targetsCreateDlLayerBoxTargetsCreateDlLayerBoxTargetscreate_dl_layer_box_targets was used to generate the score and box delta targets, the correct order is already on hand.

DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box, DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score, and DLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltadllayer_box_delta can be tuples of layers of the same length to be processed simultaneously. For example, each input can correspond to one level in a Feature Pyramid Network (see the references given below).

'dense'"dense""dense""dense""dense":

The input boxes given by DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box are box proposals, e.g., generated with another box proposals layer. In this case, the batch size of the score and box delta inputs shall be the same as the batch size of the box inputs times the depth of the box inputs. This change of batch size is achieved by a ROI pooling layer (create_dl_layer_roi_poolingcreate_dl_layer_roi_poolingCreateDlLayerRoiPoolingCreateDlLayerRoiPoolingcreate_dl_layer_roi_pooling) that uses the box proposals as input.

The depth of the score inputs shall be the number of classes plus one, since the first index is interpreted as background class.

The depth of the box delta inputs shall be the number of box parameters NBP, if 'box_cls_specific'"box_cls_specific""box_cls_specific""box_cls_specific""box_cls_specific" is set to 'false'"false""false""false""false", or NBP times the number of classes if 'box_cls_specific'"box_cls_specific""box_cls_specific""box_cls_specific""box_cls_specific" is set to 'true'"true""true""true""true".

If create_dl_layer_box_targetscreate_dl_layer_box_targetsCreateDlLayerBoxTargetsCreateDlLayerBoxTargetscreate_dl_layer_box_targets was used to generate the score and box delta targets, here as well, the correct order is already on hand.

Default: 'anchors'"anchors""anchors""anchors""anchors".

'inside_angle_weight'"inside_angle_weight""inside_angle_weight""inside_angle_weight""inside_angle_weight":

Inside weight multiplier for box angle coordinates (phi). Box angle deltas are divided by this value to account for the inside box weights in the box targets layer. Hence, the values should match the corresponding values that are set in the box targets layer.

Restriction: Only for 'instance_type'"instance_type""instance_type""instance_type""instance_type" 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2". Default: 1.0.

'inside_center_weight'"inside_center_weight""inside_center_weight""inside_center_weight""inside_center_weight":

Inside weight multiplier for box center coordinates (row and column). Box center deltas are divided by this value to account for the inside box weights in the box targets layer. Hence, the values should match the corresponding values that are set in the box targets layer.

Default: 1.0.

'inside_dimension_weight'"inside_dimension_weight""inside_dimension_weight""inside_dimension_weight""inside_dimension_weight":

Inside weight multiplier for box dimension (width and height). Box dimension deltas are divided by this value to account for the inside box weights in the box targets layer. Hence, the values should match the corresponding values that are set in the box targets layer.

Default: 1.0.

'instance_type'"instance_type""instance_type""instance_type""instance_type":

Instance type of the generated boxes. Possible values:

  • 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1": axis-aligned rectangles.

  • 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2": oriented rectangles.

Default: 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1".

'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output":

Determines whether apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelapply_dl_model will include the output of this layer in the dictionary DLResultBatchDLResultBatchDLResultBatchDLResultBatchdlresult_batch even without specifying this layer in OutputsOutputsOutputsoutputsoutputs ('true'"true""true""true""true") or not ('false'"false""false""false""false").

Default: 'false'"false""false""false""false"

'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms":

Maximal number of detections after applying NMS. If the number of inputs in DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score times 'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms" is higher, this value is taken.

Restriction: Must be an integer larger than zero.

Default: 1000.

'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms":

Maximum number of detections per DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score input before applying NMS.

Restriction: Must be an integer larger than zero.

Default: 10000.

'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap":

The maximal allowed intersection over union (IoU) between two boxes of the same class. Class specific NMS can be switched off by setting this value to 1.0.

Default: 0.7.

'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic":

The maximum allowed IoU between two boxes of any class. Class agnostic NMS can be switched off by setting this value to 1.0.

Default: 1.0.

'max_side_length'"max_side_length""max_side_length""max_side_length""max_side_length":

Boxes with at least one of the side-length larger than this value are discarded. Possible values:

'default'"default""default""default""default" or 0.0:

For 'instance_type'"instance_type""instance_type""instance_type""instance_type" 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1" the thresholds are set to 1.5 times the image height for the box height and 1.5 times the image width for the box width.

For 'instance_type'"instance_type""instance_type""instance_type""instance_type" 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2" the threshold is set to two times the maximum of the image width and height for both the box width and height.

Number:

Determines the maximum side length that is allowed.

'none'"none""none""none""none":

No thresholding is used.

Restriction: Needs to be larger or equal to zero or 'default'"default""default""default""default" or 'none'"none""none""none""none".

Default: 'default'"default""default""default""default".

'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence":

Boxes with a confidence smaller than this value are discarded during inference.

Default: 0.5.

'min_confidence_train'"min_confidence_train""min_confidence_train""min_confidence_train""min_confidence_train":

Boxes with a confidence smaller than this value are discarded during training.

Default: 0.05.

'min_side_length'"min_side_length""min_side_length""min_side_length""min_side_length":

Boxes with at least one side length smaller than this value are discarded.

Restriction: Shall be larger or equal to zero.

Default: 0.0.

'nms_mode'"nms_mode""nms_mode""nms_mode""nms_mode":

Determines which IoU is used for the NMS calculation. Possible values:

'exact'"exact""exact""exact""exact":

Exact IoU.

'arIoU'"arIoU""arIoU""arIoU""arIoU":

Angle-related IoU. The angle-related IoU is defined for two 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2" boxes A and B as the cosine of their intermediate angle times the 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1" IoU of and B, where is the box A aligned to the box B. See also create_dl_layer_box_targetscreate_dl_layer_box_targetsCreateDlLayerBoxTargetsCreateDlLayerBoxTargetscreate_dl_layer_box_targets.

Restriction: Only applicable for 'instance_type'"instance_type""instance_type""instance_type""instance_type" 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2".

Default: 'exact'"exact""exact""exact""exact".

'nms_type'"nms_type""nms_type""nms_type""nms_type":

Determines which type of NMS is used. Possible values:

'standard'"standard""standard""standard""standard":

A box is discarded if it overlaps with another box with higher score.

'soft'"soft""soft""soft""soft":

Soft NMS is applied. This means that a box is not discarded if it overlaps with another box with higher score. Instead, its score is reduced depending on the value of the IoU with the higher scoring box, see 'soft_nms_type'"soft_nms_type""soft_nms_type""soft_nms_type""soft_nms_type" and the references below.

Default: 'standard'"standard""standard""standard""standard".

'soft_nms_type'"soft_nms_type""soft_nms_type""soft_nms_type""soft_nms_type":

Defines how the scores are updated in case of 'nms_type'"nms_type""nms_type""nms_type""nms_type" 'soft'"soft""soft""soft""soft". Possible values:

'linear'"linear""linear""linear""linear":

The confidence of a box is scaled by a factor 1 - IoU if its IoU with another higher scoring box is greater than 'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap" or 'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic", respectively.

'gaussian'"gaussian""gaussian""gaussian""gaussian":

Iteratively each box causes a scaling of the box confidence of all other boxes with lower confidence by a factor where is given as 'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap" and 'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic", respectively.

Default: 'linear'"linear""linear""linear""linear".

Certain parameters of layers created using this operator create_dl_layer_box_proposalscreate_dl_layer_box_proposalsCreateDlLayerBoxProposalsCreateDlLayerBoxProposalscreate_dl_layer_box_proposals can be set and retrieved using further operators. The following tables give an overview, which parameters can be set using set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param and which ones can be retrieved using get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param or get_dl_layer_paramget_dl_layer_paramGetDlLayerParamGetDlLayerParamget_dl_layer_param. Note, the operators set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param and get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param require a model created by create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelcreate_dl_model.

Layer Internal Parameters set get
'input_layer'"input_layer""input_layer""input_layer""input_layer" (DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score, DLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltadllayer_box_delta, DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box, and/or DLLayerInputImageDLLayerInputImageDLLayerInputImageDLLayerInputImagedllayer_input_image) x
'name'"name""name""name""name" (LayerNameLayerNameLayerNamelayerNamelayer_name) x x
'output_layer'"output_layer""output_layer""output_layer""output_layer" (DLLayerBoxProposalsDLLayerBoxProposalsDLLayerBoxProposalsDLLayerBoxProposalsdllayer_box_proposals) x
'shape'"shape""shape""shape""shape" x
'type'"type""type""type""type" x
Generic Layer Parameters set get
'apply_box_regression'"apply_box_regression""apply_box_regression""apply_box_regression""apply_box_regression" x x
'box_cls_specific'"box_cls_specific""box_cls_specific""box_cls_specific""box_cls_specific" x
'clip_boxes'"clip_boxes""clip_boxes""clip_boxes""clip_boxes" x x
'has_box_regression_inputs'"has_box_regression_inputs""has_box_regression_inputs""has_box_regression_inputs""has_box_regression_inputs" x
'ignore_direction'"ignore_direction""ignore_direction""ignore_direction""ignore_direction" x
'input_mode'"input_mode""input_mode""input_mode""input_mode" x
'inside_angle_weight'"inside_angle_weight""inside_angle_weight""inside_angle_weight""inside_angle_weight" x
'inside_center_weight'"inside_center_weight""inside_center_weight""inside_center_weight""inside_center_weight" x
'inside_dimension_weight'"inside_dimension_weight""inside_dimension_weight""inside_dimension_weight""inside_dimension_weight" x
'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output" x x
'instance_type'"instance_type""instance_type""instance_type""instance_type" x
'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap" x x
'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic" x x
'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms" x
'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms" x x
'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap" x x
'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic" x x
'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms" x
'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms" x x
'max_side_length'"max_side_length""max_side_length""max_side_length""max_side_length" x x
'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence" x x
'min_confidence_train'"min_confidence_train""min_confidence_train""min_confidence_train""min_confidence_train" x x
'min_side_length'"min_side_length""min_side_length""min_side_length""min_side_length" x x
'nms_mode'"nms_mode""nms_mode""nms_mode""nms_mode" x
'nms_type'"nms_type""nms_type""nms_type""nms_type" x x
'num_class_ids_no_orientation'"num_class_ids_no_orientation""num_class_ids_no_orientation""num_class_ids_no_orientation""num_class_ids_no_orientation" x
'num_trainable_params'"num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params" x
'soft_nms_type'"soft_nms_type""soft_nms_type""soft_nms_type""soft_nms_type" x x

Execution Information

  • Multithreading type: reentrant (runs in parallel with non-exclusive operators).
  • Multithreading scope: global (may be called from any thread).
  • Processed without parallelization.

Parameters

DLLayerClassScoreDLLayerClassScoreDLLayerClassScoreDLLayerClassScoredllayer_class_score (input_control)  dl_layer(-array) HDlLayer, HTupleMaybeSequence[HHandle]HTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Feeding layers with classification scores.

DLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltaDLLayerBoxDeltadllayer_box_delta (input_control)  dl_layer(-array) HDlLayer, HTupleMaybeSequence[HHandle]HTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Feeding layers with bounding box regression values.

DLLayerBoxDLLayerBoxDLLayerBoxDLLayerBoxdllayer_box (input_control)  dl_layer(-array) HDlLayer, HTupleMaybeSequence[HHandle]HTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Feeding layers with anchors or input box proposals.

DLLayerInputImageDLLayerInputImageDLLayerInputImageDLLayerInputImagedllayer_input_image (input_control)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Feeding layer with the network input image.

LayerNameLayerNameLayerNamelayerNamelayer_name (input_control)  string HTuplestrHTupleHtuple (string) (string) (HString) (char*)

Name of the output layer.

GenParamNameGenParamNameGenParamNamegenParamNamegen_param_name (input_control)  attribute.name(-array) HTupleMaybeSequence[str]HTupleHtuple (string) (string) (HString) (char*)

Generic input parameter names.

Default: []

List of values: 'apply_box_regression'"apply_box_regression""apply_box_regression""apply_box_regression""apply_box_regression", 'box_cls_specific'"box_cls_specific""box_cls_specific""box_cls_specific""box_cls_specific", 'clip_boxes'"clip_boxes""clip_boxes""clip_boxes""clip_boxes", 'ignore_direction'"ignore_direction""ignore_direction""ignore_direction""ignore_direction", 'input_mode'"input_mode""input_mode""input_mode""input_mode", 'inside_angle_weight'"inside_angle_weight""inside_angle_weight""inside_angle_weight""inside_angle_weight", 'inside_center_weight'"inside_center_weight""inside_center_weight""inside_center_weight""inside_center_weight", 'inside_dimension_weight'"inside_dimension_weight""inside_dimension_weight""inside_dimension_weight""inside_dimension_weight", 'instance_type'"instance_type""instance_type""instance_type""instance_type", 'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output", 'max_num_post_nms'"max_num_post_nms""max_num_post_nms""max_num_post_nms""max_num_post_nms", 'max_num_pre_nms'"max_num_pre_nms""max_num_pre_nms""max_num_pre_nms""max_num_pre_nms", 'max_overlap'"max_overlap""max_overlap""max_overlap""max_overlap", 'max_overlap_class_agnostic'"max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic""max_overlap_class_agnostic", 'max_side_length'"max_side_length""max_side_length""max_side_length""max_side_length", 'min_confidence'"min_confidence""min_confidence""min_confidence""min_confidence", 'min_confidence_train'"min_confidence_train""min_confidence_train""min_confidence_train""min_confidence_train", 'min_side_length'"min_side_length""min_side_length""min_side_length""min_side_length", 'nms_mode'"nms_mode""nms_mode""nms_mode""nms_mode", 'nms_type'"nms_type""nms_type""nms_type""nms_type", 'soft_nms_type'"soft_nms_type""soft_nms_type""soft_nms_type""soft_nms_type"

GenParamValueGenParamValueGenParamValuegenParamValuegen_param_value (input_control)  attribute.value(-array) HTupleMaybeSequence[Union[int, float, str]]HTupleHtuple (string / integer / real) (string / int / long / double) (HString / Hlong / double) (char* / Hlong / double)

Generic input parameter values.

Default: []

Suggested values: 'rectangle1'"rectangle1""rectangle1""rectangle1""rectangle1", 'rectangle2'"rectangle2""rectangle2""rectangle2""rectangle2", 'dense'"dense""dense""dense""dense", 'anchors'"anchors""anchors""anchors""anchors", 'standard'"standard""standard""standard""standard", 'soft'"soft""soft""soft""soft", 'exact'"exact""exact""exact""exact", 'arIoU'"arIoU""arIoU""arIoU""arIoU", 'linear'"linear""linear""linear""linear", 'gaussian'"gaussian""gaussian""gaussian""gaussian", 'true'"true""true""true""true", 'false'"false""false""false""false", 'default'"default""default""default""default", 'none'"none""none""none""none", 0.05, 0.5, 1.0, 0.7, 10.0, 5.0, 2000

DLLayerBoxProposalsDLLayerBoxProposalsDLLayerBoxProposalsDLLayerBoxProposalsdllayer_box_proposals (output_control)  dl_layer HDlLayer, HTupleHHandleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

BoxProposals layer.

Example (HDevelop)

* Minimal example for the usage of layers
*  - create_dl_layer_box_proposals
*  - create_dl_layer_box_targets
* for creating and training a model to perform object detection.
*
dev_update_off ()
NumClasses := 1
AnchorAspectRatios := 1.0
AnchorNumSubscales := 1
* Define the input image layer.
create_dl_layer_input ('image', [224,224,3], [], [], DLLayerInputImage)
* Define the input ground truth box layers.
create_dl_layer_input ('bbox_row1', [1, 1, 10], ['allow_smaller_tuple'], \
                       ['true'], DLLayerInputRow1)
create_dl_layer_input ('bbox_row2', [1, 1, 10], ['allow_smaller_tuple'], \
                       ['true'], DLLayerInputRow2)
create_dl_layer_input ('bbox_col1', [1, 1, 10], ['allow_smaller_tuple'], \
                       ['true'], DLLayerInputCol1)
create_dl_layer_input ('bbox_col2', [1, 1, 10], ['allow_smaller_tuple'], \
                       ['true'], DLLayerInputCol2)
create_dl_layer_input ('bbox_label_id', [1, 1, 10], \
                       ['allow_smaller_tuple'], ['true'], \
                       DLLayerInputLabelID)
create_dl_layer_class_id_conversion (DLLayerInputLabelID, \
                                     'class_id_conversion', \
                                     'from_class_id', [], [], \
                                     DLLayerClassIdConversion)
* Concatenate all box coordinates.
create_dl_layer_concat ([DLLayerInputRow1, DLLayerInputCol1, \
                        DLLayerInputRow2, DLLayerInputCol2, \
                        DLLayerClassIdConversion], \
                        'gt_boxes', 'height', [], [], DLLayerGTBoxes)
*
* Perform some operations on the input image to extract features.
* -> this serves as our backbone CNN here.
create_dl_layer_convolution (DLLayerInputImage, 'conv1', 3, 1, 2, 8, 1, \
                             'half_kernel_size', 'relu', [], [], \
                             DLLayerConvolution)
create_dl_layer_convolution (DLLayerConvolution, 'conv2', 3, 1, 2, 8, 1, \
                             'half_kernel_size', 'relu', [], [], \
                             DLLayerConvolution)
create_dl_layer_pooling (DLLayerConvolution, 'pool', 2, 2, 'none', \
                         'maximum', [], [], DLLayerPooling)
*
* Create the anchor boxes -> adapt the scale to fit the object size.
create_dl_layer_anchors (DLLayerPooling, DLLayerInputImage, 'anchor', \
                         AnchorAspectRatios, AnchorNumSubscales, [], \
                         ['scale'], [8], DLLayerAnchors)
*
* Create predictions for the classification and regression of anchors.
* We set the bias such that background is a lot more likely than foreground.
PriorProb := 0.05
BiasInit := -log((1.0 - PriorProb) / PriorProb)
create_dl_layer_convolution (DLLayerPooling, 'cls_logits', 3, 1, 1, \
                             NumClasses, 1, 'half_kernel_size', 'none', \
                             ['bias_filler_const_val'], \
                             [BiasInit], DLLayerClsLogits)
create_dl_layer_convolution (DLLayerPooling, 'box_delta_predictions', 5, 1, \
                             1, 4*|AnchorAspectRatios|*|AnchorNumSubscales|, \
                             1, 'half_kernel_size', 'none', [], [], \
                             DLLayerBoxDeltaPredictions)
*
* Generate the class and box regression targets for the anchors
* according to the ground truth boxes.
* -> we use inside-weights here, they also need to be set in the
*    corresponding box proposals layer later.
Targets := ['cls_target', 'cls_weight', 'box_target', 'box_weight', \
            'num_fg_instances']
create_dl_layer_box_targets (DLLayerAnchors, DLLayerGTBoxes, [], Targets, \
                             'anchors', Targets, NumClasses, \
                             ['inside_center_weight', \
                             'inside_dimension_weight'], [10.0, 5.0], \
                             DLLayerClassTarget, DLLayerClassWeight, \
                             DLLayerBoxTarget, DLLayerBoxWeight, \
                             DLLayerNumFgInstances, _, _)
*
* We use a focal loss for the classification predictions.
create_dl_layer_loss_focal (DLLayerClsLogits, DLLayerClassTarget, \
                            DLLayerClassWeight, DLLayerNumFgInstances, \
                            'loss_cls', 1.0, 2.0, 0.25, \
                            'sigmoid_focal_binary', [], [], DLLayerLossCls)
* We use an L1-loss for the box deltas.
create_dl_layer_loss_huber (DLLayerBoxDeltaPredictions, DLLayerBoxTarget, \
  DLLayerBoxWeight, [], 'loss_box', 1.0, 0.0, [], [], DLLayerLossBox)
*
* Apply sigmoid to class-predictions and compute box outputs.
* --> alternatively, we could directly apply the prediction and set the
*     focal loss mode to 'focal_binary' instead of 'sigmoid_focal_binary'.
create_dl_layer_activation (DLLayerClsLogits, 'cls_probs', 'sigmoid', \
                            [], [], DLLayerClsProbs)
create_dl_layer_box_proposals (DLLayerClsProbs, DLLayerBoxDeltaPredictions, \
                               DLLayerAnchors, DLLayerInputImage, \
                               'anchors', ['inside_center_weight', \
                               'inside_dimension_weight'], [10.0, 5.0], \
                               DLLayerBoxProposals)
*
* Create the model.
OutputLayers := [DLLayerLossCls, DLLayerLossBox, DLLayerBoxProposals]
create_dl_model (OutputLayers, DLModelHandle)
*
* Prepare the model for using it as a detection model.
set_dl_model_param (DLModelHandle, 'type', 'detection')
ClassIDs := [2]
set_dl_model_param (DLModelHandle, 'class_ids', ClassIDs)
set_dl_model_param (DLModelHandle, 'max_overlap', 0.1)
*
* Create a sample.
create_dict (DLSample)
gen_image_const (Image, 'real', 224, 224)
gen_circle (Circle, [50., 100.], [50., 150.], [20., 20.])
overpaint_region (Image, Circle, [255], 'fill')
compose3 (Image, Image, Image, Image)
set_dict_object (Image, DLSample, 'image')
smallest_rectangle1 (Circle, Row1, Col1, Row2, Col2)
set_dict_tuple (DLSample, 'bbox_row1', Row1)
set_dict_tuple (DLSample, 'bbox_row2', Row2)
set_dict_tuple (DLSample, 'bbox_col1', Col1)
set_dict_tuple (DLSample, 'bbox_col2', Col2)
set_dict_tuple (DLSample, 'bbox_label_id', [2,2])
*
* Train the model for some iterations (heavy overfitting).
set_dl_model_param (DLModelHandle, 'learning_rate', 0.0001)
Iteration := 0
TotalLoss := 1e6
LossCls := 1e6
LossBox := 1e6
dev_inspect_ctrl ([Iteration, TotalLoss, LossCls, LossBox])
while (TotalLoss > 0.2 and Iteration < 3000)
  train_dl_model_batch (DLModelHandle, DLSample, DLResult)
  get_dict_tuple (DLResult, 'loss_cls', LossCls)
  get_dict_tuple (DLResult, 'loss_box', LossBox)
  get_dict_tuple (DLResult, 'total_loss', TotalLoss)
  Iteration := Iteration + 1
endwhile
dev_close_inspect_ctrl ([Iteration, TotalLoss, LossCls, LossBox])
*
* Apply the detection model.
apply_dl_model (DLModelHandle, DLSample, [], DLResult)
*
* Display ground truth and result.
create_dict (DLDatasetInfo)
set_dict_tuple (DLDatasetInfo, 'class_ids', ClassIDs)
set_dict_tuple (DLDatasetInfo, 'class_names', ['circle'])
create_dict (WindowHandleDict)
dev_display_dl_data (DLSample, DLResult, DLDatasetInfo, \
                    ['image', 'bbox_ground_truth', 'bbox_result'], [], \
                    WindowHandleDict)
stop ()
dev_close_window_dict (WindowHandleDict)

See also

create_dl_layer_box_targetscreate_dl_layer_box_targetsCreateDlLayerBoxTargetsCreateDlLayerBoxTargetscreate_dl_layer_box_targets, create_dl_layer_roi_poolingcreate_dl_layer_roi_poolingCreateDlLayerRoiPoolingCreateDlLayerRoiPoolingcreate_dl_layer_roi_pooling, create_dl_layer_anchorscreate_dl_layer_anchorsCreateDlLayerAnchorsCreateDlLayerAnchorscreate_dl_layer_anchors

References

Tsung-Yi Lin, Piotr Dollàr, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie, "Feature Pyramid Networks for Object Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 936--944, doi: 10.1109/CVPR.2017.106.,
Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis, SSoft-NMS - Improving Object Detection with One Line of Code," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 5562--5570, doi: 10.1109/ICCV.2017.593.

Module

Deep Learning Professional