HALCON Operator reference

create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc (Operator)

create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc — Create a CTC loss layer.

Signature

create_dl_layer_loss_ctc( : : DLLayerInput, DLLayerInputLengths, DLLayerTarget, DLLayerTargetLengths, LayerName, GenParamName, GenParamValue : DLLayerLossCTC)

Description

The operator create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc creates a Connectionist Temporal Classification (CTC) loss layer whose handle is returned in DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc. See the reference cited below for information about the CTC loss.

With this loss layer it is possible to train sequence to sequence models (Seq2Seq). E.g., it can be used to train a model that is able to read text in an image. In order to do so, the sequences are compared, thus the determined network prediction DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input with sequence length DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths to the given DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target with sequence length DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths.

The following variables are important to understand the input shapes:

T: Maximum input sequence length (i.e., width of DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input)
S: Maximum output sequence length (i.e., width of DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target)
C: Number of classes including 0 as the blank class ID (i.e., depth of DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input)

This layer expects multiple layers as input:

DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input: Specifies the network prediction.

Shape: [T,1,C]
DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths: Specifies the input sequence length of each item in the batch.

Shape: [1,1,1]
DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target: Specifies the target sequences.

Shape: [S,1,1]
DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths: Input layer which specifies the target sequence length of each item in the batch.

Shape: [1,1,1]

The parameter LayerNameLayerNameLayerNamelayerNamelayer_name sets an individual layer name. Note that if creating a model using create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelcreate_dl_model each layer of the created network must have a unique name.

The CTC loss is typically applied in a CNN as follows. The input sequence is expected to be encoded in some CNN layer with the output shape [width: T, height: 1, depth: C]. Typically the end of a large fully convolutional classifier is pooled in height down to 1 with an average pooling layer. It is important that the last layer is wide enough to hold enough information. In order to obtain the sequence prediction in the output depth a 1x1 convolutional layer is added after the pooling with the number of kernels set to C. In this use case the CTC loss obtains this convolutional layer as input layer DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input. The width of the input layer determines the maximum output sequence of the model.

The CTC loss can be applied to a batch of input items with differing input and target sequence lengths. T and S are the maximum lengths. In DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths and DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths the individual length of each item in a batch needs to be specified. Note that each individual loss within the batch is first normalized by dividing it by its corresponding target length. The final output loss is then computed as the mean of these normalized values across the entire batch.

Restrictions

A model containing this layer cannot be trained on a CPU.
A model containing this layer cannot be trained with a 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" != 1.0.
The input layer DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input must not be a softmax layer. The softmax calculation is done internally in this layer. For inference, there should be an extra softmax layer connected to the DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input (see create_dl_layer_softmaxcreate_dl_layer_softmaxCreateDlLayerSoftmaxCreateDlLayerSoftmaxcreate_dl_layer_softmax).

The following generic parameters GenParamNameGenParamNameGenParamNamegenParamNamegen_param_name and the corresponding values GenParamValueGenParamValueGenParamValuegenParamValuegen_param_value are supported:

'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output":

Determines whether apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelapply_dl_model will include the output of this layer in the dictionary DLResultBatchDLResultBatchDLResultBatchDLResultBatchdlresult_batch even without specifying this layer in OutputsOutputsOutputsoutputsoutputs ('true'"true""true""true""true") or not ('false'"false""false""false""false").

Default: 'false'"false""false""false""false"

'zero_infinity'"zero_infinity""zero_infinity""zero_infinity""zero_infinity":

Determines whether infinite loss values are set to zero in order to stabilize the training. If set to 'true'"true""true""true""true", the loss values and gradients of samples that would yield an infinite loss are set to zero.

List of values: 'true'"true""true""true""true", 'false'"false""false""false""false"

Default: 'false'"false""false""false""false"

Certain parameters of layers created using this operator create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc can be set and retrieved using further operators. The following tables give an overview, which parameters can be set using set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param and which ones can be retrieved using get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param or get_dl_layer_paramget_dl_layer_paramGetDlLayerParamGetDlLayerParamget_dl_layer_param. Note, the operators set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param and get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param require a model created by create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelcreate_dl_model.

Layer Parameters	`set`	`get`
'input_layer'"input_layer""input_layer""input_layer""input_layer" (`DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input`, `DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths`, `DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target`, and/or `DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths`)		`x`
'name'"name""name""name""name" (`LayerNameLayerNameLayerNamelayerNamelayer_name`)	`x`	`x`
'output_layer'"output_layer""output_layer""output_layer""output_layer" (`DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc`)		`x`
'shape'"shape""shape""shape""shape"		`x`
'type'"type""type""type""type"		`x`
'zero_infinity'"zero_infinity""zero_infinity""zero_infinity""zero_infinity"	`x`	`x`

Generic Layer Parameters	`set`	`get`
'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output"	`x`	`x`
'num_trainable_params'"num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params"		`x`

Execution Information

Multithreading type: reentrant (runs in parallel with non-exclusive operators).
Multithreading scope: global (may be called from any thread).
Processed without parallelization.

Parameters

DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input (input_control) dl_layer → (handle)

Input layer with network predictions.

DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths (input_control) dl_layer → (handle)

Input layer which specifies the input sequence length of each item in the batch.

DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target (input_control) dl_layer → (handle)

Input layer which specifies the target sequences. If the input dimensions of the CNN are changed the width of this layer is automatically resized to the same width as the DLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input layer.

DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths (input_control) dl_layer → (handle)

Input layer which specifies the target sequence length of each item in the batch.

LayerNameLayerNameLayerNamelayerNamelayer_name (input_control) string → (string)

Name of the output layer.

GenParamNameGenParamNameGenParamNamegenParamNamegen_param_name (input_control) attribute.name(-array) → (string)

Generic input parameter names.

Default: []

List of values: 'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output", 'zero_infinity'"zero_infinity""zero_infinity""zero_infinity""zero_infinity"

GenParamValueGenParamValueGenParamValuegenParamValuegen_param_value (input_control) attribute.value(-array) → (string / integer / real)

Generic input parameter values.

Default: []

Suggested values: 'true'"true""true""true""true", 'false'"false""false""false""false"

DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc (output_control) dl_layer → (handle)

CTC loss layer.

Example (HDevelop)

* Create a simple Seq2Seq model which overfits to a single output sequence.

* Input sequence length
T := 6
* Number of classes including blank (blank is always class_id 0)
C := 3
* Batch Size
N := 1
* Maximum length of target sequences
S := 3

* Model creation
create_dl_layer_input ('input', [T,1,1], [], [], Input)
create_dl_layer_dense (Input, 'dense', T*C, [], [], DLLayerDense)
create_dl_layer_reshape (DLLayerDense, 'dense_reshape', [T,1,C], [], [],\
                         ConvFinal)

* Training part
create_dl_layer_input ('ctc_input_lengths', [1,1,1], [], [],\
                       DLLayerInputLengths)
create_dl_layer_input ('ctc_target', [S,1,1], ['allow_smaller_tuple'], ['true'], DLLayerTarget)
create_dl_layer_input ('ctc_target_lengths', [1,1,1], [], [],\
                       DLLayerTargetLengths)
* Create the loss layer
create_dl_layer_loss_ctc (ConvFinal, DLLayerInputLengths, DLLayerTarget,\
                          DLLayerTargetLengths, 'ctc_loss', [], [],\
                          DLLayerLossCTC)

* Inference part
create_dl_layer_softmax (ConvFinal, 'softmax', [], [], DLLayerSoftMax)
create_dl_layer_depth_max (DLLayerSoftMax, 'prediction', 'argmax', [], [],\
                           DLLayerDepthMaxArg, _)

* Setting a seed because the weights of the network are randomly initialized
set_system ('seed_rand', 35)

create_dl_model ([DLLayerLossCTC,DLLayerDepthMaxArg], DLModel)

set_dl_model_param (DLModel, 'batch_size', N)
set_dl_model_param (DLModel, 'runtime', 'gpu')
set_dl_model_param (DLModel, 'learning_rate', 1)

* Create input sample for training
InputSequence := [0,1,2,3,4,5]
TargetSequence := [1,2,1]
create_dict (InputSample)
InputSample.input := InputSequence
InputSample.ctc_input_lengths := |InputSequence|
InputSample.ctc_target := TargetSequence
InputSample.ctc_target_lengths := |TargetSequence|

PredictedSequence := []
dev_inspect_ctrl ([InputSequence, TargetSequence, CTCLoss, PredictedValues,\
                  PredictedSequence])
Eps := 0.01
MaxIterations := 15
for I := 0 to MaxIterations by 1
    apply_dl_model (DLModel, InputSample, ['prediction','softmax'], \
                    DLResultBatch)
    get_grayval (DLResultBatch.prediction, rep(0,T), [0:T-1], PredictedValues)

    train_dl_model_batch (DLModel, InputSample, DLTrainResult)
    CTCLoss := DLTrainResult.ctc_loss
    if (CTCLoss < Eps)
        break
    endif
    stop()
endfor

* Greedy decoding of predicted sequence
PredictedSequence := uniq(int(PredictedValues))
PredictedSequence := select_mask(PredictedSequence, PredictedSequence [#] 0)

References

Graves Alex et al., "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. 2006.

Module

Deep Learning Professional

Operators