Operator Reference
Semantic Segmentation and Edge Extraction
This chapter explains how to use semantic segmentation based on deep learning, both for the training and inference phases.
With semantic segmentation we assign each pixel of the input image to a class using a deep learning (DL) network.
The result of semantic segmentation is an output image, in which the pixel value signifies the assigned class of the corresponding pixel in the input image. Thus, in HALCON the output image is of the same size as the input image. For general DL networks the deeper feature maps, representing more complex features, are usually smaller than the input image (see the section “The Network and the Training Process” in Deep Learning). To obtain an output of the same size as the input, HALCON uses segmentation networks with two components: an encoder and a decoder. The encoder determines features of the input image as done, e.g., for deep-learning-based classification. As this information is 'encoded' in a compressed format, the decoder is needed to reconstruct the information to the desired outcome, which, in this case, is the assignment of each pixel to a class. Note that, as pixels are classified, overlapping instances of the same class are not distinguished as distinct.
Edge extraction is a special case of semantic segmentation, where the
model is trained to distinguish two classes: 'edge' and 'background'.
For more information, see “Solution Guide I - Basics”
.
Semantic segmentation with deep learning is implemented within the more
general deep learning model of HALCON.
For more information to the latter one, see the chapter
Deep Learning / Model.
For the specific system requirements in order to apply deep learning,
please refer to the HALCON “Installation Guide”
.
The following sections are introductions to the general workflow needed for semantic segmentation, information related to the involved data and parameters, and explanations to the evaluation measures.
General Workflow
In this paragraph, we describe the general workflow for a semantic
segmentation task based on deep learning.
It is subdivided into the four parts
preprocessing of the data, training of the model,
evaluation of the trained model, and inference on new images.
Thereby we assume, your dataset is already labeled, see also the section
“Data” below.
Have a look at the HDevelop example series
segment_pill_defects_deep_learning
for an application.
The example segment_edges_deep_learning_with_retraining
shows the
complete workflow for an edge extraction application.
- Preprocess the data
-
This part is about how to preprocess your data. The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_1_preprocess.hdev
.-
The information what is to be found in which image of your training dataset needs to be transferred. This is done by the procedure
-
read_dl_dataset_segmentation
.
Thereby a dictionary
DLDataset
is created, which serves as a database and stores all necessary information about your data. For more information about the data and the way it is transferred, see the section “Data” below and the chapter Deep Learning / Model. -
-
Split the dataset represented by the dictionary
DLDataset
. This can be done using the procedure-
split_dl_dataset
.
The resulting split will be saved over the key
split
in each sample entry ofDLDataset
. -
-
Now you can preprocess your dataset. For this, you can use the procedure
-
preprocess_dl_dataset
.
This procedure also offers guidance on how to implement a customized preprocessing procedure.
To use this procedure, specify the preprocessing parameters as e.g., the image size. For this latter one you should select the smallest possible image size at which the regions to segment are still well recognizable. Store all the parameter with their values in a dictionary
DLPreprocessParam
, wherefore you can use the procedure-
create_dl_preprocess_param
.
We recommend to save this dictionary
DLPreprocessParam
in order to have access to the preprocessing parameter values later during the inference phase.During the preprocessing of your dataset also the images
weight_image
will be generated for the training dataset bypreprocess_dl_dataset
. They assign each class the weight ('class weights') its pixels get during training (see the section “Model Parameters and Hyperparameters” below). -
-
- Training of the model
-
This part is about how to train a DL semantic segmentation model. The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_2_train.hdev
.-
A network has to be read using the operator
-
The model parameters need to be set via the operator
Such parameters are e.g.,
'image_dimensions'
and'class_ids'
, see the documentation of
.get_dl_model_param
You can always retrieve the current parameter values using the operator
-
Set the training parameters and store them in the dictionary
TrainParam
. These parameters include:-
the hyperparameters, for an overview see the section “Model Parameters and Hyperparameters” below and the chapter Deep Learning.
-
parameters for possible data augmentation (optional).
-
parameters for the evaluation during training.
-
parameters for the visualization of training results.
-
parameters for serialization.
This can be done using the procedure
-
create_dl_train_param
.
-
-
Train the model. This can be done using the procedure
-
train_dl_model
.
The procedure expects:
-
the model handle
DLModelHandle
-
the dictionary with the data information
DLDataset
-
the dictionary with the training parameter
TrainParam
-
the information, over how many epochs the training shall run.
In case the procedure
train_dl_model
is used, the total loss as well as optional evaluation measures are visualized. -
-
- Evaluation of the trained model
-
In this part we evaluate the semantic segmentation model. The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_3_evaluate.hdev
.-
Set the model parameters which may influence the evaluation, as e.g.,
'batch_size'
, using the operator -
The evaluation can conveniently be done using the procedure
-
evaluate_dl_model
.
-
-
The dictionary
EvaluationResult
holds the asked evaluation measures. You can visualize your evaluation results using the procedure-
dev_display_segmentation_evaluation
.
-
-
- Inference on new images
-
This part covers the application of a DL semantic segmentation model. The single steps are also shown in the HDevelop example
segment_pill_defects_deep_learning_4_infer.hdev
.-
Set the parameters as e.g.,
'batch_size'
using the operator -
Generate a data dictionary
DLSample
for each image. This can be done using the procedure-
gen_dl_samples_from_images
.
-
-
Preprocess the image as done for the training. We recommend to do this using the procedure
-
preprocess_dl_samples
.
When you saved the dictionary
DLPreprocessParam
during the preprocessing step, you can directly use it as input to specify all parameter values. -
-
Apply the model using the operator
-
Retrieve the results from the dictionary
'DLResultBatch'
. The regions of the particular classes can be selected using e.g., the operator
on the segmentation image.threshold
-
Data
We distinguish between data used for training and evaluation, and data
for inference.
The latter ones consist of bare images.
The first ones consist of images with their information and ground truth
annotations. You provide this information defining for each pixel,
to which class it belongs (over the segmentation_image
, see
below for further explanations).
As basic concept, the model handles data over dictionaries, meaning it
receives the input data over a dictionary DLSample
and
returns a dictionary DLResult
and DLTrainResult
,
respectively. More information on the
data handling can be found in the chapter Deep Learning / Model.
- Data for training and evaluation
-
The training data is used to train a network for your specific task. The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below. The information about the images and their ground truth annotations is provided over the dictionary
DLDataset
and for every sample the respectivesegmentation_image
, defining the class for every pixel.- Classes
-
The different classes are the sets or categories differentiated by the network. They are set in the dictionary
DLDataset
and are passed to the model via the operator
.set_dl_model_param
In semantic segmentation, we call your attention to two special cases: the class 'background' and classes declared as 'ignore':
-
'background' class: The networks treats the background class like any other class. It is also not necessary to have a background class. But if you have different classes in your dataset you are not interested in although they have to be learned by the network, you can set them all as 'background'. As a result, the class background will be more diverse. See the procedure
preprocess_dl_samples
for more information. -
'ignore' classes: There is the possibility to declare one or multiple classes as 'ignore'. Pixels assigned to a 'ignore' class are ignored by the loss as well as for all measures and evaluations. Please see the section “The Network and the Training Process” in the chapter Deep Learning for more information about the loss. The network does not classify any pixel into a class declared as 'ignore'. Also, the pixels labeled to belong to such a class will be classified by the network like every other pixel into a non-'ignore' class. In the example given in the image below, this means the network will classify also the pixels of the class 'border', but it will not classify any pixel into the class 'border'. You can declare a class as 'ignore' using the parameter
'ignore_class_ids'
of
.set_dl_model_param
In edge extraction only two classes are distinguished: 'edge' and 'background'. The class 'edge' is labeled just like a normal class. Thus, only one class is labeled and this class is called 'edge'.
-
DLDataset
-
This dictionary serves as a database, this means, it stores all information about your data necessary for the network as, e.g., the names and paths to the images, the classes, ... Please see the documentation of Deep Learning / Model for the general concept and key entries. Keys only applicable for semantic segmentation concern the
segmentation_image
(see the entry below). Over the keyssegmentation_dir
andsegmentation_file_name
you provide the information how they are named and where they are saved. segmentation_image
-
In order that the network can learn, how the member of different classes look like, you tell for each pixel of every image in the training dataset to which class it belongs. This is done by storing for every pixel of the input image the class encoded as pixel value in the corresponding
segmentation_image
. These annotations are the ground truth annotations.( 1) ( 2)
You need enough training data to split it into three subsets, one used for training, one for validation and one for testing the network. These subsets are preferably independent and identically distributed (see the section “Data” in the chapter Deep Learning. For the splitting you can use the procedure
split_dl_data_set
. - Images
-
Regardless of the application, the network poses requirements on the images regarding the image dimensions, the gray value range, and the type. The specific values depend on the network itself, see the documentation of
for the specific values of different networks. For a loaded network they can be queried withread_dl_model
. In order to fulfill these requirements, you may have to preprocess your images. Standard preprocessing of an entire sample and therewith also the image is implemented inget_dl_model_param
preprocess_dl_samples
. This procedure also offers guidance on how to implement a customized preprocessing procedure. - Network output
-
The network output depends on the task:
- training
-
As output, the operator will return a dictionary
with the current value of the total loss as well as values for all other losses included in your model.DLTrainResult
- inference and evaluation
-
As output, the network will return a dictionary
for every sample. For semantic segmentation, this dictionary will include for each input image the handles of the two following images:DLResult
-
segmentation_image
: An image where each pixel has a value corresponding to the class its corresponding pixel has been assigned to (see the illustration below). -
segmentation_confidence
: An image, where each pixel has the confidence value out of the classification of the according pixel in the input image (see the illustration below).
( 1) ( 2) -
Model Parameters and Hyperparameters
Next to the general DL hyperparameters explained in Deep Learning, there is a further hyperparameter relevant for semantic segmentation:
-
'class weights', see the explanations below.
For a semantic segmentation model, the model parameters as well as the
hyperparameters (with the exception of 'class weights') are set using
.
The model parameters are explained in more detail in
set_dl_model_param
.
get_dl_model_param
Note, due to large memory usage, typically only small batch sizes are
possible for training. As a consequence, training is rather slow and we
advice to use a momentum higher than e.g., for classification.
The HDevelop example
segment_pill_defects_deep_learning_2_train.hdev
provides
good initial parameter values for the training of a segmentation network
in HALCON.
- 'class weights'
-
With the hyperparameter 'class weights' you can assign each class the weight its pixels get during training. Giving the unique classes a different weight, it is possible to force the network to learn the classes with different importance. This is useful in cases where a class dominates the images, as e.g., defect detection, where the defects take up only a small fraction within an image. In such a case a network classifying every pixel as background (thus, 'not defect') would achieve generally good loss results. Assigning different weights to the distinct classes helps to re-balance the distribution. In short, you can focus the loss to train especially on those pixels you determine to be important.
The network obtains these weights over
weight_image
, an image which is created for every training sample. Inweight_image
, every pixel value corresponds to the weight the corresponding pixel of the input image gets during training. You can create these images with the help of the following two procedures:-
calculate_dl_segmentation_class_weights
helps you to create the class weights. The procedure uses the concept of inverse class frequency weights. -
gen_dl_segmentation_weight_images
uses the class weights and generates theweight_image
.
This step has to be done before the training. Usually it is done during the preprocessing and it is part of the procedure
preprocess_dl_dataset
. Note, this hyperparameter is referred asclass_weights
or
within procedures. An illustration, how such an image with different weights looks like, is shown in the figure below.ClassWeights
Note, giving a specific part of the image the weight 0.0, these pixels do not contribute to the loss (see the section “The network and its training” in Deep Learning for more information about the loss).
( 1) ( 2) -
Evaluation measures for the Data from Semantic Segmentation
For semantic segmentation, the following evaluation measures are supported
in HALCON.
Note that for computing such a measure for an image, the related ground
truth information is needed.
All the measure values explained below for a single image
(e.g., mean_iou
) can also be calculated for an arbitrary number
of images.
For this, imagine a single, large image formed by the ensemble of the
output images, for which the measure is computed.
Note, all pixels of a class declared as 'ignore' are ignored for the
computation of the measures.
pixel_accuracy
-
The pixel accuracy is simply the ratio of all pixels that have been predicted with the correct class-label to the total number of pixels.
( 1) ( 2) ( 3) class_pixel_accuracy
-
The per-class pixel accuracy considers only pixels of a single class. It is defined as the ratio between the correctly predicted pixels and the total number of pixels labeled with this class.
In case a class does not occur it gets a
class_pixel_accuracy
value of -1 and does not contribute to the average value,mean_accuracy
. mean_accuracy
-
The mean accuracy is defined as the averaged per-class pixel accuracy,
class_pixel_accuracy
, of all occurring classes. class_iou
-
The per-class intersection over union (IoU) gives for a specific class the ratio of correctly predicted pixels to the union of annotated and predicted pixels. Visually this is the ratio between the intersection and the union of the areas, see the image below.
In case a class does not occur it gets a
class_iou
value of -1 and does not contribute to themean_iou
.( 1) ( 2) ( 3) mean_iou
-
The mean IoU is defined as the averaged per-class intersection over union,
class_iou
, of all occurring classes. Note that every occurring class has the same impact on this measure, independent of the number of pixels they contain. frequency_weighted_iou
-
As for the mean IoU, the per-class IoU is calculated first. But the contribution of each occurring class to this measure is weighted by the ratio of pixels that belong to that class. Note that classes with many pixels can dominate this measure.
pixel_confusion_matrix
-
The concept of a confusion matrix is explained in the section “Supervising the training” within the chapter Deep Learning. It applies for semantic segmentation, where the instances are single pixels.