CN114693967B

CN114693967B - Multi-classification semantic segmentation method based on classification tensor enhancement

Info

Publication number: CN114693967B
Application number: CN202210274049.0A
Authority: CN
Inventors: 李宏亮; 高翔宇; 邱奔流
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-03-20
Filing date: 2022-03-20
Publication date: 2023-10-31
Anticipated expiration: 2042-03-20
Also published as: CN114693967A

Abstract

The invention provides a multi-classification semantic segmentation method based on classification tensor enhancement, which comprises the following steps: inputting the pictures to be classified into an original segmentation network; the feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel; the N classification heads respectively perform classification processing on the input features to output N classification tensors, and the transit part outputs the features to be classified; and cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as a final multi-classification result. The invention can be simply added into most of the divided network structures, and only a small amount of network parameter increase is brought. Compared with a method for optimizing the multi-classification head result by using CE loss directly, the method can improve the classification performance of the segmented network under the condition of increasing a small amount of parameter consumption.

Description

Multi-classification semantic segmentation method based on classification tensor enhancement

Technical Field

The present invention relates to multi-classification technology, and more particularly, to a technology for supporting multi-classification segmentation based on classification tensor enhancement.

Background

With the continuous development of hardware computing and deep learning, the need for high-precision pixel-level processing of images is becoming increasingly common. Image segmentation is used as the most common visual task in computer vision, and pixel-level classification of images can be realized by means of a deep neural network. In many application scenarios, image segmentation is an indispensable loop in the processing flow, such as automatic driving, augmented reality, etc., and the segmentation result directly affects the effect of downstream processing. The method is mainly used for enhancing the classification links of the segmentation network, so that the segmentation effect is improved.

Most networks implementing semantic segmentation ultimately optimize the final output multi-class tensor only by Cross Entropy (CE) loss, achieving relatively good segmentation results in most tasks. However, since CE loss only excites the prediction scores on the correct class, suppression of similar class scores is easily ignored, which in turn easily causes confusion in the network when distinguishing similar classes. Especially in specific subordinate tasks such as human body analysis, the influence of misclassification caused by confusion type on the network is not negligible. The use of the two-class header to support multiple class headers can increase the classification capability of the partitioned network, but the relation between the two-class predictions of each class is difficult to reflect through BCE loss, so that the output tensor of the two-class header can further pass through new loss constraint to realize better support of the final result.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method for optimizing the classification tensor by using novel loss so that the classification tensor can better support the final result of the semantic segmentation network and realize better support of multi-classification results. The method aims at reducing the classification scores of predictions of incorrect classes in the classification tensor in each truth area and further enhancing the scores of correct predictions in each truth area. In addition, the present invention aims to further improve the classification ability of the segmentation network for similar classes. The loss is optimized by adding a simple two-class header to the existing network structure, so that the supporting effect of the two-class tensor on the final result is improved, and the average cross-over ratio of the final segmentation result is further improved.

The technical scheme adopted by the invention for solving the technical problems is that the multi-classification semantic segmentation method based on the enhancement of the classification tensor comprises the following steps:

1) Inputting the pictures to be classified into an original segmentation network; the original segmentation network comprises a feature extraction part and a multi-classification head;

2) The feature extraction part of the original segmentation network performs feature extraction, and the extracted features are input into N two classification heads and a transit part in parallel;

3) The N classification heads respectively perform classification processing on the input features to output N classification tensors, and the transfer part keeps the dimension of the input features and performs simple feature conversion to output the features to be classified;

4) And cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as a final multi-classification result.

Further, the loss function L adopted by the whole network for realizing the multi-classification method in the training process is as follows: l=l _CE +α·L _BCE +β·L _B2M The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _CE And L is equal to _BCE Cross entropy loss of multi-class header and two-class header respectively, alpha and beta are super parameters;

L _B2M to enhance loss of the classification tensor support attribute, L _B2M ＝L _overlap +L _missing ；L _overlap To reflect unreasonably overlapping loss terms in the two classifications, L _missing To reflect the loss term of the missing prediction in the two classifications.

The invention has the advantages that the method can be simply added into most of the split network structures, only a small amount of network parameter increase is brought, the improved strategy can improve the average cross ratio of the split network results, and the output tensor of the simple two-class head is further constrained by the designed B2M loss. Compared with a method for optimizing the multi-classification head result by using CE loss directly, the method can improve the classification performance of the segmented network under the condition of increasing a small amount of parameter consumption.

Drawings

Fig. 1: the multi-classification schematic diagram of the invention;

fig. 2: a network schematic used in the invention;

fig. 3: structure of two classification heads.

Detailed Description

Since the semantic segmentation task can be regarded as a pixel-by-pixel classification task. The conventional semantic segmentation network finally classifies each pixel by N classes through a plurality of classification heads on the premise that the label types are N classes. Considering that the classification heads can assist the classification of the N classification heads, the applicant proposes an improved strategy for enhancing the segmentation effect by strengthening the characteristics of the classification tensor. Structurally, a simple two-classification head and a transit part are designed; the penalty is designed to strengthen the two-class tensor support attribute, named B2M (Binary to Multiple) penalty.

As shown in fig. 1, we selected HRNetV2 as the baseline network for implementing the improvement strategy. Firstly, the tensor to be passed through multiple classifying heads in the original dividing network is simultaneously fed into multiple parallel two classifying heads and a transit part, the multiple parallel two classifying heads are used for converting N classifying problems into N classifying problems and outputting N classifying results, the transit part is used for maintaining the channel number of the tensor to be classified, outputting the characteristics to be classified, then the N classifying results are cascaded with the characteristics to be classified, the cascaded tensor is finally fed into the multiple classifying heads, and the multiple classifying heads are used for outputting the final N classifying results. The multi-classification head structure adds N input channels to match with the added classification tensors in the cascade based on the original network design.

For training of the network, the end result of the network is still optimized using CE loss; optimization of the binary class tensor requires the use of the proposed B2M penalty in addition to the BCE penalty.

We obtain the truth label g of the two classification heads from the multi-classification truth label by converting the N classification problem into N classification problems ₁ ,g ₂ ...g _N . Inputting a picture, the two classification heads will output N classification foreground score maps (N classification tensors), each corresponding to a class, wherein the k is recorded as p _k ，p _k ∈[0,1] ^H×W K=1, 2..n, H and W are the high and wide of the score plot, respectively, which corresponds to the true value g _k ∈{0,1} ^H×W 。p _k The area with the foreground score larger than 0.5 is the foreground area, and the serial number of the label category actually existing in the input picture is recorded as l ₁ ,l ₂ ...l _C C is the number of truth-value categories contained in a picture. Since the score map of each class in the two-class tensor is optimized by BCE loss alone, there may be overlap or absence of foreground regions predicted by the tensor over different classes, i.e., some pixels on the binary tensor have a prediction score over more than 0.5 or less than 0.5 over all classes. The overlapping and missing of various foreground regions can weaken the support of multi-classification results of corresponding positions.

For the first _i Prediction of classes, we calculate the Overlap degree of other classes in their corresponding truth regions, overlap (l) _i ) Where "sum" means summing all elements and "×" means multiplying the corresponding position elements:

obtaining Overlap degree Overlap (l) _i ) It is then mapped through a non-linearity where "σ" represents the Sigmoid function and k and b are hyper-parameters:

f(x)＝σ(k·x+b)-σ(b) (2)

after mapping the overlapping degree of the C categories in turn, averaging over the categories, we get the first term of B2M loss, denoted L _overlap The loss term is mainly directed to suppression of unreasonable overlap in two-class prediction:

for areas lacking predictions, we further enhance the correct score of the true areas by using a similar method to calculate the cross-ratios. Note this loss as L _missing The calculation method is as follows:

from this, the proposed B2M loss, L, can be calculated _B2M ：

L _B2M ＝L _overlap +L _missing (5)

The loss function of the whole network is as follows, L _CE And L is equal to _BCE The cross entropy loss is the multi-class header and the two-class header, respectively, and α and β are super parameters:

L＝L _CE +α·L _BCE +β·L _B2M (6)

the invention is implemented on a server containing 8 TITAN X PASCALs, and the network adopts HRNetV2 as a base, as shown in figure 2. The whole network mainly comprises a backbone network backbone, a two-classification link binary classitication head and a multi-classification link multi-classitication head. The backbone network is HRNetV2-W48 and is used for extracting characteristics; the structure of the two sorting heads is shown in fig. 3, and the transit part transformation part is similar to the structure of the two sorting heads, and is composed of 2 1×1 convolution layers, a batch standardization Batch Normalization layer and an activation function Relu layer, which are only different in the number of output channels of the last convolution layer; the multi-sort head structure is similar to baseline. The main steps of the design strategy are as follows: and obtaining two classification tensors by using a light-weight two-classification head, calculating the overlapping degree corresponding to each truth class, taking an average value on the class, calculating the average score of each truth class in the truth area, taking an average value on the class, and optimizing the two classification tensors by using the two average values.

The effect of the invention is described below by combining experimental results, and the data sets of human body analysis tasks in semantic segmentation have more similar categories, so the following three data sets are selected for experiments. The same experimental conditions were used for both training and testing by baseline and the network after policy improvement:

table 1 comparative experiments with mIoU (percent) on three data sets

The method (outer) has obvious improvement on the mIoU compared with the original split network (baseline) on three data sets (LIP, ATR, PPSS), so that the effectiveness of the strategy on the improvement of the network performance can be seen.

Claims

1. The multi-classification semantic segmentation method based on the two-classification tensor enhancement is characterized by comprising the following steps of:

3) The N classification heads respectively perform classification processing on the input features to output N classification tensors, the transit part keeps the channel number of the input features, and the features to be classified are output;

4) Cascading the N classification foreground score maps with the features to be classified, and finally sending the cascading tensors into a multi-classification head, wherein the multi-classification head carries out N classification processing on the input cascading tensors and outputs N classification tensors as final multi-classification results;

the loss function L adopted by the whole network for realizing the multi-classification method in the training process is as follows: l=l _CE +α.L _BCE +β.L _B2M ；

Wherein L is _CE And L is equal to _BCE Cross entropy loss of multi-class header and two-class header respectively, alpha and beta are super parameters; l (L) _B2M To enhance loss of the classification tensor support attribute, L _B2M ＝L _overlap +L _missing ；L _overlap To reflect unreasonably overlapping loss terms in the two classifications, L _missing To reflect in two categoriesMissing a predicted penalty term;

loss term L reflecting unreasonable overlap in two classifications _overlap The specific calculation method of (a) is as follows:

wherein, C represents the number of truth value categories contained in the current input picture; for the i-th truth class contained, l _i Corresponding to the serial number representing the class label, f (x) represents a nonlinear mapping to the input x, f (x) =δ (k·x+b) - δ (b), δ represents a Sigmoid function, k and b are hyper-parameters; overlap (l) _i ) For other categories in the first _i The overlapping degree of the true value areas corresponding to the categories;

loss term L reflecting miss prediction in two classifications _missing The specific calculation method of (a) is as follows:

where sum represents the sum of all elements, x represents the multiplication of the corresponding position elements,represents the first _i A classification prediction score corresponding to the respective classification, +.>Representing the two classification of real tags.

2. The method of claim 1, wherein the original split network is HRNetV2.