CN116645512A

CN116645512A - Self-adaptive semantic segmentation method and device under severe conditions

Info

Publication number: CN116645512A
Application number: CN202310638992.XA
Authority: CN
Inventors: 欧阳攀; 李偲; 姚孝国; 黄光奇
Original assignee: Aerospace Science and Industry Shenzhen Group Co Ltd
Current assignee: Aerospace Science and Industry Shenzhen Group Co Ltd
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-08-25

Abstract

The invention provides a self-adaptive semantic segmentation method and device under severe conditions, which belong to the field of semantic segmentation, and are characterized in that pseudo tags are corrected; and then mixing the source domain data and the target domain data with the corrected pseudo tag according to a certain proportion, and training by mixing the source domain data and the target domain data, thereby improving the robustness and the robustness of the model under severe conditions. The method provided by the invention has the advantages that the trained model is used for semantic segmentation, so that the semantic segmentation effect under severe conditions can be improved, and the dependence on the data and the labels of the target domain is reduced.

Description

Self-adaptive semantic segmentation method and device under severe conditions

Technical Field

The invention belongs to the field of semantic segmentation, and particularly relates to a self-adaptive semantic segmentation method and device under severe conditions.

Background

Semantic segmentation plays an important role in the field of automatic driving and intelligent monitoring. In the prior art, a semantic segmentation model based on deep learning has good effect in a clear scene, but has great challenges in semantic segmentation under severe conditions (such as rain, fog, snow, night and the like). This is because in these cases, image collection is difficult, labeling effort is large, and the lack of high quality labeling data volume results in reduced model performance. In the existing method, the semantic segmentation model under a clear scene is generalized to severe conditions by using self-training as a main stream mode in a field self-adaption mode. However, during the self-training process, pseudo-tag noise caused by domain bias of the source domain dataset (the tagged dataset in a clear scene) and the target domain dataset (the untagged dataset in severe conditions) can negatively impact the model performance.

Disclosure of Invention

The invention aims to solve the technical problem of pseudo tag noise under severe conditions, and provides a self-adaptive semantic segmentation method and device under severe conditions.

In order to solve the technical problems, the invention adopts the following technical scheme:

a self-adaptive semantic segmentation method under severe conditions comprises the following steps:

step 1: acquiring a source domain data set and a target domain data set;

step 2: preprocessing image data in a source domain data set and a target domain data set;

step 3: predicting the preprocessed target domain data by using a teacher model, and generating a pseudo tag according to a prediction result;

step 4: correcting the pseudo tag;

step 5: mixing the source domain data and the target domain data with the corrected pseudo tag according to a preset proportion, respectively inputting the mixed domain data into a teacher model and a student model for mixed training, and obtaining a trained model after repeated iterative training;

step 6: and evaluating the model, and performing semantic segmentation on the input image by using the trained student model after the evaluation is passed.

Further, the method for generating the pseudo tag according to the prediction result in the step 3 is as follows:

predicting the target domain data by using a teacher model to obtain probability distribution of each pixel belonging to each category;

and taking the category with the highest probability in each pixel point as a pseudo tag of the pixel point.

Further, the method for correcting the pseudo tag comprises the following steps:

inputting the target domain data and the corresponding pseudo tag into a student model, and using a new pseudo tag predicted by the student model as a pseudo tag after the target domain data is corrected; in the cross entropy loss function of the student model:

the global weight parameter is used for setting the global weight of the source domain as 1, and taking the proportion of the part of the predicted value of the target domain larger than the threshold value as the global weight of the target domain;

the local weight parameter takes the difference of each pixel point in the target domain on the main and auxiliary classifiers as local weight, wherein the main and auxiliary classifiers refer to:

and setting the threshold parameter, wherein the initial value is 1/C, C is the number of categories, and then dynamically adjusting according to the predicted value of each pixel point category.

Further, the method for performing the hybrid training in the step 5 is as follows:

step 5.1: initializing student model and teacher model parameters;

step 5.2, randomly sampling an image and a label of a batch from the source domain data set, and randomly sampling an image of a batch from the target domain data set;

step 5.3: training a cross entropy loss function of a student model by using a mixed image and mixed labels, wherein the mixed image is obtained by taking object pixel points corresponding to one half of classes of class labels from one image of a source domain data set, the rest pixel points are filled with other pixel points in a target domain, and the mixed labels are formed by mixing labels of various classes taken from the source domain image in the mixed image and pseudo labels of the pixel points in the taken target domain image;

step 5.4: training a cross entropy loss function of a student model by using an enhanced mixed image and enhanced mixed labels, wherein the enhanced mixed image is formed by taking half of object pixel points corresponding to classes in class labels from a source domain image, filling the rest pixel points with pixel points in an image after the target domain image is subjected to visual enhancement, and the enhanced mixed labels are formed by mixing labels of various classes taken from the source domain image in the enhanced mixed image and pseudo labels of the pixel points taken from the enhanced target domain image;

step 5.5: updating the student model, and obtaining a new teacher model according to the weighted summation of the student model weight at the current moment and the previous teacher model weight;

step 5.6: stopping training after the training reaches the preset times, and outputting the trained student model.

Further, the method for performing visual enhancement in step 5.4 is:

the image is enhanced using an atmospheric scattering model,

the following formula is adopted for rain, fog and snow:

I(x)＝J(x)t(x)+A(1-t(x))

the following formula was used for the night:

1-I(x)＝(1-J(x))t(x)+A(1-t(x))，

wherein I (x) represents an input original image, J (x) represents an enhanced clear image, a is atmospheric light, and t (x) is a transmission diagram of the original image.

Further, the method for updating the teacher model in step 5.5 is as follows:

ψ _t+1 ←αψ _t +(1-α)θ _t

θ _t representing student model after t training iteration, ψ _t 、ψ _t+1 Respectively representing the teacher model after the t time and the t+1st training iteration, wherein alpha is a momentum coefficient.

Further, the method for dynamically adjusting the threshold parameter is as follows:

and the threshold value of each category is weighted and summed with the average value of the predicted value of each pixel point category at the current moment through the threshold value of the category at the previous moment.

The invention also provides a self-adaptive semantic segmentation device under severe conditions, which comprises the following modules:

an input module: for acquiring a source domain data set and a target domain data set;

and a pretreatment module: the method comprises the steps of preprocessing image data in a source domain data set and a target domain data set, and respectively extracting features of the preprocessed images to obtain a feature map of each training sample;

a pseudo tag generation module: the method comprises the steps of predicting target domain data by using a teacher model, and generating a pseudo tag according to a prediction result;

and a pseudo tag correction module: for correcting the pseudo tag;

and (3) a hybrid training module: the method comprises the steps of mixing source domain data and target domain data with corrected pseudo labels according to a preset proportion, respectively inputting the mixed domain data into a teacher model and a student model for mixed training, and obtaining a trained model after repeated iterative training;

and a segmentation module: the method is used for evaluating the model, and semantic segmentation is carried out on the input image by using the trained model after the evaluation is passed.

By adopting the technical scheme, the invention has the following beneficial effects:

the invention provides a self-adaptive semantic segmentation method and a device under severe conditions, which are characterized in that a pseudo tag is corrected; and then mixing the source domain data and the target domain data with the corrected pseudo tag according to a certain proportion, and training by mixing the source domain data and the target domain data, thereby improving the robustness and the robustness of the model under severe conditions. The semantic segmentation is carried out by using the model trained by the method, so that the semantic segmentation effect under severe conditions can be improved.

Drawings

FIG. 1 is a flow chart of a system of the present invention;

figure 2 shows a schematic diagram of a model framework.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 shows a specific embodiment of an adaptive semantic segmentation method under severe conditions, comprising the following steps:

step 1: a source domain data set and a target domain data set are acquired.

Step 2: image data in the source domain data set and the target domain data set are preprocessed.

In this embodiment, after the data set is acquired, preprocessing is further required for each image, which includes operations such as image size adjustment, data enhancement, etc., where the purpose of image size adjustment is to make the images have the same specification, and the purpose of data enhancement is to enhance the quality of the images under severe conditions.

Step 3: and predicting the preprocessed target domain data by using a teacher model, and generating a pseudo tag according to a prediction result. Because the teacher model is a relatively mature model and does not have a gradient back propagation function, the teacher model is used for carrying out preliminary prediction on the target domain data, and the accuracy of prediction can be improved.

In this embodiment, as shown in fig. 2, the method for generating the pseudo tag according to the prediction result is:

Step 4: and correcting the pseudo tag.

In this embodiment, the method for correcting the pseudo tag is:

and taking the difference of each pixel point in the target domain on the main classifier and the auxiliary classifier as a local weight, wherein the local weight is determined by using the KL divergences of the main classifier F_main and the auxiliary classifier F_auxliary.

And (3) setting a threshold parameter, wherein an initial value is 1/C, C is the number of categories, the initial value is set to be very small, and then dynamically adjusting according to the predicted value of each pixel point category. The method for dynamically adjusting the threshold parameter in the embodiment is as follows: and the threshold value of each category is weighted and summed with the average value of the predicted value of each pixel point category at the current moment through the threshold value of the category at the previous moment. As the number of network training increases, the predicted value for each category is continually adjusted and increases, so too does the threshold. The dynamic change of the threshold value enables the diversity of the pseudo tag to be considered more in the early stage of training, and the credibility of the pseudo tag to be considered more in the later stage, so that the quality of the pseudo tag is further improved.

The correction of the pseudo tag by using the student model is focused on how to take values of all parameters in the cross entropy loss function of the student model, and the global weight and the local weight parameters used in the embodiment are all uncertainty weights and continuously change along with the change of the target domain predicted value, and the threshold parameters are continuously adjusted along with the change of the class predicted value of each pixel point.

Step 5: mixing the source domain data and the target domain data with the corrected pseudo tag according to a preset proportion, respectively inputting the mixed training into a teacher model and a student model, and obtaining a trained model after repeated iterative training.

As shown in the model framework of fig. 2, the method for performing the hybrid training in this embodiment is:

step 5.1: initializing student model and teacher model parameters;

step 5.3: training a cross entropy loss function of a student model by using a mixed image and mixed labels, wherein the mixed image is obtained by taking object pixel points corresponding to one half of classes of class labels from one image of a source domain data set, the rest pixel points are filled with other pixel points in a target domain, and the mixed labels are formed by mixing labels of various classes taken from the source domain image in the mixed image and pseudo labels of the pixel points in the taken target domain image; mixing the source domain data and the target domain data according to a certain proportion, and adding the pseudo tag into the training data at the same time so as to further improve the performance of the model.

Step 5.4: training a cross entropy loss function of a student model by using an enhanced mixed image and enhanced mixed labels, wherein the enhanced mixed image is obtained by taking half of object pixels corresponding to the classes in the class labels from a source domain image, the rest pixels are filled with pixels in the image after the target domain image is visually enhanced, and the enhanced mixed labels are formed by mixing labels of various classes taken in the source domain image in the enhanced mixed image and pseudo labels of the pixels in the taken enhanced target domain image. In this embodiment, the difference between the enhanced hybrid tag and the hybrid tag in step 5.3 is that the pseudo tag in the target domain is used, one is that the pseudo tag predicted by the teacher model for the original target domain in step 5.3, and the pseudo tag predicted by the teacher model for the visually enhanced target data in step 5.4, so that the quality of the pseudo tag is improved.

In this embodiment, the method for performing visual enhancement is:

the atmospheric scattering model is used for enhancing the image, and the following formula is adopted for rain, fog and snow:

I(x)＝J(x)t(x)+A(1-t(x))

the following formula was used for the night:

1-I(x)＝(1-J(x))t(x)+A(1-t(x))，

wherein I (x) represents an input original image, J (x) represents an enhanced clear image, a is atmospheric light, and t (x) is a transmission diagram of the original image. The atmospheric scattering model comes from the atmospheric visibility formula proposed in 1997 by Koschmieder in its paper Theorie der horizontalen Sichtweite. The formula can be used for describing the propagation and scattering of light in the atmosphere, so that the transparency of the atmosphere is calculated, and the haze effect in pictures or videos is further achieved.

Step 5.5: and updating the student model, and updating the teacher model by using EMA momentum according to the weight of the student model at the current moment and the weight of the previous teacher model.

The method for updating the teacher model in this embodiment is:

ψ _t+1 ←αψ _t +(1-α)θ _t

θ _t representing student model after t training iteration, ψ _t 、ψ _t+1 Respectively representing the teacher model after the t time and the t+1st training iteration, wherein alpha is a momentum coefficient, and the alpha value in the embodiment is 0.99.

In this embodiment, a cross-validation method is used to evaluate the model, including calculating indexes such as pixel accuracy, average cross-over ratio (mIoU), etc. of each class.

The embodiment further reduces pseudo tag noise and domain deviation by adopting a dynamic threshold value, uncertainty weight and the like, and trains by mixing source domain and target domain data so as to improve the robustness and robustness of the model under severe conditions. Experimental results show that the technical scheme of the invention can obviously improve the semantic segmentation effect of the model under severe conditions, and has better robustness and robustness. Thereby being more robust in the automatic driving and intelligent monitoring directions by using semantic segmentation models

and a pseudo tag correction module: for correcting the pseudo tag;

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The self-adaptive semantic segmentation method under the severe condition is characterized by comprising the following steps of:

step 1: acquiring a source domain data set and a target domain data set;

step 4: correcting the pseudo tag;

step 5: mixing the source domain data and the target domain data with the corrected pseudo tag according to a preset proportion, inputting the mixed data into a student model for mixed training, and obtaining a trained student model after repeated iterative training;

2. The adaptive semantic segmentation method under severe conditions according to claim 1, wherein the method for generating pseudo tags according to the prediction result in step 3 is as follows:

3. The method for adaptively partitioning semantics under severe conditions as in claim 2, wherein the method for correcting the pseudo tag is:

4. A method for adaptive semantic segmentation under severe conditions according to claim 3, wherein the method for performing hybrid training in step 5 is:

step 5.1: initializing student model and teacher model parameters;

5. The method for adaptive semantic segmentation under severe conditions according to claim 4, wherein the method for visual enhancement in step 5.4 is as follows:

the image is enhanced using an atmospheric scattering model,

the following formula is adopted for rain, fog and snow:

I(x)＝J(x)t(x)+A(1-t(x))

the following formula was used for the night:

1-I(x)＝(1-J(x))t(x)+A(1-t(x))，

6. The adaptive semantic segmentation method under severe conditions according to claim 4, wherein the method for updating the teacher model in step 5.5 is as follows:

ψ _t+1 ←αψ _t +(1-α)θ _t

7. A method for adaptive semantic segmentation under severe conditions according to claim 3, wherein the method for dynamically adjusting the threshold parameter is:

8. The self-adaptive semantic segmentation device under severe conditions is characterized by comprising the following modules:

and a pseudo tag correction module: for correcting the pseudo tag;