CN115761735A

CN115761735A - Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction

Info

Publication number: CN115761735A
Application number: CN202211432700.9A
Authority: CN
Inventors: 王军; 杨宇宇; 潘在宇; 李玉莲; 申政文
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-03-07

Abstract

Disclosure of the inventionA semi-supervised semantic segmentation method based on self-adaptive pseudo label correction is disclosed, which comprises the following steps: selectingGTA5 data set construction source domain, selectingCityscapesConstructing a target domain by the data set; inputting a source domain image into a deep convolutional neural network for training to obtain a pre-trained semantic segmentation model; constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm based on a prediction probability matrix generated by a target image; constructing a self-adaptive pseudo label correction strategy to obtain a final pseudo label as supervision, and training a semi-supervised semantic segmentation model; and inputting the target images in the target domain verification set into the trained semi-supervised semantic segmentation model to verify the performance of semantic segmentation. The invention realizes on-line updating of the pseudo label, solves the problem of confirmation bias, relieves the problem of category imbalance, overcomes the defect of full convolution and improves the semantic segmentation effect of the model on the target domain.

Description

Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction

Technical Field

The invention belongs to the field of self-supervision domain adaptive semantic segmentation, and particularly relates to a semi-supervision semantic segmentation method based on self-adaption pseudo label correction.

Background

The semantic segmentation aims to assign semantic level labels to each pixel in an image, and is widely applied to the real world, such as automatic driving, robot operation, medical analysis and the like. However, the learning of segmentation models relies heavily on large amounts of data with pixel annotations, whereas manual annotations are time consuming and costly. Furthermore, the ability of the model to generalize across different data is also a significant challenge. Various research works have been carried out in order to solve the above problems, and domain adaptation is a promising approach.

Recently, domain adaptation has been facilitated by self-supervised training, using pseudo-labels generated by target domain prediction as a supervised training network. For example, cheng et al propose a domain adaptive Semantic Segmentation method based on dual Path Learning, which aligns a source domain and a target domain through two complementary and interactive single-domain adaptive pipelines to enable the target domain to generate more reliable pseudo labels, and improve the performance of a Semantic Segmentation network in a self-supervision training manner (Yiting Cheng, fangyun Wei, jianmin Bao, dong Chen, fang Wen, and wenjiang zhang.

InICCV,9082-9091, 2021). Zheng et al proposed a Domain-Adaptive Semantic segmentation method for correcting Pseudo-Label Learning by Uncertainty Estimation, modeling Uncertainty by predicting variance, incorporating Uncertainty into the optimization target to improve Semantic segmentation performance (Zhengdong Zheng and Yi yang. However, these semantic segmentation models gradually generate pseudo labels biased to the dominant class in the training process, most of the current adaptive models pay more attention to the pseudo labels with high confidence, and discard the pseudo labels with low confidence to make the error irreversible, so that the semantic segmentation network may never learn some pixels in the whole training process of the self-supervised training, resulting in a confirmation bias.

In order to fully utilize the unmarked target image data, each pixel should be properly utilized. Wang et al propose a semi-Supervised Semantic Segmentation method Using unreliable Pseudo-labels, separating reliable and unreliable pixels by prediction entropy, pushing each unreliable pixel into a class queue consisting of negative examples, and trying to train a model with all candidate pixels (yucha Wang, haochen Wang, yujun Shen, jingjun Fei, wei Li, guojiang Jin, liwei Wu, rui Zhao, xinyi le, semi-redundant Segmentation Using unknown Pseudo-label. In CVPR,4248-4257, 2022). Although the method fully utilizes unlabeled data, the influence of down-sampling and up-sampling on detail information between image feature levels is not considered, and the calculation amount in the process of contrast learning is very large. Therefore, the algorithm for domain adaptive semantic segmentation based on semi-supervision needs to be further researched, and the performance of the algorithm needs to be improved.

Disclosure of Invention

The invention aims to provide a semi-supervised semantic segmentation method based on adaptive pseudo label correction, which is based on a prediction probability matrix generated by a target image and realizes online correction of the pseudo label of the target image by constructing an uncertain region selection strategy and an adaptive pseudo label correction strategy by using an information entropy and density clustering algorithm. According to the invention, the pixel points of unmarked target image data are fully utilized by correcting the target image pseudo labels on line, so that incorrect pseudo labels are prevented from being excessively fitted, and the problem of confirmation bias of a semantic segmentation model on a dominant class in training is solved; the resolution of the image in the uncertain region is improved to perform classified prediction again, the problem of loss of detail information between target images is fully considered, the problem of class imbalance is relieved, the defect of full convolution is overcome, and the classification performance of the semi-supervised semantic segmentation model and the generalization capability on the target region are integrally improved.

The technical solution for realizing the purpose of the invention is as follows: a semi-supervised semantic segmentation method based on self-adaptive pseudo-label correction comprises the following steps:

step 1, selecting a GTA5 data set to construct a source domain, selecting a Cityscapes data set to construct a target domain, dividing a target image in the target domain into a training set and a verification set, and turning to step 2.

And 2, inputting the image in the source domain into a deep convolutional neural network for training to obtain a pre-trained semantic segmentation model, and turning to the step 3.

And 3, inputting the target images in the training set of the target domain into a pre-trained semantic segmentation model to generate a corresponding prediction probability matrix of the target images, constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm, acquiring the uncertain region in the prediction probability matrix of the target images, and turning to the step 4.

Step 4, finding the same position on the target image corresponding to the uncertain region on the target image prediction probability matrix, cutting the position on the target image to obtain an uncertain region image, amplifying the uncertain region image to be used as a secondary target image, inputting the secondary target image into a pre-trained semantic segmentation model, sampling according to the size of the uncertain region to generate a secondary pseudo label to be fused with a target image pseudo label, constructing an adaptive pseudo label correction strategy to obtain a final pseudo label to be used as supervision of the target image, and inputting the source domain image and the target domain image in the same batch to jointly train a semi-supervised semantic segmentation model; and when the preset training times are reached, obtaining a trained semi-supervised semantic segmentation model, and turning to the step 5.

And 5, inputting target images in the target domain verification set to a trained semi-supervised semantic segmentation model to generate pseudo labels to verify the semantic segmentation performance of the network.

Compared with the prior art, the invention has the advantages that:

1) Compared with the existing semantic segmentation method, firstly, most semantic segmentation methods only consider tags with high confidence coefficient, ignore tags with low confidence coefficient, cause overfitting incorrect pseudo tags, and lead errors to be irreversible, thereby causing the problem of confirmation bias; secondly, most semantic segmentation methods use full convolution to encode and decode images, the resolution of a feature map is reduced in the encoding process, which means that some detail information is lost, and the encoding model is required to be very powerful in the decoding process to well restore image information, which means that a larger model and calculation amount are required in the encoding process. In order to solve the two problems, the invention provides an uncertain region selection strategy and an adaptive pseudo label correction strategy, and improves the classification performance of a semi-supervised semantic segmentation model and the generalization capability on a target domain.

2) The uncertain region selection strategy based on the information entropy and density clustering changes the traditional training mode of only using the pseudo label with high confidence coefficient as supervision, and the method not only uses the label with high confidence coefficient on the target image but also fully considers the label with low confidence coefficient, so that each pixel point on the target image can be fully utilized;

3) According to the self-adaptive pseudo label correction strategy provided by the invention, on one hand, the on-line correction of the pseudo label of the target image is realized, the incorrect pseudo label is prevented from being excessively fitted in the training process of the semantic segmentation model, the error irreversible effect is prevented, the problem of the confirming bias of the semantic segmentation model on the leading class in the training process is solved, and the performance of the semantic segmentation model is improved; on the other hand, the method adopts bilinear interpolation to amplify the low-resolution uncertain region image in an equal proportion, not only improves the resolution of the uncertain region image, but also equivalently expands the target image containing difficult classification samples, avoids the loss of detail information between the characteristic levels of the uncertain region image caused by down-sampling, and also relieves the problem of unbalanced classes of the target domain training set;

4) In the process of correcting the false label of the target image on line, the amplified secondary target image is used as input to carry out classification prediction again, and upsampling is carried out according to the size of an uncertain region in the decoding process, so that the times of upsampling are reduced, the requirement on a coding model is reduced, the calculated amount is reduced, and the defect of full convolution is overcome.

Drawings

FIG. 1 is a flow chart of a semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to the present invention.

FIG. 2 is a model diagram of the semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

With reference to fig. 1-2, a semi-supervised semantic segmentation method based on adaptive pseudo-label correction includes the following steps:

step 1, selecting GTA5 as a source domain data set, wherein 24966 images and labels are respectively provided, and the pixel sizes are 1914 multiplied by 1052; the city landscape Cityscapes data set is selected as a target domain data set, 5000 images and labels are respectively provided, wherein 2975 training sets, 500 verification sets and 1525 testing sets are provided, and the pixel size is 1024 multiplied by 2048. Because the resolution ratios of the source domain and the target domain are different, normalization processing is required, and the pixel sizes are unified to 1024 multiplied by 512; the source domain and the target domain have 19 common classes, and are finally classified into 19 classes. Let the source domain be represented as

Wherein, S represents an image of a source domain,

represents the ground truth label corresponding to S, and H represents the height of S, namely H =1024,w represents the width of S, i.e., W =512, hxw represents the resolution size of S, and also represents the total number of pixels of S, 3 of hxwx 3 representing three color channels of RGB; let target Domain be denoted as D _T ＝{T|T∈R ^H×W×3 A step 2 is carried out, wherein T represents a target image of a target domain and has no corresponding semantic label;

step 2, inputting the source domain image into a deep convolution neural network to train to obtain a pre-trained semantic segmentation model, which is specifically as follows:

the invention uses ResNet101-Deeplabv2 joint coding as a semantic segmentation model, wherein ResNet101 is used for extracting features for a backbone network, the network 101 is divided into 5 convolutional layers for convolution for the first time, the convolutional part is used as a feature extractor, and finally, a full connection layer is used as a classifier. In the invention, abandoning the full-connection layer and only keeping the previous 5 convolutional layers as an encoder to extract features, wherein the full-connection layer uses Deeplabv2 to replace the encoder to be used as a classifier; deeplabv2 has a trackless spatial pyramid ASPP aggregation scheme that applies parallel dilation convolutions at different rates in the input feature map and then fuses them together, ASPP helping to account for different object sizes, as objects of the same category may have different sizes in the image. In the present invention, deeplabv2 is used as a classifier to obtain a prediction probability matrix of pixel points, and the network has four branches, each branch is composed of 3 fully-connected layers, but has different void ratios, which are [6,12,28,24].

The image in the source domain is loaded as input to the ResNet101-Deeplabv2 semantic segmentation model. Extracting a characteristic vector of the source domain image through a ResNet101 encoder, inputting the characteristic vector into a Deeplabv2 classifier to obtain a prediction probability matrix that each pixel point of the source domain image belongs to 19 classes respectively

The channel index value of the maximum prediction probability of the pixel point is used as a classification category to generate a source domain image pseudo label, the cross entropy loss is carried out by using a prediction probability matrix and a real ground label to optimize the segmentation performance of a semantic segmentation model, and finally a pre-trained semantic score is obtainedCutting the model as shown in the following formula (1):

wherein, the first and the second end of the pipe are connected with each other,

representing the source domain image S e R ^H×W C represents the total number of classes classified,

real ground label of representation

The thermal encoding of the ith pixel of (a),

representing the prediction probability that the ith pixel of the source domain image belongs to the class C (C ∈ C).

Step 3, inputting the target images in the training set in the target domain into a pre-trained semantic segmentation model to generate a corresponding prediction probability matrix of the target images, and constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm to obtain the uncertain region in the prediction probability matrix of the target images, wherein the method specifically comprises the following steps:

step 3.1, target image T belonging to R in target domain training set ^H×W Inputting the prediction probability matrix into a pre-trained semantic segmentation model to generate a corresponding target image

Calculating the dispersion degree of the prediction probability of each pixel point on the target image by using the information entropy, and taking the pixel point as an uncertain point when the entropy of the pixel point is smaller than the threshold of the entropy, wherein the formulas are shown as a formula (2) and a formula (3):

wherein the content of the first and second substances,

entropy mapping representing the ith pixel point of the target image,

a prediction probability matrix, X, representing the ith pixel of the target image _n Denotes the nth uncertainty point, N ∈ {1, 2., N }, N denotes the total number of uncertainty points, (x, y) denotes the coordinate location of the uncertainty point on the target image, γ, y _t Represents the lowest threshold value of information entropy at the t-th iteration, will be gamma _t Is set to alpha _t Corresponding quantiles, i.e. gamma _t ＝np.percent(H().flatten(),100×(1-α _t ) H () is the entropy mapping of each pixel point of the target image, α _t Selecting the proportion of uncertain points, and adjusting the proportion through a linear strategy, wherein the formula is shown as (4):

wherein alpha is ₀ The ratio of the initially selected uncertain points is represented and set to 20%, iter represents the current iteration number, and total iter represents the preset iteration number.

Step 3.2, density clustering is essentially clustering by a concept of density, and the essence of density is from the distance between two points. There are many clustering algorithms, such as K-means clustering, spectral clustering, etc., but the specific number of clusters needed to be clustered needs to be determined for K-means clustering first, and it is unknown for the uncertain points of the present invention to be clustered, so it is not applicable; the spectral clustering and the density clustering can automatically determine the number of clusters to be clustered in the clustering process, but the position of a central point and the shearing direction are determined when the uncertain regions are divided by using the result of the spectral clustering, and the density clustering can help us to better determine the central position and the sheared regions, and most importantly, the density clustering has the advantage of noise resistance, namely, refers to an object which does not belong to any cluster, which indicates that noise information can be removed in the clustering process of the uncertain points; therefore, the present invention chooses to cluster the locations of the uncertain points using density clustering.

Prediction probability matrix for searching target image by using density clustering algorithm based on selected uncertain point

Upper uncertainty area T _un The sample set input by the density clustering algorithm is a set of uncertain points D = { X = { (X) ₁ ,X ₂ ,...,X _N Inputting field parameters of (epsilon, M), wherein epsilon is a radius determined by density clustering, samples with the distance from the sample set to the core image being not more than epsilon are called epsilon-field, and M is at least the number of samples contained in the epsilon-field; the output of the density clustering algorithm is cluster division A = { A = { (A) } ₁ ,A ₂ ,...,A _K A represents the set of all uncertain points divided into K clusters, A _K Represents a K-th cluster represented by the formula (5):

N _ε (X _j )＝{X _i ∈D|D dist(X _i ,X _j )≤ε} (5)

wherein N is _ε (X _j ) Representing the number of samples, X, contained in the epsilon-field _i ，X _j Representing a core object, X _i And X _j With the difference that X _j From X _i Density of direct, if X _j At X _i In the epsilon-domain of (a), and X _i Is also a core object, then called X _j From X _i Direct density, dist (X) _i ,X _j ) Representing the distance between two core points.

The density clustering algorithm finds out all core objects according to the given neighborhood parameters (epsilon, M), firstly, one core object in the data set is arbitrarily selected as a seed, and then, the core object is taken as a starting point to find outGenerating clusters from samples whose density is reachable, for X _j And X _i Presence of a sample sequence R ₁ ,R ₂ ,...,R _Z And R is ₁ ＝X _j ，R _Z ＝X _i And R is _i+1 From R _i When the density is direct, it is called X _j From X _i The density may be reached until all core objects are visited, completing the clustering.

Step 3.3, according to the parameter requirement of the density clustering algorithm, inputting a set D = { X } of uncertain points ₁ ,X ₂ ,...,X _N Taking the obtained data as a sample set, setting a domain parameter (epsilon, M), and outputting cluster division A = { A = } ₁ ,A ₂ ,...,A _K Selecting one of the K clusters with the highest density

Then will be

The cluster center is used as the center X of the uncertain region _o And setting the width w as 2 epsilon and the height h as 4 epsilon, selecting the uncertain area by using the cluster with suboptimal density in the K cluster if the uncertain area exceeds the range of the target image, and repeating the steps until the K clusters do not meet the conditions, and entering the next uncertain area selection strategy.

The size of an input target image is 1024 × 512 × 3, the input image is coded sequentially through a pre-trained semantic segmentation model, the size of feature vectors output by different convolutional layers is 1024 × 512 × 3 → 512 × 256 × 64 → 256 × 128 × 64 → 256 × 128 × 256 → 128 × 64 × 512 → 128 × 64 × 1024 → 128 × 64 × 2048 → 128 × 64 × C, and 8 times of upsampling is needed in decoding in order to recover the size of the input target image from the scaled feature vectors and obtain a prediction probability matrix. In order to enable the amplified image to directly recover the size of the image of the uncertain region through upsampling in the self-adaptive pseudo label correction process instead of recovering the size of the input secondary target image, the invention respectively sets the domain radius epsilon to be 128, 64 and 32, and since the width w of the uncertain region is set to be 2 epsilon and the height h is set to be 4 epsilon, the size of the uncertain region can be 512 multiplied by 256, 256 multiplied by 128 or 128 multiplied by 64. Because the final size of the pre-trained semantic segmentation model downsampling is 128 × 64 × C, the size of directly restoring the uncertain region image through upsampling can be set to be 4 times upsampling, 2 times upsampling and no sampling, respectively, so as to obtain a quadratic probability matrix and a quadratic pseudo label.

According to the size of the uncertain region, the number of the contained pixel points is 131072, 3276 and 8192 respectively, we expect that the regions with different sizes contain at least half uncertain points to be called uncertain regions, so that the number of the uncertain points contained in the epsilon-field is respectively set to 65536, 16384 and 4096. Through continuous optimization of the semi-supervised semantic segmentation model, if the uncertain points in the uncertain region are less than 4096, the semi-supervised semantic segmentation model is proved to be well optimized, and the uncertain region selection strategy is stopped.

In summary, the invention sets three groups of field parameters (epsilon) for the uncertain region selection strategy ₁ ,M ₁ ),(ε ₂ ,M ₂ ),...,(ε _m ,M _m ) I.e., (128, 65536), (64, 16384), (32, 4096), and satisfies ε ₁ >ε ₂ >ε ₃ The selection strategies of the uncertain areas are respectively as follows:

strategy 1: when N is present>M ₁ In time, a sample set of indeterminate points D = { X is input ₁ ,X ₂ ,...,X _N And the Domain parameter (. Epsilon.) ₁ ,M ₁ ) Outputting the uncertain region T _un ＝[X _o ,w,h]。

Strategy 2: when M is ₁ ≥N>M ₂ Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points ₁ ,X ₂ ,...,X _N And the Domain parameter (. Epsilon.) ₂ ,M ₂ ) Outputting the uncertain region T _un ＝[X _o ,w,h]。

Strategy 3: when M is ₂ ≥N>M ₃ Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points ₁ ,X ₂ ,...,X _N And the Domain parameter (. Epsilon.) ₃ ,M ₃ ) Output is notDetermining a region T _un ＝[X _o ,w,h]。

And when the uncertain area does not meet the strategy, stopping using the uncertain area selection strategy.

Step 4, constructing a self-adaptive pseudo label correction strategy to acquire a final pseudo label as the supervision of the target domain, and inputting source domain images and target domain images in the same batch to train a semi-supervised semantic segmentation model together, wherein the method specifically comprises the following steps:

step 4.1, for semantic segmentation, most of objects difficult to classify are because the class occupation ratio of the objects in the data set is small, the resolution ratio of the objects in the image is also small, and the objects can also be called short-tail classes, such as people, motorcycles, bicycles, traffic lights and the like, the invention provides a self-adaptive pseudo label correction strategy aiming at the problems of fully utilizing each pixel point of a target image and the short-tail classes, on one hand, the image is amplified to 1024 × 512 in an equal proportion by using a bilinear interpolation method, and the image can be used as the supplement of a target domain training set, so that the occupation ratio of the short-tail classes is balanced, and the problem of unbalanced classes is relieved; on the other hand, the image is input into a pre-trained semantic segmentation model for coding after being amplified in equal proportion, so that the loss of detail information in the coding process is overcome, and the decoding process only needs to up-sample the coded feature vector to the size of the secondary target image before amplification, thereby reducing the requirement on a coding model and reducing the calculation amount.

Finding the same position on the target image corresponding to the uncertain region on the target image pseudo label, cutting the position on the target image to obtain an uncertain region image, and amplifying the uncertain region image to 1024 x 512 by using bilinear interpolation in equal proportion to enhance the resolution of the uncertain region image, wherein the amplification factor is

The amplified image is input to a pre-trained semantic segmentation model as a secondary target image, and a feature vector of the secondary target image reduced by 8 times is obtained in the encoding process, wherein the size of the feature vector is 128 multiplied by 64 multiplied by C. According to the area of uncertaintySize hxw, upsampling

And restoring the characteristic vector with the same size as the uncertain region image, and activating by softmax to obtain a prediction probability matrix and a pseudo label of the uncertain region image, wherein the prediction probability matrix and the pseudo label are respectively used as a secondary prediction probability matrix and a secondary pseudo label, and the formulas (6) and (7) are shown as follows:

wherein, P _un Representing a quadratic prediction probability matrix P _un ∈R ^h×w ，F _un Feature vector F representing a secondary target image _un ∈R ^h×w×C ，

Indicating a secondary false label

Step 4.2, zero filling is carried out on the peripheral area of the generated secondary pseudo label, and the pseudo label after filling is used as a tertiary pseudo label

Selecting uncertain region T by uncertain region selection strategy _un ＝[X _o ,w,h]Generating Mask E R by combining the target image and the uncertain region ^H×W As shown in formula (8):

and 4.3, fusing the target image pseudo label and the secondary pseudo label to be used as a final pseudo label of the target image, wherein the formula (9) is as follows:

wherein the content of the first and second substances,

the final pseudo label representing the target image,

representing a target image pseudo label.

Using the final pseudo label as a supervision, as shown in equation (10):

in the process of training the semi-supervised semantic segmentation model, the batches of input source domain images and target domain images are both 1, and model parameters are optimized in a self-training mode, so that the overall loss function of the semi-supervised semantic segmentation model for self-adaptive pseudo-label correction

As shown in equation (11):

wherein λ is _T The weight lost to the target. In the training process of the semi-supervised semantic segmentation model, the number of uncertain points is less and less, namely the entropy is smaller than the threshold value gamma _t Is smaller, and therefore, the target domain is expected to have a larger and larger influence on the semi-supervised semantic segmentation model, the weight is defined as that the entropy in the current batch training is smaller than the threshold value gamma _t The inverse of the percentage of pixels of (a), as shown in equation (12):

wherein, | B _T I represents the number of images input for the current batch, set to 1,

is an indicator function if X _n If the point is an uncertain point, the point is 1, otherwise, the point is 0.

In the process of training the semi-supervised semantic segmentation model, the used optimizer is SGD, the weight attenuation is 0.0005, a poly learning rate adjusting method is used, and the attenuation mechanism is

base _ lr is the initial learning rate set to 0.001, iter is the current iteration number, total _ iter is the maximum iteration number set to 2k, power is set to 0.9 for adjusting the learning rate.

And 5, inputting 500 target images in the target domain verification set to the trained semi-supervised semantic segmentation model to generate pseudo labels, wherein the target images in the target domain verification set have real ground labels, so that the average intersection ratio of the generated pseudo labels and the real ground labels of the images is calculated to verify the segmentation performance of the semi-supervised semantic segmentation model.

Claims

1. A semi-supervised semantic segmentation method based on self-adaptive pseudo label correction is characterized by comprising the following steps of:

step 1, selecting a GTA5 data set to construct a source domain, selecting a Cityscapes data set to construct a target domain, dividing target images in the target domain into a training set and a verification set, and turning to step 2;

step 2, inputting the image in the source domain into a deep convolutional neural network for training to obtain a pre-trained semantic segmentation model, and turning to step 3;

step 3, inputting the target images in the training set of the target domain into a pre-trained semantic segmentation model to generate a prediction probability matrix of the corresponding target images, constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm, acquiring the uncertain region in the prediction probability matrix of the target images, and turning to step 4;

step 4, finding the same position on the target image, which corresponds to the uncertain region on the target image prediction probability matrix, shearing the position on the target image to obtain an uncertain region image, amplifying the uncertain region image to be used as a secondary target image, inputting the secondary target image into a pre-trained semantic segmentation model, sampling according to the size of the uncertain region to generate a secondary pseudo label to be fused with a target image pseudo label, constructing an adaptive pseudo label correction strategy to obtain a final pseudo label to be used as the supervision of the target image, and inputting the source domain image and the target domain image in the same batch to jointly train a semi-supervised semantic segmentation model; when the preset training times are reached, obtaining a trained semi-supervised semantic segmentation model, and turning to the step 5;

2. The semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to claim 1, wherein in step 2, the image in the source domain is input into a deep convolutional neural network to be trained to obtain a pre-trained semantic segmentation model, as shown in formula (1):

representing the source domain image S E R ^H×W H denotes the height of the source domain image, W denotes the width of the source domain image, H x W denotes the total number of pixel points on the source domain image, C denotes the total number of categories of the classification,

real ground label of representation

The thermal encoding of the ith pixel of (a),

represents the prediction probability that the ith pixel of the source domain image belongs to the class C, C ∈ C.

3. The semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to claim 2, wherein in the step 3, a selection strategy for constructing the uncertainty region by using an information entropy and density clustering algorithm is specifically as follows:

step 3.1, target image T epsilon in target domain training set is determined as R ^H×W Inputting the prediction probability matrix into a pre-trained semantic segmentation model to generate a corresponding target image

wherein the content of the first and second substances,

representing the entropy mapping of the ith pixel point of the target image,

a prediction probability matrix, X, representing the ith pixel of the target image _n Denotes the nth indeterminate point, N is from {1, 2., N }, N denotes the total number of indeterminate points, (x, y) denotes the coordinate position of the indeterminate point on the target image, γ _t Represents the lowest threshold value of information entropy at the t-th iteration, will be gamma _t Is set as alpha _t Corresponding quantiles, i.e. gamma _t ＝np.percent(H().flatten(),100×(1-α _t ) H () is the entropy mapping of each pixel point of the target image, α _t Selecting the proportion of uncertain points, and adjusting the proportion through a linear strategy, wherein the formula is shown as (4):

Step 3.2, based on the selected uncertain points, searching the prediction probability matrix of the target image by using a density clustering algorithm

Upper uncertainty area T _un The sample set input by the density clustering algorithm is a set of uncertain points D = { X = { (X) ₁ ,X ₂ ,...,X _N The input field parameters are (epsilon, M), epsilon is the radius determined by density clustering, samples in the sample set, the distance between the samples and the core object is not more than epsilon, are called epsilon-field, and M is the number of samples at least contained in the epsilon-field; of density clustering algorithms output as cluster division a = { a = { (a) } ₁ ,A ₂ ,...,A _K A represents the set of all uncertain points divided into K clusters, A _K Represents a K-th cluster represented by the formula (5):

N _ε (X _j )＝{X _i ∈D|D dist(X _i ,X _j )≤ε} (5)

wherein N is _ε (X _j ) Representing the number of samples, X, contained in the epsilon-field _i ，X _j Representing a core object, X _i And X _j With the difference that X _j From X _i Density of direct, if X _j Is located at X _i In the epsilon-domain of (a), and X _i Is also a core object, then called X _j From X _i Direct density, dist (X) _i ,X _j ) Representing the distance between two core points;

the density clustering algorithm finds out all core objects according to the given neighborhood parameters (epsilon, M), firstly randomly selects one core object in the data set as a seed, then takes the core object as a starting point, finds out samples with the reachable density to generate a cluster, and for X _j And X _i Presence of sample sequence R ₁ ,R ₂ ,...,R _Z And R is ₁ ＝X _j ，R _Z ＝X _i And R is _i+1 From R _i When the density is up to, it is called X _j From X _i The density can reach, until all core objects are visited, finish clustering;

step 3.3, inputting a set D = { X ] of uncertain points according to parameter requirements of the density clustering algorithm ₁ ,X ₂ ,...,X _N As a sample set and setting a domain parameter (epsilon, M), the output is cluster division A = { A = } ₁ ,A ₂ ,...,A _K Selecting one of the K clusters with the highest density

Then will be

The cluster center is used as the center X of the uncertain region _o Setting the width w as 2 epsilon and the height h as 4 epsilon, if the uncertain region exceeds the range of the target image, selecting the uncertain region by utilizing the cluster with suboptimal density in the K clusters, and the like, when the K clusters do not meet the conditions,then the next uncertain region selection strategy is entered so our uncertain region selection strategy can set multiple sets of domain parameters, namely (epsilon) ₁ ,M ₁ ),(ε ₂ ,M ₂ ),...,(ε _m ,M _m ) And needs to satisfy ε ₁ >ε ₂ >...>ε _m The selection strategies of the uncertain areas are respectively as follows:

strategy 1: when N is present>M ₁ In time, a sample set of indeterminate points D = { X is input ₁ ,X ₂ ,...,X _N And the Domain parameter (. Epsilon.) ₁ ,M ₁ ) Output the uncertain region T _un ＝[X _o ,w,h]；

Strategy 2: when M is ₁ ≥N>M ₂ Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points ₁ ,X ₂ ,...,X _N And the Domain parameter (. Epsilon.) ₂ ,M ₂ ) Output the uncertain region T _un ＝[X _o ,w,h]；

......

Strategy m: when M is _m-1 ≥N>M _m Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points ₁ ,X ₂ ,...,X _N And the Domain parameter (. Epsilon.) _m ,M _m ) Outputting the uncertain region T _un ＝[X _o ,w,h]；

And when the uncertain region does not meet the strategy, stopping the iterative training.

4. The semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to claim 3, wherein in the step 4, the adaptive pseudo-label correction strategy is constructed by the following steps:

step 4.1, finding the same position on the target image corresponding to the uncertain region on the target image prediction probability matrix, shearing the position on the target image to obtain an uncertain region image, and amplifying the uncertain region image in equal proportion by using bilinear interpolation to increase the resolution of the uncertain region image, wherein the amplification factor is

Inputting the amplified image as a secondary target image into a pre-trained semantic segmentation network, obtaining a feature vector of the secondary target image reduced by 8 times in the coding process, wherein the feature vector has the size of 128 multiplied by 64 multiplied by C, and upsampling according to the size h multiplied by w of an uncertain region

And recovering the characteristic vector with the same size as the image of the uncertain region, and obtaining a prediction probability matrix and a pseudo label of the image of the uncertain region after softmax activation, wherein the prediction probability matrix and the pseudo label are respectively used as a secondary prediction probability matrix and a secondary pseudo label, and the formulas (6) and (7) are shown as follows:

wherein, P _un Representing a quadratic prediction probability matrix P _un ∈R ^h×w ，F _un Feature vector F representing a secondary target image _un ∈R ^h ^×w×C ，

Indicating a secondary false label

Selecting uncertain region T by uncertain region selection strategy _un ＝[X _o ,w,h]Generating a mask combining the target image and the uncertainty regionCode Mask ∈ R ^H×W As shown in formula (8):

wherein the content of the first and second substances,

the final pseudo label representing the target image,

representing a target image pseudo label;

using the final pseudo label as a supervision, as shown in equation (10):

wherein the content of the first and second substances,

representing the target image T ∈ R ^H×W The cross-entropy loss of (a) is,

representing the final pseudo label

The thermal encoding of the ith pixel of (a),

a prediction probability representing that the ith pixel of the target image belongs to the class c;

in the process of training the semi-supervised semantic segmentation model, the source domain images and the target domain images are input in the same batch, model parameters are optimized in a self-training mode, and therefore the overall loss function of the semi-supervised semantic segmentation model with pseudo-label correction is self-adapted

As shown in equation (11):

wherein λ is _T Is weight of target loss, and the weight is defined as that the entropy in the training of the current batch is less than a threshold value gamma _t The inverse of the percentage of pixels of (a), as shown in equation (12):

wherein, | B _T L represents the number of images input for the current batch,