CN115761735A - Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction - Google Patents

Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction Download PDF

Info

Publication number
CN115761735A
CN115761735A CN202211432700.9A CN202211432700A CN115761735A CN 115761735 A CN115761735 A CN 115761735A CN 202211432700 A CN202211432700 A CN 202211432700A CN 115761735 A CN115761735 A CN 115761735A
Authority
CN
China
Prior art keywords
image
semantic segmentation
target image
uncertain
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211432700.9A
Other languages
Chinese (zh)
Inventor
王军
杨宇宇
潘在宇
李玉莲
申政文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Mining and Technology CUMT
Original Assignee
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Mining and Technology CUMT filed Critical China University of Mining and Technology CUMT
Priority to CN202211432700.9A priority Critical patent/CN115761735A/en
Publication of CN115761735A publication Critical patent/CN115761735A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

Disclosure of the inventionA semi-supervised semantic segmentation method based on self-adaptive pseudo label correction is disclosed, which comprises the following steps: selectingGTA5 data set construction source domain, selectingCityscapesConstructing a target domain by the data set; inputting a source domain image into a deep convolutional neural network for training to obtain a pre-trained semantic segmentation model; constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm based on a prediction probability matrix generated by a target image; constructing a self-adaptive pseudo label correction strategy to obtain a final pseudo label as supervision, and training a semi-supervised semantic segmentation model; and inputting the target images in the target domain verification set into the trained semi-supervised semantic segmentation model to verify the performance of semantic segmentation. The invention realizes on-line updating of the pseudo label, solves the problem of confirmation bias, relieves the problem of category imbalance, overcomes the defect of full convolution and improves the semantic segmentation effect of the model on the target domain.

Description

Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction
Technical Field
The invention belongs to the field of self-supervision domain adaptive semantic segmentation, and particularly relates to a semi-supervision semantic segmentation method based on self-adaption pseudo label correction.
Background
The semantic segmentation aims to assign semantic level labels to each pixel in an image, and is widely applied to the real world, such as automatic driving, robot operation, medical analysis and the like. However, the learning of segmentation models relies heavily on large amounts of data with pixel annotations, whereas manual annotations are time consuming and costly. Furthermore, the ability of the model to generalize across different data is also a significant challenge. Various research works have been carried out in order to solve the above problems, and domain adaptation is a promising approach.
Recently, domain adaptation has been facilitated by self-supervised training, using pseudo-labels generated by target domain prediction as a supervised training network. For example, cheng et al propose a domain adaptive Semantic Segmentation method based on dual Path Learning, which aligns a source domain and a target domain through two complementary and interactive single-domain adaptive pipelines to enable the target domain to generate more reliable pseudo labels, and improve the performance of a Semantic Segmentation network in a self-supervision training manner (Yiting Cheng, fangyun Wei, jianmin Bao, dong Chen, fang Wen, and wenjiang zhang.
InICCV,9082-9091, 2021). Zheng et al proposed a Domain-Adaptive Semantic segmentation method for correcting Pseudo-Label Learning by Uncertainty Estimation, modeling Uncertainty by predicting variance, incorporating Uncertainty into the optimization target to improve Semantic segmentation performance (Zhengdong Zheng and Yi yang. However, these semantic segmentation models gradually generate pseudo labels biased to the dominant class in the training process, most of the current adaptive models pay more attention to the pseudo labels with high confidence, and discard the pseudo labels with low confidence to make the error irreversible, so that the semantic segmentation network may never learn some pixels in the whole training process of the self-supervised training, resulting in a confirmation bias.
In order to fully utilize the unmarked target image data, each pixel should be properly utilized. Wang et al propose a semi-Supervised Semantic Segmentation method Using unreliable Pseudo-labels, separating reliable and unreliable pixels by prediction entropy, pushing each unreliable pixel into a class queue consisting of negative examples, and trying to train a model with all candidate pixels (yucha Wang, haochen Wang, yujun Shen, jingjun Fei, wei Li, guojiang Jin, liwei Wu, rui Zhao, xinyi le, semi-redundant Segmentation Using unknown Pseudo-label. In CVPR,4248-4257, 2022). Although the method fully utilizes unlabeled data, the influence of down-sampling and up-sampling on detail information between image feature levels is not considered, and the calculation amount in the process of contrast learning is very large. Therefore, the algorithm for domain adaptive semantic segmentation based on semi-supervision needs to be further researched, and the performance of the algorithm needs to be improved.
Disclosure of Invention
The invention aims to provide a semi-supervised semantic segmentation method based on adaptive pseudo label correction, which is based on a prediction probability matrix generated by a target image and realizes online correction of the pseudo label of the target image by constructing an uncertain region selection strategy and an adaptive pseudo label correction strategy by using an information entropy and density clustering algorithm. According to the invention, the pixel points of unmarked target image data are fully utilized by correcting the target image pseudo labels on line, so that incorrect pseudo labels are prevented from being excessively fitted, and the problem of confirmation bias of a semantic segmentation model on a dominant class in training is solved; the resolution of the image in the uncertain region is improved to perform classified prediction again, the problem of loss of detail information between target images is fully considered, the problem of class imbalance is relieved, the defect of full convolution is overcome, and the classification performance of the semi-supervised semantic segmentation model and the generalization capability on the target region are integrally improved.
The technical solution for realizing the purpose of the invention is as follows: a semi-supervised semantic segmentation method based on self-adaptive pseudo-label correction comprises the following steps:
step 1, selecting a GTA5 data set to construct a source domain, selecting a Cityscapes data set to construct a target domain, dividing a target image in the target domain into a training set and a verification set, and turning to step 2.
And 2, inputting the image in the source domain into a deep convolutional neural network for training to obtain a pre-trained semantic segmentation model, and turning to the step 3.
And 3, inputting the target images in the training set of the target domain into a pre-trained semantic segmentation model to generate a corresponding prediction probability matrix of the target images, constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm, acquiring the uncertain region in the prediction probability matrix of the target images, and turning to the step 4.
Step 4, finding the same position on the target image corresponding to the uncertain region on the target image prediction probability matrix, cutting the position on the target image to obtain an uncertain region image, amplifying the uncertain region image to be used as a secondary target image, inputting the secondary target image into a pre-trained semantic segmentation model, sampling according to the size of the uncertain region to generate a secondary pseudo label to be fused with a target image pseudo label, constructing an adaptive pseudo label correction strategy to obtain a final pseudo label to be used as supervision of the target image, and inputting the source domain image and the target domain image in the same batch to jointly train a semi-supervised semantic segmentation model; and when the preset training times are reached, obtaining a trained semi-supervised semantic segmentation model, and turning to the step 5.
And 5, inputting target images in the target domain verification set to a trained semi-supervised semantic segmentation model to generate pseudo labels to verify the semantic segmentation performance of the network.
Compared with the prior art, the invention has the advantages that:
1) Compared with the existing semantic segmentation method, firstly, most semantic segmentation methods only consider tags with high confidence coefficient, ignore tags with low confidence coefficient, cause overfitting incorrect pseudo tags, and lead errors to be irreversible, thereby causing the problem of confirmation bias; secondly, most semantic segmentation methods use full convolution to encode and decode images, the resolution of a feature map is reduced in the encoding process, which means that some detail information is lost, and the encoding model is required to be very powerful in the decoding process to well restore image information, which means that a larger model and calculation amount are required in the encoding process. In order to solve the two problems, the invention provides an uncertain region selection strategy and an adaptive pseudo label correction strategy, and improves the classification performance of a semi-supervised semantic segmentation model and the generalization capability on a target domain.
2) The uncertain region selection strategy based on the information entropy and density clustering changes the traditional training mode of only using the pseudo label with high confidence coefficient as supervision, and the method not only uses the label with high confidence coefficient on the target image but also fully considers the label with low confidence coefficient, so that each pixel point on the target image can be fully utilized;
3) According to the self-adaptive pseudo label correction strategy provided by the invention, on one hand, the on-line correction of the pseudo label of the target image is realized, the incorrect pseudo label is prevented from being excessively fitted in the training process of the semantic segmentation model, the error irreversible effect is prevented, the problem of the confirming bias of the semantic segmentation model on the leading class in the training process is solved, and the performance of the semantic segmentation model is improved; on the other hand, the method adopts bilinear interpolation to amplify the low-resolution uncertain region image in an equal proportion, not only improves the resolution of the uncertain region image, but also equivalently expands the target image containing difficult classification samples, avoids the loss of detail information between the characteristic levels of the uncertain region image caused by down-sampling, and also relieves the problem of unbalanced classes of the target domain training set;
4) In the process of correcting the false label of the target image on line, the amplified secondary target image is used as input to carry out classification prediction again, and upsampling is carried out according to the size of an uncertain region in the decoding process, so that the times of upsampling are reduced, the requirement on a coding model is reduced, the calculated amount is reduced, and the defect of full convolution is overcome.
Drawings
FIG. 1 is a flow chart of a semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to the present invention.
FIG. 2 is a model diagram of the semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
With reference to fig. 1-2, a semi-supervised semantic segmentation method based on adaptive pseudo-label correction includes the following steps:
step 1, selecting GTA5 as a source domain data set, wherein 24966 images and labels are respectively provided, and the pixel sizes are 1914 multiplied by 1052; the city landscape Cityscapes data set is selected as a target domain data set, 5000 images and labels are respectively provided, wherein 2975 training sets, 500 verification sets and 1525 testing sets are provided, and the pixel size is 1024 multiplied by 2048. Because the resolution ratios of the source domain and the target domain are different, normalization processing is required, and the pixel sizes are unified to 1024 multiplied by 512; the source domain and the target domain have 19 common classes, and are finally classified into 19 classes. Let the source domain be represented as
Figure BDA0003945517000000041
Wherein, S represents an image of a source domain,
Figure BDA0003945517000000042
represents the ground truth label corresponding to S, and H represents the height of S, namely H =1024,w represents the width of S, i.e., W =512, hxw represents the resolution size of S, and also represents the total number of pixels of S, 3 of hxwx 3 representing three color channels of RGB; let target Domain be denoted as D T ={T|T∈R H×W×3 A step 2 is carried out, wherein T represents a target image of a target domain and has no corresponding semantic label;
step 2, inputting the source domain image into a deep convolution neural network to train to obtain a pre-trained semantic segmentation model, which is specifically as follows:
the invention uses ResNet101-Deeplabv2 joint coding as a semantic segmentation model, wherein ResNet101 is used for extracting features for a backbone network, the network 101 is divided into 5 convolutional layers for convolution for the first time, the convolutional part is used as a feature extractor, and finally, a full connection layer is used as a classifier. In the invention, abandoning the full-connection layer and only keeping the previous 5 convolutional layers as an encoder to extract features, wherein the full-connection layer uses Deeplabv2 to replace the encoder to be used as a classifier; deeplabv2 has a trackless spatial pyramid ASPP aggregation scheme that applies parallel dilation convolutions at different rates in the input feature map and then fuses them together, ASPP helping to account for different object sizes, as objects of the same category may have different sizes in the image. In the present invention, deeplabv2 is used as a classifier to obtain a prediction probability matrix of pixel points, and the network has four branches, each branch is composed of 3 fully-connected layers, but has different void ratios, which are [6,12,28,24].
The image in the source domain is loaded as input to the ResNet101-Deeplabv2 semantic segmentation model. Extracting a characteristic vector of the source domain image through a ResNet101 encoder, inputting the characteristic vector into a Deeplabv2 classifier to obtain a prediction probability matrix that each pixel point of the source domain image belongs to 19 classes respectively
Figure BDA0003945517000000051
The channel index value of the maximum prediction probability of the pixel point is used as a classification category to generate a source domain image pseudo label, the cross entropy loss is carried out by using a prediction probability matrix and a real ground label to optimize the segmentation performance of a semantic segmentation model, and finally a pre-trained semantic score is obtainedCutting the model as shown in the following formula (1):
Figure BDA0003945517000000052
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003945517000000053
representing the source domain image S e R H×W C represents the total number of classes classified,
Figure BDA0003945517000000054
real ground label of representation
Figure BDA0003945517000000055
The thermal encoding of the ith pixel of (a),
Figure BDA0003945517000000056
Figure BDA0003945517000000057
representing the prediction probability that the ith pixel of the source domain image belongs to the class C (C ∈ C).
Step 3, inputting the target images in the training set in the target domain into a pre-trained semantic segmentation model to generate a corresponding prediction probability matrix of the target images, and constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm to obtain the uncertain region in the prediction probability matrix of the target images, wherein the method specifically comprises the following steps:
step 3.1, target image T belonging to R in target domain training set H×W Inputting the prediction probability matrix into a pre-trained semantic segmentation model to generate a corresponding target image
Figure BDA0003945517000000058
Calculating the dispersion degree of the prediction probability of each pixel point on the target image by using the information entropy, and taking the pixel point as an uncertain point when the entropy of the pixel point is smaller than the threshold of the entropy, wherein the formulas are shown as a formula (2) and a formula (3):
Figure BDA0003945517000000059
Figure BDA00039455170000000510
wherein the content of the first and second substances,
Figure BDA00039455170000000511
entropy mapping representing the ith pixel point of the target image,
Figure BDA00039455170000000512
a prediction probability matrix, X, representing the ith pixel of the target image n Denotes the nth uncertainty point, N ∈ {1, 2., N }, N denotes the total number of uncertainty points, (x, y) denotes the coordinate location of the uncertainty point on the target image, γ, y t Represents the lowest threshold value of information entropy at the t-th iteration, will be gamma t Is set to alpha t Corresponding quantiles, i.e. gamma t =np.percent(H().flatten(),100×(1-α t ) H () is the entropy mapping of each pixel point of the target image, α t Selecting the proportion of uncertain points, and adjusting the proportion through a linear strategy, wherein the formula is shown as (4):
Figure BDA00039455170000000513
wherein alpha is 0 The ratio of the initially selected uncertain points is represented and set to 20%, iter represents the current iteration number, and total iter represents the preset iteration number.
Step 3.2, density clustering is essentially clustering by a concept of density, and the essence of density is from the distance between two points. There are many clustering algorithms, such as K-means clustering, spectral clustering, etc., but the specific number of clusters needed to be clustered needs to be determined for K-means clustering first, and it is unknown for the uncertain points of the present invention to be clustered, so it is not applicable; the spectral clustering and the density clustering can automatically determine the number of clusters to be clustered in the clustering process, but the position of a central point and the shearing direction are determined when the uncertain regions are divided by using the result of the spectral clustering, and the density clustering can help us to better determine the central position and the sheared regions, and most importantly, the density clustering has the advantage of noise resistance, namely, refers to an object which does not belong to any cluster, which indicates that noise information can be removed in the clustering process of the uncertain points; therefore, the present invention chooses to cluster the locations of the uncertain points using density clustering.
Prediction probability matrix for searching target image by using density clustering algorithm based on selected uncertain point
Figure BDA0003945517000000061
Upper uncertainty area T un The sample set input by the density clustering algorithm is a set of uncertain points D = { X = { (X) 1 ,X 2 ,...,X N Inputting field parameters of (epsilon, M), wherein epsilon is a radius determined by density clustering, samples with the distance from the sample set to the core image being not more than epsilon are called epsilon-field, and M is at least the number of samples contained in the epsilon-field; the output of the density clustering algorithm is cluster division A = { A = { (A) } 1 ,A 2 ,...,A K A represents the set of all uncertain points divided into K clusters, A K Represents a K-th cluster represented by the formula (5):
N ε (X j )={X i ∈D|D dist(X i ,X j )≤ε} (5)
wherein N is ε (X j ) Representing the number of samples, X, contained in the epsilon-field i ,X j Representing a core object, X i And X j With the difference that X j From X i Density of direct, if X j At X i In the epsilon-domain of (a), and X i Is also a core object, then called X j From X i Direct density, dist (X) i ,X j ) Representing the distance between two core points.
The density clustering algorithm finds out all core objects according to the given neighborhood parameters (epsilon, M), firstly, one core object in the data set is arbitrarily selected as a seed, and then, the core object is taken as a starting point to find outGenerating clusters from samples whose density is reachable, for X j And X i Presence of a sample sequence R 1 ,R 2 ,...,R Z And R is 1 =X j ,R Z =X i And R is i+1 From R i When the density is direct, it is called X j From X i The density may be reached until all core objects are visited, completing the clustering.
Step 3.3, according to the parameter requirement of the density clustering algorithm, inputting a set D = { X } of uncertain points 1 ,X 2 ,...,X N Taking the obtained data as a sample set, setting a domain parameter (epsilon, M), and outputting cluster division A = { A = } 1 ,A 2 ,...,A K Selecting one of the K clusters with the highest density
Figure BDA0003945517000000062
Then will be
Figure BDA0003945517000000063
The cluster center is used as the center X of the uncertain region o And setting the width w as 2 epsilon and the height h as 4 epsilon, selecting the uncertain area by using the cluster with suboptimal density in the K cluster if the uncertain area exceeds the range of the target image, and repeating the steps until the K clusters do not meet the conditions, and entering the next uncertain area selection strategy.
The size of an input target image is 1024 × 512 × 3, the input image is coded sequentially through a pre-trained semantic segmentation model, the size of feature vectors output by different convolutional layers is 1024 × 512 × 3 → 512 × 256 × 64 → 256 × 128 × 64 → 256 × 128 × 256 → 128 × 64 × 512 → 128 × 64 × 1024 → 128 × 64 × 2048 → 128 × 64 × C, and 8 times of upsampling is needed in decoding in order to recover the size of the input target image from the scaled feature vectors and obtain a prediction probability matrix. In order to enable the amplified image to directly recover the size of the image of the uncertain region through upsampling in the self-adaptive pseudo label correction process instead of recovering the size of the input secondary target image, the invention respectively sets the domain radius epsilon to be 128, 64 and 32, and since the width w of the uncertain region is set to be 2 epsilon and the height h is set to be 4 epsilon, the size of the uncertain region can be 512 multiplied by 256, 256 multiplied by 128 or 128 multiplied by 64. Because the final size of the pre-trained semantic segmentation model downsampling is 128 × 64 × C, the size of directly restoring the uncertain region image through upsampling can be set to be 4 times upsampling, 2 times upsampling and no sampling, respectively, so as to obtain a quadratic probability matrix and a quadratic pseudo label.
According to the size of the uncertain region, the number of the contained pixel points is 131072, 3276 and 8192 respectively, we expect that the regions with different sizes contain at least half uncertain points to be called uncertain regions, so that the number of the uncertain points contained in the epsilon-field is respectively set to 65536, 16384 and 4096. Through continuous optimization of the semi-supervised semantic segmentation model, if the uncertain points in the uncertain region are less than 4096, the semi-supervised semantic segmentation model is proved to be well optimized, and the uncertain region selection strategy is stopped.
In summary, the invention sets three groups of field parameters (epsilon) for the uncertain region selection strategy 1 ,M 1 ),(ε 2 ,M 2 ),...,(ε m ,M m ) I.e., (128, 65536), (64, 16384), (32, 4096), and satisfies ε 123 The selection strategies of the uncertain areas are respectively as follows:
strategy 1: when N is present>M 1 In time, a sample set of indeterminate points D = { X is input 1 ,X 2 ,...,X N And the Domain parameter (. Epsilon.) 1 ,M 1 ) Outputting the uncertain region T un =[X o ,w,h]。
Strategy 2: when M is 1 ≥N>M 2 Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points 1 ,X 2 ,...,X N And the Domain parameter (. Epsilon.) 2 ,M 2 ) Outputting the uncertain region T un =[X o ,w,h]。
Strategy 3: when M is 2 ≥N>M 3 Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points 1 ,X 2 ,...,X N And the Domain parameter (. Epsilon.) 3 ,M 3 ) Output is notDetermining a region T un =[X o ,w,h]。
And when the uncertain area does not meet the strategy, stopping using the uncertain area selection strategy.
Step 4, constructing a self-adaptive pseudo label correction strategy to acquire a final pseudo label as the supervision of the target domain, and inputting source domain images and target domain images in the same batch to train a semi-supervised semantic segmentation model together, wherein the method specifically comprises the following steps:
step 4.1, for semantic segmentation, most of objects difficult to classify are because the class occupation ratio of the objects in the data set is small, the resolution ratio of the objects in the image is also small, and the objects can also be called short-tail classes, such as people, motorcycles, bicycles, traffic lights and the like, the invention provides a self-adaptive pseudo label correction strategy aiming at the problems of fully utilizing each pixel point of a target image and the short-tail classes, on one hand, the image is amplified to 1024 × 512 in an equal proportion by using a bilinear interpolation method, and the image can be used as the supplement of a target domain training set, so that the occupation ratio of the short-tail classes is balanced, and the problem of unbalanced classes is relieved; on the other hand, the image is input into a pre-trained semantic segmentation model for coding after being amplified in equal proportion, so that the loss of detail information in the coding process is overcome, and the decoding process only needs to up-sample the coded feature vector to the size of the secondary target image before amplification, thereby reducing the requirement on a coding model and reducing the calculation amount.
Finding the same position on the target image corresponding to the uncertain region on the target image pseudo label, cutting the position on the target image to obtain an uncertain region image, and amplifying the uncertain region image to 1024 x 512 by using bilinear interpolation in equal proportion to enhance the resolution of the uncertain region image, wherein the amplification factor is
Figure BDA0003945517000000083
The amplified image is input to a pre-trained semantic segmentation model as a secondary target image, and a feature vector of the secondary target image reduced by 8 times is obtained in the encoding process, wherein the size of the feature vector is 128 multiplied by 64 multiplied by C. According to the area of uncertaintySize hxw, upsampling
Figure BDA0003945517000000084
And restoring the characteristic vector with the same size as the uncertain region image, and activating by softmax to obtain a prediction probability matrix and a pseudo label of the uncertain region image, wherein the prediction probability matrix and the pseudo label are respectively used as a secondary prediction probability matrix and a secondary pseudo label, and the formulas (6) and (7) are shown as follows:
Figure BDA0003945517000000081
Figure BDA0003945517000000082
wherein, P un Representing a quadratic prediction probability matrix P un ∈R h×w ,F un Feature vector F representing a secondary target image un ∈R h×w×C
Figure BDA0003945517000000085
Indicating a secondary false label
Figure BDA0003945517000000086
Step 4.2, zero filling is carried out on the peripheral area of the generated secondary pseudo label, and the pseudo label after filling is used as a tertiary pseudo label
Figure BDA0003945517000000087
Selecting uncertain region T by uncertain region selection strategy un =[X o ,w,h]Generating Mask E R by combining the target image and the uncertain region H×W As shown in formula (8):
Figure BDA0003945517000000091
and 4.3, fusing the target image pseudo label and the secondary pseudo label to be used as a final pseudo label of the target image, wherein the formula (9) is as follows:
Figure BDA0003945517000000092
wherein the content of the first and second substances,
Figure BDA0003945517000000094
the final pseudo label representing the target image,
Figure BDA0003945517000000095
representing a target image pseudo label.
Using the final pseudo label as a supervision, as shown in equation (10):
Figure BDA0003945517000000093
in the process of training the semi-supervised semantic segmentation model, the batches of input source domain images and target domain images are both 1, and model parameters are optimized in a self-training mode, so that the overall loss function of the semi-supervised semantic segmentation model for self-adaptive pseudo-label correction
Figure BDA0003945517000000096
As shown in equation (11):
Figure BDA0003945517000000097
wherein λ is T The weight lost to the target. In the training process of the semi-supervised semantic segmentation model, the number of uncertain points is less and less, namely the entropy is smaller than the threshold value gamma t Is smaller, and therefore, the target domain is expected to have a larger and larger influence on the semi-supervised semantic segmentation model, the weight is defined as that the entropy in the current batch training is smaller than the threshold value gamma t The inverse of the percentage of pixels of (a), as shown in equation (12):
Figure BDA0003945517000000098
wherein, | B T I represents the number of images input for the current batch, set to 1,
Figure BDA0003945517000000099
is an indicator function if X n If the point is an uncertain point, the point is 1, otherwise, the point is 0.
In the process of training the semi-supervised semantic segmentation model, the used optimizer is SGD, the weight attenuation is 0.0005, a poly learning rate adjusting method is used, and the attenuation mechanism is
Figure BDA00039455170000000910
Figure BDA00039455170000000911
base _ lr is the initial learning rate set to 0.001, iter is the current iteration number, total _ iter is the maximum iteration number set to 2k, power is set to 0.9 for adjusting the learning rate.
And 5, inputting 500 target images in the target domain verification set to the trained semi-supervised semantic segmentation model to generate pseudo labels, wherein the target images in the target domain verification set have real ground labels, so that the average intersection ratio of the generated pseudo labels and the real ground labels of the images is calculated to verify the segmentation performance of the semi-supervised semantic segmentation model.

Claims (4)

1. A semi-supervised semantic segmentation method based on self-adaptive pseudo label correction is characterized by comprising the following steps of:
step 1, selecting a GTA5 data set to construct a source domain, selecting a Cityscapes data set to construct a target domain, dividing target images in the target domain into a training set and a verification set, and turning to step 2;
step 2, inputting the image in the source domain into a deep convolutional neural network for training to obtain a pre-trained semantic segmentation model, and turning to step 3;
step 3, inputting the target images in the training set of the target domain into a pre-trained semantic segmentation model to generate a prediction probability matrix of the corresponding target images, constructing a selection strategy of an uncertain region by using an information entropy and density clustering algorithm, acquiring the uncertain region in the prediction probability matrix of the target images, and turning to step 4;
step 4, finding the same position on the target image, which corresponds to the uncertain region on the target image prediction probability matrix, shearing the position on the target image to obtain an uncertain region image, amplifying the uncertain region image to be used as a secondary target image, inputting the secondary target image into a pre-trained semantic segmentation model, sampling according to the size of the uncertain region to generate a secondary pseudo label to be fused with a target image pseudo label, constructing an adaptive pseudo label correction strategy to obtain a final pseudo label to be used as the supervision of the target image, and inputting the source domain image and the target domain image in the same batch to jointly train a semi-supervised semantic segmentation model; when the preset training times are reached, obtaining a trained semi-supervised semantic segmentation model, and turning to the step 5;
and 5, inputting target images in the target domain verification set to a trained semi-supervised semantic segmentation model to generate pseudo labels to verify the semantic segmentation performance of the network.
2. The semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to claim 1, wherein in step 2, the image in the source domain is input into a deep convolutional neural network to be trained to obtain a pre-trained semantic segmentation model, as shown in formula (1):
Figure FDA0003945516990000011
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003945516990000012
representing the source domain image S E R H×W H denotes the height of the source domain image, W denotes the width of the source domain image, H x W denotes the total number of pixel points on the source domain image, C denotes the total number of categories of the classification,
Figure FDA0003945516990000013
real ground label of representation
Figure FDA0003945516990000014
The thermal encoding of the ith pixel of (a),
Figure FDA0003945516990000015
Figure FDA0003945516990000016
represents the prediction probability that the ith pixel of the source domain image belongs to the class C, C ∈ C.
3. The semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to claim 2, wherein in the step 3, a selection strategy for constructing the uncertainty region by using an information entropy and density clustering algorithm is specifically as follows:
step 3.1, target image T epsilon in target domain training set is determined as R H×W Inputting the prediction probability matrix into a pre-trained semantic segmentation model to generate a corresponding target image
Figure FDA0003945516990000021
Calculating the dispersion degree of the prediction probability of each pixel point on the target image by using the information entropy, and taking the pixel point as an uncertain point when the entropy of the pixel point is smaller than the threshold of the entropy, wherein the formulas are shown as a formula (2) and a formula (3):
Figure FDA0003945516990000022
Figure FDA0003945516990000023
wherein the content of the first and second substances,
Figure FDA0003945516990000024
representing the entropy mapping of the ith pixel point of the target image,
Figure FDA0003945516990000025
a prediction probability matrix, X, representing the ith pixel of the target image n Denotes the nth indeterminate point, N is from {1, 2., N }, N denotes the total number of indeterminate points, (x, y) denotes the coordinate position of the indeterminate point on the target image, γ t Represents the lowest threshold value of information entropy at the t-th iteration, will be gamma t Is set as alpha t Corresponding quantiles, i.e. gamma t =np.percent(H().flatten(),100×(1-α t ) H () is the entropy mapping of each pixel point of the target image, α t Selecting the proportion of uncertain points, and adjusting the proportion through a linear strategy, wherein the formula is shown as (4):
Figure FDA0003945516990000026
wherein alpha is 0 The ratio of the initially selected uncertain points is represented and set to 20%, iter represents the current iteration number, and total iter represents the preset iteration number.
Step 3.2, based on the selected uncertain points, searching the prediction probability matrix of the target image by using a density clustering algorithm
Figure FDA0003945516990000027
Upper uncertainty area T un The sample set input by the density clustering algorithm is a set of uncertain points D = { X = { (X) 1 ,X 2 ,...,X N The input field parameters are (epsilon, M), epsilon is the radius determined by density clustering, samples in the sample set, the distance between the samples and the core object is not more than epsilon, are called epsilon-field, and M is the number of samples at least contained in the epsilon-field; of density clustering algorithms output as cluster division a = { a = { (a) } 1 ,A 2 ,...,A K A represents the set of all uncertain points divided into K clusters, A K Represents a K-th cluster represented by the formula (5):
N ε (X j )={X i ∈D|D dist(X i ,X j )≤ε} (5)
wherein N is ε (X j ) Representing the number of samples, X, contained in the epsilon-field i ,X j Representing a core object, X i And X j With the difference that X j From X i Density of direct, if X j Is located at X i In the epsilon-domain of (a), and X i Is also a core object, then called X j From X i Direct density, dist (X) i ,X j ) Representing the distance between two core points;
the density clustering algorithm finds out all core objects according to the given neighborhood parameters (epsilon, M), firstly randomly selects one core object in the data set as a seed, then takes the core object as a starting point, finds out samples with the reachable density to generate a cluster, and for X j And X i Presence of sample sequence R 1 ,R 2 ,...,R Z And R is 1 =X j ,R Z =X i And R is i+1 From R i When the density is up to, it is called X j From X i The density can reach, until all core objects are visited, finish clustering;
step 3.3, inputting a set D = { X ] of uncertain points according to parameter requirements of the density clustering algorithm 1 ,X 2 ,...,X N As a sample set and setting a domain parameter (epsilon, M), the output is cluster division A = { A = } 1 ,A 2 ,...,A K Selecting one of the K clusters with the highest density
Figure FDA0003945516990000031
Then will be
Figure FDA0003945516990000032
The cluster center is used as the center X of the uncertain region o Setting the width w as 2 epsilon and the height h as 4 epsilon, if the uncertain region exceeds the range of the target image, selecting the uncertain region by utilizing the cluster with suboptimal density in the K clusters, and the like, when the K clusters do not meet the conditions,then the next uncertain region selection strategy is entered so our uncertain region selection strategy can set multiple sets of domain parameters, namely (epsilon) 1 ,M 1 ),(ε 2 ,M 2 ),...,(ε m ,M m ) And needs to satisfy ε 12 >...>ε m The selection strategies of the uncertain areas are respectively as follows:
strategy 1: when N is present>M 1 In time, a sample set of indeterminate points D = { X is input 1 ,X 2 ,...,X N And the Domain parameter (. Epsilon.) 1 ,M 1 ) Output the uncertain region T un =[X o ,w,h];
Strategy 2: when M is 1 ≥N>M 2 Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points 1 ,X 2 ,...,X N And the Domain parameter (. Epsilon.) 2 ,M 2 ) Output the uncertain region T un =[X o ,w,h];
......
Strategy m: when M is m-1 ≥N>M m Or entering the next selection strategy, inputting a sample set D = { X) of uncertain points 1 ,X 2 ,...,X N And the Domain parameter (. Epsilon.) m ,M m ) Outputting the uncertain region T un =[X o ,w,h];
And when the uncertain region does not meet the strategy, stopping the iterative training.
4. The semi-supervised semantic segmentation method based on adaptive pseudo-label correction according to claim 3, wherein in the step 4, the adaptive pseudo-label correction strategy is constructed by the following steps:
step 4.1, finding the same position on the target image corresponding to the uncertain region on the target image prediction probability matrix, shearing the position on the target image to obtain an uncertain region image, and amplifying the uncertain region image in equal proportion by using bilinear interpolation to increase the resolution of the uncertain region image, wherein the amplification factor is
Figure FDA0003945516990000033
Inputting the amplified image as a secondary target image into a pre-trained semantic segmentation network, obtaining a feature vector of the secondary target image reduced by 8 times in the coding process, wherein the feature vector has the size of 128 multiplied by 64 multiplied by C, and upsampling according to the size h multiplied by w of an uncertain region
Figure FDA0003945516990000034
And recovering the characteristic vector with the same size as the image of the uncertain region, and obtaining a prediction probability matrix and a pseudo label of the image of the uncertain region after softmax activation, wherein the prediction probability matrix and the pseudo label are respectively used as a secondary prediction probability matrix and a secondary pseudo label, and the formulas (6) and (7) are shown as follows:
Figure FDA0003945516990000041
Figure FDA0003945516990000042
wherein, P un Representing a quadratic prediction probability matrix P un ∈R h×w ,F un Feature vector F representing a secondary target image un ∈R h ×w×C
Figure FDA0003945516990000043
Indicating a secondary false label
Figure FDA0003945516990000044
Step 4.2, zero filling is carried out on the peripheral area of the generated secondary pseudo label, and the pseudo label after filling is used as a tertiary pseudo label
Figure FDA0003945516990000045
Selecting uncertain region T by uncertain region selection strategy un =[X o ,w,h]Generating a mask combining the target image and the uncertainty regionCode Mask ∈ R H×W As shown in formula (8):
Figure FDA0003945516990000046
and 4.3, fusing the target image pseudo label and the secondary pseudo label to be used as a final pseudo label of the target image, wherein the formula (9) is as follows:
Figure FDA0003945516990000047
wherein the content of the first and second substances,
Figure FDA0003945516990000048
the final pseudo label representing the target image,
Figure FDA0003945516990000049
representing a target image pseudo label;
using the final pseudo label as a supervision, as shown in equation (10):
Figure FDA00039455169900000410
wherein the content of the first and second substances,
Figure FDA00039455169900000411
representing the target image T ∈ R H×W The cross-entropy loss of (a) is,
Figure FDA00039455169900000412
representing the final pseudo label
Figure FDA00039455169900000413
The thermal encoding of the ith pixel of (a),
Figure FDA00039455169900000414
a prediction probability representing that the ith pixel of the target image belongs to the class c;
in the process of training the semi-supervised semantic segmentation model, the source domain images and the target domain images are input in the same batch, model parameters are optimized in a self-training mode, and therefore the overall loss function of the semi-supervised semantic segmentation model with pseudo-label correction is self-adapted
Figure FDA00039455169900000417
As shown in equation (11):
Figure FDA00039455169900000415
wherein λ is T Is weight of target loss, and the weight is defined as that the entropy in the training of the current batch is less than a threshold value gamma t The inverse of the percentage of pixels of (a), as shown in equation (12):
Figure FDA00039455169900000416
wherein, | B T L represents the number of images input for the current batch,
Figure FDA00039455169900000418
is an indicator function if X n If the point is an uncertain point, the point is 1, otherwise, the point is 0.
CN202211432700.9A 2022-11-16 2022-11-16 Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction Pending CN115761735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211432700.9A CN115761735A (en) 2022-11-16 2022-11-16 Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211432700.9A CN115761735A (en) 2022-11-16 2022-11-16 Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction

Publications (1)

Publication Number Publication Date
CN115761735A true CN115761735A (en) 2023-03-07

Family

ID=85371696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211432700.9A Pending CN115761735A (en) 2022-11-16 2022-11-16 Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction

Country Status (1)

Country Link
CN (1) CN115761735A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204626A (en) * 2023-05-05 2023-06-02 江西尚通科技发展有限公司 Dialogue new intention discovery method, system and computer based on deep learning
CN116229080A (en) * 2023-05-08 2023-06-06 中国科学技术大学 Semi-supervised domain adaptive image semantic segmentation method, system, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204626A (en) * 2023-05-05 2023-06-02 江西尚通科技发展有限公司 Dialogue new intention discovery method, system and computer based on deep learning
CN116229080A (en) * 2023-05-08 2023-06-06 中国科学技术大学 Semi-supervised domain adaptive image semantic segmentation method, system, equipment and storage medium
CN116229080B (en) * 2023-05-08 2023-08-29 中国科学技术大学 Semi-supervised domain adaptive image semantic segmentation method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN115761735A (en) Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction
CN112381116A (en) Self-supervision image classification method based on contrast learning
CN110929848A (en) Training and tracking method based on multi-challenge perception learning model
CN110188827A (en) A kind of scene recognition method based on convolutional neural networks and recurrence autocoder model
CN111008224A (en) Time sequence classification and retrieval method based on deep multitask representation learning
CN112115967A (en) Image increment learning method based on data protection
CN113139592A (en) Method, device and storage medium for identifying lunar meteorite crater based on depth residual error U-Net
CN115908908A (en) Remote sensing image gathering type target identification method and device based on graph attention network
CN114359930A (en) Depth cross-modal hashing method based on fusion similarity
CN116469561A (en) Breast cancer survival prediction method based on deep learning
CN114119966A (en) Small sample target detection method based on multi-view learning and meta-learning
CN111275702B (en) Loop detection method based on convolutional neural network
CN115631513A (en) Multi-scale pedestrian re-identification method based on Transformer
CN113807340A (en) Method for recognizing irregular natural scene text based on attention mechanism
CN116452862A (en) Image classification method based on domain generalization learning
CN117217368A (en) Training method, device, equipment, medium and program product of prediction model
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
CN113870286A (en) Foreground segmentation method based on multi-level feature and mask fusion
CN116645562A (en) Detection method for fine-grained fake image and model training method thereof
CN116543269A (en) Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof
WO2024016424A1 (en) Sparse code multiple access encoding and decoding system based on generative adversarial network
CN113420706B (en) Vehicle detection method based on multi-layer feature fusion
CN114998731A (en) Intelligent terminal navigation scene perception identification method
CN114170460A (en) Multi-mode fusion-based artwork classification method and system
CN113177599A (en) Enhanced sample generation method based on GAN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination