CN113095328A - Self-training-based semantic segmentation method guided by Gini index - Google Patents
Self-training-based semantic segmentation method guided by Gini index Download PDFInfo
- Publication number
- CN113095328A CN113095328A CN202110318561.6A CN202110318561A CN113095328A CN 113095328 A CN113095328 A CN 113095328A CN 202110318561 A CN202110318561 A CN 202110318561A CN 113095328 A CN113095328 A CN 113095328A
- Authority
- CN
- China
- Prior art keywords
- target domain
- training
- network
- loss
- layers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 52
- 238000012549 training Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012360 testing method Methods 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 4
- 238000005259 measurement Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 description 13
- 238000005192 partition Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 5
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 4
- 239000002131 composite material Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a self-training-based semantic segmentation method guided by a kini index, which is provided by the invention.
Description
Technical Field
The invention relates to a self-training-based field self-adaptive semantic labeling method, which is different from the traditional method in the selection of a pseudo label, determines the pseudo label by taking a Gini index as a basis, belongs to the field of pattern recognition and computer vision, and can be applied to the automatic driving and robot visual navigation technology.
Background
The self-training-based domain adaptive semantic segmentation method uses two types of data: the method comprises the steps that labeled source domain data and unlabeled target domain data are obtained, labels are used as supervision information in a source domain, pseudo labels are used as supervision information in a target domain, a network is trained on the basis of the supervision information, and then a model with a good semantic labeling effect on a target domain image is learned. Accurate unsupervised domain adaptive semantic segmentation is important for applications with obvious data difference between a model learning stage and a model using stage, such as automatic driving, robot navigation and the like.
The main idea of unsupervised domain adaptation based on self-training is to create a pseudo label and use the pseudo label as a real label of a target domain image in a training phase. The biggest problem to be solved by the unsupervised field adaptive method based on self-training is how to acquire correct pseudo labels, and wrong pseudo labels may finally cause 'confirmation deviation', namely, the wrong pseudo labels become noise when being used as supervision information, so that the expression performance of a trained model is worse.
In order to efficiently obtain as correct a pseudo tag as possible, existing strategies include: selecting a pseudo label based on the prediction of the network output; the pseudo-label is selected based on a measure of uncertainty of the network output prediction. The method comprises the steps that a threshold value is set in advance, and pixels with Softmax scores larger than the threshold value are endowed with class labels corresponding to the maximum prediction score value as pseudo labels of the pixels. The method can generate error labels in the initial stage of iteration, but as the number of iterations increases, the performance of the classifier on the test data set is improved, so that the accuracy of the labels is improved. The problem with this approach is that choosing a pseudo label in cases where the model is highly uncertain about the prediction of the pixel (e.g., boundary pixels) is error prone, i.e., a softmax score above a threshold does not represent that the corresponding predicted label is correct. In response to this problem, researchers have proposed to measure the uncertainty of the network output prediction and select pseudo labels based on the uncertainty, which is typically an approach of this type, which calculates the Entropy of the network output prediction to measure the uncertainty of the prediction and selects pseudo labels based on the Entropy to improve the reliability of the pseudo labels. However, in the entropy-based gradient back propagation optimization process, the categories which are easy to classify are optimized in a biased manner, namely the optimization weight of the categories which are difficult to classify is smaller than that of the categories which are easy to classify, so that the problem that the pixel accuracy which is difficult to classify is not high is caused.
Disclosure of Invention
In order to effectively improve the accuracy of unsupervised domain self-adaptive semantic segmentation based on a self-training frame, the invention provides that the uncertainty of output prediction is measured by using a kini index, and the selection of a pseudo label is guided by using the kini index, namely, a class label corresponding to the maximum softmax score is given to a pixel with the kini index smaller than a set threshold value as the pseudo label. The abscissa shown in fig. 1 is the output prediction probability and the ordinate is the gradient based on the uncertainty measure (entropy information or kini index) when propagating backwards. The counter-propagating gradients in the two intervals of prediction probability [0.75,0.9] and prediction probability [0.9,1] in FIG. 1 are compared: when uncertainty of output prediction is calculated based on entropy information, the gradient for back propagation is far greater than the prediction probability [0.75,0.9] in the prediction probability [0.9,1] interval; when the uncertainty of output prediction is calculated by adopting the Gini index, the difference between the gradient used for back propagation in the prediction probability [0.9,1] interval and the gradient used for back propagation in the prediction probability [0.75,0.9] interval is not large. That is, the gradient calculated based on the kini index does not pay much attention to the points in the [0.9,1] interval during the back propagation process, and the model gives relatively greater update weight to the classes over the prediction probability [0.75,0.9] interval. Research results have indicated that the IOU value of a class is positively correlated with the class prediction probability. Because the relevant entropy index of the Gini index is more focused on outputting the corresponding point of the prediction probability [0.75,0.9] interval during model training, the IOU value of the prediction probability in the [0.75,0.9] interval can be improved. Because the pseudo label selected on the interval [0.9,1] is relatively accurate, the accuracy of class prediction on the interval [0.75,0.9] is ensured, the correctness of the pseudo label is ensured, more correct supervision information is favorably introduced, and the introduction of noise is avoided.
In the invention, the source domain refers to a synthetic data set, and the target domain refers to a real data set. The network comprises two sub-networks with the same structure and different parameters, which are respectively called GteAnd Gst. During training, network GteThe source domain and target domain images are input for training. After training is complete, use GtePerforming semantic segmentation on the target domain image based on GteAnd predicting and calculating a kini index of the output target domain image and guiding the acquisition of the target domain image pseudo label by using the kini index. The obtained pseudo label is used for training Gst。
The method comprises the following specific steps:
step (1), a RGB image is taken from a source domain data set and a target domain data set at random respectively as a batch input semantic segmentation network Gte;
Step (2), calculating cross entropy loss of the source domain image based on the output prediction graph and the group channel of the last two layers of the network, and carrying out weighted summation on the loss of the last two layers;
step (3), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain image, and weighting and summing the losses of the last two layers;
step (4), summing the weighting loss calculated in the step (2) and the weighting loss calculated in the step (3), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing training of the batch data;
step (5), returning to step 1 to continue to select new batch data, repeating steps 1 to 5 until 2000 batches are trained and storing the trained models;
step (6), repeating the steps (1) to (5) until 120000 batch data are trained, namely, 60 models are saved;
step (7), testing 60 stored models in a target domain verification set, calculating output prediction of images of a target domain training set by using the model with the best accuracy on the verification set, calculating a kini index corresponding to the output prediction, and endowing the images of the target domain training set with a pseudo label based on the kini index;
step (8), respectively taking an RGB image as a batch as input randomly from the source domain data set and the target domain data set, and training a semantic segmentation network Gst;
Step (9), calculating cross entropy loss of the source domain image based on the output prediction graph and the group prediction of the last two layers, and carrying out weighted summation on the loss of the last two layers;
step (10), calculating cross entropy losses of the output prediction images of the last two layers and the pseudo labels of the target domain images respectively, and performing weighted summation on the losses of the last two layers;
step (11), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (12), summing the weighting losses of the step (9), the step (10) and the step (11), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing the training of the batch data;
step (14), the step 8 is returned to continue to select new batch data, and the steps 8 to 12 are repeated until 2000 batches are trained and the trained models are stored, wherein 120000 batches are trained in total, namely 60 models are stored;
and (15) testing the 60 stored models in the target domain test set during testing, and taking the optimal segmentation result.
Compared with the prior art, the method has the advantages that the pseudo label of the target domain image selected by taking the uncertainty of the Kernel index measurement as the basis is more reliable, and the semantic annotation accuracy of the target domain is improved.
Drawings
FIG. 1: the gradient values VS predict the probability.
FIG. 2: the self-adaptive semantic segmentation network structure chart of the self-supervision field.
FIG. 3: semantic segmentation network (G)teAnd Gst) Structure diagram.
FIG. 4: ASPP module structure chart.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
The invention utilizes the synthetic data set as a source domain and the real data set as a target domain. The network comprises two sub-networks with the same structure and different parameters, which are respectively called GteAnd Gst. During training, the network G is semantically segmentedteThe source domain and target domain images are input for training. After training is completed, the network G is semantically segmentedteAnd calculating the prediction of the target domain training image, calculating the predicted kini index of the target domain training image and guiding the acquisition of the target domain image pseudo label by using the kini index. Target domain image pseudo label for training semantic segmentation network Gst。
The method comprises the following specific steps:
step (1), a RGB image is taken from a source domain data set and a target domain data set at random respectively as a batch input semantic segmentation network Gte;
Step (2), calculating cross entropy loss of the source domain image based on the output prediction graph and the group treth of the last two layers of the network, and carrying out weighted summation on the loss of the last two layers of the source domain;
step (3), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (4), summing the weighting loss calculated in the step (2) and the weighting loss calculated in the step (3), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing training of the batch data;
step (5), returning to step 1 to continue to select new batch data, repeating steps 1 to 5 until 2000 batches are trained and storing the trained models;
step (6), repeating the steps (1) to (5) until 120000 batch data are trained, namely, 60 models are saved;
step (7), testing 60 stored models in a target domain verification set, calculating output prediction of images of a target domain training set by using the model with the best accuracy, calculating a kini index corresponding to the output prediction, and endowing the images of the target domain training set with a pseudo label based on the kini index;
step (8), respectively taking an RGB image as a batch as input randomly from the source domain data set and the target domain data set, and training a semantic segmentation network Gst;
Step (9), calculating cross entropy loss of the source domain image based on the output prediction graph and the group prediction of the last two layers, and performing weighted summation on the loss of the last two layers of the source domain;
step (10), calculating cross entropy losses of the output prediction images of the last two layers of the target domain image and the pseudo label respectively, and performing weighted summation on the losses of the last two layers of the target domain image;
step (11), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (12), summing the weighting losses of the step (9), the step (10) and the step (11), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing the training of the batch data;
step (14), the step 8 is returned to continue to select new batch data, and the steps 8 to 12 are repeated until 2000 batches are trained and the trained models are stored, wherein 120000 batches are trained in total, namely 60 models are stored; and (15) testing the 60 stored models in the target domain test set during testing, and taking the optimal segmentation result.
The model constructed by the method provided by the invention is an unsupervised field self-adaptive network, the overall structure of the network is shown as figure 2, and the network comprises two sub-networks GteAnd Gst. First training network Gte(ii) a Then according to network GteSelecting a pseudo label of the target domain image for the uncertainty measurement result of the target domain prediction, and training a semantic segmentation network G after giving the pseudo label to the corresponding target domain imagestAnd the prediction accuracy of the target domain image is improved by adding effective supervision information.
1. Semantic segmentation netLuo GteAnd GstThe network structure of (2):
semantic segmentation network GteAnd GstThe network structure of the system is the same, the Deeplab-V2 is used as a basic network architecture and is composed of an encoder and a decoder, and the specific network structure diagram is shown in FIG. 3.
The encoder uses Resnet101 as the basic network, and the network structure parameters are shown in Table 1. The encoder is composed of four blocks of convolutional layers Conv _1, Conv _2, Conv _3, Conv _4 and Conv _5, each block comprises 3 residual modules, 4 residual modules, 23 residual modules and 3 residual modules, and the active functions are ReLU functions.
The convolution layer Conv _1 includes 64 7 × 7 filters having stride 2 and padding 3. In four blocks, Conv _2 contains one 3 × 3 max pooling layer and 3 residual modules; stride 2, no padding, of 1 x 1 filter of the Conv _3 first residual block; the 3 × 3 filter of the first residual block of Conv _4 is a hole convolution with stride 1, variance 2, padding 2; the 3 × 3 filter of the first residual block of Conv _5 is a hole convolution with stride 1, variance 4, padding 4; in the remaining residual blocks not specifically described above, all 3 × 3 filters are convolutions in which stride is 1 and padding is 1, and all 1 × 1 filters are convolutions in which stride is 1 and no padding is present.
The decoder inputs the feature maps obtained by Conv _4 and Conv _5 into the ASPP module, the final feature map output by the ASPP is 1/8 of the original image, the original image is restored to the original image size by bilinear interpolation, and finally the boundary is smoothed by using CRF to obtain the final semantic segmentation result. The structure of the ASPP module is shown in fig. 4, and the detailed parameters are shown in table 2.
2. Selection of pseudo tags based on kini index
For the target domain training set image, the target domain image xt∈RH×W×3Is finally predicted to be a segmentation map Semantic-dependent partitioning of networks GteCalculation, as shown in equation (1): the specific calculation method for predicting the selected pseudo label is as follows:
wherein,is a semantically segmented network GteConv _5 of (2) output target area image xtThe predicted partition map of (a) is,is a semantically segmented network GteConv _4 output target area image xtPredicted partition map of beta3Is a hyper-parameter.
Target field image xtThe pixel at the (h, w) position belongs to class c, if and only if the predicted value of the pixel belonging to class c is maximum and the corresponding Gini index value of the pixel is(Is calculated in a manner that the values of the equations (7), (8) and (9)) are less than v(c),v(c)Is a hyper-parameter. The specific calculation method is as shown in formula (2), whereinIs the target field image xtPrediction of the pixel at the (h, w) position.
3. Semantic segmentation network GteLoss function of
The loss of the unsupervised domain adaptive network comprises the source domain segmentation loss and the uncertainty loss of the target domain prediction.
i. Source domain partition loss
For source domain data, the invention uses the traditional cross entropy as a loss function to calculate the segmentation loss, and the corresponding segmentation loss is calculated based on the prediction output by Conv _5 and Conv _4 respectivelyAndthe sum of these two segmentation losses is the source domain segmentation loss Lseg(xs,ys):
Wherein x iss∈RH×W×3Is a source domain RGB image with the resolution of H multiplied by W; y iss∈RH×W×CIs a source domain image xsC is the number of classes;is a semantically segmented network GteConv _5 output Source Domain image xsThe predicted partition map of (1);is a semantically segmented network GteConv _4 output Source Domain image x of (1)sThe predicted partition map of (1); beta is a1Is a hyper-parameter.
Uncertainty loss for target domain prediction
The method measures the uncertainty of target domain prediction by using the Gini index, and obtains a high-confidence prediction result for a target domain image by minimizing the Gini index and constraining an inter-domain adaptive network.
Calculating pixel-level kini indexes for target domain image predictions output by Conv _5 and Conv _4, respectivelyAndthe pixel-level kini index is calculated as follows:
xt∈RH×W×3is a target domain RGB image with resolution of H multiplied by W;is based on a semantic segmentation network GteConv _5 of (2) output target area image xtA kini index map calculated from the predicted segmentation map of (1),is the corresponding pixel-level kini index;is based on a semantic segmentation network GteConv _4 output target area image xtA kini index map calculated from the predicted segmentation map of (1),is the corresponding pixel-level kini index;is a semantically segmented network GteTarget of Conv _5 output ofDomain image xtThe predicted partition map of (1);is a semantically segmented network GteConv _4 output target area image xtIs predicted for the segmentation map.
Target field image xtThe pixel-level kini index of (c) is calculated as follows:
wherein C is the number of classes, β2Is a hyper-parameter.
Uncertainty loss of target domain prediction:
semantic segmentation network GteTotal loss of L (x)s,xt) Comprises the following steps:
L(xs,xt)=Lseg(xs,ys)+μ1LGini(xt) (10)
wherein mu1Is a hyper-parameter.
4. Semantic segmentation network GstLoss function of
Semantic segmentation network GstThe loss of (2) includes a source domain partitioning loss, and the uncertainty of the target domain prediction loses a pseudo-label partitioning loss of the target domain.
The pseudo label calculation of the target domain adopts the traditional cross entropy loss function calculation, and the corresponding segmentation loss is calculated based on the predictions output by Conv _5 and Conv _4 respectivelyAndthe sum of these two segmentation losses is the target domain segmentation loss
Wherein x ist∈RH×W×3Is a target domain RGB image with resolution of H multiplied by W;is the target field image xtC is the number of classes;is a semantically segmented network GstConv _5 of (2) output target area image xtThe predicted partition map of (1);is a semantically segmented network GstConv _4 output target area image xsThe predicted partition map of (1); beta is a2Is a hyper-parameter.
Semantic segmentation network GstTotal loss of L (x)s,xt) Comprises the following steps:
wherein mu2And mu3Is a hyper-parameter.
Examples
1. Experimental data set
The method provided by the invention performs experiments on a common unsupervised self-adaptive data set GTA5-Cityscapes, wherein a synthetic data set GTA5 is used as a source domain, and a real data set Cityscapes is used as a target domain. Models were evaluated on the cityscaps validation set.
GTA 5: the composite data set GTA5 contains 24966 composite images with a resolution of 1914 × 1052 and a corresponding ground-truth. These composite images are collected from a city wind-light video game based on los angeles city. The automatically generated ground-truth contains 33 classes. The method of performing experiments on GTA5-Cityscapes generally only considers 19 classes compatible with the Cityscapes dataset, and the present invention is no exception.
Cityscaps: as a dataset collected from the real world, cityscaps provides 3975 images with fine segmentation annotations. The training set contained 2975 images and the validation set contained 500 images.
2. Evaluation index of experiment
The present invention uses an Intersection-over-Union (IoU) to evaluate the performance of semantic segmentation. IoU values are between [0, 1], the larger the value is, the better the segmentation effect is, IoU is defined as follows:
IoU=TP/(TP+FP+FN)
where TP, FP and FN are the number of true positive (true positive), false positive (false positive) and false negative (false negative) pixels, respectively. The mlou in tables 3 and 4 is the mean IoU of class 19.
3. Network training
The unsupervised domain adaptive network batch size is 2, the resolution of the source domain input image is 1280 × 720, the resolution of the target domain input image is 1024 × 512, and resize to 321 × 321 is required respectively. During training, the label is reduced by 8 times, and the output prediction graph of the network is subjected to loss calculation; the output prediction graph of the network is enlarged by 8 times and compared with label during testing. Beta is a1、β3Set to 0.1; beta is a2Set to 0.2, mu1、μ2、μ3Set to 0.01. Semantic segmentation network GteAnd semantic segmentation network GstThe encoders of ResNet-101 are pre-trained based on ImageNet. We use SGD optimizationTraining apparatus GstAnd GstThe initial learning rate is 2.5 × 10-4。
4. Results of the experiment
The invention performs experiments on a common unsupervised adaptive data set GTA 5-Cityscapes. Table 3 and table 4 show that the invention uses the self-supervision method guided by the kini index on the MinEnt method (network in which Vu et al uses entropy as a measure to directly minimize) and the AdvEnt method (network in which Vu et al uses entropy as a measure to fight) respectively, and it can be seen that mlou is significantly improved when the invention adds the target domain pseudo tag on the basis of the common method, and the semantic segmentation effect is improved compared with the common SSL and ESL pseudo tag self-training methods.
Table 1: encoder structure parameters
Table 2: decoder structure parameter
Table 3 improved experimental result comparison
Table 4 improved experimental results comparison
Claims (2)
1. A self-training-based semantic segmentation method guided by a Gini index is characterized by comprising the following steps of: using the synthetic data set as a source domain and the real data set as a target domain; during training, inputting a source domain and a target domain image into an inter-domain adaptive network for training, and after the training is finished, dividing the target domain image and inputting the divided target domain image into the intra-domain adaptive network for training to obtain an optimal segmentation result;
the method comprises the following specific steps:
step (1), a RGB image is taken from a source domain data set and a target domain data set at random respectively as a batch input semantic segmentation network Gte;
Step (2), calculating cross entropy loss of the source domain image based on the output prediction graph and the group treth of the last two layers of the network, and carrying out weighted summation on the loss of the last two layers of the source domain;
step (3), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (4), summing the weighting loss calculated in the step (2) and the weighting loss calculated in the step (3), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing training of the batch data;
step (5), returning to step 1 to continue to select new batch data, repeating steps 1 to 5 until 2000 batches are trained and storing the trained models;
step (6), repeating the steps (1) to (5) until 120000 batch data are trained, namely, 60 models are saved;
step (7), testing 60 stored models in a target domain verification set, calculating output prediction of images of a target domain training set by using the model with the best accuracy, calculating a kini index corresponding to the output prediction, and endowing the images of the target domain training set with a pseudo label based on the kini index;
step (8), respectively taking an RGB image as a batch as input randomly from the source domain data set and the target domain data set, and training a semantic segmentation network Gst;
Step (9), calculating cross entropy loss of the source domain image based on the output prediction graph and the group prediction of the last two layers, and performing weighted summation on the loss of the last two layers of the source domain;
step (10), calculating cross entropy losses of the output prediction images of the last two layers of the target domain image and the pseudo label respectively, and performing weighted summation on the losses of the last two layers of the target domain image;
step (11), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (12), summing the weighting losses of the step (9), the step (10) and the step (11), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing the training of the batch data;
step (14), the step 8 is returned to continue to select new batch data, and the steps 8 to 12 are repeated until 2000 batches are trained and the trained models are stored, wherein 120000 batches are trained in total, namely 60 models are stored;
and (15) testing the 60 stored models in the target domain test set during testing to obtain a final segmentation result.
2. The method of claim 1, wherein the self-training semantic segmentation method based on the guiding of the kini index is characterized in that: the built model is an unsupervised field self-adaptive network, and the whole network structure comprises two sub-networks GteAnd Gst(ii) a First training network Gte(ii) a Then according to network GteSelecting a pseudo label of the target domain image for the uncertainty measurement result of the target domain prediction, and training a semantic segmentation network G after giving the pseudo label to the corresponding target domain imagestAnd the prediction accuracy of the target domain image is improved by adding effective supervision information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110318561.6A CN113095328B (en) | 2021-03-25 | 2021-03-25 | Semantic segmentation method guided by base index and based on self-training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110318561.6A CN113095328B (en) | 2021-03-25 | 2021-03-25 | Semantic segmentation method guided by base index and based on self-training |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113095328A true CN113095328A (en) | 2021-07-09 |
CN113095328B CN113095328B (en) | 2024-08-23 |
Family
ID=76669614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110318561.6A Active CN113095328B (en) | 2021-03-25 | 2021-03-25 | Semantic segmentation method guided by base index and based on self-training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113095328B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114445413A (en) * | 2022-04-07 | 2022-05-06 | 宁波康达凯能医疗科技有限公司 | Inter-frame image semantic segmentation method and system based on domain self-adaptation |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222690A (en) * | 2019-04-29 | 2019-09-10 | 浙江大学 | A kind of unsupervised domain adaptation semantic segmentation method multiplying loss based on maximum two |
CN110322446A (en) * | 2019-07-01 | 2019-10-11 | 华中科技大学 | A kind of domain adaptive semantic dividing method based on similarity space alignment |
CN112116593A (en) * | 2020-08-06 | 2020-12-22 | 北京工业大学 | Domain self-adaptive semantic segmentation method based on Gini index |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
-
2021
- 2021-03-25 CN CN202110318561.6A patent/CN113095328B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222690A (en) * | 2019-04-29 | 2019-09-10 | 浙江大学 | A kind of unsupervised domain adaptation semantic segmentation method multiplying loss based on maximum two |
CN110322446A (en) * | 2019-07-01 | 2019-10-11 | 华中科技大学 | A kind of domain adaptive semantic dividing method based on similarity space alignment |
CN112116593A (en) * | 2020-08-06 | 2020-12-22 | 北京工业大学 | Domain self-adaptive semantic segmentation method based on Gini index |
AU2020103901A4 (en) * | 2020-12-04 | 2021-02-11 | Chongqing Normal University | Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114445413A (en) * | 2022-04-07 | 2022-05-06 | 宁波康达凯能医疗科技有限公司 | Inter-frame image semantic segmentation method and system based on domain self-adaptation |
Also Published As
Publication number | Publication date |
---|---|
CN113095328B (en) | 2024-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110930397B (en) | Magnetic resonance image segmentation method and device, terminal equipment and storage medium | |
CN106228185B (en) | A kind of general image classifying and identifying system neural network based and method | |
CN112685597B (en) | Weak supervision video clip retrieval method and system based on erasure mechanism | |
CN110852273A (en) | Behavior identification method based on reinforcement learning attention mechanism | |
CN111462191B (en) | Non-local filter unsupervised optical flow estimation method based on deep learning | |
CN115147598B (en) | Target detection segmentation method and device, intelligent terminal and storage medium | |
CN112634296A (en) | RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism | |
CN112116593A (en) | Domain self-adaptive semantic segmentation method based on Gini index | |
CN111144483A (en) | Image feature point filtering method and terminal | |
CN113095254B (en) | Method and system for positioning key points of human body part | |
CN114692732B (en) | Method, system, device and storage medium for updating online label | |
CN115222998B (en) | Image classification method | |
CN112801104A (en) | Image pixel level pseudo label determination method and system based on semantic segmentation | |
CN114140469A (en) | Depth hierarchical image semantic segmentation method based on multilayer attention | |
CN114266894A (en) | Image segmentation method and device, electronic equipment and storage medium | |
Xu et al. | AutoSegNet: An automated neural network for image segmentation | |
CN115546171A (en) | Shadow detection method and device based on attention shadow boundary and feature correction | |
CN116208399A (en) | Network malicious behavior detection method and device based on metagraph | |
CN113947022B (en) | Near-end strategy optimization method based on model | |
CN113095328A (en) | Self-training-based semantic segmentation method guided by Gini index | |
CN117671261A (en) | Passive domain noise perception domain self-adaptive segmentation method for remote sensing image | |
CN117437423A (en) | Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement | |
CN111931841A (en) | Deep learning-based tree processing method, terminal, chip and storage medium | |
TWI781000B (en) | Machine learning device and method | |
CN115424012A (en) | Lightweight image semantic segmentation method based on context information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |