CN113095328A - Self-training-based semantic segmentation method guided by Gini index - Google Patents

Self-training-based semantic segmentation method guided by Gini index Download PDF

Info

Publication number
CN113095328A
CN113095328A CN202110318561.6A CN202110318561A CN113095328A CN 113095328 A CN113095328 A CN 113095328A CN 202110318561 A CN202110318561 A CN 202110318561A CN 113095328 A CN113095328 A CN 113095328A
Authority
CN
China
Prior art keywords
target domain
training
network
loss
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110318561.6A
Other languages
Chinese (zh)
Other versions
CN113095328B (en
Inventor
王立春
胡玉杰
王少帆
孔德慧
李敬华
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110318561.6A priority Critical patent/CN113095328B/en
Publication of CN113095328A publication Critical patent/CN113095328A/en
Application granted granted Critical
Publication of CN113095328B publication Critical patent/CN113095328B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a self-training-based semantic segmentation method guided by a kini index, which is provided by the invention.

Description

Self-training-based semantic segmentation method guided by Gini index
Technical Field
The invention relates to a self-training-based field self-adaptive semantic labeling method, which is different from the traditional method in the selection of a pseudo label, determines the pseudo label by taking a Gini index as a basis, belongs to the field of pattern recognition and computer vision, and can be applied to the automatic driving and robot visual navigation technology.
Background
The self-training-based domain adaptive semantic segmentation method uses two types of data: the method comprises the steps that labeled source domain data and unlabeled target domain data are obtained, labels are used as supervision information in a source domain, pseudo labels are used as supervision information in a target domain, a network is trained on the basis of the supervision information, and then a model with a good semantic labeling effect on a target domain image is learned. Accurate unsupervised domain adaptive semantic segmentation is important for applications with obvious data difference between a model learning stage and a model using stage, such as automatic driving, robot navigation and the like.
The main idea of unsupervised domain adaptation based on self-training is to create a pseudo label and use the pseudo label as a real label of a target domain image in a training phase. The biggest problem to be solved by the unsupervised field adaptive method based on self-training is how to acquire correct pseudo labels, and wrong pseudo labels may finally cause 'confirmation deviation', namely, the wrong pseudo labels become noise when being used as supervision information, so that the expression performance of a trained model is worse.
In order to efficiently obtain as correct a pseudo tag as possible, existing strategies include: selecting a pseudo label based on the prediction of the network output; the pseudo-label is selected based on a measure of uncertainty of the network output prediction. The method comprises the steps that a threshold value is set in advance, and pixels with Softmax scores larger than the threshold value are endowed with class labels corresponding to the maximum prediction score value as pseudo labels of the pixels. The method can generate error labels in the initial stage of iteration, but as the number of iterations increases, the performance of the classifier on the test data set is improved, so that the accuracy of the labels is improved. The problem with this approach is that choosing a pseudo label in cases where the model is highly uncertain about the prediction of the pixel (e.g., boundary pixels) is error prone, i.e., a softmax score above a threshold does not represent that the corresponding predicted label is correct. In response to this problem, researchers have proposed to measure the uncertainty of the network output prediction and select pseudo labels based on the uncertainty, which is typically an approach of this type, which calculates the Entropy of the network output prediction to measure the uncertainty of the prediction and selects pseudo labels based on the Entropy to improve the reliability of the pseudo labels. However, in the entropy-based gradient back propagation optimization process, the categories which are easy to classify are optimized in a biased manner, namely the optimization weight of the categories which are difficult to classify is smaller than that of the categories which are easy to classify, so that the problem that the pixel accuracy which is difficult to classify is not high is caused.
Disclosure of Invention
In order to effectively improve the accuracy of unsupervised domain self-adaptive semantic segmentation based on a self-training frame, the invention provides that the uncertainty of output prediction is measured by using a kini index, and the selection of a pseudo label is guided by using the kini index, namely, a class label corresponding to the maximum softmax score is given to a pixel with the kini index smaller than a set threshold value as the pseudo label. The abscissa shown in fig. 1 is the output prediction probability and the ordinate is the gradient based on the uncertainty measure (entropy information or kini index) when propagating backwards. The counter-propagating gradients in the two intervals of prediction probability [0.75,0.9] and prediction probability [0.9,1] in FIG. 1 are compared: when uncertainty of output prediction is calculated based on entropy information, the gradient for back propagation is far greater than the prediction probability [0.75,0.9] in the prediction probability [0.9,1] interval; when the uncertainty of output prediction is calculated by adopting the Gini index, the difference between the gradient used for back propagation in the prediction probability [0.9,1] interval and the gradient used for back propagation in the prediction probability [0.75,0.9] interval is not large. That is, the gradient calculated based on the kini index does not pay much attention to the points in the [0.9,1] interval during the back propagation process, and the model gives relatively greater update weight to the classes over the prediction probability [0.75,0.9] interval. Research results have indicated that the IOU value of a class is positively correlated with the class prediction probability. Because the relevant entropy index of the Gini index is more focused on outputting the corresponding point of the prediction probability [0.75,0.9] interval during model training, the IOU value of the prediction probability in the [0.75,0.9] interval can be improved. Because the pseudo label selected on the interval [0.9,1] is relatively accurate, the accuracy of class prediction on the interval [0.75,0.9] is ensured, the correctness of the pseudo label is ensured, more correct supervision information is favorably introduced, and the introduction of noise is avoided.
In the invention, the source domain refers to a synthetic data set, and the target domain refers to a real data set. The network comprises two sub-networks with the same structure and different parameters, which are respectively called GteAnd Gst. During training, network GteThe source domain and target domain images are input for training. After training is complete, use GtePerforming semantic segmentation on the target domain image based on GteAnd predicting and calculating a kini index of the output target domain image and guiding the acquisition of the target domain image pseudo label by using the kini index. The obtained pseudo label is used for training Gst
The method comprises the following specific steps:
step (1), a RGB image is taken from a source domain data set and a target domain data set at random respectively as a batch input semantic segmentation network Gte
Step (2), calculating cross entropy loss of the source domain image based on the output prediction graph and the group channel of the last two layers of the network, and carrying out weighted summation on the loss of the last two layers;
step (3), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain image, and weighting and summing the losses of the last two layers;
step (4), summing the weighting loss calculated in the step (2) and the weighting loss calculated in the step (3), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing training of the batch data;
step (5), returning to step 1 to continue to select new batch data, repeating steps 1 to 5 until 2000 batches are trained and storing the trained models;
step (6), repeating the steps (1) to (5) until 120000 batch data are trained, namely, 60 models are saved;
step (7), testing 60 stored models in a target domain verification set, calculating output prediction of images of a target domain training set by using the model with the best accuracy on the verification set, calculating a kini index corresponding to the output prediction, and endowing the images of the target domain training set with a pseudo label based on the kini index;
step (8), respectively taking an RGB image as a batch as input randomly from the source domain data set and the target domain data set, and training a semantic segmentation network Gst
Step (9), calculating cross entropy loss of the source domain image based on the output prediction graph and the group prediction of the last two layers, and carrying out weighted summation on the loss of the last two layers;
step (10), calculating cross entropy losses of the output prediction images of the last two layers and the pseudo labels of the target domain images respectively, and performing weighted summation on the losses of the last two layers;
step (11), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (12), summing the weighting losses of the step (9), the step (10) and the step (11), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing the training of the batch data;
step (14), the step 8 is returned to continue to select new batch data, and the steps 8 to 12 are repeated until 2000 batches are trained and the trained models are stored, wherein 120000 batches are trained in total, namely 60 models are stored;
and (15) testing the 60 stored models in the target domain test set during testing, and taking the optimal segmentation result.
Compared with the prior art, the method has the advantages that the pseudo label of the target domain image selected by taking the uncertainty of the Kernel index measurement as the basis is more reliable, and the semantic annotation accuracy of the target domain is improved.
Drawings
FIG. 1: the gradient values VS predict the probability.
FIG. 2: the self-adaptive semantic segmentation network structure chart of the self-supervision field.
FIG. 3: semantic segmentation network (G)teAnd Gst) Structure diagram.
FIG. 4: ASPP module structure chart.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
The invention utilizes the synthetic data set as a source domain and the real data set as a target domain. The network comprises two sub-networks with the same structure and different parameters, which are respectively called GteAnd Gst. During training, the network G is semantically segmentedteThe source domain and target domain images are input for training. After training is completed, the network G is semantically segmentedteAnd calculating the prediction of the target domain training image, calculating the predicted kini index of the target domain training image and guiding the acquisition of the target domain image pseudo label by using the kini index. Target domain image pseudo label for training semantic segmentation network Gst
The method comprises the following specific steps:
step (1), a RGB image is taken from a source domain data set and a target domain data set at random respectively as a batch input semantic segmentation network Gte
Step (2), calculating cross entropy loss of the source domain image based on the output prediction graph and the group treth of the last two layers of the network, and carrying out weighted summation on the loss of the last two layers of the source domain;
step (3), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (4), summing the weighting loss calculated in the step (2) and the weighting loss calculated in the step (3), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing training of the batch data;
step (5), returning to step 1 to continue to select new batch data, repeating steps 1 to 5 until 2000 batches are trained and storing the trained models;
step (6), repeating the steps (1) to (5) until 120000 batch data are trained, namely, 60 models are saved;
step (7), testing 60 stored models in a target domain verification set, calculating output prediction of images of a target domain training set by using the model with the best accuracy, calculating a kini index corresponding to the output prediction, and endowing the images of the target domain training set with a pseudo label based on the kini index;
step (8), respectively taking an RGB image as a batch as input randomly from the source domain data set and the target domain data set, and training a semantic segmentation network Gst
Step (9), calculating cross entropy loss of the source domain image based on the output prediction graph and the group prediction of the last two layers, and performing weighted summation on the loss of the last two layers of the source domain;
step (10), calculating cross entropy losses of the output prediction images of the last two layers of the target domain image and the pseudo label respectively, and performing weighted summation on the losses of the last two layers of the target domain image;
step (11), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (12), summing the weighting losses of the step (9), the step (10) and the step (11), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing the training of the batch data;
step (14), the step 8 is returned to continue to select new batch data, and the steps 8 to 12 are repeated until 2000 batches are trained and the trained models are stored, wherein 120000 batches are trained in total, namely 60 models are stored; and (15) testing the 60 stored models in the target domain test set during testing, and taking the optimal segmentation result.
The model constructed by the method provided by the invention is an unsupervised field self-adaptive network, the overall structure of the network is shown as figure 2, and the network comprises two sub-networks GteAnd Gst. First training network Gte(ii) a Then according to network GteSelecting a pseudo label of the target domain image for the uncertainty measurement result of the target domain prediction, and training a semantic segmentation network G after giving the pseudo label to the corresponding target domain imagestAnd the prediction accuracy of the target domain image is improved by adding effective supervision information.
1. Semantic segmentation netLuo GteAnd GstThe network structure of (2):
semantic segmentation network GteAnd GstThe network structure of the system is the same, the Deeplab-V2 is used as a basic network architecture and is composed of an encoder and a decoder, and the specific network structure diagram is shown in FIG. 3.
The encoder uses Resnet101 as the basic network, and the network structure parameters are shown in Table 1. The encoder is composed of four blocks of convolutional layers Conv _1, Conv _2, Conv _3, Conv _4 and Conv _5, each block comprises 3 residual modules, 4 residual modules, 23 residual modules and 3 residual modules, and the active functions are ReLU functions.
The convolution layer Conv _1 includes 64 7 × 7 filters having stride 2 and padding 3. In four blocks, Conv _2 contains one 3 × 3 max pooling layer and 3 residual modules; stride 2, no padding, of 1 x 1 filter of the Conv _3 first residual block; the 3 × 3 filter of the first residual block of Conv _4 is a hole convolution with stride 1, variance 2, padding 2; the 3 × 3 filter of the first residual block of Conv _5 is a hole convolution with stride 1, variance 4, padding 4; in the remaining residual blocks not specifically described above, all 3 × 3 filters are convolutions in which stride is 1 and padding is 1, and all 1 × 1 filters are convolutions in which stride is 1 and no padding is present.
The decoder inputs the feature maps obtained by Conv _4 and Conv _5 into the ASPP module, the final feature map output by the ASPP is 1/8 of the original image, the original image is restored to the original image size by bilinear interpolation, and finally the boundary is smoothed by using CRF to obtain the final semantic segmentation result. The structure of the ASPP module is shown in fig. 4, and the detailed parameters are shown in table 2.
2. Selection of pseudo tags based on kini index
For the target domain training set image, the target domain image xt∈RH×W×3Is finally predicted to be a segmentation map
Figure BDA0002991838250000061
Figure BDA0002991838250000062
Semantic-dependent partitioning of networks GteCalculation, as shown in equation (1): the specific calculation method for predicting the selected pseudo label is as follows:
Figure BDA0002991838250000063
wherein,
Figure BDA0002991838250000064
is a semantically segmented network GteConv _5 of (2) output target area image xtThe predicted partition map of (a) is,
Figure BDA0002991838250000065
is a semantically segmented network GteConv _4 output target area image xtPredicted partition map of beta3Is a hyper-parameter.
Target field image xtThe pixel at the (h, w) position belongs to class c, if and only if the predicted value of the pixel belonging to class c is maximum and the corresponding Gini index value of the pixel is
Figure BDA0002991838250000066
(
Figure BDA0002991838250000067
Is calculated in a manner that the values of the equations (7), (8) and (9)) are less than v(c),v(c)Is a hyper-parameter. The specific calculation method is as shown in formula (2), wherein
Figure BDA0002991838250000068
Is the target field image xtPrediction of the pixel at the (h, w) position.
Figure BDA0002991838250000069
3. Semantic segmentation network GteLoss function of
The loss of the unsupervised domain adaptive network comprises the source domain segmentation loss and the uncertainty loss of the target domain prediction.
i. Source domain partition loss
For source domain data, the invention uses the traditional cross entropy as a loss function to calculate the segmentation loss, and the corresponding segmentation loss is calculated based on the prediction output by Conv _5 and Conv _4 respectively
Figure BDA0002991838250000071
And
Figure BDA0002991838250000072
the sum of these two segmentation losses is the source domain segmentation loss Lseg(xs,ys):
Figure BDA0002991838250000073
Figure BDA0002991838250000074
Figure BDA0002991838250000075
Wherein x iss∈RH×W×3Is a source domain RGB image with the resolution of H multiplied by W; y iss∈RH×W×CIs a source domain image xsC is the number of classes;
Figure BDA0002991838250000076
is a semantically segmented network GteConv _5 output Source Domain image xsThe predicted partition map of (1);
Figure BDA0002991838250000077
is a semantically segmented network GteConv _4 output Source Domain image x of (1)sThe predicted partition map of (1); beta is a1Is a hyper-parameter.
Uncertainty loss for target domain prediction
The method measures the uncertainty of target domain prediction by using the Gini index, and obtains a high-confidence prediction result for a target domain image by minimizing the Gini index and constraining an inter-domain adaptive network.
Calculating pixel-level kini indexes for target domain image predictions output by Conv _5 and Conv _4, respectively
Figure BDA0002991838250000078
And
Figure BDA0002991838250000079
the pixel-level kini index is calculated as follows:
Figure BDA00029918382500000710
Figure BDA00029918382500000711
xt∈RH×W×3is a target domain RGB image with resolution of H multiplied by W;
Figure BDA00029918382500000712
is based on a semantic segmentation network GteConv _5 of (2) output target area image xtA kini index map calculated from the predicted segmentation map of (1),
Figure BDA00029918382500000713
is the corresponding pixel-level kini index;
Figure BDA00029918382500000714
is based on a semantic segmentation network GteConv _4 output target area image xtA kini index map calculated from the predicted segmentation map of (1),
Figure BDA00029918382500000715
is the corresponding pixel-level kini index;
Figure BDA00029918382500000716
is a semantically segmented network GteTarget of Conv _5 output ofDomain image xtThe predicted partition map of (1);
Figure BDA00029918382500000717
is a semantically segmented network GteConv _4 output target area image xtIs predicted for the segmentation map.
Target field image xtThe pixel-level kini index of (c) is calculated as follows:
Figure BDA0002991838250000081
wherein C is the number of classes, β2Is a hyper-parameter.
Uncertainty loss of target domain prediction:
Figure BDA0002991838250000082
semantic segmentation network GteTotal loss of L (x)s,xt) Comprises the following steps:
L(xs,xt)=Lseg(xs,ys)+μ1LGini(xt) (10)
wherein mu1Is a hyper-parameter.
4. Semantic segmentation network GstLoss function of
Semantic segmentation network GstThe loss of (2) includes a source domain partitioning loss, and the uncertainty of the target domain prediction loses a pseudo-label partitioning loss of the target domain.
The pseudo label calculation of the target domain adopts the traditional cross entropy loss function calculation, and the corresponding segmentation loss is calculated based on the predictions output by Conv _5 and Conv _4 respectively
Figure BDA0002991838250000083
And
Figure BDA0002991838250000084
the sum of these two segmentation losses is the target domain segmentation loss
Figure BDA0002991838250000085
Figure BDA0002991838250000086
Figure BDA0002991838250000087
Figure BDA0002991838250000088
Wherein x ist∈RH×W×3Is a target domain RGB image with resolution of H multiplied by W;
Figure BDA0002991838250000089
is the target field image xtC is the number of classes;
Figure BDA00029918382500000810
is a semantically segmented network GstConv _5 of (2) output target area image xtThe predicted partition map of (1);
Figure BDA00029918382500000811
is a semantically segmented network GstConv _4 output target area image xsThe predicted partition map of (1); beta is a2Is a hyper-parameter.
Semantic segmentation network GstTotal loss of L (x)s,xt) Comprises the following steps:
Figure BDA00029918382500000812
wherein mu2And mu3Is a hyper-parameter.
Examples
1. Experimental data set
The method provided by the invention performs experiments on a common unsupervised self-adaptive data set GTA5-Cityscapes, wherein a synthetic data set GTA5 is used as a source domain, and a real data set Cityscapes is used as a target domain. Models were evaluated on the cityscaps validation set.
GTA 5: the composite data set GTA5 contains 24966 composite images with a resolution of 1914 × 1052 and a corresponding ground-truth. These composite images are collected from a city wind-light video game based on los angeles city. The automatically generated ground-truth contains 33 classes. The method of performing experiments on GTA5-Cityscapes generally only considers 19 classes compatible with the Cityscapes dataset, and the present invention is no exception.
Cityscaps: as a dataset collected from the real world, cityscaps provides 3975 images with fine segmentation annotations. The training set contained 2975 images and the validation set contained 500 images.
2. Evaluation index of experiment
The present invention uses an Intersection-over-Union (IoU) to evaluate the performance of semantic segmentation. IoU values are between [0, 1], the larger the value is, the better the segmentation effect is, IoU is defined as follows:
IoU=TP/(TP+FP+FN)
where TP, FP and FN are the number of true positive (true positive), false positive (false positive) and false negative (false negative) pixels, respectively. The mlou in tables 3 and 4 is the mean IoU of class 19.
3. Network training
The unsupervised domain adaptive network batch size is 2, the resolution of the source domain input image is 1280 × 720, the resolution of the target domain input image is 1024 × 512, and resize to 321 × 321 is required respectively. During training, the label is reduced by 8 times, and the output prediction graph of the network is subjected to loss calculation; the output prediction graph of the network is enlarged by 8 times and compared with label during testing. Beta is a1、β3Set to 0.1; beta is a2Set to 0.2, mu1、μ2、μ3Set to 0.01. Semantic segmentation network GteAnd semantic segmentation network GstThe encoders of ResNet-101 are pre-trained based on ImageNet. We use SGD optimizationTraining apparatus GstAnd GstThe initial learning rate is 2.5 × 10-4
4. Results of the experiment
The invention performs experiments on a common unsupervised adaptive data set GTA 5-Cityscapes. Table 3 and table 4 show that the invention uses the self-supervision method guided by the kini index on the MinEnt method (network in which Vu et al uses entropy as a measure to directly minimize) and the AdvEnt method (network in which Vu et al uses entropy as a measure to fight) respectively, and it can be seen that mlou is significantly improved when the invention adds the target domain pseudo tag on the basis of the common method, and the semantic segmentation effect is improved compared with the common SSL and ESL pseudo tag self-training methods.
Table 1: encoder structure parameters
Figure BDA0002991838250000101
Table 2: decoder structure parameter
Figure BDA0002991838250000102
Table 3 improved experimental result comparison
Figure BDA0002991838250000103
Table 4 improved experimental results comparison
Figure BDA0002991838250000104
Figure BDA0002991838250000111

Claims (2)

1. A self-training-based semantic segmentation method guided by a Gini index is characterized by comprising the following steps of: using the synthetic data set as a source domain and the real data set as a target domain; during training, inputting a source domain and a target domain image into an inter-domain adaptive network for training, and after the training is finished, dividing the target domain image and inputting the divided target domain image into the intra-domain adaptive network for training to obtain an optimal segmentation result;
the method comprises the following specific steps:
step (1), a RGB image is taken from a source domain data set and a target domain data set at random respectively as a batch input semantic segmentation network Gte
Step (2), calculating cross entropy loss of the source domain image based on the output prediction graph and the group treth of the last two layers of the network, and carrying out weighted summation on the loss of the last two layers of the source domain;
step (3), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (4), summing the weighting loss calculated in the step (2) and the weighting loss calculated in the step (3), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing training of the batch data;
step (5), returning to step 1 to continue to select new batch data, repeating steps 1 to 5 until 2000 batches are trained and storing the trained models;
step (6), repeating the steps (1) to (5) until 120000 batch data are trained, namely, 60 models are saved;
step (7), testing 60 stored models in a target domain verification set, calculating output prediction of images of a target domain training set by using the model with the best accuracy, calculating a kini index corresponding to the output prediction, and endowing the images of the target domain training set with a pseudo label based on the kini index;
step (8), respectively taking an RGB image as a batch as input randomly from the source domain data set and the target domain data set, and training a semantic segmentation network Gst
Step (9), calculating cross entropy loss of the source domain image based on the output prediction graph and the group prediction of the last two layers, and performing weighted summation on the loss of the last two layers of the source domain;
step (10), calculating cross entropy losses of the output prediction images of the last two layers of the target domain image and the pseudo label respectively, and performing weighted summation on the losses of the last two layers of the target domain image;
step (11), respectively calculating the Gini index and the uncertainty loss of the output prediction images of the last two layers of the target domain images, and carrying out weighted summation on the losses of the last two layers of the target domain images;
step (12), summing the weighting losses of the step (9), the step (10) and the step (11), and iterating until the loss of the model is smaller than a certain threshold value by using an error back propagation optimization model, and finishing the training of the batch data;
step (14), the step 8 is returned to continue to select new batch data, and the steps 8 to 12 are repeated until 2000 batches are trained and the trained models are stored, wherein 120000 batches are trained in total, namely 60 models are stored;
and (15) testing the 60 stored models in the target domain test set during testing to obtain a final segmentation result.
2. The method of claim 1, wherein the self-training semantic segmentation method based on the guiding of the kini index is characterized in that: the built model is an unsupervised field self-adaptive network, and the whole network structure comprises two sub-networks GteAnd Gst(ii) a First training network Gte(ii) a Then according to network GteSelecting a pseudo label of the target domain image for the uncertainty measurement result of the target domain prediction, and training a semantic segmentation network G after giving the pseudo label to the corresponding target domain imagestAnd the prediction accuracy of the target domain image is improved by adding effective supervision information.
CN202110318561.6A 2021-03-25 2021-03-25 Semantic segmentation method guided by base index and based on self-training Active CN113095328B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110318561.6A CN113095328B (en) 2021-03-25 2021-03-25 Semantic segmentation method guided by base index and based on self-training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110318561.6A CN113095328B (en) 2021-03-25 2021-03-25 Semantic segmentation method guided by base index and based on self-training

Publications (2)

Publication Number Publication Date
CN113095328A true CN113095328A (en) 2021-07-09
CN113095328B CN113095328B (en) 2024-08-23

Family

ID=76669614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110318561.6A Active CN113095328B (en) 2021-03-25 2021-03-25 Semantic segmentation method guided by base index and based on self-training

Country Status (1)

Country Link
CN (1) CN113095328B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445413A (en) * 2022-04-07 2022-05-06 宁波康达凯能医疗科技有限公司 Inter-frame image semantic segmentation method and system based on domain self-adaptation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222690A (en) * 2019-04-29 2019-09-10 浙江大学 A kind of unsupervised domain adaptation semantic segmentation method multiplying loss based on maximum two
CN110322446A (en) * 2019-07-01 2019-10-11 华中科技大学 A kind of domain adaptive semantic dividing method based on similarity space alignment
CN112116593A (en) * 2020-08-06 2020-12-22 北京工业大学 Domain self-adaptive semantic segmentation method based on Gini index
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222690A (en) * 2019-04-29 2019-09-10 浙江大学 A kind of unsupervised domain adaptation semantic segmentation method multiplying loss based on maximum two
CN110322446A (en) * 2019-07-01 2019-10-11 华中科技大学 A kind of domain adaptive semantic dividing method based on similarity space alignment
CN112116593A (en) * 2020-08-06 2020-12-22 北京工业大学 Domain self-adaptive semantic segmentation method based on Gini index
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445413A (en) * 2022-04-07 2022-05-06 宁波康达凯能医疗科技有限公司 Inter-frame image semantic segmentation method and system based on domain self-adaptation

Also Published As

Publication number Publication date
CN113095328B (en) 2024-08-23

Similar Documents

Publication Publication Date Title
CN110930397B (en) Magnetic resonance image segmentation method and device, terminal equipment and storage medium
CN106228185B (en) A kind of general image classifying and identifying system neural network based and method
CN112685597B (en) Weak supervision video clip retrieval method and system based on erasure mechanism
CN110852273A (en) Behavior identification method based on reinforcement learning attention mechanism
CN111462191B (en) Non-local filter unsupervised optical flow estimation method based on deep learning
CN115147598B (en) Target detection segmentation method and device, intelligent terminal and storage medium
CN112634296A (en) RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN112116593A (en) Domain self-adaptive semantic segmentation method based on Gini index
CN111144483A (en) Image feature point filtering method and terminal
CN113095254B (en) Method and system for positioning key points of human body part
CN114692732B (en) Method, system, device and storage medium for updating online label
CN115222998B (en) Image classification method
CN112801104A (en) Image pixel level pseudo label determination method and system based on semantic segmentation
CN114140469A (en) Depth hierarchical image semantic segmentation method based on multilayer attention
CN114266894A (en) Image segmentation method and device, electronic equipment and storage medium
Xu et al. AutoSegNet: An automated neural network for image segmentation
CN115546171A (en) Shadow detection method and device based on attention shadow boundary and feature correction
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
CN113947022B (en) Near-end strategy optimization method based on model
CN113095328A (en) Self-training-based semantic segmentation method guided by Gini index
CN117671261A (en) Passive domain noise perception domain self-adaptive segmentation method for remote sensing image
CN117437423A (en) Weak supervision medical image segmentation method and device based on SAM collaborative learning and cross-layer feature aggregation enhancement
CN111931841A (en) Deep learning-based tree processing method, terminal, chip and storage medium
TWI781000B (en) Machine learning device and method
CN115424012A (en) Lightweight image semantic segmentation method based on context information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant