CN114463543A

CN114463543A - Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion

Info

Publication number: CN114463543A
Application number: CN202210087653.2A
Authority: CN
Inventors: 缪佩翰; 励雪巍; 李玺
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-10

Abstract

The invention discloses a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion. The method specifically comprises the following steps: acquiring a target domain data set and a network domain data set; training on the existing weak supervision semantic segmentation network training framework by using a target domain data set to obtain a target domain segmentation network; the target domain segmentation network deduces the network domain data set to obtain a pixel point set of each network image segmentation result, and performs data cleaning to obtain a single-class label network data set and a multi-class label network data set; training on a weak supervision semantic segmentation network training framework by using the obtained data set to obtain a network domain segmentation network; the method comprises the steps that a target domain segmentation network, a network domain segmentation network and an existing saliency target detection network infer a target domain data set to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map to carry out cascade decision to obtain a final pixel level mask; and (5) performing supervised training by using the final pixel level mask to obtain a final segmentation network.

Description

Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion

Technical Field

The invention relates to the field of computer vision, in particular to a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion.

Background

Semantic segmentation is one of the key problems in the field of computer vision today, and serves as a high-level task to pave the way for realizing complete understanding of scenes. The semantic segmentation technology is widely applied to the fields of unmanned vehicle driving, medical image analysis, geographic information systems and the like, and has an important role in promoting social development. In recent years, a semantic segmentation method based on supervised learning has achieved remarkable effects, but a great deal of cost is required for constructing pixel-level labels required for training. In order to obtain high-performance semantic segmentation and reduce the sample construction cost, researchers explore a weak supervision semantic segmentation method based on network images under class label supervision.

In the related method, the existing image with the accurate class label is used as target domain data, and the network image obtained by searching in a search engine by using the keyword is used as network domain data. The network images are easy to obtain in a large quantity, can be obtained by searching in a search engine through keywords, and convert the search keywords into the class labels. But this easily introduces two types of noise images: a label error image where the class label is completely inconsistent with the image content, and a label inaccurate image where the class label is consistent with only a portion of the content. If the noise interference of the network image cannot be reduced, the reliability of the network domain knowledge is improved, and the weak supervision semantic segmentation performance is influenced. In the existing method, external technologies such as significance detection, collaborative segmentation and Grabcut are used to reduce the noise influence of a network domain and assist in acquiring network domain knowledge. However, the existing method does not significantly utilize the knowledge of the target domain, so that the knowledge reliability of the existing method cannot help the weak supervision semantic segmentation training system to greatly improve the semantic segmentation performance.

In addition, the target domain segmentation network is obtained by training a target domain image with an accurate class label, and the pixel-level mask foreground feature segmentation effect generated by inference on a target domain data set is good. However, in the weak supervision semantic segmentation network training process, the class labels serving as supervision information only indicate classes existing in the image, no spatial information related to any class is provided, and the target domain segmentation network has the characteristic that the target domain segmentation network cannot be completely segmented due to the fact that the target domain segmentation network is limited by the inherent context prior of the data set. The semantic information of the network domain is rich but noise exists, and the inference result of the network domain segmentation network obtained by training on the target domain data set can be used as important foreground feature segmentation supplementary information. And the context inferred by the salient object detection on the object domain data set can be used as important context characteristic information. How to effectively perform decision fusion according to pixel level results generated by different network inference and realize robust knowledge migration related to reliability to improve the final semantic segmentation performance becomes a challenge.

Disclosure of Invention

Aiming at the problems, the invention provides a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion. The technical scheme adopted by the invention is as follows:

a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion comprises the following steps:

s1, acquiring a target domain data set and a network domain data set;

s2, training on a weak supervision semantic segmentation network training framework by using a target domain data set to obtain a target domain segmentation network;

s3, deducing the network domain data set by using the target domain segmentation network to obtain a pixel point set of each network image segmentation result, and performing data cleaning to obtain a single-class label network data set and a multi-class label network data set;

s4, training the single-class label network data set and the multi-class label network data set on a weak supervision semantic segmentation network training framework to obtain a network domain segmentation network;

s5, deducing a target domain data set by using a target domain segmentation network, a network domain segmentation network and a saliency target detection network respectively to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map respectively, and obtaining a final pixel level mask by performing cascade decision on the target domain pixel level mask, the network domain pixel level mask and the saliency map;

and S6, performing supervised training on the semantic segmentation network by using the final pixel level mask to obtain a final segmentation network, and segmenting the target image by using the final segmentation network.

Preferably, in S1, the target domain data set I^tA network domain data set I comprising a plurality of training images and manually labeled class labels corresponding to the training images^wComprises a plurality of images searched on a search engine and corresponding category labels formed by converting search keywords.

Preferably, in S2, the network structure in the weakly supervised semantic segmentation network training framework includes two parts, namely SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and then the affinity net outputs the pixel level mask by using the class activation map as input; utilizing a target domain dataset I^tAfter training on the weak supervision semantic segmentation network training framework is finished, a target domain segmentation network F is obtained^t。

Preferably, the S3 includes the following substeps:

s31 network F partitioned by target domain^tFor network domain data set I^wDeducing to obtain a pixel point set phi of each image segmentation result;

s32, each network domain image X_iDefinition of

And

wherein phi_iIs the set of pixel points of the ith image segmentation result, | phi_iI denotes phi_iTotal number of pixels in, p_kIs phi_iSemantic labels of the middle pixel points k; χ (a) represents a result of determination on conditional expression a, and χ (a) ═ 1 when conditional expression a is satisfied, and χ (a) ═ 0 otherwise; bg represents a background pixel; lambda [ alpha ]_iRepresents a single class l_iAccount for image X_iRatio, μ_iRepresenting other foreground classes in image X_iA ratio;

s33, network domain data set I^wPerforming a preliminary cleaning to traverse each network image X_iIf the network image X_iSatisfies delta₁≤λ_i≤δ₂Then is reserved, where₁And delta₂For controlling the individual classes l for the threshold_iAccount for image X_iRatio, otherwise directly determined as a noisy image from the network domain data set I^wRemoving;

s34, performing single-class cleaning on the network images reserved in the step S33, and traversing each network image X reserved in the single-class cleaning_iIf the network image X_iSatisfies mu_iIf not, adding the single type label network data set, otherwise, not adding the single type label network data set;

s35, cleaning the network images reserved in the step S33 in multiple types, and traversing each network image X reserved in the step S33_iIf the network image X_iSatisfies mu_i≥δ₃Then will phi_iAll class labels appearing in as network image X_iClass label of (2), and network image X with class label_iJoining multiple classes of tagged network datasets, where δ₃Thresholding other foreground classes to image X_iAnd proportion, otherwise, not adding the multi-class label network data set.

Preferably, in S4, the network structure in the weakly supervised semantic segmentation network training framework includes two parts, namely SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and then the affinity net outputs the pixel level mask by using the class activation map as input; using the single-class label network data set and the multi-class label network data set as training data sets, and obtaining a network domain segmentation network F after training on the weak supervision semantic segmentation network training framework^w。

Preferably, the S5 includes the following substeps:

s51 network F partitioned by target domain^tNetwork domain split network F^wAnd the saliency target detection network respectively aim at the target domain dataSet I^tDeducing to respectively obtain pixel level masks M^tPixel level mask M^wAnd significance map M^b；

S52, initializing pixel level mask M without semantic label^fIs 255 for storing the cascaded decision result, and a pixel level mask M^fIs the same as the resolution of the input image;

s53 traversing the pixel level mask M^tAll the pixel points p in₁If the pixel point p₁Semantic tag of as a foreground class l₁Then the foreground class l₁Imparting a pixel-level mask M^fMiddle and pixel point p₁The corresponding pixel points are used as semantic labels of the pixel points;

s54 traversing the pixel level mask M^wAll the pixel points p in₂If the pixel point p₂Semantic tag of as a foreground class l₂And a pixel level mask M^fMiddle and pixel point p₂If the corresponding pixel point is not endowed with the semantic label, the foreground class l is classified₂Imparting a pixel-level mask M^fMiddle and pixel point p₂The corresponding pixel points are used as semantic labels of the pixel points;

s55, traversing the saliency map M^bAll the pixel points p in₃If the pixel point p₃Background and pixel level mask M^fMiddle and pixel point p₃If the corresponding pixel point is endowed with the semantic label, the pixel level mask M is used^fMiddle and pixel point p₃Reassigning the corresponding pixel point to 255 and using the assigned pixel point as a first type pixel point; if the pixel point p₃Is foreground and pixel level mask M^fMiddle and pixel point p₃If the pixel value of the corresponding pixel point is 255, the pixel level mask M is kept^fMiddle and pixel point p₃The corresponding pixel point has a pixel value of 255 and is used as a second type pixel point; for pixel level mask M^fThe other pixels with the pixel value of 255 except the first-class pixels and the second-class pixels are uniformly used as the background, and the pixel value is assigned to be 0, so that the final pixel-level mask M is obtained^f。

Preferably, the S6 includes the following substeps:

s61 target domain based data set I^tAnd training the semantic segmentation network by using a cross entropy loss function and a multi-label classification loss function for the final pixel level mask of each image, wherein the pixels with the pixel value of 255 in the final pixel level mask do not participate in training, and obtaining a final segmentation network F after the training is finished^f；

And S62, inputting the target image into the final segmentation network to segment the target image to obtain a semantic segmentation result.

Preferably, the semantic segmentation network supervised training by using the final pixel level mask in S6 is a semantic segmentation network based on ResNet 38.

The invention realizes a weak supervision semantic segmentation method based on double-domain cascade decision and interactive annotation self-promotion. The invention realizes the robust knowledge transfer by using the network image and the saliency detection result under the class label supervision, improves the final segmentation performance, realizes the high-performance semantic segmentation and reduces the sample construction cost. The double-domain interactive annotation self-promotion reduces the noise of a network domain data set from the data perspective, improves the image annotation quality, enriches the semantic information of the network domain, and promotes the reliability of network domain knowledge. The knowledge migration of the cascade decision realizes the knowledge migration at the pixel level from the knowledge migration angle, the decision process is related to the reliability of the knowledge, and the advantages of different domain knowledge are effectively utilized to realize the robust knowledge migration.

Drawings

FIG. 1 is a schematic diagram of basic steps of a weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion in the present invention.

FIG. 2 is a schematic flow chart of a weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.

Referring to fig. 1, in a preferred embodiment of the present invention, a weak supervised semantic segmentation method based on cascade decision and interactive annotation self-boosting is provided, which is used for training to obtain a high performance weak supervised semantic segmentation network, and a flow of the weak supervised semantic segmentation method is shown in fig. 2, and specifically includes the following steps:

and S1, acquiring the target domain data set and the network domain data set.

In the present embodiment, in the above step S1, the target domain data set I^tA network domain data set I comprising a plurality of training images and manually labeled class labels corresponding to the training images^wComprises a plurality of images searched on a search engine and corresponding category labels formed by converting search keywords. The specific search engine is not limited as long as the corresponding image can be searched by the keyword, and each keyword can be used as a category label of each image searched by the keyword. The construction of the network domain data set can be realized by a crawler technology.

And S2, training on a weak supervision semantic segmentation network training framework by using the target domain data set to obtain a target domain segmentation network.

It should be noted that the weakly supervised semantic segmentation network training framework in the present invention may be any existing public method as long as it can be used for constructing the weakly supervised semantic segmentation network.

In this embodiment, in step S2, the network structure in the weakly supervised semantic segmentation network training framework is SSENet + affinity net, which includes two parts, SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and the affinity net inputs the class activation map by using the class activation map as inputOutputting a pixel level mask; utilizing a target domain dataset I^tAfter training on the weak supervision semantic segmentation network training framework is finished, a target domain segmentation network F is obtained^t。

And S3, deducing the network domain data set by using the target domain segmentation network to obtain a pixel point set of each network image segmentation result, and performing data cleaning to obtain a single-class label network data set and a multi-class label network data set.

In this embodiment, the step S3 includes the following sub-steps:

s31 network F partitioned by target domain^tFor network domain data set I^wDeducing to obtain a pixel point set phi of each image segmentation result, wherein the pixel point set of the first image segmentation result can be recorded as phi_i；

S32, each network domain image X_iDefinition of

And

wherein phi_iIs the set of pixel points of the ith image segmentation result, | phi_iI denotes phi_iTotal number of pixels in, p_kIs phi_iSemantic labels of the middle pixel points k; χ (a) represents a determination result on a conditional expression a (a is a general formula for reference, and may be replaced by a conditional expression in each formula), and χ (a) is 1 when the conditional expression a is satisfied, otherwise χ (a) is 0; bg represents a background pixel; lambda [ alpha ]_iRepresents a single class l_iAccount for image X_iRatio, μ_iRepresenting other foreground classes in image X_iA ratio;

s33, network domain data set I^wPerforming a preliminary cleaning to traverse each network image X_iIf the network image X_iSatisfies delta₁≤λ_i≤x₂Then is reserved, where₁And delta₂For controlling the individual classes l for the threshold_iAccount for image X_iProportion, otherwise it is directly judged as noiseImage from network domain data set I^wRemoving;

s34, aiming at the network domain data set I reserved in the step S33^wThe network image in (1) is subjected to single-class cleaning, and each network image X retained in the network image is traversed_iIf the network image X_iSatisfies mu_iIf not, adding the single type label network data set, otherwise, not adding the single type label network data set;

s35, aiming at the network domain data set I reserved in the step S33^wThe network images in (1) are subjected to multi-class cleaning, and each network image X retained in the network images is traversed_iIf the network image X_iSatisfies mu_i≥δ₃Then will phi_iAll class labels appearing in as network image X_iClass label of (2), and the network image X with the class label_iJoining multiple classes of tagged network datasets, where δ₃Thresholding other foreground classes to image X_iAnd proportion, otherwise, not adding the multi-class label network data set.

And S4, training the single-class label network data set and the multi-class label network data set on a weak supervision semantic segmentation network training framework to obtain a network domain segmentation network.

In this embodiment, in step S4, the network structure in the weak supervised semantic segmentation network training framework is SSENet + affinity net, which includes two parts, SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and then the affinity net outputs the pixel level mask by using the class activation map as input; using the single-class label network data set and the multi-class label network data set as training data sets, and obtaining a network domain segmentation network F after training on the weak supervision semantic segmentation network training framework^w。

In the present invention, although the weakly supervised semantic segmentation network training frameworks adopted in the above steps S2 and S4 are the same, in other embodiments, the training frameworks may actually be realized by adopting different frameworks, as long as the semantic segmentation function can be realized. The loss function adopted in the weak supervision semantic segmentation network training framework can be adjusted according to actual needs so as to meet the requirement of model training precision.

And S5, deducing the target domain data set by using the target domain segmentation network, the network domain segmentation network and the saliency target detection network respectively to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map respectively, and obtaining a final pixel level mask by performing cascade decision on the target domain pixel level mask, the network domain pixel level mask and the saliency map.

In this embodiment, the step S5 includes the following sub-steps:

s51 network F partitioned by target domain^tNetwork domain split network F^wAnd the saliency target detection network respectively to the target domain data set I^tDeducing to respectively obtain pixel level masks M^tPixel level mask M^wAnd significance map M^b。

S52, initializing a pixel level mask M without assigned semantic labels^fFor storing the result of the cascaded decision, the pixel level mask M^fAll pixel values are initialized to 255 and the pixel level mask M^fIs the same as the resolution of the input image.

Note that, the pixel level mask M^fPixel level mask M^tPixel level mask M^wAnd significance map M^bThe resolution of the image, namely the size of the image, is consistent with that of the input image, and each pixel point of the input image is at M^t、M^w、M^f、M^bThere is a corresponding pixel point with the same position. The pixel values in each mask and saliency map reflect the semantic label of the corresponding pixel, i.e. the classification category. Subsequent steps may be based on M^t、M^w、M^bRecording the final semantic label in M through cascade decision^fIn (B) is M^fIs assigned to each pixel.

S53 traversing the pixel level mask M^tAll the pixel points p in₁If the pixel point p₁Semantic tag of as a foreground class l₁Then the foreground class l₁Imparting a pixel-level mask M^fMiddle and pixel point p₁And taking the corresponding pixel point as a semantic label of the pixel point.

S54 traversing the pixel level mask M^wAll the pixel points p in₂If the pixel point p₂Semantic tag of as a foreground class l₂And a pixel level mask M^fMiddle and pixel point p₂If the corresponding pixel point is not endowed with the semantic label, the foreground class l is classified₂Imparting a pixel-level mask M^fNeutral pixel p₂And taking the corresponding pixel point as a semantic label of the pixel point.

Thus, assuming that the number of foreground classes is N and the class labels thereof are respectively represented by 1-N, the final pixel level mask M is obtained^fIn the method, three types of labels exist, the first type is a background with a pixel value of 0, the second type is a foreground with a pixel value of any value from 1 to N, and the third type is a pixel with a pixel value of 255. Wherein the third type of pixels does not participate in the calculation of the loss function when used for the subsequent training of S6, because the partial pixels exist in different inference results with large probabilityAnd contradiction, the semantic segmentation network does not participate in supervised training, and the accuracy of the network is improved.

In this embodiment, the step S6 includes the following sub-steps:

It should be noted that the semantic segmentation network supervised training by using the final pixel level mask in the present invention may be any existing public method as long as it can be used for constructing the semantic segmentation network.

In this embodiment, the semantic segmentation network trained using final pixel level mask supervision may be a ResNet 38-based semantic segmentation network.

The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-lifting shown in the above steps S1-S6 is applied to a specific example to show the specific technical effect.

Examples

The implementation method of this embodiment is as described in the foregoing steps S1-S6, and specific steps are not described in detail, and the effect is shown only for case data.

The invention is implemented on a target domain data set and a network domain data set:

target domain data set: this example uses the PASCAL VOC 2012 training set as the target domain data set, which contains 10582 images in total of 21 classes. And the final segmented network was tested for performance on a validation set containing 1449 images and a test set containing 1456 images.

Network domain data set: in the embodiment, the public network image is used as a network domain data set, the image is obtained by searching the keyword on a search engine, the search keyword is converted into a class label, the class of the class label is consistent with the target domain, the width and the height of the image are not more than 500, and the class label only comprises a single class. In this example, δ is set₁＝0.2、δ₂The network domain data set is initially cleaned up to 0.8. The number of images in each specific category in the single-category cleaning is set to be not more than 500, so that a single-category label network data set containing 9742 images is obtained. Setting delta₃0.05, and thus a multi-class label network data set containing 5415 images is obtained.

In this embodiment, an existing weak supervised semantic segmentation network training framework (SSENet + affinity net) is used as a basis, and the performance of a segmentation network measured on a PASCAL VOC 2012 verification set and a test set by training on the PASCAL VOC 2012 training set is 63.5% mlio u and 64.2% mlio u, respectively. mlou is the average intersection ratio that computes the ratio of the correctly predicted positive samples (the intersection of the two sets of true and predicted values) to the sum (union) of the correctly predicted positive samples, the incorrectly predicted positive samples, and the incorrectly predicted negative samples. The semantic segmentation network based on ResNet38 is trained by using a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion, the performances of the obtained final segmentation network measured on a PASCAL VOC 2012 verification set and a test set are respectively 68.2% mIoU and 69.6% mIoU, and the performances are obviously promoted.

Therefore, through the technical scheme, the weak supervision semantic segmentation method based on the cascade decision and interactive annotation self-promotion is developed based on the deep learning technology. The invention realizes the robust knowledge transfer by using the network image and the saliency detection result under the class label supervision, improves the final segmentation performance, realizes the high-performance semantic segmentation and reduces the sample construction cost. The double-domain interactive annotation self-promotion reduces the noise of a network domain data set from the data perspective, improves the image annotation quality, enriches the semantic information of the network domain, and promotes the reliability of network domain knowledge. The knowledge migration of the cascade decision realizes the knowledge migration at the pixel level from the knowledge migration angle, the decision process is related to the reliability of the knowledge, and the advantages of different domain knowledge are effectively utilized to realize the robust knowledge migration.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. A weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion is characterized by comprising the following steps:

s1, acquiring a target domain data set and a network domain data set;

s3, deducing the network domain data set by using the target domain segmentation network to obtain a pixel point set of each network image segmentation result, and performing data cleaning to obtain a single-type label network data set and a multi-type label network data set;

s4, training on a weak supervision semantic segmentation network training framework by utilizing the single-class label network data set and the multi-class label network data set to obtain a network domain segmentation network;

2. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein in the step S1Target domain data set I^tA network domain data set I comprising a plurality of training images and manually labeled class labels corresponding to the training images^wComprises a plurality of images searched on a search engine and corresponding category labels formed by converting search keywords.

3. The method for weakly supervised semantic segmentation based on cascade decision and interactive annotation self-boosting as claimed in claim 1, wherein in S2, the network structure in the training framework of the weakly supervised semantic segmentation network includes two parts, namely SSENet and affinity net, wherein a class activation map is output by the SSENet using the training image as input, and then a pixel level mask is output by the affinity net using the class activation map as input; utilizing a target domain dataset I^tAfter training on the weak supervision semantic segmentation network training framework is finished, a target domain segmentation network F is obtained^t。

4. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the S3 includes the following sub-steps:

s32, each network domain image X_iDefinition of

And

5. The method for weakly supervised semantic segmentation based on cascade decision and interactive annotation self-boosting as claimed in claim 1, wherein in S4, the network structure in the training framework of the weakly supervised semantic segmentation network includes two parts, namely SSENet and affinity net, wherein a class activation map is output by the SSENet using the training image as input, and then a pixel level mask is output by the affinity net using the class activation map as input; using the single-class label network data set and the multi-class label network data set as training data sets, and obtaining a network domain segmentation network F after training on the weak supervision semantic segmentation network training framework^w。

6. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the S5 includes the following sub-steps:

s51 network F partitioned by target domain^tNetwork domain split network F^wAnd the saliency target detection network respectively to the target domain data set I^tDeducing to respectively obtain pixel level masks M^tPixel level mask M^wAnd significance map M^b；

s53 traversing the pixel level mask M^tAll the pixel points p in₁If the pixel point p₁Semantic tag of as a foreground class l₁Then the foreground class l₁Imparting a pixel-level mask M^fNeutral pixel p₁The corresponding pixel points are used as semantic labels of the pixel points;

s54 traversing the pixel level mask M^wAll pixel points p in₂If the pixel point p₂Semantic tag of as a foreground class l₂And a pixel level mask M^fMiddle and pixel point p₂If the corresponding pixel point is not endowed with the semantic label, the foreground class l is classified₂Imparting a pixel-level mask M^fMiddle and pixel point p₂The corresponding pixel points are used as semantic labels of the pixel points;

s55, traversing the saliency map M^bAll pixel points p in₃If the pixel point p₃Background and pixel level mask M^fMiddle and pixel point p₃If the corresponding pixel point is endowed with the semantic label, the pixel level mask M is used^fMiddle and pixel point p₃Reassigning the corresponding pixel point to 255 and using the assigned pixel point as a first type pixel point; if the pixel point p₃Is foreground and pixel level mask M^fMiddle and pixel point p₃If the pixel value of the corresponding pixel point is 255, the pixel level mask M is kept^fNeutralization ofPixel point p₃The corresponding pixel point has a pixel value of 255 and is used as a second type pixel point; for pixel level mask M^fThe other pixels with the pixel value of 255 except the first-class pixels and the second-class pixels are uniformly used as the background, and the pixel value is assigned to be 0, so that the final pixel-level mask M is obtained^f。

7. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the S6 includes the following sub-steps:

s61 target domain based data set I^tAnd training the semantic segmentation network by using a cross entropy loss function and a multi-label classification loss function for the final pixel level mask of each image, wherein pixel points with the pixel value of 255 in the final pixel level mask do not participate in training, and obtaining a final segmentation network F after the training is finished^f；

8. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the semantic segmentation network supervised and trained by using a final pixel level mask in S6 is a semantic segmentation network based on ResNet 38.