CN114463543A - Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion - Google Patents

Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion Download PDF

Info

Publication number
CN114463543A
CN114463543A CN202210087653.2A CN202210087653A CN114463543A CN 114463543 A CN114463543 A CN 114463543A CN 202210087653 A CN202210087653 A CN 202210087653A CN 114463543 A CN114463543 A CN 114463543A
Authority
CN
China
Prior art keywords
network
pixel
data set
image
level mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210087653.2A
Other languages
Chinese (zh)
Inventor
缪佩翰
励雪巍
李玺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210087653.2A priority Critical patent/CN114463543A/en
Publication of CN114463543A publication Critical patent/CN114463543A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion. The method specifically comprises the following steps: acquiring a target domain data set and a network domain data set; training on the existing weak supervision semantic segmentation network training framework by using a target domain data set to obtain a target domain segmentation network; the target domain segmentation network deduces the network domain data set to obtain a pixel point set of each network image segmentation result, and performs data cleaning to obtain a single-class label network data set and a multi-class label network data set; training on a weak supervision semantic segmentation network training framework by using the obtained data set to obtain a network domain segmentation network; the method comprises the steps that a target domain segmentation network, a network domain segmentation network and an existing saliency target detection network infer a target domain data set to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map to carry out cascade decision to obtain a final pixel level mask; and (5) performing supervised training by using the final pixel level mask to obtain a final segmentation network.

Description

Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion
Technical Field
The invention relates to the field of computer vision, in particular to a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion.
Background
Semantic segmentation is one of the key problems in the field of computer vision today, and serves as a high-level task to pave the way for realizing complete understanding of scenes. The semantic segmentation technology is widely applied to the fields of unmanned vehicle driving, medical image analysis, geographic information systems and the like, and has an important role in promoting social development. In recent years, a semantic segmentation method based on supervised learning has achieved remarkable effects, but a great deal of cost is required for constructing pixel-level labels required for training. In order to obtain high-performance semantic segmentation and reduce the sample construction cost, researchers explore a weak supervision semantic segmentation method based on network images under class label supervision.
In the related method, the existing image with the accurate class label is used as target domain data, and the network image obtained by searching in a search engine by using the keyword is used as network domain data. The network images are easy to obtain in a large quantity, can be obtained by searching in a search engine through keywords, and convert the search keywords into the class labels. But this easily introduces two types of noise images: a label error image where the class label is completely inconsistent with the image content, and a label inaccurate image where the class label is consistent with only a portion of the content. If the noise interference of the network image cannot be reduced, the reliability of the network domain knowledge is improved, and the weak supervision semantic segmentation performance is influenced. In the existing method, external technologies such as significance detection, collaborative segmentation and Grabcut are used to reduce the noise influence of a network domain and assist in acquiring network domain knowledge. However, the existing method does not significantly utilize the knowledge of the target domain, so that the knowledge reliability of the existing method cannot help the weak supervision semantic segmentation training system to greatly improve the semantic segmentation performance.
In addition, the target domain segmentation network is obtained by training a target domain image with an accurate class label, and the pixel-level mask foreground feature segmentation effect generated by inference on a target domain data set is good. However, in the weak supervision semantic segmentation network training process, the class labels serving as supervision information only indicate classes existing in the image, no spatial information related to any class is provided, and the target domain segmentation network has the characteristic that the target domain segmentation network cannot be completely segmented due to the fact that the target domain segmentation network is limited by the inherent context prior of the data set. The semantic information of the network domain is rich but noise exists, and the inference result of the network domain segmentation network obtained by training on the target domain data set can be used as important foreground feature segmentation supplementary information. And the context inferred by the salient object detection on the object domain data set can be used as important context characteristic information. How to effectively perform decision fusion according to pixel level results generated by different network inference and realize robust knowledge migration related to reliability to improve the final semantic segmentation performance becomes a challenge.
Disclosure of Invention
Aiming at the problems, the invention provides a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion. The technical scheme adopted by the invention is as follows:
a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion comprises the following steps:
s1, acquiring a target domain data set and a network domain data set;
s2, training on a weak supervision semantic segmentation network training framework by using a target domain data set to obtain a target domain segmentation network;
s3, deducing the network domain data set by using the target domain segmentation network to obtain a pixel point set of each network image segmentation result, and performing data cleaning to obtain a single-class label network data set and a multi-class label network data set;
s4, training the single-class label network data set and the multi-class label network data set on a weak supervision semantic segmentation network training framework to obtain a network domain segmentation network;
s5, deducing a target domain data set by using a target domain segmentation network, a network domain segmentation network and a saliency target detection network respectively to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map respectively, and obtaining a final pixel level mask by performing cascade decision on the target domain pixel level mask, the network domain pixel level mask and the saliency map;
and S6, performing supervised training on the semantic segmentation network by using the final pixel level mask to obtain a final segmentation network, and segmenting the target image by using the final segmentation network.
Preferably, in S1, the target domain data set ItA network domain data set I comprising a plurality of training images and manually labeled class labels corresponding to the training imageswComprises a plurality of images searched on a search engine and corresponding category labels formed by converting search keywords.
Preferably, in S2, the network structure in the weakly supervised semantic segmentation network training framework includes two parts, namely SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and then the affinity net outputs the pixel level mask by using the class activation map as input; utilizing a target domain dataset ItAfter training on the weak supervision semantic segmentation network training framework is finished, a target domain segmentation network F is obtainedt
Preferably, the S3 includes the following substeps:
s31 network F partitioned by target domaintFor network domain data set IwDeducing to obtain a pixel point set phi of each image segmentation result;
s32, each network domain image XiDefinition of
Figure BDA0003487661190000031
And
Figure BDA0003487661190000032
wherein phiiIs the set of pixel points of the ith image segmentation result, | phiiI denotes phiiTotal number of pixels in, pkIs phiiSemantic labels of the middle pixel points k; χ (a) represents a result of determination on conditional expression a, and χ (a) ═ 1 when conditional expression a is satisfied, and χ (a) ═ 0 otherwise; bg represents a background pixel; lambda [ alpha ]iRepresents a single class liAccount for image XiRatio, μiRepresenting other foreground classes in image XiA ratio;
s33, network domain data set IwPerforming a preliminary cleaning to traverse each network image XiIf the network image XiSatisfies delta1≤λi≤δ2Then is reserved, where1And delta2For controlling the individual classes l for the thresholdiAccount for image XiRatio, otherwise directly determined as a noisy image from the network domain data set IwRemoving;
s34, performing single-class cleaning on the network images reserved in the step S33, and traversing each network image X reserved in the single-class cleaningiIf the network image XiSatisfies muiIf not, adding the single type label network data set, otherwise, not adding the single type label network data set;
s35, cleaning the network images reserved in the step S33 in multiple types, and traversing each network image X reserved in the step S33iIf the network image XiSatisfies mui≥δ3Then will phiiAll class labels appearing in as network image XiClass label of (2), and network image X with class labeliJoining multiple classes of tagged network datasets, where δ3Thresholding other foreground classes to image XiAnd proportion, otherwise, not adding the multi-class label network data set.
Preferably, in S4, the network structure in the weakly supervised semantic segmentation network training framework includes two parts, namely SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and then the affinity net outputs the pixel level mask by using the class activation map as input; using the single-class label network data set and the multi-class label network data set as training data sets, and obtaining a network domain segmentation network F after training on the weak supervision semantic segmentation network training frameworkw
Preferably, the S5 includes the following substeps:
s51 network F partitioned by target domaintNetwork domain split network FwAnd the saliency target detection network respectively aim at the target domain dataSet ItDeducing to respectively obtain pixel level masks MtPixel level mask MwAnd significance map Mb
S52, initializing pixel level mask M without semantic labelfIs 255 for storing the cascaded decision result, and a pixel level mask MfIs the same as the resolution of the input image;
s53 traversing the pixel level mask MtAll the pixel points p in1If the pixel point p1Semantic tag of as a foreground class l1Then the foreground class l1Imparting a pixel-level mask MfMiddle and pixel point p1The corresponding pixel points are used as semantic labels of the pixel points;
s54 traversing the pixel level mask MwAll the pixel points p in2If the pixel point p2Semantic tag of as a foreground class l2And a pixel level mask MfMiddle and pixel point p2If the corresponding pixel point is not endowed with the semantic label, the foreground class l is classified2Imparting a pixel-level mask MfMiddle and pixel point p2The corresponding pixel points are used as semantic labels of the pixel points;
s55, traversing the saliency map MbAll the pixel points p in3If the pixel point p3Background and pixel level mask MfMiddle and pixel point p3If the corresponding pixel point is endowed with the semantic label, the pixel level mask M is usedfMiddle and pixel point p3Reassigning the corresponding pixel point to 255 and using the assigned pixel point as a first type pixel point; if the pixel point p3Is foreground and pixel level mask MfMiddle and pixel point p3If the pixel value of the corresponding pixel point is 255, the pixel level mask M is keptfMiddle and pixel point p3The corresponding pixel point has a pixel value of 255 and is used as a second type pixel point; for pixel level mask MfThe other pixels with the pixel value of 255 except the first-class pixels and the second-class pixels are uniformly used as the background, and the pixel value is assigned to be 0, so that the final pixel-level mask M is obtainedf
Preferably, the S6 includes the following substeps:
s61 target domain based data set ItAnd training the semantic segmentation network by using a cross entropy loss function and a multi-label classification loss function for the final pixel level mask of each image, wherein the pixels with the pixel value of 255 in the final pixel level mask do not participate in training, and obtaining a final segmentation network F after the training is finishedf
And S62, inputting the target image into the final segmentation network to segment the target image to obtain a semantic segmentation result.
Preferably, the semantic segmentation network supervised training by using the final pixel level mask in S6 is a semantic segmentation network based on ResNet 38.
The invention realizes a weak supervision semantic segmentation method based on double-domain cascade decision and interactive annotation self-promotion. The invention realizes the robust knowledge transfer by using the network image and the saliency detection result under the class label supervision, improves the final segmentation performance, realizes the high-performance semantic segmentation and reduces the sample construction cost. The double-domain interactive annotation self-promotion reduces the noise of a network domain data set from the data perspective, improves the image annotation quality, enriches the semantic information of the network domain, and promotes the reliability of network domain knowledge. The knowledge migration of the cascade decision realizes the knowledge migration at the pixel level from the knowledge migration angle, the decision process is related to the reliability of the knowledge, and the advantages of different domain knowledge are effectively utilized to realize the robust knowledge migration.
Drawings
FIG. 1 is a schematic diagram of basic steps of a weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion in the present invention.
FIG. 2 is a schematic flow chart of a weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
Referring to fig. 1, in a preferred embodiment of the present invention, a weak supervised semantic segmentation method based on cascade decision and interactive annotation self-boosting is provided, which is used for training to obtain a high performance weak supervised semantic segmentation network, and a flow of the weak supervised semantic segmentation method is shown in fig. 2, and specifically includes the following steps:
and S1, acquiring the target domain data set and the network domain data set.
In the present embodiment, in the above step S1, the target domain data set ItA network domain data set I comprising a plurality of training images and manually labeled class labels corresponding to the training imageswComprises a plurality of images searched on a search engine and corresponding category labels formed by converting search keywords. The specific search engine is not limited as long as the corresponding image can be searched by the keyword, and each keyword can be used as a category label of each image searched by the keyword. The construction of the network domain data set can be realized by a crawler technology.
And S2, training on a weak supervision semantic segmentation network training framework by using the target domain data set to obtain a target domain segmentation network.
It should be noted that the weakly supervised semantic segmentation network training framework in the present invention may be any existing public method as long as it can be used for constructing the weakly supervised semantic segmentation network.
In this embodiment, in step S2, the network structure in the weakly supervised semantic segmentation network training framework is SSENet + affinity net, which includes two parts, SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and the affinity net inputs the class activation map by using the class activation map as inputOutputting a pixel level mask; utilizing a target domain dataset ItAfter training on the weak supervision semantic segmentation network training framework is finished, a target domain segmentation network F is obtainedt
And S3, deducing the network domain data set by using the target domain segmentation network to obtain a pixel point set of each network image segmentation result, and performing data cleaning to obtain a single-class label network data set and a multi-class label network data set.
In this embodiment, the step S3 includes the following sub-steps:
s31 network F partitioned by target domaintFor network domain data set IwDeducing to obtain a pixel point set phi of each image segmentation result, wherein the pixel point set of the first image segmentation result can be recorded as phii
S32, each network domain image XiDefinition of
Figure BDA0003487661190000061
And
Figure BDA0003487661190000062
wherein phiiIs the set of pixel points of the ith image segmentation result, | phiiI denotes phiiTotal number of pixels in, pkIs phiiSemantic labels of the middle pixel points k; χ (a) represents a determination result on a conditional expression a (a is a general formula for reference, and may be replaced by a conditional expression in each formula), and χ (a) is 1 when the conditional expression a is satisfied, otherwise χ (a) is 0; bg represents a background pixel; lambda [ alpha ]iRepresents a single class liAccount for image XiRatio, μiRepresenting other foreground classes in image XiA ratio;
s33, network domain data set IwPerforming a preliminary cleaning to traverse each network image XiIf the network image XiSatisfies delta1≤λi≤x2Then is reserved, where1And delta2For controlling the individual classes l for the thresholdiAccount for image XiProportion, otherwise it is directly judged as noiseImage from network domain data set IwRemoving;
s34, aiming at the network domain data set I reserved in the step S33wThe network image in (1) is subjected to single-class cleaning, and each network image X retained in the network image is traversediIf the network image XiSatisfies muiIf not, adding the single type label network data set, otherwise, not adding the single type label network data set;
s35, aiming at the network domain data set I reserved in the step S33wThe network images in (1) are subjected to multi-class cleaning, and each network image X retained in the network images is traversediIf the network image XiSatisfies mui≥δ3Then will phiiAll class labels appearing in as network image XiClass label of (2), and the network image X with the class labeliJoining multiple classes of tagged network datasets, where δ3Thresholding other foreground classes to image XiAnd proportion, otherwise, not adding the multi-class label network data set.
And S4, training the single-class label network data set and the multi-class label network data set on a weak supervision semantic segmentation network training framework to obtain a network domain segmentation network.
It should be noted that the weakly supervised semantic segmentation network training framework in the present invention may be any existing public method as long as it can be used for constructing the weakly supervised semantic segmentation network.
In this embodiment, in step S4, the network structure in the weak supervised semantic segmentation network training framework is SSENet + affinity net, which includes two parts, SSENet and affinity net, wherein the SSENet outputs the class activation map by using the training image as input, and then the affinity net outputs the pixel level mask by using the class activation map as input; using the single-class label network data set and the multi-class label network data set as training data sets, and obtaining a network domain segmentation network F after training on the weak supervision semantic segmentation network training frameworkw
In the present invention, although the weakly supervised semantic segmentation network training frameworks adopted in the above steps S2 and S4 are the same, in other embodiments, the training frameworks may actually be realized by adopting different frameworks, as long as the semantic segmentation function can be realized. The loss function adopted in the weak supervision semantic segmentation network training framework can be adjusted according to actual needs so as to meet the requirement of model training precision.
And S5, deducing the target domain data set by using the target domain segmentation network, the network domain segmentation network and the saliency target detection network respectively to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map respectively, and obtaining a final pixel level mask by performing cascade decision on the target domain pixel level mask, the network domain pixel level mask and the saliency map.
In this embodiment, the step S5 includes the following sub-steps:
s51 network F partitioned by target domaintNetwork domain split network FwAnd the saliency target detection network respectively to the target domain data set ItDeducing to respectively obtain pixel level masks MtPixel level mask MwAnd significance map Mb
S52, initializing a pixel level mask M without assigned semantic labelsfFor storing the result of the cascaded decision, the pixel level mask MfAll pixel values are initialized to 255 and the pixel level mask MfIs the same as the resolution of the input image.
Note that, the pixel level mask MfPixel level mask MtPixel level mask MwAnd significance map MbThe resolution of the image, namely the size of the image, is consistent with that of the input image, and each pixel point of the input image is at Mt、Mw、Mf、MbThere is a corresponding pixel point with the same position. The pixel values in each mask and saliency map reflect the semantic label of the corresponding pixel, i.e. the classification category. Subsequent steps may be based on Mt、Mw、MbRecording the final semantic label in M through cascade decisionfIn (B) is MfIs assigned to each pixel.
S53 traversing the pixel level mask MtAll the pixel points p in1If the pixel point p1Semantic tag of as a foreground class l1Then the foreground class l1Imparting a pixel-level mask MfMiddle and pixel point p1And taking the corresponding pixel point as a semantic label of the pixel point.
S54 traversing the pixel level mask MwAll the pixel points p in2If the pixel point p2Semantic tag of as a foreground class l2And a pixel level mask MfMiddle and pixel point p2If the corresponding pixel point is not endowed with the semantic label, the foreground class l is classified2Imparting a pixel-level mask MfNeutral pixel p2And taking the corresponding pixel point as a semantic label of the pixel point.
S55, traversing the saliency map MbAll the pixel points p in3If the pixel point p3Background and pixel level mask MfMiddle and pixel point p3If the corresponding pixel point is endowed with the semantic label, the pixel level mask M is usedfMiddle and pixel point p3Reassigning the corresponding pixel point to 255 and using the assigned pixel point as a first type pixel point; if the pixel point p3Is foreground and pixel level mask MfMiddle and pixel point p3If the pixel value of the corresponding pixel point is 255, the pixel level mask M is keptfMiddle and pixel point p3The corresponding pixel point has a pixel value of 255 and is used as a second type pixel point; for pixel level mask MfThe other pixels with the pixel value of 255 except the first-class pixels and the second-class pixels are uniformly used as the background, and the pixel value is assigned to be 0, so that the final pixel-level mask M is obtainedf
Thus, assuming that the number of foreground classes is N and the class labels thereof are respectively represented by 1-N, the final pixel level mask M is obtainedfIn the method, three types of labels exist, the first type is a background with a pixel value of 0, the second type is a foreground with a pixel value of any value from 1 to N, and the third type is a pixel with a pixel value of 255. Wherein the third type of pixels does not participate in the calculation of the loss function when used for the subsequent training of S6, because the partial pixels exist in different inference results with large probabilityAnd contradiction, the semantic segmentation network does not participate in supervised training, and the accuracy of the network is improved.
And S6, performing supervised training on the semantic segmentation network by using the final pixel level mask to obtain a final segmentation network, and segmenting the target image by using the final segmentation network.
In this embodiment, the step S6 includes the following sub-steps:
s61 target domain based data set ItAnd training the semantic segmentation network by using a cross entropy loss function and a multi-label classification loss function for the final pixel level mask of each image, wherein the pixels with the pixel value of 255 in the final pixel level mask do not participate in training, and obtaining a final segmentation network F after the training is finishedf
And S62, inputting the target image into the final segmentation network to segment the target image to obtain a semantic segmentation result.
It should be noted that the semantic segmentation network supervised training by using the final pixel level mask in the present invention may be any existing public method as long as it can be used for constructing the semantic segmentation network.
In this embodiment, the semantic segmentation network trained using final pixel level mask supervision may be a ResNet 38-based semantic segmentation network.
The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-lifting shown in the above steps S1-S6 is applied to a specific example to show the specific technical effect.
Examples
The implementation method of this embodiment is as described in the foregoing steps S1-S6, and specific steps are not described in detail, and the effect is shown only for case data.
The invention is implemented on a target domain data set and a network domain data set:
target domain data set: this example uses the PASCAL VOC 2012 training set as the target domain data set, which contains 10582 images in total of 21 classes. And the final segmented network was tested for performance on a validation set containing 1449 images and a test set containing 1456 images.
Network domain data set: in the embodiment, the public network image is used as a network domain data set, the image is obtained by searching the keyword on a search engine, the search keyword is converted into a class label, the class of the class label is consistent with the target domain, the width and the height of the image are not more than 500, and the class label only comprises a single class. In this example, δ is set1=0.2、δ2The network domain data set is initially cleaned up to 0.8. The number of images in each specific category in the single-category cleaning is set to be not more than 500, so that a single-category label network data set containing 9742 images is obtained. Setting delta30.05, and thus a multi-class label network data set containing 5415 images is obtained.
In this embodiment, an existing weak supervised semantic segmentation network training framework (SSENet + affinity net) is used as a basis, and the performance of a segmentation network measured on a PASCAL VOC 2012 verification set and a test set by training on the PASCAL VOC 2012 training set is 63.5% mlio u and 64.2% mlio u, respectively. mlou is the average intersection ratio that computes the ratio of the correctly predicted positive samples (the intersection of the two sets of true and predicted values) to the sum (union) of the correctly predicted positive samples, the incorrectly predicted positive samples, and the incorrectly predicted negative samples. The semantic segmentation network based on ResNet38 is trained by using a weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion, the performances of the obtained final segmentation network measured on a PASCAL VOC 2012 verification set and a test set are respectively 68.2% mIoU and 69.6% mIoU, and the performances are obviously promoted.
Therefore, through the technical scheme, the weak supervision semantic segmentation method based on the cascade decision and interactive annotation self-promotion is developed based on the deep learning technology. The invention realizes the robust knowledge transfer by using the network image and the saliency detection result under the class label supervision, improves the final segmentation performance, realizes the high-performance semantic segmentation and reduces the sample construction cost. The double-domain interactive annotation self-promotion reduces the noise of a network domain data set from the data perspective, improves the image annotation quality, enriches the semantic information of the network domain, and promotes the reliability of network domain knowledge. The knowledge migration of the cascade decision realizes the knowledge migration at the pixel level from the knowledge migration angle, the decision process is related to the reliability of the knowledge, and the advantages of different domain knowledge are effectively utilized to realize the robust knowledge migration.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (8)

1. A weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion is characterized by comprising the following steps:
s1, acquiring a target domain data set and a network domain data set;
s2, training on a weak supervision semantic segmentation network training framework by using a target domain data set to obtain a target domain segmentation network;
s3, deducing the network domain data set by using the target domain segmentation network to obtain a pixel point set of each network image segmentation result, and performing data cleaning to obtain a single-type label network data set and a multi-type label network data set;
s4, training on a weak supervision semantic segmentation network training framework by utilizing the single-class label network data set and the multi-class label network data set to obtain a network domain segmentation network;
s5, deducing a target domain data set by using a target domain segmentation network, a network domain segmentation network and a saliency target detection network respectively to obtain a target domain pixel level mask, a network domain pixel level mask and a saliency map respectively, and obtaining a final pixel level mask by performing cascade decision on the target domain pixel level mask, the network domain pixel level mask and the saliency map;
and S6, performing supervised training on the semantic segmentation network by using the final pixel level mask to obtain a final segmentation network, and segmenting the target image by using the final segmentation network.
2. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein in the step S1Target domain data set ItA network domain data set I comprising a plurality of training images and manually labeled class labels corresponding to the training imageswComprises a plurality of images searched on a search engine and corresponding category labels formed by converting search keywords.
3. The method for weakly supervised semantic segmentation based on cascade decision and interactive annotation self-boosting as claimed in claim 1, wherein in S2, the network structure in the training framework of the weakly supervised semantic segmentation network includes two parts, namely SSENet and affinity net, wherein a class activation map is output by the SSENet using the training image as input, and then a pixel level mask is output by the affinity net using the class activation map as input; utilizing a target domain dataset ItAfter training on the weak supervision semantic segmentation network training framework is finished, a target domain segmentation network F is obtainedt
4. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the S3 includes the following sub-steps:
s31 network F partitioned by target domaintFor network domain data set IwDeducing to obtain a pixel point set phi of each image segmentation result;
s32, each network domain image XiDefinition of
Figure FDA0003487661180000021
And
Figure FDA0003487661180000022
wherein phiiIs the set of pixel points of the ith image segmentation result, | phiiI denotes phiiTotal number of pixels in, pkIs phiiSemantic labels of the middle pixel points k; χ (a) represents a result of determination on conditional expression a, and χ (a) ═ 1 when conditional expression a is satisfied, and χ (a) ═ 0 otherwise; bg represents a background pixel; lambda [ alpha ]iRepresents a single class liAccount for image XiRatio, μiRepresenting other foreground classes in image XiA ratio;
s33, network domain data set IwPerforming a preliminary cleaning to traverse each network image XiIf the network image XiSatisfies delta1≤λi≤δ2Then is reserved, where1And delta2For controlling the individual classes l for the thresholdiAccount for image XiRatio, otherwise directly determined as a noisy image from the network domain data set IwRemoving;
s34, performing single-class cleaning on the network images reserved in the step S33, and traversing each network image X reserved in the single-class cleaningiIf the network image XiSatisfies muiIf not, adding the single type label network data set, otherwise, not adding the single type label network data set;
s35, cleaning the network images reserved in the step S33 in multiple types, and traversing each network image X reserved in the step S33iIf the network image XiSatisfies mui≥δ3Then will phiiAll class labels appearing in as network image XiClass label of (2), and network image X with class labeliJoining multiple classes of tagged network datasets, where δ3Thresholding other foreground classes to image XiAnd proportion, otherwise, not adding the multi-class label network data set.
5. The method for weakly supervised semantic segmentation based on cascade decision and interactive annotation self-boosting as claimed in claim 1, wherein in S4, the network structure in the training framework of the weakly supervised semantic segmentation network includes two parts, namely SSENet and affinity net, wherein a class activation map is output by the SSENet using the training image as input, and then a pixel level mask is output by the affinity net using the class activation map as input; using the single-class label network data set and the multi-class label network data set as training data sets, and obtaining a network domain segmentation network F after training on the weak supervision semantic segmentation network training frameworkw
6. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the S5 includes the following sub-steps:
s51 network F partitioned by target domaintNetwork domain split network FwAnd the saliency target detection network respectively to the target domain data set ItDeducing to respectively obtain pixel level masks MtPixel level mask MwAnd significance map Mb
S52, initializing pixel level mask M without semantic labelfIs 255 for storing the cascaded decision result, and a pixel level mask MfIs the same as the resolution of the input image;
s53 traversing the pixel level mask MtAll the pixel points p in1If the pixel point p1Semantic tag of as a foreground class l1Then the foreground class l1Imparting a pixel-level mask MfNeutral pixel p1The corresponding pixel points are used as semantic labels of the pixel points;
s54 traversing the pixel level mask MwAll pixel points p in2If the pixel point p2Semantic tag of as a foreground class l2And a pixel level mask MfMiddle and pixel point p2If the corresponding pixel point is not endowed with the semantic label, the foreground class l is classified2Imparting a pixel-level mask MfMiddle and pixel point p2The corresponding pixel points are used as semantic labels of the pixel points;
s55, traversing the saliency map MbAll pixel points p in3If the pixel point p3Background and pixel level mask MfMiddle and pixel point p3If the corresponding pixel point is endowed with the semantic label, the pixel level mask M is usedfMiddle and pixel point p3Reassigning the corresponding pixel point to 255 and using the assigned pixel point as a first type pixel point; if the pixel point p3Is foreground and pixel level mask MfMiddle and pixel point p3If the pixel value of the corresponding pixel point is 255, the pixel level mask M is keptfNeutralization ofPixel point p3The corresponding pixel point has a pixel value of 255 and is used as a second type pixel point; for pixel level mask MfThe other pixels with the pixel value of 255 except the first-class pixels and the second-class pixels are uniformly used as the background, and the pixel value is assigned to be 0, so that the final pixel-level mask M is obtainedf
7. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the S6 includes the following sub-steps:
s61 target domain based data set ItAnd training the semantic segmentation network by using a cross entropy loss function and a multi-label classification loss function for the final pixel level mask of each image, wherein pixel points with the pixel value of 255 in the final pixel level mask do not participate in training, and obtaining a final segmentation network F after the training is finishedf
And S62, inputting the target image into the final segmentation network to segment the target image to obtain a semantic segmentation result.
8. The weak supervised semantic segmentation method based on cascade decision and interactive annotation self-promotion as claimed in claim 1, wherein the semantic segmentation network supervised and trained by using a final pixel level mask in S6 is a semantic segmentation network based on ResNet 38.
CN202210087653.2A 2022-01-25 2022-01-25 Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion Pending CN114463543A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210087653.2A CN114463543A (en) 2022-01-25 2022-01-25 Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210087653.2A CN114463543A (en) 2022-01-25 2022-01-25 Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion

Publications (1)

Publication Number Publication Date
CN114463543A true CN114463543A (en) 2022-05-10

Family

ID=81411244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210087653.2A Pending CN114463543A (en) 2022-01-25 2022-01-25 Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion

Country Status (1)

Country Link
CN (1) CN114463543A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842005A (en) * 2022-07-04 2022-08-02 海门市芳华纺织有限公司 Semi-supervised network-based textile surface defect detection method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842005A (en) * 2022-07-04 2022-08-02 海门市芳华纺织有限公司 Semi-supervised network-based textile surface defect detection method and system
CN114842005B (en) * 2022-07-04 2022-09-20 海门市芳华纺织有限公司 Method and system for detecting surface defects of textile fabric based on semi-supervised network

Similar Documents

Publication Publication Date Title
CN108399406B (en) Method and system for detecting weakly supervised salient object based on deep learning
CN111583263A (en) Point cloud segmentation method based on joint dynamic graph convolution
US20210326638A1 (en) Video panoptic segmentation
CN109783666A (en) A kind of image scene map generation method based on iteration fining
CN110009008B (en) Method for automatically classifying immune fixed electrophoretogram based on extracted features
CN113487629B (en) Image attribute editing method based on structured scene and text description
CN112287941B (en) License plate recognition method based on automatic character region perception
Shi et al. An image mosaic method based on convolutional neural network semantic features extraction
CN117079139B (en) Remote sensing image target detection method and system based on multi-scale semantic features
CN112069884A (en) Violent video classification method, system and storage medium
CN114565808B (en) Double-action contrast learning method for unsupervised visual representation
Yang et al. STA-TSN: Spatial-temporal attention temporal segment network for action recognition in video
Davis et al. Visual FUDGE: form understanding via dynamic graph editing
CN117690098B (en) Multi-label identification method based on dynamic graph convolution under open driving scene
CN114463543A (en) Weak supervision semantic segmentation method based on cascade decision and interactive annotation self-promotion
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN116363374B (en) Image semantic segmentation network continuous learning method, system, equipment and storage medium
Wang et al. Semantic segmentation of sewer pipe defects using deep dilated convolutional neural network
CN116958809A (en) Remote sensing small sample target detection method for feature library migration
CN111260659A (en) Image interactive segmentation method based on initial annotation point guidance
CN116385466A (en) Method and system for dividing targets in image based on boundary box weak annotation
Wang et al. Multiscale anchor box and optimized classification with faster R‐CNN for object detection
CN115269925A (en) Non-biased scene graph generation method based on hierarchical structure
CN112487927B (en) Method and system for realizing indoor scene recognition based on object associated attention
CN111723301B (en) Attention relation identification and labeling method based on hierarchical theme preference semantic matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination