CN117079103B - Pseudo tag generation method and system for neural network training - Google Patents

Pseudo tag generation method and system for neural network training Download PDF

Info

Publication number
CN117079103B
CN117079103B CN202311331979.6A CN202311331979A CN117079103B CN 117079103 B CN117079103 B CN 117079103B CN 202311331979 A CN202311331979 A CN 202311331979A CN 117079103 B CN117079103 B CN 117079103B
Authority
CN
China
Prior art keywords
boundary
pseudo
input image
pixels
class activation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311331979.6A
Other languages
Chinese (zh)
Other versions
CN117079103A (en
Inventor
石敏
邓伟钊
骆爱文
易清明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202311331979.6A priority Critical patent/CN117079103B/en
Publication of CN117079103A publication Critical patent/CN117079103A/en
Application granted granted Critical
Publication of CN117079103B publication Critical patent/CN117079103B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of deep learning, and provides a pseudo tag generation method and a pseudo tag generation system for neural network training, wherein the pseudo tag generation method comprises the following steps: transmitting the input image and the image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation diagram, and transmitting the input image to a salient object detection network for region detection to obtain a salient diagram; merging the region features of the class activation diagram and the saliency diagram, and synthesizing boundary pseudo labels of the input image; monitoring training of a boundary detection network by using a boundary pseudo tag, transmitting an input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image; and carrying out refinement propagation by utilizing the boundary guide class activation graph, and generating semantic segmentation pseudo tags corresponding to the input image. The method can greatly reduce the cost and time for manually labeling the pixel-level labels and improve the precision and the generation efficiency of the semantic segmentation pseudo labels.

Description

Pseudo tag generation method and system for neural network training
Technical Field
The invention relates to the technical field of deep learning, in particular to a pseudo tag generation method and a pseudo tag generation system for neural network training.
Background
Image semantic segmentation is a computer vision technique that assigns each pixel in an image to a predefined class for pixel-level image understanding and analysis. Conventional semantic segmentation methods capture visual features of an image, such as color, texture, edges, etc., using various feature extraction algorithms, and then perform image segmentation using machine learning algorithms such as SVM, random Forest, and CRF. In recent years, a semantic segmentation method based on a deep convolutional neural network makes breakthrough progress under the support of a large-scale data set and a high-performance computer, and becomes a current mainstream semantic segmentation technology.
The full-supervision semantic segmentation method can realize classification of pixel levels, but the manual labeling of labels of pixel levels is time-consuming and labor-consuming, is unsuitable for large-scale data sets or real-time updating application, has poor model generalization capability, and needs to be continuously retrained. In order to save manpower and time and reduce the dependence on pixel-level labels, researchers have proposed semantic segmentation methods using weak labels as supervision, and the weak labels commonly used are image-level labels, bounding box labels, graffiti labels, dot labels and the like. The image-level labels only give out the target categories existing in the image, have no position information and are the weak labels which are easiest to label and have the lowest cost, but the image-level labels are difficult to apply to pixel-level semantic segmentation.
In addition, paper Weakly supervised semantic segmentation with boundary exploration proposes a method for generating boundary pseudo labels by using a boundary detection technology, training a boundary detection network to explore object boundaries, and refining class activation diagrams. However, due to insufficient prior information of the boundary, the boundary detection network is difficult to detect the complete object boundary, uncertainty is brought to refinement of the class activation diagram, and the generated semantic segmentation pseudo tag is low in efficiency and precision.
Disclosure of Invention
The invention provides the following technical scheme for overcoming the defects of low efficiency and low precision of semantic segmentation pseudo labels in the prior art:
in a first aspect, the present invention provides a pseudo tag generation method for neural network training, including the steps of:
s1: and transmitting the input image and the image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation graph, and transmitting the input image to a salient object detection network for region detection to obtain a salient graph.
S2: and fusing the region features of the class activation map and the saliency map, and synthesizing a boundary pseudo tag of the input image.
S3: and monitoring the training of the boundary detection network by using the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image.
S4: and guiding the class activation graph to conduct refinement propagation by utilizing the boundary, and generating a semantic segmentation pseudo tag corresponding to the input image.
In a second aspect, the present invention further proposes a pseudo tag generation system for neural network training, which is applied to the pseudo tag generation method for neural network training according to the first aspect, and includes:
and the class activation diagram acquisition module is used for transmitting the input image and the image level label corresponding to the input image to the classification backbone network based on the residual structure for attention pooling to obtain the class activation diagram.
The saliency map acquisition module is used for transmitting the input image to a saliency object detection network for region detection to obtain a saliency map.
And the boundary pseudo tag generation module is used for fusing the regional characteristics of the class activation graph and the saliency graph and synthesizing the boundary pseudo tag of the input image.
And the boundary detection module is used for supervising the training of the boundary detection network by utilizing the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image.
And the semantic segmentation pseudo tag generation module is used for guiding the class activation graph to conduct refinement propagation by utilizing the boundary to generate a semantic segmentation pseudo tag corresponding to the input image.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
(1) The boundary pseudo-labels are generated by combining the class activation diagrams and the saliency diagrams, the class activation diagrams provide foreground areas with high confidence and related to classes, a small number of boundary labels can be synthesized by the class activation diagrams, the saliency diagrams provide background areas with high confidence, a large number of boundaries between the foreground and the background can be provided, and the quality and coverage rate of the boundary pseudo-labels are improved.
(2) The boundary pseudo-label is used for supervising and training the boundary detection network, so that generalization capability and robustness of the boundary detection network can be enhanced, the boundary detection network can adapt to images of different scenes and environments, more complete object boundaries are detected, and accuracy and integrity of boundary detection are improved. Finally, the boundary of the input image and the class activation diagram are utilized to carry out refinement and propagation treatment, the class activation diagram can be corrected and optimized according to the boundary information, the cost and time for manually labeling the pixel-level label are greatly reduced, and the precision and the generation efficiency of the semantic segmentation pseudo label are improved.
(3) The saliency map is used for guiding the generation of the class activation map, so that the class activation map can better indicate the object region, more reliable information is provided for the synthesis and propagation of the boundary pseudo labels, and the precision and the generation efficiency of the semantic segmentation pseudo labels are further improved.
Drawings
Fig. 1 is a schematic diagram i of the pseudo tag generation network in embodiment 1.
Fig. 2 is a schematic diagram ii of the pseudo tag generation network in embodiment 1.
Fig. 3 is a schematic diagram iii of the pseudo tag generation network in embodiment 1.
Fig. 4 is a schematic diagram of the structure of the pseudo tag generation network in embodiment 1 after the pseudo tag generation networks in fig. 1, 2 and 3 are optimally combined.
Fig. 5 is a flow diagram of generating boundary pseudo tags under significance guidance in the pseudo tag generation network structure of fig. 2.
Fig. 6 is a flowchart of the pseudo tag generation network of fig. 4 in embodiment 1.
Fig. 7 is a flowchart of a pseudo tag generation method for neural network training according to an embodiment of the present application.
Part (a) in fig. 8 shows the input image i in example 2, (b) shows the true label value map corresponding to the input image i in example 2, (c) shows the boundary pseudo label map synthesized by a-CAMs in example 2, (d) shows the boundary pseudo label map synthesized by saliency guidance in example 2, (e) shows the boundary pseudo label map synthesized by a-CAMs and saliency guidance in example 2, (f) shows the boundary map output by the boundary detection network in example 2, and (g) shows the semantic segmentation pseudo label map generated corresponding to the input image i in example 2.
In fig. 9, (a) represents the input image ii in embodiment 2, (b) represents the true semantic division label map corresponding to the input image ii in embodiment 2, and (c) represents the semantic division pseudo label map corresponding to the input image ii in embodiment 2.
Fig. 10 is a hardware architecture diagram of a pseudo tag generation system for neural network training in embodiment 3.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present application; for a better illustration of the present embodiment, it may be understood that some of the illustrations of the figures may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
Example 1
The embodiment provides a pseudo tag generation method for neural network training.
As shown in fig. 1 to 5, fig. 1, fig. 2, and fig. 3 are schematic structural diagrams of pseudo tag generation networks with three different structures, respectively, and fig. 4 is a schematic structural diagram of a pseudo tag generation network after the pseudo tag generation networks of fig. 1, fig. 2, and fig. 3 are optimally combined in an embodiment of the present application.
In fig. 1, a classification network is trained by using image-level labels, a first class activation graph (hereinafter referred to as a-CAMs) using the image-level labels as supervisory signals is generated to synthesize boundary pseudo-labels, the synthesized boundary pseudo-labels are used as a supervisory training boundary detection network, and the output boundary of the boundary detection network is used as a constraint to propagate the a-CAMs to obtain semantic segmentation pseudo-labels.
In FIG. 2, the classification network is trained with image level labels, generating A-CAMs. Combining the A-CAMs and the saliency map to generate a synthesized boundary pseudo-label, using the synthesized boundary pseudo-label as a supervision training boundary detection network, using the output boundary of the boundary detection network as a constraint, and propagating the A-CAMs to obtain the semantic segmentation pseudo-label. The reliable background area information provided by the saliency and the boundary between the foreground and the background are fused into the boundary pseudo tag, so that more reliable supervision is provided for the boundary detection network. As shown in fig. 5, fig. 5 is a flow chart of generating boundary pseudo labels under significance guidance in the pseudo label generating network structure of fig. 2, and four parts of information including foreground information, background information, boundary information and uncertain parts are in the boundary pseudo labels generated in fig. 5. The composite of the boundary pseudo tag is divided into two parts, and the background information in the class activation graph is less reliable than the saliency graph and the saliency graph cannot indicate the possible class information in the foreground, so that in the boundary pseudo tag, by reserving the foreground information in the class activation graph and the background information in the saliency graph and reserving the boundary information of the foreground information and the saliency graph, the composite boundary pseudo tag is more accurate from the visual effect after the saliency graph is combined.
In fig. 3, the classification network is trained by using the image level label and the saliency map together, a second type of activation map (hereinafter referred to as S-CAMs) using the image level label and the saliency map as supervisory signals is generated to synthesize boundary pseudo labels, the boundary pseudo labels synthesized are used as a supervisory training boundary detection network, and the output boundary of the boundary detection network is used as a constraint to propagate the S-CAMs to obtain semantic segmentation pseudo labels.
As shown in fig. 6 and fig. 7, fig. 6 is a flowchart of a flow chart of generating a semantic segmentation pseudo tag by using the pseudo tag generation network of fig. 4 in the embodiment of the present application, and fig. 7 is a flowchart of a pseudo tag generation method for training a neural network according to the embodiment of the present application.
In this embodiment, the pseudo tag generation method for neural network training includes the following steps:
s1: and transmitting the input image and the image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation graph, and transmitting the input image to a salient object detection network for region detection to obtain a salient graph.
Optionally, in one embodiment of the present application, the input image is transmitted to a classification backbone network stacked in a residual structure for classification processing, and a localization map is output.
Full-supervised training of the localization map using the image-level labels to generate a first class of activation maps employing a loss functionThe expression of (2) is as follows:
wherein,cfor the number of categories in the data set,representing sigmoid function->Pixels in class activation diagrams representing classification network outputiBelongs to the category ofcProbability values of (a) are provided.
The loss function employedThe expression of (2) is as follows:
in this embodiment, the classification backbone network based on the residual structure is a classification backbone network based on a Resnet 50.
In the implementation process, a group of color images are input, preprocessing is performed first, and modification of the uniform size of 512×512×3 is completed through supplementation or clipping. The number of input channels of the first layer input image in fig. 6 is 3, the last layer of the classification backbone network based on the Resnet50 is replaced by an attention pooling layer (the lower limit of the attention pooling layer is a global average pooling layer), after the color image is input into the classification network, a class activation map with the size of 512×512×c is output (c represents that c classes are in the data set), and when only an image level label is used as a supervision signal, the generated first class activation map is called a-CAMs; when combining the image level labels and saliency maps as supervisory signals, the second type of activation maps generated are called S-CAMs.
S2: and fusing the region features of the class activation map and the saliency map, and synthesizing a boundary pseudo tag of the input image.
Optionally, in an embodiment of the present application, the boundary pseudo tag based on the class activation map is utilized according to the following formulaAnd said saliency map-based boundary pseudo tag +.>Generating boundary pseudo tag of input image +.>
Wherein,representing pixelsiBelonging to boundary (I)>Representing pixelsiBelonging to the object area->Representing pixelsiBelonging to the background area.
In this embodiment, the boundary pseudo tag based on the class activation graph is generatedThe specific steps of (a) include:
traversing all pixels in the class activation graph using a sliding window of specified size, and determining each pixel in the class activation graph according to the following formulaiGenerates boundary pseudo tags based on class activation graphs
Wherein,representing pixelsiAt the border in class activation diagram, +.>And->The number of object region pixels and background region pixels in the sliding window are represented respectively,Wpixel set representing a sliding window, +.>Representing pixelsiIn class activation diagram in object region, +.>Representing class activation graphs belonging to a classcProbability of->Pixel threshold for the object region, +.>Representing pixelsiIn the class activation diagram in an uncertainty region.
It will be appreciated that since the class activation map indicates the object region and the background region represented by a small number of image level labels, and the object boundary exists between the object region and the background region, the object region and the background region in the class activation map may be used to indicate the composition of boundary pseudo labels. To achieve this object, the present invention employs a sliding window approach in which the center pixel of the window is considered to be the boundary if and only if a similar number of object region pixels and background region pixels are contained within the window.
In this embodiment, the boundary pseudo tag based on the saliency map is generatedThe specific steps of (a) include:
performing edge detection on the saliency map by using a Sobel operator to generate a Sobel gradient amplitude;
according to each pixel in the saliency mapiCorresponding Sobel gradient amplitudeAnd significance amplitude->For each pixel in the saliency map, using the following formulaiAssigning boundary pseudo tags based on saliency maps +.>
Wherein,representing pixels in a saliency mapiSobel gradient amplitude, +.>A preset threshold value for the gradient amplitude;representing pixels in a saliency mapiIs a significant magnitude of (1); />Representing pixelsiBelonging to a boundary in the saliency map; />Representing pixelsiBelonging to the object region in the saliency map; />Representing pixelsiBelonging to the background area in the saliency map.
In this embodiment, the horizontal gradients of the saliency maps are calculated using the Sobel operator, respectively, according to the following formulaAnd gradient in vertical direction->
Wherein,and->Sobel gradient operator in horizontal direction and vertical direction respectively,>is remarkable in thatA sex map;
according to the following, the horizontal gradient is utilizedAnd vertical gradient>Calculating Sobel gradient amplitude of saliency map>
It will be appreciated that the correct boundary tags should contain the complete object boundary, however, only a small number of boundary pseudo tags, and reliable object regions of the corresponding class, can be synthesized by the class activation map. From observation, salient regions in the saliency map generally represent object regions, non-salient regions generally represent background, but salient regions are not available because there is no explicit category information; in addition to this, the boundaries of the salient region and the non-salient region can also be deduced from the saliency map, which highly coincides with the boundary of the object. The present embodiment uses the Sobel operator to extract the boundaries of the salient region and the non-salient region.
S3: and monitoring the training of the boundary detection network by using the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image.
Optionally, in an embodiment of the present application, the training of the boundary detection network using the boundary pseudo tag includes: using loss functionsTraining the boundary detection network, the loss function +.>The calculation is carried out by the following formula:
wherein,loss function representing foreground pixels and background pixels,/->Loss function representing boundary pixels,/->And->Respectively representing a set of foreground pixels and background pixels in the boundary pseudo tag, +.>Pixels representing the output of the boundary detection networkiProbability values belonging to the boundary; />Andboundary pixels belonging to the salient region and boundary pixels belonging to the non-salient region in the boundary pseudo tag are represented, respectively.
It will be appreciated that due to the synthesized boundary pixelsDoes not represent an exact object boundary, so that the parameter +.>,/>The non-linear nature of (c) makes the boundary detection network more sensitive to predicted variations in boundary pixels. Thus, when the pixeliIn the boundary pseudo-label is considered as boundary pixel, and its boundary probability prediction value in the boundary network output +.>Lower, the contribution weight of the pixel in the loss function is reduced. Since the boundary extracted from the saliency map is not necessarily accurate, the present embodiment adopts two methods to solve the problem of inaccurate boundary pixels, namely, the boundary term is multiplied by +.>The boundary detection network is more sensitive to the predicted change of the boundary pixels, if the boundary pseudo label is defined as the boundary for the pixels predicted to be non-boundary, the contribution weight of the pixels to the loss function is reduced so as to reduce the negative influence of the inaccurate boundary pseudo label, and the boundary pixels in the saliency area and the boundary pixels in the non-saliency area in the boundary pseudo label are respectively activated, so that the correct object boundary can exist in the saliency area and the non-saliency area due to the uncertainty of the extracted object boundary, and the strategies proposed by the embodiment respectively activate the boundary pixels and the non-saliency area for the two possibilities so as to avoid the negative influence of the inaccurate boundary pixels on the network training.
In this embodiment, the boundary detection network selects a classification network using the Resnet50 as a backbone network, inputs a group of color pictures (with a self-defined number), performs preprocessing first, and completes modification of 512×512×3 uniform size by supplementing or clipping. The number of input channels of the image input in the first layer is 3, and after the color image is input into the classification network, a boundary prediction map of 128×128×1 in size is output, and then a loss function is calculated using the boundary prediction map and the synthesized boundary pseudo tag.
S4: and guiding the class activation graph to conduct refinement propagation by utilizing the boundary, and generating a semantic segmentation pseudo tag corresponding to the input image.
In this embodiment, the specific steps of S4 include:
s4.1: the boundary is converted into a semantic affinity matrix.
S4.2: and according to the semantic affinity matrix, performing refinement propagation processing on the class activation graph by using a Random Walk algorithm to generate a semantic segmentation pseudo tag.
In the implementation process, boundaries output by a boundary detection network are used as constraints, the boundaries are converted into a semantic affinity matrix, and a Random Walk algorithm is used for propagation refinement of the class activation graph. In addition, when dCRF (dense conditional random field) is applied to the propagation refinement process, the quality of the semantic segmentation pseudo tag can be further improved.
It can be understood that the boundary pseudo-labels are generated by combining the class activation diagrams and the saliency diagrams, the class activation diagrams provide foreground regions with high confidence and related to classes, a small number of boundary labels can be synthesized by the class activation diagrams, the saliency diagrams provide background regions with high confidence, a large number of boundaries between the foreground and the background can be provided, and the quality and coverage rate of the boundary pseudo-labels are improved. In addition, the boundary detection network is supervised and trained by using the boundary pseudo tag, so that the generalization capability and the robustness of the boundary detection network can be enhanced, the boundary detection network can adapt to images of different scenes and environments, more complete object boundaries can be detected, and the accuracy and the integrity of boundary detection are improved. Finally, the boundary of the input image and the class activation diagram are utilized to carry out refinement and propagation treatment, the class activation diagram can be corrected and optimized according to the boundary information, the cost and time for manually labeling the pixel-level label are greatly reduced, and the precision and the generation efficiency of the semantic segmentation pseudo label are improved. Finally, the saliency map is used for guiding the generation of the class activation map, so that the class activation map can better indicate the object region, more reliable information is provided for the synthesis and propagation of the boundary pseudo labels, and the precision and the generation efficiency of the semantic segmentation pseudo labels are further improved.
In this embodiment, by performing a lot of experiments on two data sets of the PASCAL VOC 2012 and the MS COCO 2014, the comparison of the base network, the generated pseudo tag has only 66.4% of mIoU, and only 67.2% of mIoU is obtained even after dCRF treatment, while the generated pseudo tag has 66.9% and 67.9% (+dcrf) when the boundary pseudo tag synthesis strategy of significance guidance proposed by the present invention is combined alone; when the boundary detection network training strategy provided by the invention is independently combined, the generated pseudo tag mIoU reaches 67.2% and 67.6% (+dCTF); when two strategies are combined simultaneously, the generated pseudo tag mIoU reaches 69.8 percent and 70.4 percent (+dCTF), and is greatly improved relative to a basic network; when the instruction of the saliency information is also added in the generation process of the class activation diagram, the generated pseudo tag mIoU reaches 73.8% and 74.1% (+dCRF) which far exceeds the basic network.
Example 2
The embodiment applies the pseudo tag generation method for neural network training proposed in embodiment 1 to perform specific implementation verification.
The experimental verification is performed by using a paspal VOC 2012 dataset and an MS COCO 2014 dataset, wherein the paspal VOC 2012 dataset comprises 21 types of images including a background, and the MS COCO 2014 dataset comprises 81 types of images including the background. Specifically, for the PASCAL VOC 2012 dataset, the present embodiment uses 10582 images as the training set, 1464 images as the validation set, and 1456 images as the test set. For the MS COCO 2014 dataset, this example excludes images without categories, using 82081 images for training and 40137 images for verification. In the pseudo tag generation stage in the experiment, the input image sizes from the above two data sets were each set to 512×512×3.
Illustratively, in class activation graph generation, for a-CAMs, the results after 5 iterations of Batch unit Batch size=16 tend to be substantially smooth/converging, and for S-CAMs, the results after 10 iterations of Batch unit Batch size=16 tend to be substantially smooth/converging. During training of the boundary detection network, the results after 3 iterations tend to be substantially smooth/converging for Batch unit Batch size=16. In the process of generating the semantic segmentation pseudo labels, after boundary detection network training is completed, inputting pictures of each training set into a network to generate a boundary graph of the pictures, taking boundaries output by the network as constraints, and carrying out label propagation on class activation graphs by using a Random Walk algorithm, so that the semantic segmentation pseudo labels are generated. The supervised training network model adopts deep Lab-v1 and deep Lab-v2 which take ResNet-101 or VGG-16 as backbone networks.
Table 1 pseudo tag experimental results generated on the paspal VOC 2012 training set
Table 2 pseudo tag experimental results generated by different classes of activation maps on the paspal VOC 2012 training set
Table 3 pseudo tag generation method based on a-CAMs strategy in the case of the PASCAL VOC 2012False label ablation experimental result (mIoU value) of value
Table 4 pseudo tag generation method based on S-CAMs strategy in the case of the PASCAL VOC 2012False label ablation experimental result (mIoU value) of value
As shown in table 1, the accuracy of the semantic segmentation pseudo tags generated on the PASCAL VOC 2012 training set was improved by 0.7% when the boundary pseudo tags were synthesized using the method of the present invention relative to the reference model. When the boundary detection training strategy of significance guidance provided by the invention is adopted, the precision of the generated semantic segmentation pseudo tag is improved by 0.4%. When the two methods are combined at the same time, the precision of the generated semantic segmentation pseudo tag reaches 70.4 percent and is 3.2 percent higher than that of the reference model. As shown in table 2, when the generation of the class activation map was also guided using the saliency information, the semantic segmentation pseudo tag accuracy generated on the PASCAL VOC 2012 training set reached 74.1%, well above 67.2% of the benchmark model.
Tables 3 and 4 showThe influence of different values of (2) on the performance of the model, it can be seen that when +.>When 0 is taken, that is, when boundaries synthesized through saliency are not screened, the boundaries may contain irrelevant information or noise, so that error boundary labels are introduced, accurate recognition of the boundaries in the image by the model is affected, and poor performance of the model is caused. After screening the border, i.e. +.>At values of 0.25 to 1.5, the performance of the model is very stable.
Table 5 full-supervised semantic segmentation experimental results (mIoU values) on the PASCAL VOC 2012 validation set and test set
Table 6 full supervised learning semantic segmentation mlou, average accuracy, and pixel accuracy results on the PASCAL VOC 2012 validation set
TABLE 7 full-supervised semantic segmentation mIoU experiment results on MS COCO 2014 validation set
In order to further study the performance of the generated semantic segmentation pseudo tag, the embodiment generates the semantic segmentation pseudo tag for the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset, replaces the real tag to perform full-supervision learning training for the deep Lab-v1 and the deep Lab-v2, selects two backbone networks, namely ResNet-101 and VGG-16, for proving generalization of the invention, the semantic segmentation results are shown in table 5 and table 7, when the same class as the reference network is adopted to activate the image A-CAMs, whether ResNet-101 is selected as the backbone network or VGG-16 is selected as the backbone network, the algorithm provided by the invention far exceeds the reference model, wherein the algorithm reaches 67.4% on the PASCAL 2012 validation set based on the deep Lab-v1 of VGG-16, exceeds the reference network 7.3%, reaches 68.1% on the test set and exceeds the reference network 7.0%; deepLab-v2 based on ResNet-101, the algorithm reaches 71.0% over the benchmark network 5.3% on the PASCAL VOC 2012 validation set, and reaches 70.9% over the benchmark network 4.3% on the test set; when the class activation diagram is S-CAMs, the semantic segmentation result is further improved. On the MS COCO 2014 dataset, deepLab-v2 based on ResNet-101 and VGG-16 as backbones reached 37.7% and 39.1 mIoU on the validation set, respectively. Table 6 shows the mIoU (average cross-correlation ratio), average accuracy and pixel accuracy results on the verification set after the semantic segmentation pseudo tag generated by the invention performs full-supervised learning training on deep Lab-v1 and deep Lab-v2.
As shown in fig. 8, (a) represents an input image i, (b) represents a true label value map corresponding to the input image i, (c) represents a boundary pseudo label map synthesized by a-CAMs, (d) represents a boundary pseudo label map synthesized by saliency guidance, (e) represents a boundary pseudo label map synthesized by a-CAMs and saliency guidance, and (f) represents a boundary map output from a boundary detection network, and (g) represents a semantic division pseudo label map corresponding to the input image i.
It can be observed from fig. 8 that the boundary pseudo tag synthesized using only class activation maps contains fewer boundary pixels, and it is difficult to provide reliable supervision for the boundary detection network. While the saliency map provides rich boundary information. When the two are combined, a very reliable supervision can be provided for the training of the boundary detection network, as shown in part (f) of fig. 8. After training, the network can correctly identify a large amount of complete boundary information, and reliable constraint is provided for the generation of the semantic segmentation pseudo labels in the next step.
As shown in fig. 9, the result comparison graph of the semantic segmentation pseudo tag generated on the PASCAL VOC 2012 dataset according to the present embodiment is shown, where (a) part of fig. 9 represents the input image ii, (b) part represents the real semantic segmentation tag graph corresponding to the input image ii, and (c) part represents the semantic segmentation pseudo tag graph corresponding to the input image ii. In conclusion, the invention can use less manpower resources and time to complete large-scale semantic segmentation tasks. The experimental result also shows that the semantic segmentation model finally obtained achieves higher accuracy by training with the pseudo tag.
Example 3
The embodiment provides a pseudo tag generation system for training a neural network, and the pseudo tag generation method for training the neural network provided in the embodiment 1 is applied.
Fig. 10 is a hardware architecture diagram of a pseudo tag generation system for neural network training according to an embodiment of the present application.
As shown in fig. 10, the pseudo tag generation system includes: the system comprises a class activation graph acquisition module 100, a saliency graph acquisition module 200, a boundary pseudo tag generation module 300, a boundary detection module 400 and a semantic segmentation pseudo tag generation module 500.
The class activation diagram acquisition module 100 is configured to transmit an input image and an image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation diagram; the saliency map obtaining module 200 is configured to transmit the input image to a salient object detection network for area detection, so as to obtain a saliency map; the boundary pseudo tag generation module 300 is configured to fuse each region feature of the class activation graph and the saliency graph, and synthesize a boundary pseudo tag of the input image; the boundary detection module 400 is configured to monitor training of a boundary detection network by using the boundary pseudo tag, transmit the input image to the trained boundary detection network for boundary detection, and extract a boundary of the input image; the semantic segmentation pseudo tag generation module 500 is configured to guide the class activation graph to perform refinement propagation by using the boundary, and generate a semantic segmentation pseudo tag corresponding to the input image.
It should be noted that the foregoing explanation of the embodiment of the pseudo tag generation method for neural network training is also applicable to the pseudo tag generation system for neural network training of this embodiment, and will not be repeated here.
It can be understood that the pseudo tag generation system for neural network training provided by the embodiment of the application generates boundary pseudo tags by using a mode of combining a class activation diagram and a saliency diagram, the class activation diagram provides a foreground region with high confidence and related to a class, meanwhile, a small number of boundary tags can be synthesized by using the class activation diagram, the saliency diagram provides a background region with high confidence, and meanwhile, a large number of boundaries between the foreground and the background can be provided, so that the quality and coverage rate of the boundary pseudo tags are improved. In addition, the boundary detection network is supervised and trained by using the boundary pseudo tag, so that the generalization capability and the robustness of the boundary detection network can be enhanced, the boundary detection network can adapt to images of different scenes and environments, more complete object boundaries can be detected, and the accuracy and the integrity of boundary detection are improved. Finally, the boundary of the input image and the class activation diagram are utilized to carry out refinement and propagation treatment, the class activation diagram can be corrected and optimized according to the boundary information, the cost and time for manually labeling the pixel-level label are greatly reduced, and the precision and the generation efficiency of the semantic segmentation pseudo label are improved. Finally, the saliency map is used for guiding the generation of the class activation map, so that the class activation map can better indicate the object region, more reliable information is provided for the synthesis and propagation of the boundary pseudo labels, and the precision and the generation efficiency of the semantic segmentation pseudo labels are further improved.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (7)

1. A method for generating a pseudo tag for neural network training, comprising the steps of:
s1: transmitting an input image and an image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation diagram, and transmitting the input image to a salient object detection network for region detection to obtain a salient diagram;
s2: fusing the region features of the class activation diagram and the saliency diagram to synthesize a boundary pseudo tag of the input image;
wherein boundary pseudo tags based on class activation graphs are utilized according to the following formulaAnd boundary pseudo tag based on saliency map +.>Generating boundary pseudo tag of input image +.>
Wherein,representing pixelsiBelonging to boundary (I)>Representing pixelsiBelonging to the object area->Representing pixelsiBelongs to a background area;
generating boundary pseudo tags for the class-based activation graphThe specific steps of (a) include:
traversing all pixels in the class activation graph using a sliding window of specified size, and determining each pixel in the class activation graph according to the following formulaiGenerates boundary pseudo tags based on class activation graphs
Wherein,representing pixelsiAt the border in class activation diagram, +.>And->The number of object region pixels and background region pixels in the sliding window are represented respectively,Wpixel set representing a sliding window, +.>Representing pixelsiIn class activation diagram in object region, +.>Representing pixels in class activation diagramsiBelongs to the category ofcProbability of->Pixel threshold for the object region, +.>Representing pixelsiIn the class activation diagram in an uncertainty region;
generating the saliency map-based boundary pseudo tagThe specific steps of (a) include:
performing edge detection on the saliency map by using a Sobel operator to generate a Sobel gradient amplitude;
according to each pixel in the saliency mapiCorresponding Sobel gradient amplitudeAnd significance amplitude->For each pixel in the saliency map, using the following formulaiAssigning boundary pseudo tags based on saliency maps +.>
Wherein,representing pixels in a saliency mapiSobel gradient amplitude, +.>A preset threshold value for the gradient amplitude; />Representing pixels in a saliency mapiIs a significant magnitude of (1); />Representing pixelsiBelonging to a boundary in the saliency map; />Representing pixelsiBelonging to the object region in the saliency map; />Representing pixelsiBelonging to the background area in the saliency map;
s3: monitoring the training of a boundary detection network by using the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image;
s4: and guiding the class activation graph to conduct refinement propagation by utilizing the boundary, and generating a semantic segmentation pseudo tag corresponding to the input image.
2. The method for pseudo tag generation for neural network training of claim 1,
performing edge detection on the saliency map by using a Sobel operator to generate a Sobel gradient amplitude, wherein the method specifically comprises the following steps:
respectively calculating the gradient of the saliency map in the horizontal direction by using the Sobel operator according to the following methodAnd a gradient in the vertical direction
Wherein,and->Sobel gradient operator in horizontal direction and vertical direction respectively,>is a saliency map;
according to the following, the horizontal gradient is utilizedAnd vertical gradient>Calculating Sobel gradient amplitude of saliency map
3. The method for pseudo tag generation for neural network training of claim 1,
s3, the training of the boundary detection network is supervised by using the boundary pseudo tag, which specifically comprises the following steps: using loss functionsTraining the boundary detection network, the loss function +.>The calculation is carried out by the following formula:
wherein,loss function representing foreground pixels and background pixels,/->Loss function representing boundary pixels,/->And->Respectively representing a set of foreground pixels and background pixels in the boundary pseudo tag, +.>Pixels representing the output of the boundary detection networkiProbability values belonging to the boundary; />And->Boundary pixels belonging to the salient region and boundary pixels belonging to the non-salient region in the boundary pseudo tag are represented, respectively.
4. The pseudo tag generation method for neural network training according to claim 1, wherein in S1, the specific step of transmitting the input image and the image level tag corresponding thereto to the classification backbone network based on the residual structure for attention pooling, to obtain the class activation map comprises:
transmitting the input image to a classification backbone network stacked in a residual structure for classification processing, and outputting a positioning map;
using the image level label to perform full-supervision training on the positioning map to generate a first type of activation map;
and generating a second type of activation graph by using the image level label and the full-supervised training of the saliency map to the positioning graph.
5. The method for pseudo tag generation for neural network training of claim 4,
full supervised training of the localization map using the image level labels, a loss function is employedThe expression of (2) is as follows:
wherein,cfor the number of categories in the data set,representing sigmoid function->Pixels in class activation diagrams representing classification network outputiBelongs to the category ofcProbability values of (2);
full supervision of localization maps using the image level labels and the saliency mapsDuring governor training, the loss function adoptedThe expression of (2) is as follows:
6. the method for generating pseudo labels for training a neural network according to any one of claims 1 to 5, wherein in S4, the boundary is used to guide the class activation graph to perform refinement propagation, and the specific step of generating semantic segmentation pseudo labels corresponding to the input image includes:
s4.1: converting the boundary into a semantic affinity matrix;
s4.2: and according to the semantic affinity matrix, performing refinement propagation processing on the class activation graph by using a Random Walk algorithm to generate a semantic segmentation pseudo tag.
7. A pseudo tag generation system for neural network training, applied to the pseudo tag generation method for neural network training of any one of claims 1 to 6, comprising:
the class activation diagram acquisition module is used for transmitting the input image and the image level label corresponding to the input image to the classification backbone network based on the residual structure for attention pooling to obtain a class activation diagram;
the saliency map acquisition module is used for transmitting the input image to a saliency object detection network to perform region detection so as to obtain a saliency map;
the boundary pseudo tag generation module is used for fusing the regional characteristics of the class activation graph and the saliency graph and synthesizing boundary pseudo tags of the input image;
the boundary detection module is used for supervising the training of the boundary detection network by utilizing the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image;
and the semantic segmentation pseudo tag generation module is used for guiding the class activation graph to conduct refinement propagation by utilizing the boundary to generate a semantic segmentation pseudo tag corresponding to the input image.
CN202311331979.6A 2023-10-16 2023-10-16 Pseudo tag generation method and system for neural network training Active CN117079103B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311331979.6A CN117079103B (en) 2023-10-16 2023-10-16 Pseudo tag generation method and system for neural network training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311331979.6A CN117079103B (en) 2023-10-16 2023-10-16 Pseudo tag generation method and system for neural network training

Publications (2)

Publication Number Publication Date
CN117079103A CN117079103A (en) 2023-11-17
CN117079103B true CN117079103B (en) 2024-01-02

Family

ID=88713751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311331979.6A Active CN117079103B (en) 2023-10-16 2023-10-16 Pseudo tag generation method and system for neural network training

Country Status (1)

Country Link
CN (1) CN117079103B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
CN108399406A (en) * 2018-01-15 2018-08-14 中山大学 The method and system of Weakly supervised conspicuousness object detection based on deep learning
CN111832573A (en) * 2020-06-12 2020-10-27 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN113436204A (en) * 2021-06-10 2021-09-24 中国地质大学(武汉) High-resolution remote sensing image weak supervision building extraction method
CN115393598A (en) * 2022-10-31 2022-11-25 南京理工大学 Weakly supervised semantic segmentation method based on non-salient region object mining
CN115512169A (en) * 2022-11-09 2022-12-23 之江实验室 Weak supervision semantic segmentation method and device based on gradient and region affinity optimization
CN115546466A (en) * 2022-09-28 2022-12-30 北京工业大学 Weak supervision image target positioning method based on multi-scale significant feature fusion
CN115546490A (en) * 2022-11-23 2022-12-30 南京理工大学 Weak supervision semantic segmentation method based on significance guidance
WO2023077816A1 (en) * 2021-11-03 2023-05-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium
CN116681903A (en) * 2023-06-06 2023-09-01 大连民族大学 Weak supervision significance target detection method based on complementary fusion pseudo tag

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10410353B2 (en) * 2017-05-18 2019-09-10 Mitsubishi Electric Research Laboratories, Inc. Multi-label semantic boundary detection system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
CN108399406A (en) * 2018-01-15 2018-08-14 中山大学 The method and system of Weakly supervised conspicuousness object detection based on deep learning
CN111832573A (en) * 2020-06-12 2020-10-27 桂林电子科技大学 Image emotion classification method based on class activation mapping and visual saliency
CN113436204A (en) * 2021-06-10 2021-09-24 中国地质大学(武汉) High-resolution remote sensing image weak supervision building extraction method
WO2023077816A1 (en) * 2021-11-03 2023-05-11 中国华能集团清洁能源技术研究院有限公司 Boundary-optimized remote sensing image semantic segmentation method and apparatus, and device and medium
CN115546466A (en) * 2022-09-28 2022-12-30 北京工业大学 Weak supervision image target positioning method based on multi-scale significant feature fusion
CN115393598A (en) * 2022-10-31 2022-11-25 南京理工大学 Weakly supervised semantic segmentation method based on non-salient region object mining
CN115512169A (en) * 2022-11-09 2022-12-23 之江实验室 Weak supervision semantic segmentation method and device based on gradient and region affinity optimization
CN115546490A (en) * 2022-11-23 2022-12-30 南京理工大学 Weak supervision semantic segmentation method based on significance guidance
CN116681903A (en) * 2023-06-06 2023-09-01 大连民族大学 Weak supervision significance target detection method based on complementary fusion pseudo tag

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于显著性背景引导的弱监督语义分割网络;白雪飞等;模式识别与人工智能;第第34卷卷(第第9期期);第824-834页 *

Also Published As

Publication number Publication date
CN117079103A (en) 2023-11-17

Similar Documents

Publication Publication Date Title
US10354392B2 (en) Image guided video semantic object segmentation method and apparatus
CN110276264B (en) Crowd density estimation method based on foreground segmentation graph
CN111985376A (en) Remote sensing image ship contour extraction method based on deep learning
CN113989604B (en) Tire DOT information identification method based on end-to-end deep learning
CN111553414A (en) In-vehicle lost object detection method based on improved Faster R-CNN
CN112488229A (en) Domain self-adaptive unsupervised target detection method based on feature separation and alignment
CN111461121A (en) Electric meter number identification method based on YO L OV3 network
CN113705579A (en) Automatic image annotation method driven by visual saliency
CN117011381A (en) Real-time surgical instrument pose estimation method and system based on deep learning and stereoscopic vision
CN117437647B (en) Oracle character detection method based on deep learning and computer vision
CN117829243A (en) Model training method, target detection device, electronic equipment and medium
CN117079103B (en) Pseudo tag generation method and system for neural network training
CN110889418A (en) Gas contour identification method
CN116681961A (en) Weak supervision target detection method based on semi-supervision method and noise processing
CN114943869B (en) Airport target detection method with enhanced style migration
CN116310293A (en) Method for detecting target of generating high-quality candidate frame based on weak supervised learning
CN113920254B (en) Monocular RGB (Red Green blue) -based indoor three-dimensional reconstruction method and system thereof
CN114067359B (en) Pedestrian detection method integrating human body key points and visible part attention characteristics
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN112287906B (en) Template matching tracking method and system based on depth feature fusion
CN114943741A (en) Visual SLAM method based on target detection and geometric probability in dynamic scene
Huang et al. A Stepwise Refining Image-Level Weakly Supervised Semantic Segmentation Method for Detecting Exposed Surface for Buildings (ESB) From Very High-Resolution Remote Sensing Images
CN114639013A (en) Remote sensing image airplane target detection and identification method based on improved Orient RCNN model
CN114445649A (en) Method for detecting RGB-D single image shadow by multi-scale super-pixel fusion
CN113673534A (en) RGB-D image fruit detection method based on fast RCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant