CN117079103B

CN117079103B - Pseudo tag generation method and system for neural network training

Info

Publication number: CN117079103B
Application number: CN202311331979.6A
Authority: CN
Inventors: 石敏; 邓伟钊; 骆爱文; 易清明
Original assignee: Jinan University
Current assignee: Jinan University
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-01-02
Anticipated expiration: 2043-10-16
Also published as: CN117079103A

Abstract

The invention relates to the technical field of deep learning, and provides a pseudo tag generation method and a pseudo tag generation system for neural network training, wherein the pseudo tag generation method comprises the following steps: transmitting the input image and the image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation diagram, and transmitting the input image to a salient object detection network for region detection to obtain a salient diagram; merging the region features of the class activation diagram and the saliency diagram, and synthesizing boundary pseudo labels of the input image; monitoring training of a boundary detection network by using a boundary pseudo tag, transmitting an input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image; and carrying out refinement propagation by utilizing the boundary guide class activation graph, and generating semantic segmentation pseudo tags corresponding to the input image. The method can greatly reduce the cost and time for manually labeling the pixel-level labels and improve the precision and the generation efficiency of the semantic segmentation pseudo labels.

Description

Pseudo tag generation method and system for neural network training

Technical Field

The invention relates to the technical field of deep learning, in particular to a pseudo tag generation method and a pseudo tag generation system for neural network training.

Background

Image semantic segmentation is a computer vision technique that assigns each pixel in an image to a predefined class for pixel-level image understanding and analysis. Conventional semantic segmentation methods capture visual features of an image, such as color, texture, edges, etc., using various feature extraction algorithms, and then perform image segmentation using machine learning algorithms such as SVM, random Forest, and CRF. In recent years, a semantic segmentation method based on a deep convolutional neural network makes breakthrough progress under the support of a large-scale data set and a high-performance computer, and becomes a current mainstream semantic segmentation technology.

The full-supervision semantic segmentation method can realize classification of pixel levels, but the manual labeling of labels of pixel levels is time-consuming and labor-consuming, is unsuitable for large-scale data sets or real-time updating application, has poor model generalization capability, and needs to be continuously retrained. In order to save manpower and time and reduce the dependence on pixel-level labels, researchers have proposed semantic segmentation methods using weak labels as supervision, and the weak labels commonly used are image-level labels, bounding box labels, graffiti labels, dot labels and the like. The image-level labels only give out the target categories existing in the image, have no position information and are the weak labels which are easiest to label and have the lowest cost, but the image-level labels are difficult to apply to pixel-level semantic segmentation.

In addition, paper Weakly supervised semantic segmentation with boundary exploration proposes a method for generating boundary pseudo labels by using a boundary detection technology, training a boundary detection network to explore object boundaries, and refining class activation diagrams. However, due to insufficient prior information of the boundary, the boundary detection network is difficult to detect the complete object boundary, uncertainty is brought to refinement of the class activation diagram, and the generated semantic segmentation pseudo tag is low in efficiency and precision.

Disclosure of Invention

The invention provides the following technical scheme for overcoming the defects of low efficiency and low precision of semantic segmentation pseudo labels in the prior art:

in a first aspect, the present invention provides a pseudo tag generation method for neural network training, including the steps of:

s1: and transmitting the input image and the image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation graph, and transmitting the input image to a salient object detection network for region detection to obtain a salient graph.

S2: and fusing the region features of the class activation map and the saliency map, and synthesizing a boundary pseudo tag of the input image.

S3: and monitoring the training of the boundary detection network by using the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image.

S4: and guiding the class activation graph to conduct refinement propagation by utilizing the boundary, and generating a semantic segmentation pseudo tag corresponding to the input image.

In a second aspect, the present invention further proposes a pseudo tag generation system for neural network training, which is applied to the pseudo tag generation method for neural network training according to the first aspect, and includes:

and the class activation diagram acquisition module is used for transmitting the input image and the image level label corresponding to the input image to the classification backbone network based on the residual structure for attention pooling to obtain the class activation diagram.

The saliency map acquisition module is used for transmitting the input image to a saliency object detection network for region detection to obtain a saliency map.

And the boundary pseudo tag generation module is used for fusing the regional characteristics of the class activation graph and the saliency graph and synthesizing the boundary pseudo tag of the input image.

And the boundary detection module is used for supervising the training of the boundary detection network by utilizing the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image.

And the semantic segmentation pseudo tag generation module is used for guiding the class activation graph to conduct refinement propagation by utilizing the boundary to generate a semantic segmentation pseudo tag corresponding to the input image.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

(1) The boundary pseudo-labels are generated by combining the class activation diagrams and the saliency diagrams, the class activation diagrams provide foreground areas with high confidence and related to classes, a small number of boundary labels can be synthesized by the class activation diagrams, the saliency diagrams provide background areas with high confidence, a large number of boundaries between the foreground and the background can be provided, and the quality and coverage rate of the boundary pseudo-labels are improved.

(2) The boundary pseudo-label is used for supervising and training the boundary detection network, so that generalization capability and robustness of the boundary detection network can be enhanced, the boundary detection network can adapt to images of different scenes and environments, more complete object boundaries are detected, and accuracy and integrity of boundary detection are improved. Finally, the boundary of the input image and the class activation diagram are utilized to carry out refinement and propagation treatment, the class activation diagram can be corrected and optimized according to the boundary information, the cost and time for manually labeling the pixel-level label are greatly reduced, and the precision and the generation efficiency of the semantic segmentation pseudo label are improved.

(3) The saliency map is used for guiding the generation of the class activation map, so that the class activation map can better indicate the object region, more reliable information is provided for the synthesis and propagation of the boundary pseudo labels, and the precision and the generation efficiency of the semantic segmentation pseudo labels are further improved.

Drawings

Fig. 1 is a schematic diagram i of the pseudo tag generation network in embodiment 1.

Fig. 2 is a schematic diagram ii of the pseudo tag generation network in embodiment 1.

Fig. 3 is a schematic diagram iii of the pseudo tag generation network in embodiment 1.

Fig. 4 is a schematic diagram of the structure of the pseudo tag generation network in embodiment 1 after the pseudo tag generation networks in fig. 1, 2 and 3 are optimally combined.

Fig. 5 is a flow diagram of generating boundary pseudo tags under significance guidance in the pseudo tag generation network structure of fig. 2.

Fig. 6 is a flowchart of the pseudo tag generation network of fig. 4 in embodiment 1.

Fig. 7 is a flowchart of a pseudo tag generation method for neural network training according to an embodiment of the present application.

Part (a) in fig. 8 shows the input image i in example 2, (b) shows the true label value map corresponding to the input image i in example 2, (c) shows the boundary pseudo label map synthesized by a-CAMs in example 2, (d) shows the boundary pseudo label map synthesized by saliency guidance in example 2, (e) shows the boundary pseudo label map synthesized by a-CAMs and saliency guidance in example 2, (f) shows the boundary map output by the boundary detection network in example 2, and (g) shows the semantic segmentation pseudo label map generated corresponding to the input image i in example 2.

In fig. 9, (a) represents the input image ii in embodiment 2, (b) represents the true semantic division label map corresponding to the input image ii in embodiment 2, and (c) represents the semantic division pseudo label map corresponding to the input image ii in embodiment 2.

Fig. 10 is a hardware architecture diagram of a pseudo tag generation system for neural network training in embodiment 3.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present application; for a better illustration of the present embodiment, it may be understood that some of the illustrations of the figures may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

Example 1

The embodiment provides a pseudo tag generation method for neural network training.

As shown in fig. 1 to 5, fig. 1, fig. 2, and fig. 3 are schematic structural diagrams of pseudo tag generation networks with three different structures, respectively, and fig. 4 is a schematic structural diagram of a pseudo tag generation network after the pseudo tag generation networks of fig. 1, fig. 2, and fig. 3 are optimally combined in an embodiment of the present application.

In fig. 1, a classification network is trained by using image-level labels, a first class activation graph (hereinafter referred to as a-CAMs) using the image-level labels as supervisory signals is generated to synthesize boundary pseudo-labels, the synthesized boundary pseudo-labels are used as a supervisory training boundary detection network, and the output boundary of the boundary detection network is used as a constraint to propagate the a-CAMs to obtain semantic segmentation pseudo-labels.

In FIG. 2, the classification network is trained with image level labels, generating A-CAMs. Combining the A-CAMs and the saliency map to generate a synthesized boundary pseudo-label, using the synthesized boundary pseudo-label as a supervision training boundary detection network, using the output boundary of the boundary detection network as a constraint, and propagating the A-CAMs to obtain the semantic segmentation pseudo-label. The reliable background area information provided by the saliency and the boundary between the foreground and the background are fused into the boundary pseudo tag, so that more reliable supervision is provided for the boundary detection network. As shown in fig. 5, fig. 5 is a flow chart of generating boundary pseudo labels under significance guidance in the pseudo label generating network structure of fig. 2, and four parts of information including foreground information, background information, boundary information and uncertain parts are in the boundary pseudo labels generated in fig. 5. The composite of the boundary pseudo tag is divided into two parts, and the background information in the class activation graph is less reliable than the saliency graph and the saliency graph cannot indicate the possible class information in the foreground, so that in the boundary pseudo tag, by reserving the foreground information in the class activation graph and the background information in the saliency graph and reserving the boundary information of the foreground information and the saliency graph, the composite boundary pseudo tag is more accurate from the visual effect after the saliency graph is combined.

In fig. 3, the classification network is trained by using the image level label and the saliency map together, a second type of activation map (hereinafter referred to as S-CAMs) using the image level label and the saliency map as supervisory signals is generated to synthesize boundary pseudo labels, the boundary pseudo labels synthesized are used as a supervisory training boundary detection network, and the output boundary of the boundary detection network is used as a constraint to propagate the S-CAMs to obtain semantic segmentation pseudo labels.

As shown in fig. 6 and fig. 7, fig. 6 is a flowchart of a flow chart of generating a semantic segmentation pseudo tag by using the pseudo tag generation network of fig. 4 in the embodiment of the present application, and fig. 7 is a flowchart of a pseudo tag generation method for training a neural network according to the embodiment of the present application.

In this embodiment, the pseudo tag generation method for neural network training includes the following steps:

Optionally, in one embodiment of the present application, the input image is transmitted to a classification backbone network stacked in a residual structure for classification processing, and a localization map is output.

Full-supervised training of the localization map using the image-level labels to generate a first class of activation maps employing a loss functionThe expression of (2) is as follows:

wherein,cfor the number of categories in the data set,representing sigmoid function->Pixels in class activation diagrams representing classification network outputiBelongs to the category ofcProbability values of (a) are provided.

The loss function employedThe expression of (2) is as follows:

。

in this embodiment, the classification backbone network based on the residual structure is a classification backbone network based on a Resnet 50.

In the implementation process, a group of color images are input, preprocessing is performed first, and modification of the uniform size of 512×512×3 is completed through supplementation or clipping. The number of input channels of the first layer input image in fig. 6 is 3, the last layer of the classification backbone network based on the Resnet50 is replaced by an attention pooling layer (the lower limit of the attention pooling layer is a global average pooling layer), after the color image is input into the classification network, a class activation map with the size of 512×512×c is output (c represents that c classes are in the data set), and when only an image level label is used as a supervision signal, the generated first class activation map is called a-CAMs; when combining the image level labels and saliency maps as supervisory signals, the second type of activation maps generated are called S-CAMs.

Optionally, in an embodiment of the present application, the boundary pseudo tag based on the class activation map is utilized according to the following formulaAnd said saliency map-based boundary pseudo tag +.>Generating boundary pseudo tag of input image +.>：

Wherein,representing pixelsiBelonging to boundary (I)>Representing pixelsiBelonging to the object area->Representing pixelsiBelonging to the background area.

In this embodiment, the boundary pseudo tag based on the class activation graph is generatedThe specific steps of (a) include:

traversing all pixels in the class activation graph using a sliding window of specified size, and determining each pixel in the class activation graph according to the following formulaiGenerates boundary pseudo tags based on class activation graphs：

Wherein,representing pixelsiAt the border in class activation diagram, +.>And->The number of object region pixels and background region pixels in the sliding window are represented respectively,Wpixel set representing a sliding window, +.>Representing pixelsiIn class activation diagram in object region, +.>Representing class activation graphs belonging to a classcProbability of->Pixel threshold for the object region, +.>Representing pixelsiIn the class activation diagram in an uncertainty region.

It will be appreciated that since the class activation map indicates the object region and the background region represented by a small number of image level labels, and the object boundary exists between the object region and the background region, the object region and the background region in the class activation map may be used to indicate the composition of boundary pseudo labels. To achieve this object, the present invention employs a sliding window approach in which the center pixel of the window is considered to be the boundary if and only if a similar number of object region pixels and background region pixels are contained within the window.

In this embodiment, the boundary pseudo tag based on the saliency map is generatedThe specific steps of (a) include:

performing edge detection on the saliency map by using a Sobel operator to generate a Sobel gradient amplitude;

according to each pixel in the saliency mapiCorresponding Sobel gradient amplitudeAnd significance amplitude->For each pixel in the saliency map, using the following formulaiAssigning boundary pseudo tags based on saliency maps +.>：

Wherein,representing pixels in a saliency mapiSobel gradient amplitude, +.>A preset threshold value for the gradient amplitude;representing pixels in a saliency mapiIs a significant magnitude of (1); />Representing pixelsiBelonging to a boundary in the saliency map; />Representing pixelsiBelonging to the object region in the saliency map; />Representing pixelsiBelonging to the background area in the saliency map.

In this embodiment, the horizontal gradients of the saliency maps are calculated using the Sobel operator, respectively, according to the following formulaAnd gradient in vertical direction->：

Wherein,and->Sobel gradient operator in horizontal direction and vertical direction respectively,>is remarkable in thatA sex map;

according to the following, the horizontal gradient is utilizedAnd vertical gradient>Calculating Sobel gradient amplitude of saliency map>：

。

It will be appreciated that the correct boundary tags should contain the complete object boundary, however, only a small number of boundary pseudo tags, and reliable object regions of the corresponding class, can be synthesized by the class activation map. From observation, salient regions in the saliency map generally represent object regions, non-salient regions generally represent background, but salient regions are not available because there is no explicit category information; in addition to this, the boundaries of the salient region and the non-salient region can also be deduced from the saliency map, which highly coincides with the boundary of the object. The present embodiment uses the Sobel operator to extract the boundaries of the salient region and the non-salient region.

Optionally, in an embodiment of the present application, the training of the boundary detection network using the boundary pseudo tag includes: using loss functionsTraining the boundary detection network, the loss function +.>The calculation is carried out by the following formula:

wherein,loss function representing foreground pixels and background pixels,/->Loss function representing boundary pixels,/->And->Respectively representing a set of foreground pixels and background pixels in the boundary pseudo tag, +.>Pixels representing the output of the boundary detection networkiProbability values belonging to the boundary; />Andboundary pixels belonging to the salient region and boundary pixels belonging to the non-salient region in the boundary pseudo tag are represented, respectively.

It will be appreciated that due to the synthesized boundary pixelsDoes not represent an exact object boundary, so that the parameter +.>，/>The non-linear nature of (c) makes the boundary detection network more sensitive to predicted variations in boundary pixels. Thus, when the pixeliIn the boundary pseudo-label is considered as boundary pixel, and its boundary probability prediction value in the boundary network output +.>Lower, the contribution weight of the pixel in the loss function is reduced. Since the boundary extracted from the saliency map is not necessarily accurate, the present embodiment adopts two methods to solve the problem of inaccurate boundary pixels, namely, the boundary term is multiplied by +.>The boundary detection network is more sensitive to the predicted change of the boundary pixels, if the boundary pseudo label is defined as the boundary for the pixels predicted to be non-boundary, the contribution weight of the pixels to the loss function is reduced so as to reduce the negative influence of the inaccurate boundary pseudo label, and the boundary pixels in the saliency area and the boundary pixels in the non-saliency area in the boundary pseudo label are respectively activated, so that the correct object boundary can exist in the saliency area and the non-saliency area due to the uncertainty of the extracted object boundary, and the strategies proposed by the embodiment respectively activate the boundary pixels and the non-saliency area for the two possibilities so as to avoid the negative influence of the inaccurate boundary pixels on the network training.

In this embodiment, the boundary detection network selects a classification network using the Resnet50 as a backbone network, inputs a group of color pictures (with a self-defined number), performs preprocessing first, and completes modification of 512×512×3 uniform size by supplementing or clipping. The number of input channels of the image input in the first layer is 3, and after the color image is input into the classification network, a boundary prediction map of 128×128×1 in size is output, and then a loss function is calculated using the boundary prediction map and the synthesized boundary pseudo tag.

In this embodiment, the specific steps of S4 include:

s4.1: the boundary is converted into a semantic affinity matrix.

S4.2: and according to the semantic affinity matrix, performing refinement propagation processing on the class activation graph by using a Random Walk algorithm to generate a semantic segmentation pseudo tag.

In the implementation process, boundaries output by a boundary detection network are used as constraints, the boundaries are converted into a semantic affinity matrix, and a Random Walk algorithm is used for propagation refinement of the class activation graph. In addition, when dCRF (dense conditional random field) is applied to the propagation refinement process, the quality of the semantic segmentation pseudo tag can be further improved.

It can be understood that the boundary pseudo-labels are generated by combining the class activation diagrams and the saliency diagrams, the class activation diagrams provide foreground regions with high confidence and related to classes, a small number of boundary labels can be synthesized by the class activation diagrams, the saliency diagrams provide background regions with high confidence, a large number of boundaries between the foreground and the background can be provided, and the quality and coverage rate of the boundary pseudo-labels are improved. In addition, the boundary detection network is supervised and trained by using the boundary pseudo tag, so that the generalization capability and the robustness of the boundary detection network can be enhanced, the boundary detection network can adapt to images of different scenes and environments, more complete object boundaries can be detected, and the accuracy and the integrity of boundary detection are improved. Finally, the boundary of the input image and the class activation diagram are utilized to carry out refinement and propagation treatment, the class activation diagram can be corrected and optimized according to the boundary information, the cost and time for manually labeling the pixel-level label are greatly reduced, and the precision and the generation efficiency of the semantic segmentation pseudo label are improved. Finally, the saliency map is used for guiding the generation of the class activation map, so that the class activation map can better indicate the object region, more reliable information is provided for the synthesis and propagation of the boundary pseudo labels, and the precision and the generation efficiency of the semantic segmentation pseudo labels are further improved.

In this embodiment, by performing a lot of experiments on two data sets of the PASCAL VOC 2012 and the MS COCO 2014, the comparison of the base network, the generated pseudo tag has only 66.4% of mIoU, and only 67.2% of mIoU is obtained even after dCRF treatment, while the generated pseudo tag has 66.9% and 67.9% (+dcrf) when the boundary pseudo tag synthesis strategy of significance guidance proposed by the present invention is combined alone; when the boundary detection network training strategy provided by the invention is independently combined, the generated pseudo tag mIoU reaches 67.2% and 67.6% (+dCTF); when two strategies are combined simultaneously, the generated pseudo tag mIoU reaches 69.8 percent and 70.4 percent (+dCTF), and is greatly improved relative to a basic network; when the instruction of the saliency information is also added in the generation process of the class activation diagram, the generated pseudo tag mIoU reaches 73.8% and 74.1% (+dCRF) which far exceeds the basic network.

Example 2

The embodiment applies the pseudo tag generation method for neural network training proposed in embodiment 1 to perform specific implementation verification.

The experimental verification is performed by using a paspal VOC 2012 dataset and an MS COCO 2014 dataset, wherein the paspal VOC 2012 dataset comprises 21 types of images including a background, and the MS COCO 2014 dataset comprises 81 types of images including the background. Specifically, for the PASCAL VOC 2012 dataset, the present embodiment uses 10582 images as the training set, 1464 images as the validation set, and 1456 images as the test set. For the MS COCO 2014 dataset, this example excludes images without categories, using 82081 images for training and 40137 images for verification. In the pseudo tag generation stage in the experiment, the input image sizes from the above two data sets were each set to 512×512×3.

Illustratively, in class activation graph generation, for a-CAMs, the results after 5 iterations of Batch unit Batch size=16 tend to be substantially smooth/converging, and for S-CAMs, the results after 10 iterations of Batch unit Batch size=16 tend to be substantially smooth/converging. During training of the boundary detection network, the results after 3 iterations tend to be substantially smooth/converging for Batch unit Batch size=16. In the process of generating the semantic segmentation pseudo labels, after boundary detection network training is completed, inputting pictures of each training set into a network to generate a boundary graph of the pictures, taking boundaries output by the network as constraints, and carrying out label propagation on class activation graphs by using a Random Walk algorithm, so that the semantic segmentation pseudo labels are generated. The supervised training network model adopts deep Lab-v1 and deep Lab-v2 which take ResNet-101 or VGG-16 as backbone networks.

Table 1 pseudo tag experimental results generated on the paspal VOC 2012 training set

Table 2 pseudo tag experimental results generated by different classes of activation maps on the paspal VOC 2012 training set

Table 3 pseudo tag generation method based on a-CAMs strategy in the case of the PASCAL VOC 2012False label ablation experimental result (mIoU value) of value

Table 4 pseudo tag generation method based on S-CAMs strategy in the case of the PASCAL VOC 2012False label ablation experimental result (mIoU value) of value

As shown in table 1, the accuracy of the semantic segmentation pseudo tags generated on the PASCAL VOC 2012 training set was improved by 0.7% when the boundary pseudo tags were synthesized using the method of the present invention relative to the reference model. When the boundary detection training strategy of significance guidance provided by the invention is adopted, the precision of the generated semantic segmentation pseudo tag is improved by 0.4%. When the two methods are combined at the same time, the precision of the generated semantic segmentation pseudo tag reaches 70.4 percent and is 3.2 percent higher than that of the reference model. As shown in table 2, when the generation of the class activation map was also guided using the saliency information, the semantic segmentation pseudo tag accuracy generated on the PASCAL VOC 2012 training set reached 74.1%, well above 67.2% of the benchmark model.

Tables 3 and 4 showThe influence of different values of (2) on the performance of the model, it can be seen that when +.>When 0 is taken, that is, when boundaries synthesized through saliency are not screened, the boundaries may contain irrelevant information or noise, so that error boundary labels are introduced, accurate recognition of the boundaries in the image by the model is affected, and poor performance of the model is caused. After screening the border, i.e. +.>At values of 0.25 to 1.5, the performance of the model is very stable.

Table 5 full-supervised semantic segmentation experimental results (mIoU values) on the PASCAL VOC 2012 validation set and test set

Table 6 full supervised learning semantic segmentation mlou, average accuracy, and pixel accuracy results on the PASCAL VOC 2012 validation set

TABLE 7 full-supervised semantic segmentation mIoU experiment results on MS COCO 2014 validation set

In order to further study the performance of the generated semantic segmentation pseudo tag, the embodiment generates the semantic segmentation pseudo tag for the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset, replaces the real tag to perform full-supervision learning training for the deep Lab-v1 and the deep Lab-v2, selects two backbone networks, namely ResNet-101 and VGG-16, for proving generalization of the invention, the semantic segmentation results are shown in table 5 and table 7, when the same class as the reference network is adopted to activate the image A-CAMs, whether ResNet-101 is selected as the backbone network or VGG-16 is selected as the backbone network, the algorithm provided by the invention far exceeds the reference model, wherein the algorithm reaches 67.4% on the PASCAL 2012 validation set based on the deep Lab-v1 of VGG-16, exceeds the reference network 7.3%, reaches 68.1% on the test set and exceeds the reference network 7.0%; deepLab-v2 based on ResNet-101, the algorithm reaches 71.0% over the benchmark network 5.3% on the PASCAL VOC 2012 validation set, and reaches 70.9% over the benchmark network 4.3% on the test set; when the class activation diagram is S-CAMs, the semantic segmentation result is further improved. On the MS COCO 2014 dataset, deepLab-v2 based on ResNet-101 and VGG-16 as backbones reached 37.7% and 39.1 mIoU on the validation set, respectively. Table 6 shows the mIoU (average cross-correlation ratio), average accuracy and pixel accuracy results on the verification set after the semantic segmentation pseudo tag generated by the invention performs full-supervised learning training on deep Lab-v1 and deep Lab-v2.

As shown in fig. 8, (a) represents an input image i, (b) represents a true label value map corresponding to the input image i, (c) represents a boundary pseudo label map synthesized by a-CAMs, (d) represents a boundary pseudo label map synthesized by saliency guidance, (e) represents a boundary pseudo label map synthesized by a-CAMs and saliency guidance, and (f) represents a boundary map output from a boundary detection network, and (g) represents a semantic division pseudo label map corresponding to the input image i.

It can be observed from fig. 8 that the boundary pseudo tag synthesized using only class activation maps contains fewer boundary pixels, and it is difficult to provide reliable supervision for the boundary detection network. While the saliency map provides rich boundary information. When the two are combined, a very reliable supervision can be provided for the training of the boundary detection network, as shown in part (f) of fig. 8. After training, the network can correctly identify a large amount of complete boundary information, and reliable constraint is provided for the generation of the semantic segmentation pseudo labels in the next step.

As shown in fig. 9, the result comparison graph of the semantic segmentation pseudo tag generated on the PASCAL VOC 2012 dataset according to the present embodiment is shown, where (a) part of fig. 9 represents the input image ii, (b) part represents the real semantic segmentation tag graph corresponding to the input image ii, and (c) part represents the semantic segmentation pseudo tag graph corresponding to the input image ii. In conclusion, the invention can use less manpower resources and time to complete large-scale semantic segmentation tasks. The experimental result also shows that the semantic segmentation model finally obtained achieves higher accuracy by training with the pseudo tag.

Example 3

The embodiment provides a pseudo tag generation system for training a neural network, and the pseudo tag generation method for training the neural network provided in the embodiment 1 is applied.

Fig. 10 is a hardware architecture diagram of a pseudo tag generation system for neural network training according to an embodiment of the present application.

As shown in fig. 10, the pseudo tag generation system includes: the system comprises a class activation graph acquisition module 100, a saliency graph acquisition module 200, a boundary pseudo tag generation module 300, a boundary detection module 400 and a semantic segmentation pseudo tag generation module 500.

The class activation diagram acquisition module 100 is configured to transmit an input image and an image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation diagram; the saliency map obtaining module 200 is configured to transmit the input image to a salient object detection network for area detection, so as to obtain a saliency map; the boundary pseudo tag generation module 300 is configured to fuse each region feature of the class activation graph and the saliency graph, and synthesize a boundary pseudo tag of the input image; the boundary detection module 400 is configured to monitor training of a boundary detection network by using the boundary pseudo tag, transmit the input image to the trained boundary detection network for boundary detection, and extract a boundary of the input image; the semantic segmentation pseudo tag generation module 500 is configured to guide the class activation graph to perform refinement propagation by using the boundary, and generate a semantic segmentation pseudo tag corresponding to the input image.

It should be noted that the foregoing explanation of the embodiment of the pseudo tag generation method for neural network training is also applicable to the pseudo tag generation system for neural network training of this embodiment, and will not be repeated here.

It can be understood that the pseudo tag generation system for neural network training provided by the embodiment of the application generates boundary pseudo tags by using a mode of combining a class activation diagram and a saliency diagram, the class activation diagram provides a foreground region with high confidence and related to a class, meanwhile, a small number of boundary tags can be synthesized by using the class activation diagram, the saliency diagram provides a background region with high confidence, and meanwhile, a large number of boundaries between the foreground and the background can be provided, so that the quality and coverage rate of the boundary pseudo tags are improved. In addition, the boundary detection network is supervised and trained by using the boundary pseudo tag, so that the generalization capability and the robustness of the boundary detection network can be enhanced, the boundary detection network can adapt to images of different scenes and environments, more complete object boundaries can be detected, and the accuracy and the integrity of boundary detection are improved. Finally, the boundary of the input image and the class activation diagram are utilized to carry out refinement and propagation treatment, the class activation diagram can be corrected and optimized according to the boundary information, the cost and time for manually labeling the pixel-level label are greatly reduced, and the precision and the generation efficiency of the semantic segmentation pseudo label are improved. Finally, the saliency map is used for guiding the generation of the class activation map, so that the class activation map can better indicate the object region, more reliable information is provided for the synthesis and propagation of the boundary pseudo labels, and the precision and the generation efficiency of the semantic segmentation pseudo labels are further improved.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A method for generating a pseudo tag for neural network training, comprising the steps of:

s1: transmitting an input image and an image level label corresponding to the input image to a classification backbone network based on a residual structure for attention pooling to obtain a class activation diagram, and transmitting the input image to a salient object detection network for region detection to obtain a salient diagram;

s2: fusing the region features of the class activation diagram and the saliency diagram to synthesize a boundary pseudo tag of the input image;

wherein boundary pseudo tags based on class activation graphs are utilized according to the following formulaAnd boundary pseudo tag based on saliency map +.>Generating boundary pseudo tag of input image +.>：

Wherein,representing pixelsiBelonging to boundary (I)>Representing pixelsiBelonging to the object area->Representing pixelsiBelongs to a background area;

generating boundary pseudo tags for the class-based activation graphThe specific steps of (a) include:

Wherein,representing pixelsiAt the border in class activation diagram, +.>And->The number of object region pixels and background region pixels in the sliding window are represented respectively,Wpixel set representing a sliding window, +.>Representing pixelsiIn class activation diagram in object region, +.>Representing pixels in class activation diagramsiBelongs to the category ofcProbability of->Pixel threshold for the object region, +.>Representing pixelsiIn the class activation diagram in an uncertainty region;

generating the saliency map-based boundary pseudo tagThe specific steps of (a) include:

Wherein,representing pixels in a saliency mapiSobel gradient amplitude, +.>A preset threshold value for the gradient amplitude; />Representing pixels in a saliency mapiIs a significant magnitude of (1); />Representing pixelsiBelonging to a boundary in the saliency map; />Representing pixelsiBelonging to the object region in the saliency map; />Representing pixelsiBelonging to the background area in the saliency map;

s3: monitoring the training of a boundary detection network by using the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image;

2. The method for pseudo tag generation for neural network training of claim 1,

performing edge detection on the saliency map by using a Sobel operator to generate a Sobel gradient amplitude, wherein the method specifically comprises the following steps:

respectively calculating the gradient of the saliency map in the horizontal direction by using the Sobel operator according to the following methodAnd a gradient in the vertical direction：

Wherein,and->Sobel gradient operator in horizontal direction and vertical direction respectively,>is a saliency map;

according to the following, the horizontal gradient is utilizedAnd vertical gradient>Calculating Sobel gradient amplitude of saliency map：

。

3. The method for pseudo tag generation for neural network training of claim 1,

s3, the training of the boundary detection network is supervised by using the boundary pseudo tag, which specifically comprises the following steps: using loss functionsTraining the boundary detection network, the loss function +.>The calculation is carried out by the following formula:

wherein,loss function representing foreground pixels and background pixels,/->Loss function representing boundary pixels,/->And->Respectively representing a set of foreground pixels and background pixels in the boundary pseudo tag, +.>Pixels representing the output of the boundary detection networkiProbability values belonging to the boundary; />And->Boundary pixels belonging to the salient region and boundary pixels belonging to the non-salient region in the boundary pseudo tag are represented, respectively.

4. The pseudo tag generation method for neural network training according to claim 1, wherein in S1, the specific step of transmitting the input image and the image level tag corresponding thereto to the classification backbone network based on the residual structure for attention pooling, to obtain the class activation map comprises:

transmitting the input image to a classification backbone network stacked in a residual structure for classification processing, and outputting a positioning map;

using the image level label to perform full-supervision training on the positioning map to generate a first type of activation map;

and generating a second type of activation graph by using the image level label and the full-supervised training of the saliency map to the positioning graph.

5. The method for pseudo tag generation for neural network training of claim 4,

full supervised training of the localization map using the image level labels, a loss function is employedThe expression of (2) is as follows:

wherein,cfor the number of categories in the data set,representing sigmoid function->Pixels in class activation diagrams representing classification network outputiBelongs to the category ofcProbability values of (2);

full supervision of localization maps using the image level labels and the saliency mapsDuring governor training, the loss function adoptedThe expression of (2) is as follows:

。

6. the method for generating pseudo labels for training a neural network according to any one of claims 1 to 5, wherein in S4, the boundary is used to guide the class activation graph to perform refinement propagation, and the specific step of generating semantic segmentation pseudo labels corresponding to the input image includes:

s4.1: converting the boundary into a semantic affinity matrix;

7. A pseudo tag generation system for neural network training, applied to the pseudo tag generation method for neural network training of any one of claims 1 to 6, comprising:

the class activation diagram acquisition module is used for transmitting the input image and the image level label corresponding to the input image to the classification backbone network based on the residual structure for attention pooling to obtain a class activation diagram;

the saliency map acquisition module is used for transmitting the input image to a saliency object detection network to perform region detection so as to obtain a saliency map;

the boundary pseudo tag generation module is used for fusing the regional characteristics of the class activation graph and the saliency graph and synthesizing boundary pseudo tags of the input image;

the boundary detection module is used for supervising the training of the boundary detection network by utilizing the boundary pseudo tag, transmitting the input image to the trained boundary detection network for boundary detection, and extracting the boundary of the input image;