CN116524207A

CN116524207A - Weak supervision RGBD image significance detection method based on edge detection assistance

Info

Publication number: CN116524207A
Application number: CN202211575959.9A
Authority: CN
Inventors: 陈羽中; 朱文婧; 牛玉贞; 杨立芬
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-08-01

Abstract

The invention relates to a weak supervision RGBD image saliency detection method based on edge detection assistance, which comprises the following steps: step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement; step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network; step S3: designing a fusion module; step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function to optimize network parameters; step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result. By the technical scheme, weak supervision RGBD image saliency detection with good performance can be realized.

Description

Weak supervision RGBD image significance detection method based on edge detection assistance

Technical Field

The invention relates to the technical field of image processing and computer vision, in particular to a weak supervision RGBD image saliency detection method based on edge detection assistance.

Background

Saliency target detection is an important research content in the field of computer vision, and aims to simulate a human visual perception system to find the most attractive object in an image and divide the object at a pixel level. As a fundamental image processing problem, it plays a key role in tasks such as object detection, semantic segmentation, video tracking, and image understanding.

With the development of convolutional neural networks, a plurality of image saliency detection methods based on deep learning are proposed, and compared with the traditional methods, the methods have greatly improved performance. However, deep learning requires a large amount of training data to support, and the acquisition of pixel-by-pixel labeling labels required by the strong supervision saliency detection model is quite expensive, so that weak supervision image saliency detection has become a research direction actively explored by many students.

Weak supervision image saliency detection models incomplete weak level labels, and then deduces complete saliency targets by means of strong generalization capability of the model, wherein common weak level labels comprise noise labels, image level labels, bounding boxes, graffiti labels and the like. These low cost labels do not provide complete salient object structure details compared to pixel-by-pixel labeling labels, which presents a greater challenge for the saliency detection network model to recover fine salient object edge structures. Most of the current methods choose to introduce traditional unsupervised saliency detection methods, image classification tasks or edge detection tasks, etc. as aids, which are used to help determine the position and edge of the salient object. However, in some complex scenarios, the problem of edge localization, which is difficult to solve by strong supervision saliency detection, will become more difficult in weak supervision situations, depending on the color and texture features provided by the color image alone. The saliency detection of the weakly supervised RGBD image can improve the saliency target detection capability in a complex scene by introducing a depth map and taking rich structural information and position information contained in the depth map as supplements. But brings new problems such as cross-modal conflict between color image and depth map, rough edge problem of depth map, noise problem brought by low quality depth map, etc. at the same time of introducing depth map.

Disclosure of Invention

In view of the above, the present invention aims to provide a weak supervision RGBD image saliency detection method based on edge detection assistance, which can realize weak supervision RGBD image saliency detection with better performance.

In order to achieve the above purpose, the invention adopts the following technical scheme: the weak supervision RGBD image significance detection method based on edge detection assistance comprises the following steps:

step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;

step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;

step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;

step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;

step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.

In a preferred embodiment, the step S1 specifically includes:

step S11: dividing a data set into a training set and a testing set according to a certain proportion;

step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;

step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.

In a preferred embodiment, the step S2 specifically includes:

step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->

Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then splice theThe features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3×3 convolution layer, channel attention, spatial attention and a 3×3 convolution layer connected in series. Finally, the fused features are reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the process is expressed as follows:

wherein the method comprises the steps ofRepresenting the initial saliency feature of the kth layer, < >>And->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F _CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) _1×1 A convolution layer representing a convolution kernel of 1;

step S23: designing an edge detection branch to obtain an edge characteristic E _k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:

wherein E is _k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F _CFF ' Cross-modal feature fusion module in edge detection branch, conv _1×1 Representing a convolution layer with a convolution kernel of 1.

Step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E _k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:

wherein S is _k Representing edge-refined saliency characteristics of the kth layer,and E is _k The initial saliency feature and the edge feature of the kth layer are respectively represented, the value is represented by the splicing operation, conv _1×1 Representing a convolution layer with a convolution kernel of 1.

In a preferred embodiment, the step S3 specifically includes:

step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:

S _final ＝σ(Conv _3×3 (H ₁ ))

wherein H is _k Representing the aggregate characteristics of the kth layer, S _k Representing salient features of edge refinement of the kth layer, F _up Representing upsampling, conv _3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S _final Representing the final significance prediction result.

In a preferred embodiment, the step S4 is specifically:

step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;

step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:

where L represents the loss function of the final training, Σ represents the sum, k e 1, …, andpartial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->Smoothing losses, respectively, on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result, < >>Is the cross entropy loss acting on the k-th layer of the edge detection branch. And->The specific calculation formula of (2) is as follows:

S _k ′＝σ(S _k )

E _k ′＝σ(E _k )

where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S _k And S is _k ' represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, respectively, Y represents the input graffiti annotation graph, U represents the graffiti region in the graffiti annotation graph Y, (i, j) e U represents the pixels located in the graffiti region, log represents the log function, S _final Representing the final significance prediction result map, delta representing the derivative,/->ΔI[i,j]ΔG [ i, j and ΔS _final [i,j]Respectively representing an initial saliency prediction image of a kth layer, a saliency prediction image with thinned edges of the kth layer, a color image, a depth image and a final saliency prediction result image after derivation, wherein |·| represents taking an absolute value, e is a constant, alpha is a fixed parameter, and->Defined as->To avoid the result being 0, E _k And E is _k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S _final [i,j]、/>ΔS′ _k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is _k ′[i,j]Respectively representing the images Y, & lt, & gt>S′ _k 、S _final 、/>ΔS′ _k ΔI, Δ G, E and E _k Values at the ith row and jth column pixels of';

step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.

Compared with the prior art, the invention has the following beneficial effects: the method has the advantages of fully utilizing the advantages provided by the combination of the color image and the depth image, avoiding the problems caused by the depth image and realizing the weak supervision RGBD image saliency detection with better performance.

Drawings

Fig. 1 is a flow chart of an implementation of a preferred embodiment of the present invention.

FIG. 2 is an example of a set of RGBD images and their corresponding graffiti labels in a preferred embodiment of the present invention.

Fig. 3 is a diagram showing a network model structure in the preferred embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

The invention provides a weak supervision RGBD image significance detection method based on edge detection assistance, which is shown in fig. 1 and comprises the following steps:

Further, the step S1 specifically includes the following steps:

Further, the step S2 specifically includes the following steps:

Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then, the spliced features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3×3 convolution layer, channel attention, spatial attention and a 3×3 convolution layer connected in series. Finally, the fused features are reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the process is expressed as follows:

wherein the method comprises the steps ofRepresenting the initial saliency of the kth layerCharacteristic(s)>And->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F _CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) _1×1 A convolution layer representing a convolution kernel of 1;

Further, the step S3 specifically includes the following steps:

S _final ＝σ(Conv _3×3 (H ₁ ))

Further, the step S4 specifically includes the following steps:

S _k ′＝σ(S _k )

E _k ′＝σ(E _k )

where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S _k And S is _k ' respectively represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, Y represents the input graffiti label graph, U represents the graffiti region in the graffiti label graph YThe field (i, j) e U represents the pixels located in the graffiti area, log represents the log function, S _final Representing the final significance prediction result map, delta representing the derivative,/->ΔS′ _k 、ΔI[i,j]ΔG [ i, j and ΔS _final [i,j]Respectively representing an initial saliency prediction image of a kth layer, a saliency prediction image with thinned edges of the kth layer, a color image, a depth image and a final saliency prediction result image after derivation, wherein |·| represents taking an absolute value, e is a constant, alpha is a fixed parameter, and->Defined as->To avoid the result being 0, E _k And E is _k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S′ _k [i,j]、S _final [i,j]、/>ΔS′ _k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is _k ′[i,j]Respectively representing the images Y, & lt, & gt>S′ _k 、S _final 、/>ΔS′ _k ΔI, Δ G, E and E _k Values at the ith row and jth column pixels of';

The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims

1. The weak supervision RGBD image saliency detection method based on edge detection assistance is characterized by comprising the following steps of:

2. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S1 specifically comprises:

3. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S2 specifically comprises:

Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then, the spliced features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3X 3 convolution layer, channel attention, space attention and a 3X 3 convolution layer which are connected in series; finally, the fused features are reduced to 1 dimension by a convolution layer with a convolution kernel of 1, and the process is usedThe formula is as follows:

wherein the method comprises the steps ofRepresenting the initial saliency feature of the kth layer, < >>And->Color image features and depth map features, respectively, of the kth layer->Representing splicing operation, F _CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) _1×1 A convolution layer representing a convolution kernel of 1;

wherein E is _k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features, respectively, of the kth layer->Representing splicing operation, F _CFF ' Cross-modal feature fusion module in edge detection branch, conv _1×1 A convolution layer representing a convolution kernel of 1;

wherein S is _k Representing edge-refined saliency characteristics of the kth layer,and E is _k Representing the initial saliency feature and the edge feature, respectively, of the kth layer,/->Representing splicing operations, conv _1×1 Representing a convolution layer with a convolution kernel of 1.

4. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S3 specifically comprises:

S _final ＝σ(Conv _3×3 (H ₁ ))

5. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S4 specifically comprises:

where L represents the loss function of the final training, Σ represents the sum, k e 1, …, and->Partial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->The smoothing loss acting on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result respectively,is the cross entropy loss acting on the k-th layer of the edge detection branch; and->The specific calculation formula of (2) is as follows:

S _k ′＝σ(S _k )

E _k ′＝σ(E _k )

where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S _k And S is _k ' represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, respectively, Y represents the input graffiti annotation graph, U represents the graffiti region in the graffiti annotation graph Y, (i, j) e U represents the pixels located in the graffiti region, log represents the log function, S _final Representing the final significance prediction result map, delta representing the derivative,/->ΔS′ _k 、ΔI[i,j]、ΔG[i,j]And DeltaS _final [i,j]Respectively represent the initial saliency prediction map of the kth layer and the edge refinement display of the kth layerThe graph after deriving the original predictive graph, the color image, the depth graph and the final significant predictive result graph, |·| represents taking absolute values, e is a constant, α is a fixed parameter, ++>Defined as->To avoid the result being 0, E _k And E is _k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S′ _k [i,j]、S _final [i,j]、/>ΔS′ _k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is _k ′[i,j]Respectively representing the images Y, & lt, & gt>S _final 、/>ΔS′ _k ΔI, Δ G, E and E _k Values at the ith row and jth column pixels of';