CN116524207A - Weak supervision RGBD image significance detection method based on edge detection assistance - Google Patents
Weak supervision RGBD image significance detection method based on edge detection assistance Download PDFInfo
- Publication number
- CN116524207A CN116524207A CN202211575959.9A CN202211575959A CN116524207A CN 116524207 A CN116524207 A CN 116524207A CN 202211575959 A CN202211575959 A CN 202211575959A CN 116524207 A CN116524207 A CN 116524207A
- Authority
- CN
- China
- Prior art keywords
- edge
- layer
- saliency
- representing
- weak supervision
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 64
- 238000003708 edge detection Methods 0.000 title claims abstract description 48
- 230000004927 fusion Effects 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims description 19
- 238000010422 painting Methods 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 14
- 238000013135 deep learning Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 230000004438 eyesight Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention relates to a weak supervision RGBD image saliency detection method based on edge detection assistance, which comprises the following steps: step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement; step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network; step S3: designing a fusion module; step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function to optimize network parameters; step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result. By the technical scheme, weak supervision RGBD image saliency detection with good performance can be realized.
Description
Technical Field
The invention relates to the technical field of image processing and computer vision, in particular to a weak supervision RGBD image saliency detection method based on edge detection assistance.
Background
Saliency target detection is an important research content in the field of computer vision, and aims to simulate a human visual perception system to find the most attractive object in an image and divide the object at a pixel level. As a fundamental image processing problem, it plays a key role in tasks such as object detection, semantic segmentation, video tracking, and image understanding.
With the development of convolutional neural networks, a plurality of image saliency detection methods based on deep learning are proposed, and compared with the traditional methods, the methods have greatly improved performance. However, deep learning requires a large amount of training data to support, and the acquisition of pixel-by-pixel labeling labels required by the strong supervision saliency detection model is quite expensive, so that weak supervision image saliency detection has become a research direction actively explored by many students.
Weak supervision image saliency detection models incomplete weak level labels, and then deduces complete saliency targets by means of strong generalization capability of the model, wherein common weak level labels comprise noise labels, image level labels, bounding boxes, graffiti labels and the like. These low cost labels do not provide complete salient object structure details compared to pixel-by-pixel labeling labels, which presents a greater challenge for the saliency detection network model to recover fine salient object edge structures. Most of the current methods choose to introduce traditional unsupervised saliency detection methods, image classification tasks or edge detection tasks, etc. as aids, which are used to help determine the position and edge of the salient object. However, in some complex scenarios, the problem of edge localization, which is difficult to solve by strong supervision saliency detection, will become more difficult in weak supervision situations, depending on the color and texture features provided by the color image alone. The saliency detection of the weakly supervised RGBD image can improve the saliency target detection capability in a complex scene by introducing a depth map and taking rich structural information and position information contained in the depth map as supplements. But brings new problems such as cross-modal conflict between color image and depth map, rough edge problem of depth map, noise problem brought by low quality depth map, etc. at the same time of introducing depth map.
Disclosure of Invention
In view of the above, the present invention aims to provide a weak supervision RGBD image saliency detection method based on edge detection assistance, which can realize weak supervision RGBD image saliency detection with better performance.
In order to achieve the above purpose, the invention adopts the following technical scheme: the weak supervision RGBD image significance detection method based on edge detection assistance comprises the following steps:
step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;
step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;
step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;
step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;
step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.
In a preferred embodiment, the step S1 specifically includes:
step S11: dividing a data set into a training set and a testing set according to a certain proportion;
step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;
step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.
In a preferred embodiment, the step S2 specifically includes:
step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->
Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then splice theThe features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3×3 convolution layer, channel attention, spatial attention and a 3×3 convolution layer connected in series. Finally, the fused features are reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the process is expressed as follows:
wherein the method comprises the steps ofRepresenting the initial saliency feature of the kth layer, < >>And->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) 1×1 A convolution layer representing a convolution kernel of 1;
step S23: designing an edge detection branch to obtain an edge characteristic E k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:
wherein E is k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF ' Cross-modal feature fusion module in edge detection branch, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
Step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:
wherein S is k Representing edge-refined saliency characteristics of the kth layer,and E is k The initial saliency feature and the edge feature of the kth layer are respectively represented, the value is represented by the splicing operation, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
In a preferred embodiment, the step S3 specifically includes:
step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:
S final =σ(Conv 3×3 (H 1 ))
wherein H is k Representing the aggregate characteristics of the kth layer, S k Representing salient features of edge refinement of the kth layer, F up Representing upsampling, conv 3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S final Representing the final significance prediction result.
In a preferred embodiment, the step S4 is specifically:
step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;
step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:
where L represents the loss function of the final training, Σ represents the sum, k e 1, …, andpartial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->Smoothing losses, respectively, on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result, < >>Is the cross entropy loss acting on the k-th layer of the edge detection branch. And->The specific calculation formula of (2) is as follows:
S k ′=σ(S k )
E k ′=σ(E k )
where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S k And S is k ' represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, respectively, Y represents the input graffiti annotation graph, U represents the graffiti region in the graffiti annotation graph Y, (i, j) e U represents the pixels located in the graffiti region, log represents the log function, S final Representing the final significance prediction result map, delta representing the derivative,/->ΔI[i,j]ΔG [ i, j and ΔS final [i,j]Respectively representing an initial saliency prediction image of a kth layer, a saliency prediction image with thinned edges of the kth layer, a color image, a depth image and a final saliency prediction result image after derivation, wherein |·| represents taking an absolute value, e is a constant, alpha is a fixed parameter, and->Defined as->To avoid the result being 0, E k And E is k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S final [i,j]、/>ΔS′ k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is k ′[i,j]Respectively representing the images Y, & lt, & gt>S′ k 、S final 、/>ΔS′ k ΔI, Δ G, E and E k Values at the ith row and jth column pixels of';
step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.
Compared with the prior art, the invention has the following beneficial effects: the method has the advantages of fully utilizing the advantages provided by the combination of the color image and the depth image, avoiding the problems caused by the depth image and realizing the weak supervision RGBD image saliency detection with better performance.
Drawings
Fig. 1 is a flow chart of an implementation of a preferred embodiment of the present invention.
FIG. 2 is an example of a set of RGBD images and their corresponding graffiti labels in a preferred embodiment of the present invention.
Fig. 3 is a diagram showing a network model structure in the preferred embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The invention provides a weak supervision RGBD image significance detection method based on edge detection assistance, which is shown in fig. 1 and comprises the following steps:
step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;
step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;
step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;
step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;
step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.
Further, the step S1 specifically includes the following steps:
step S11: dividing a data set into a training set and a testing set according to a certain proportion;
step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;
step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.
Further, the step S2 specifically includes the following steps:
step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->
Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then, the spliced features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3×3 convolution layer, channel attention, spatial attention and a 3×3 convolution layer connected in series. Finally, the fused features are reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the process is expressed as follows:
wherein the method comprises the steps ofRepresenting the initial saliency of the kth layerCharacteristic(s)>And->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) 1×1 A convolution layer representing a convolution kernel of 1;
step S23: designing an edge detection branch to obtain an edge characteristic E k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:
wherein E is k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF ' Cross-modal feature fusion module in edge detection branch, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
Step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:
wherein S is k Representing edge-refined saliency characteristics of the kth layer,and E is k The initial saliency feature and the edge feature of the kth layer are respectively represented, the value is represented by the splicing operation, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
Further, the step S3 specifically includes the following steps:
step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:
S final =σ(Conv 3×3 (H 1 ))
wherein H is k Representing the aggregate characteristics of the kth layer, S k Representing salient features of edge refinement of the kth layer, F up Representing upsampling, conv 3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S final Representing the final significance prediction result.
Further, the step S4 specifically includes the following steps:
step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;
step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:
where L represents the loss function of the final training, Σ represents the sum, k e 1, …, andpartial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->Smoothing losses, respectively, on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result, < >>Is the cross entropy loss acting on the k-th layer of the edge detection branch. And->The specific calculation formula of (2) is as follows:
S k ′=σ(S k )
E k ′=σ(E k )
where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S k And S is k ' respectively represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, Y represents the input graffiti label graph, U represents the graffiti region in the graffiti label graph YThe field (i, j) e U represents the pixels located in the graffiti area, log represents the log function, S final Representing the final significance prediction result map, delta representing the derivative,/->ΔS′ k 、ΔI[i,j]ΔG [ i, j and ΔS final [i,j]Respectively representing an initial saliency prediction image of a kth layer, a saliency prediction image with thinned edges of the kth layer, a color image, a depth image and a final saliency prediction result image after derivation, wherein |·| represents taking an absolute value, e is a constant, alpha is a fixed parameter, and->Defined as->To avoid the result being 0, E k And E is k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S′ k [i,j]、S final [i,j]、/>ΔS′ k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is k ′[i,j]Respectively representing the images Y, & lt, & gt>S′ k 、S final 、/>ΔS′ k ΔI, Δ G, E and E k Values at the ith row and jth column pixels of';
step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.
Claims (5)
1. The weak supervision RGBD image saliency detection method based on edge detection assistance is characterized by comprising the following steps of:
step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;
step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;
step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;
step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;
step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.
2. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S1 specifically comprises:
step S11: dividing a data set into a training set and a testing set according to a certain proportion;
step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;
step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.
3. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S2 specifically comprises:
step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->
Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then, the spliced features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3X 3 convolution layer, channel attention, space attention and a 3X 3 convolution layer which are connected in series; finally, the fused features are reduced to 1 dimension by a convolution layer with a convolution kernel of 1, and the process is usedThe formula is as follows:
wherein the method comprises the steps ofRepresenting the initial saliency feature of the kth layer, < >>And->Color image features and depth map features, respectively, of the kth layer->Representing splicing operation, F CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) 1×1 A convolution layer representing a convolution kernel of 1;
step S23: designing an edge detection branch to obtain an edge characteristic E k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:
wherein E is k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features, respectively, of the kth layer->Representing splicing operation, F CFF ' Cross-modal feature fusion module in edge detection branch, conv 1×1 A convolution layer representing a convolution kernel of 1;
step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:
wherein S is k Representing edge-refined saliency characteristics of the kth layer,and E is k Representing the initial saliency feature and the edge feature, respectively, of the kth layer,/->Representing splicing operations, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
4. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S3 specifically comprises:
step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:
S final =σ(Conv 3×3 (H 1 ))
wherein H is k Representing the aggregate characteristics of the kth layer, S k Representing salient features of edge refinement of the kth layer, F up Representing upsampling, conv 3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S final Representing the final significance prediction result.
5. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S4 specifically comprises:
step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;
step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:
where L represents the loss function of the final training, Σ represents the sum, k e 1, …, and->Partial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->The smoothing loss acting on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result respectively,is the cross entropy loss acting on the k-th layer of the edge detection branch; and->The specific calculation formula of (2) is as follows:
S k ′=σ(S k )
E k ′=σ(E k )
where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S k And S is k ' represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, respectively, Y represents the input graffiti annotation graph, U represents the graffiti region in the graffiti annotation graph Y, (i, j) e U represents the pixels located in the graffiti region, log represents the log function, S final Representing the final significance prediction result map, delta representing the derivative,/->ΔS′ k 、ΔI[i,j]、ΔG[i,j]And DeltaS final [i,j]Respectively represent the initial saliency prediction map of the kth layer and the edge refinement display of the kth layerThe graph after deriving the original predictive graph, the color image, the depth graph and the final significant predictive result graph, |·| represents taking absolute values, e is a constant, α is a fixed parameter, ++>Defined as->To avoid the result being 0, E k And E is k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S′ k [i,j]、S final [i,j]、/>ΔS′ k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is k ′[i,j]Respectively representing the images Y, & lt, & gt>S final 、/>ΔS′ k ΔI, Δ G, E and E k Values at the ith row and jth column pixels of';
step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211575959.9A CN116524207A (en) | 2022-12-08 | 2022-12-08 | Weak supervision RGBD image significance detection method based on edge detection assistance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211575959.9A CN116524207A (en) | 2022-12-08 | 2022-12-08 | Weak supervision RGBD image significance detection method based on edge detection assistance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116524207A true CN116524207A (en) | 2023-08-01 |
Family
ID=87401781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211575959.9A Pending CN116524207A (en) | 2022-12-08 | 2022-12-08 | Weak supervision RGBD image significance detection method based on edge detection assistance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116524207A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117173394A (en) * | 2023-08-07 | 2023-12-05 | 山东大学 | Weak supervision salient object detection method and system for unmanned aerial vehicle video data |
-
2022
- 2022-12-08 CN CN202211575959.9A patent/CN116524207A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117173394A (en) * | 2023-08-07 | 2023-12-05 | 山东大学 | Weak supervision salient object detection method and system for unmanned aerial vehicle video data |
CN117173394B (en) * | 2023-08-07 | 2024-04-02 | 山东大学 | Weak supervision salient object detection method and system for unmanned aerial vehicle video data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108829826B (en) | Image retrieval method based on deep learning and semantic segmentation | |
CN112541503B (en) | Real-time semantic segmentation method based on context attention mechanism and information fusion | |
CN112132156B (en) | Image saliency target detection method and system based on multi-depth feature fusion | |
Wang et al. | RGB-D salient object detection via minimum barrier distance transform and saliency fusion | |
CN109753913B (en) | Multi-mode video semantic segmentation method with high calculation efficiency | |
CN107239730B (en) | Quaternion deep neural network model method for intelligent automobile traffic sign recognition | |
CN112308860A (en) | Earth observation image semantic segmentation method based on self-supervision learning | |
CN111783622A (en) | Method, device and equipment for recognizing facial expressions and computer-readable storage medium | |
CN111696110B (en) | Scene segmentation method and system | |
CN109359527B (en) | Hair region extraction method and system based on neural network | |
CN112163498B (en) | Method for establishing pedestrian re-identification model with foreground guiding and texture focusing functions and application of method | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN111461006B (en) | Optical remote sensing image tower position detection method based on deep migration learning | |
CN109657538B (en) | Scene segmentation method and system based on context information guidance | |
CN110852327A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN112927209A (en) | CNN-based significance detection system and method | |
CN112070174A (en) | Text detection method in natural scene based on deep learning | |
Muthalagu et al. | Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks | |
JP2023131117A (en) | Joint perception model training, joint perception method, device, and medium | |
CN116524207A (en) | Weak supervision RGBD image significance detection method based on edge detection assistance | |
CN112149526A (en) | Lane line detection method and system based on long-distance information fusion | |
CN111108508A (en) | Facial emotion recognition method, intelligent device and computer-readable storage medium | |
CN113411550B (en) | Video coloring method, device, equipment and storage medium | |
CN113096133A (en) | Method for constructing semantic segmentation network based on attention mechanism | |
CN111242216A (en) | Image generation method for generating anti-convolution neural network based on conditions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |