CN116524207A - Weak supervision RGBD image significance detection method based on edge detection assistance - Google Patents

Weak supervision RGBD image significance detection method based on edge detection assistance Download PDF

Info

Publication number
CN116524207A
CN116524207A CN202211575959.9A CN202211575959A CN116524207A CN 116524207 A CN116524207 A CN 116524207A CN 202211575959 A CN202211575959 A CN 202211575959A CN 116524207 A CN116524207 A CN 116524207A
Authority
CN
China
Prior art keywords
edge
layer
saliency
representing
weak supervision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211575959.9A
Other languages
Chinese (zh)
Inventor
陈羽中
朱文婧
牛玉贞
杨立芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202211575959.9A priority Critical patent/CN116524207A/en
Publication of CN116524207A publication Critical patent/CN116524207A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention relates to a weak supervision RGBD image saliency detection method based on edge detection assistance, which comprises the following steps: step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement; step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network; step S3: designing a fusion module; step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function to optimize network parameters; step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result. By the technical scheme, weak supervision RGBD image saliency detection with good performance can be realized.

Description

Weak supervision RGBD image significance detection method based on edge detection assistance
Technical Field
The invention relates to the technical field of image processing and computer vision, in particular to a weak supervision RGBD image saliency detection method based on edge detection assistance.
Background
Saliency target detection is an important research content in the field of computer vision, and aims to simulate a human visual perception system to find the most attractive object in an image and divide the object at a pixel level. As a fundamental image processing problem, it plays a key role in tasks such as object detection, semantic segmentation, video tracking, and image understanding.
With the development of convolutional neural networks, a plurality of image saliency detection methods based on deep learning are proposed, and compared with the traditional methods, the methods have greatly improved performance. However, deep learning requires a large amount of training data to support, and the acquisition of pixel-by-pixel labeling labels required by the strong supervision saliency detection model is quite expensive, so that weak supervision image saliency detection has become a research direction actively explored by many students.
Weak supervision image saliency detection models incomplete weak level labels, and then deduces complete saliency targets by means of strong generalization capability of the model, wherein common weak level labels comprise noise labels, image level labels, bounding boxes, graffiti labels and the like. These low cost labels do not provide complete salient object structure details compared to pixel-by-pixel labeling labels, which presents a greater challenge for the saliency detection network model to recover fine salient object edge structures. Most of the current methods choose to introduce traditional unsupervised saliency detection methods, image classification tasks or edge detection tasks, etc. as aids, which are used to help determine the position and edge of the salient object. However, in some complex scenarios, the problem of edge localization, which is difficult to solve by strong supervision saliency detection, will become more difficult in weak supervision situations, depending on the color and texture features provided by the color image alone. The saliency detection of the weakly supervised RGBD image can improve the saliency target detection capability in a complex scene by introducing a depth map and taking rich structural information and position information contained in the depth map as supplements. But brings new problems such as cross-modal conflict between color image and depth map, rough edge problem of depth map, noise problem brought by low quality depth map, etc. at the same time of introducing depth map.
Disclosure of Invention
In view of the above, the present invention aims to provide a weak supervision RGBD image saliency detection method based on edge detection assistance, which can realize weak supervision RGBD image saliency detection with better performance.
In order to achieve the above purpose, the invention adopts the following technical scheme: the weak supervision RGBD image significance detection method based on edge detection assistance comprises the following steps:
step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;
step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;
step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;
step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;
step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.
In a preferred embodiment, the step S1 specifically includes:
step S11: dividing a data set into a training set and a testing set according to a certain proportion;
step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;
step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.
In a preferred embodiment, the step S2 specifically includes:
step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->
Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then splice theThe features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3×3 convolution layer, channel attention, spatial attention and a 3×3 convolution layer connected in series. Finally, the fused features are reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the process is expressed as follows:
wherein the method comprises the steps ofRepresenting the initial saliency feature of the kth layer, < >>And->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) 1×1 A convolution layer representing a convolution kernel of 1;
step S23: designing an edge detection branch to obtain an edge characteristic E k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:
wherein E is k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF ' Cross-modal feature fusion module in edge detection branch, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
Step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:
wherein S is k Representing edge-refined saliency characteristics of the kth layer,and E is k The initial saliency feature and the edge feature of the kth layer are respectively represented, the value is represented by the splicing operation, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
In a preferred embodiment, the step S3 specifically includes:
step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:
S final =σ(Conv 3×3 (H 1 ))
wherein H is k Representing the aggregate characteristics of the kth layer, S k Representing salient features of edge refinement of the kth layer, F up Representing upsampling, conv 3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S final Representing the final significance prediction result.
In a preferred embodiment, the step S4 is specifically:
step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;
step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:
where L represents the loss function of the final training, Σ represents the sum, k e 1, …, andpartial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->Smoothing losses, respectively, on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result, < >>Is the cross entropy loss acting on the k-th layer of the edge detection branch. And->The specific calculation formula of (2) is as follows:
S k ′=σ(S k )
E k ′=σ(E k )
where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S k And S is k ' represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, respectively, Y represents the input graffiti annotation graph, U represents the graffiti region in the graffiti annotation graph Y, (i, j) e U represents the pixels located in the graffiti region, log represents the log function, S final Representing the final significance prediction result map, delta representing the derivative,/->ΔI[i,j]ΔG [ i, j and ΔS final [i,j]Respectively representing an initial saliency prediction image of a kth layer, a saliency prediction image with thinned edges of the kth layer, a color image, a depth image and a final saliency prediction result image after derivation, wherein |·| represents taking an absolute value, e is a constant, alpha is a fixed parameter, and->Defined as->To avoid the result being 0, E k And E is k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S final [i,j]、/>ΔS′ k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is k ′[i,j]Respectively representing the images Y, & lt, & gt>S′ k 、S final 、/>ΔS′ k ΔI, Δ G, E and E k Values at the ith row and jth column pixels of';
step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.
Compared with the prior art, the invention has the following beneficial effects: the method has the advantages of fully utilizing the advantages provided by the combination of the color image and the depth image, avoiding the problems caused by the depth image and realizing the weak supervision RGBD image saliency detection with better performance.
Drawings
Fig. 1 is a flow chart of an implementation of a preferred embodiment of the present invention.
FIG. 2 is an example of a set of RGBD images and their corresponding graffiti labels in a preferred embodiment of the present invention.
Fig. 3 is a diagram showing a network model structure in the preferred embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application; as used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The invention provides a weak supervision RGBD image significance detection method based on edge detection assistance, which is shown in fig. 1 and comprises the following steps:
step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;
step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;
step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;
step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;
step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.
Further, the step S1 specifically includes the following steps:
step S11: dividing a data set into a training set and a testing set according to a certain proportion;
step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;
step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.
Further, the step S2 specifically includes the following steps:
step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->
Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then, the spliced features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3×3 convolution layer, channel attention, spatial attention and a 3×3 convolution layer connected in series. Finally, the fused features are reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the process is expressed as follows:
wherein the method comprises the steps ofRepresenting the initial saliency of the kth layerCharacteristic(s)>And->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) 1×1 A convolution layer representing a convolution kernel of 1;
step S23: designing an edge detection branch to obtain an edge characteristic E k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:
wherein E is k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features of the kth layer, respectively, with the third representing the stitching operation, F CFF ' Cross-modal feature fusion module in edge detection branch, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
Step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:
wherein S is k Representing edge-refined saliency characteristics of the kth layer,and E is k The initial saliency feature and the edge feature of the kth layer are respectively represented, the value is represented by the splicing operation, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
Further, the step S3 specifically includes the following steps:
step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:
S final =σ(Conv 3×3 (H 1 ))
wherein H is k Representing the aggregate characteristics of the kth layer, S k Representing salient features of edge refinement of the kth layer, F up Representing upsampling, conv 3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S final Representing the final significance prediction result.
Further, the step S4 specifically includes the following steps:
step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;
step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:
where L represents the loss function of the final training, Σ represents the sum, k e 1, …, andpartial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->Smoothing losses, respectively, on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result, < >>Is the cross entropy loss acting on the k-th layer of the edge detection branch. And->The specific calculation formula of (2) is as follows:
S k ′=σ(S k )
E k ′=σ(E k )
where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S k And S is k ' respectively represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, Y represents the input graffiti label graph, U represents the graffiti region in the graffiti label graph YThe field (i, j) e U represents the pixels located in the graffiti area, log represents the log function, S final Representing the final significance prediction result map, delta representing the derivative,/->ΔS′ k 、ΔI[i,j]ΔG [ i, j and ΔS final [i,j]Respectively representing an initial saliency prediction image of a kth layer, a saliency prediction image with thinned edges of the kth layer, a color image, a depth image and a final saliency prediction result image after derivation, wherein |·| represents taking an absolute value, e is a constant, alpha is a fixed parameter, and->Defined as->To avoid the result being 0, E k And E is k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S′ k [i,j]、S final [i,j]、/>ΔS′ k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is k ′[i,j]Respectively representing the images Y, & lt, & gt>S′ k 、S final 、/>ΔS′ k ΔI, Δ G, E and E k Values at the ith row and jth column pixels of';
step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.
The above is a preferred embodiment of the present invention, and all changes made according to the technical solution of the present invention belong to the protection scope of the present invention when the generated functional effects do not exceed the scope of the technical solution of the present invention.

Claims (5)

1. The weak supervision RGBD image saliency detection method based on edge detection assistance is characterized by comprising the following steps of:
step S1: establishing a weak supervision RGBD image saliency detection training set containing a graffiti annotation graph, and carrying out data enhancement;
step S2: designing a multi-level and multi-task weak supervision RGBD image saliency detection network, and obtaining a saliency prediction result of multi-scale edge refinement by using the network;
step S3: designing a fusion module, and fusing the multi-scale edge refined significance prediction result by using the fusion module to obtain a final significance prediction result;
step S4: designing a weak supervision RGBD image saliency detection network based on edge detection assistance, and designing a loss function optimization network parameter to obtain a trained weak supervision RGBD image saliency detection model based on edge detection assistance;
step S5: and inputting the RGBD image to be detected into a trained weak supervision RGBD image saliency detection model based on edge detection assistance, and obtaining a saliency detection result.
2. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S1 specifically comprises:
step S11: dividing a data set into a training set and a testing set according to a certain proportion;
step S12: for a training set, a painting tool is used for painting on each group of RGBD images in 'Adobe Photoshop 2020' software, specifically, black painting is used for painting part of the salient foreground area, white painting is used for painting part of the background area, and the non-painted area is represented by gray;
step S13: the images in the training set are subjected to data enhancement, and specific operations comprise adding noise, randomly cutting and overturning the images, and normalizing color images and depth maps of each group of RGBD images in the training set and the testing set to highlight a foreground region.
3. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S2 specifically comprises:
step S21: firstly, respectively inputting a color image and a depth image into two VGG16 networks, and then respectively taking 6 layers of features extracted by 5 convolution layers Conv1, conv2, conv3, conv4 and Conv5 and pooling layer Pool5 as multi-layer color image featuresAnd multi-level depth map feature->
Step S22: designing initial saliency prediction branches, and stitching color image features at each of 6 levelsAnd depth map feature->Then, the spliced features are sent to a cross-modal feature fusion module CFF to fuse the color image features and the depth map features; the cross-modal feature fusion module consists of a 3X 3 convolution layer, channel attention, space attention and a 3X 3 convolution layer which are connected in series; finally, the fused features are reduced to 1 dimension by a convolution layer with a convolution kernel of 1, and the process is usedThe formula is as follows:
wherein the method comprises the steps ofRepresenting the initial saliency feature of the kth layer, < >>And->Color image features and depth map features, respectively, of the kth layer->Representing splicing operation, F CFF Cross-modal feature fusion module in branch representing initial significance prediction (Conv) 1×1 A convolution layer representing a convolution kernel of 1;
step S23: designing an edge detection branch to obtain an edge characteristic E k The procedure of (1) is the same as the initial significance prediction branch, and the formula is as follows:
wherein E is k Representing the edge characteristics of the k-th layer,and->Color image features and depth map features, respectively, of the kth layer->Representing splicing operation, F CFF ' Cross-modal feature fusion module in edge detection branch, conv 1×1 A convolution layer representing a convolution kernel of 1;
step S24: designing an edge refinement significance prediction module; first concatenating the initial saliency features at each of the 6 levelsAnd edge feature E k Then the dimension of the splicing characteristic is reduced to 1 dimension through a convolution layer with a convolution kernel of 1, and the formula is as follows:
wherein S is k Representing edge-refined saliency characteristics of the kth layer,and E is k Representing the initial saliency feature and the edge feature, respectively, of the kth layer,/->Representing splicing operations, conv 1×1 Representing a convolution layer with a convolution kernel of 1.
4. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S3 specifically comprises:
step S31: designing a fusion module; the fusion module is designed, deep features are integrated into shallow features layer by layer, and the specific process is expressed as follows:
S final =σ(Conv 3×3 (H 1 ))
wherein H is k Representing the aggregate characteristics of the kth layer, S k Representing salient features of edge refinement of the kth layer, F up Representing upsampling, conv 3×3 Representing a convolution layer with a convolution kernel of 3, σ represents a Sigmoid activation function, S final Representing the final significance prediction result.
5. The weak supervision RGBD image saliency detection method based on the edge detection assistance according to claim 1, wherein the step S4 specifically comprises:
step S41: combining the multi-level and multi-task weak supervision RGBD image saliency detection network designed in the step S2 and the fusion module designed in the step S3 to obtain a weak supervision RGBD image saliency detection network based on edge detection assistance;
step S42: the loss function of the weak supervision RGBD image saliency detection network based on edge detection assistance is designed as follows:
where L represents the loss function of the final training, Σ represents the sum, k e 1, …, and->Partial cross entropy loss acting on initial significance prediction branch k-th layer, edge refinement significance prediction module k-th layer and final significance prediction result respectively,/> And->The smoothing loss acting on the initial significance prediction branch k-th layer, the edge refinement significance prediction module k-th layer and the final significance prediction result respectively,is the cross entropy loss acting on the k-th layer of the edge detection branch; and->The specific calculation formula of (2) is as follows:
S k ′=σ(S k )
E k ′=σ(E k )
where σ represents the Sigmoid activation function,and->Respectively representing the initial saliency characteristics of the kth layer and the initial saliency prediction diagram of the kth layer in the initial saliency prediction branch, S k And S is k ' represents the edge refined saliency feature of the kth layer and the edge refined saliency prediction graph of the kth layer in the edge refined saliency prediction module, respectively, Y represents the input graffiti annotation graph, U represents the graffiti region in the graffiti annotation graph Y, (i, j) e U represents the pixels located in the graffiti region, log represents the log function, S final Representing the final significance prediction result map, delta representing the derivative,/->ΔS′ k 、ΔI[i,j]、ΔG[i,j]And DeltaS final [i,j]Respectively represent the initial saliency prediction map of the kth layer and the edge refinement display of the kth layerThe graph after deriving the original predictive graph, the color image, the depth graph and the final significant predictive result graph, |·| represents taking absolute values, e is a constant, α is a fixed parameter, ++>Defined as->To avoid the result being 0, E k And E is k ' represents the edge feature of the kth layer and the edge map of the kth layer in the edge detection branch, respectively, E represents the input edge map, [ i, j ]]Pixels representing the ith and jth rows, Y [ i, j ] of the image]、/>S′ k [i,j]、S final [i,j]、/>ΔS′ k 、ΔI[i,j]、ΔG[i,j]、E[i,j]And E is k ′[i,j]Respectively representing the images Y, & lt, & gt>S final 、/>ΔS′ k ΔI, Δ G, E and E k Values at the ith row and jth column pixels of';
step S43: repeating the steps S2 to S4 by taking the batch as a unit until the loss function value calculated in the step S4 converges and tends to be stable, saving network parameters, and completing the training process of the weak supervision RGBD image saliency detection network based on the edge detection assistance to obtain the weak supervision RGBD image saliency detection model based on the edge detection assistance.
CN202211575959.9A 2022-12-08 2022-12-08 Weak supervision RGBD image significance detection method based on edge detection assistance Pending CN116524207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211575959.9A CN116524207A (en) 2022-12-08 2022-12-08 Weak supervision RGBD image significance detection method based on edge detection assistance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211575959.9A CN116524207A (en) 2022-12-08 2022-12-08 Weak supervision RGBD image significance detection method based on edge detection assistance

Publications (1)

Publication Number Publication Date
CN116524207A true CN116524207A (en) 2023-08-01

Family

ID=87401781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211575959.9A Pending CN116524207A (en) 2022-12-08 2022-12-08 Weak supervision RGBD image significance detection method based on edge detection assistance

Country Status (1)

Country Link
CN (1) CN116524207A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173394A (en) * 2023-08-07 2023-12-05 山东大学 Weak supervision salient object detection method and system for unmanned aerial vehicle video data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173394A (en) * 2023-08-07 2023-12-05 山东大学 Weak supervision salient object detection method and system for unmanned aerial vehicle video data
CN117173394B (en) * 2023-08-07 2024-04-02 山东大学 Weak supervision salient object detection method and system for unmanned aerial vehicle video data

Similar Documents

Publication Publication Date Title
CN108829826B (en) Image retrieval method based on deep learning and semantic segmentation
CN112541503B (en) Real-time semantic segmentation method based on context attention mechanism and information fusion
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
Wang et al. RGB-D salient object detection via minimum barrier distance transform and saliency fusion
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN107239730B (en) Quaternion deep neural network model method for intelligent automobile traffic sign recognition
CN112308860A (en) Earth observation image semantic segmentation method based on self-supervision learning
CN111783622A (en) Method, device and equipment for recognizing facial expressions and computer-readable storage medium
CN111696110B (en) Scene segmentation method and system
CN109359527B (en) Hair region extraction method and system based on neural network
CN112163498B (en) Method for establishing pedestrian re-identification model with foreground guiding and texture focusing functions and application of method
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
CN111461006B (en) Optical remote sensing image tower position detection method based on deep migration learning
CN109657538B (en) Scene segmentation method and system based on context information guidance
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN112927209A (en) CNN-based significance detection system and method
CN112070174A (en) Text detection method in natural scene based on deep learning
Muthalagu et al. Vehicle lane markings segmentation and keypoint determination using deep convolutional neural networks
JP2023131117A (en) Joint perception model training, joint perception method, device, and medium
CN116524207A (en) Weak supervision RGBD image significance detection method based on edge detection assistance
CN112149526A (en) Lane line detection method and system based on long-distance information fusion
CN111108508A (en) Facial emotion recognition method, intelligent device and computer-readable storage medium
CN113411550B (en) Video coloring method, device, equipment and storage medium
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN111242216A (en) Image generation method for generating anti-convolution neural network based on conditions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination