CN110929736A - Multi-feature cascade RGB-D significance target detection method - Google Patents

Multi-feature cascade RGB-D significance target detection method Download PDF

Info

Publication number
CN110929736A
CN110929736A CN201911099871.2A CN201911099871A CN110929736A CN 110929736 A CN110929736 A CN 110929736A CN 201911099871 A CN201911099871 A CN 201911099871A CN 110929736 A CN110929736 A CN 110929736A
Authority
CN
China
Prior art keywords
depth
layer
rgb
branch
color
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911099871.2A
Other languages
Chinese (zh)
Other versions
CN110929736B (en
Inventor
周武杰
潘思佳
林鑫杨
黄铿达
雷景生
何成
王海江
薛林林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Science and Technology ZUST
Original Assignee
Zhejiang University of Science and Technology ZUST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Science and Technology ZUST filed Critical Zhejiang University of Science and Technology ZUST
Priority to CN201911099871.2A priority Critical patent/CN110929736B/en
Publication of CN110929736A publication Critical patent/CN110929736A/en
Application granted granted Critical
Publication of CN110929736B publication Critical patent/CN110929736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a method for detecting a characteristic cascade RGB-D significance target. Selecting RGB images, depth images corresponding to the RGB images and real significance images to form a training set, constructing a convolutional neural network which comprises two input layers, a hidden layer and an output layer, inputting the training set into the convolutional neural network for training to obtain significance prediction images corresponding to each RGB image in the training set, calculating loss function values between the significance prediction images corresponding to each RGB image in the training set and the corresponding real significance images, and continuously training weight vectors and bias items corresponding to the loss function values with minimum values; and inputting the RGB image and the depth image to be predicted into the trained convolutional neural network training model to obtain a predicted segmentation image. The model of the invention has novel structure, and the similarity rate of the saliency map and the target map obtained after model processing is high.

Description

Multi-feature cascade RGB-D significance target detection method
Technical Field
The invention relates to a method for detecting a human eye salient target, in particular to a method for detecting a multi-feature cascade RGB-D salient target.
Background
Salient object detection is a branch of image processing and is also an area of computer vision. Computer vision, in a broad sense, is the discipline that imparts natural visual capabilities to machines. Natural visual ability refers to the visual ability of the biological visual system. In fact, computer vision essentially addresses the problem of visual perception. The core problem is to study how to organize the input image information, identify objects and scenes, and further explain the image content.
Computer vision has been the subject of increasing interest and intense research over the last several decades. Computer vision is also increasingly being used to recognize patterns from images. Even with great play in various fields, with the dramatic achievements of artificial intelligence and computer vision technology becoming more and more prevalent in different industries, the future of computer vision seems to be replete with promising and unthinkable results. The detection of the salient objects referred to herein is one of the categories, but plays a great role.
Saliency detection is a method of predicting where a person is located in an image, and has attracted extensive research interest in recent years. It plays an important preprocessing role in image classification, image relocation, target recognition and other problems. Unlike RGB significance detection, RGB significance detection is less studied. According to the definition of significance, significance detection methods can be classified into top-down methods and bottom-up methods. Top-down saliency detection is a task-dependent detection method that incorporates high-level features to locate salient objects. On the other hand, the bottom-up approach is task-free, and it maps regions from a biological perspective using low-level features.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a multi-feature cascade RGB-D saliency target detection method, a saliency map obtained after model processing has high similarity with a target map, and the model has a novel structure.
The technical scheme adopted by the invention is as follows:
step 1_ 1: selecting Q original RGB images and depth maps corresponding to the original RGB images, and combining real significant images corresponding to the original RGB images to form a training set;
step 1_ 2: constructing a convolutional neural network: the convolutional neural network comprises two input layers, a hidden layer and an output layer, wherein the two input layers are connected to the input end of the hidden layer, and the output end of the hidden layer is connected to the output layer;
step 1_ 3: inputting each original RGB image in the training set and the depth image corresponding to the original RGB image in the training set as original input images of two input layers into a convolutional neural network for training to obtain a prediction significance image corresponding to each original RGB image in the training set; calculating a loss function value between a prediction significance image corresponding to each original RGB image in the training set and a corresponding real significance image, wherein the loss function value is obtained by adopting a BCE (binary coded decimal) loss function;
step 1_ 4: repeatedly executing the step 1_3 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then finding out the loss function value with the minimum value from the Q multiplied by V loss function values; then, the weight vector and the bias item corresponding to the loss function value with the minimum value are correspondingly used as the optimal weight vector and the optimal bias item, and the weight vector and the bias item in the trained convolutional neural network training model are replaced;
step 1_ 5: inputting the RGB image to be predicted and the depth image corresponding to the RGB image to be predicted into a trained convolutional neural network training model, and predicting by using the optimal weight vector and the optimal bias term to obtain a predicted saliency image corresponding to the RGB image to be predicted, thereby realizing saliency target detection.
In the two input layers in the step 1_2, the 1 st input layer is an RGB image input layer, and the 2 nd input layer is a depth image input layer; the hidden layer comprises an RGB (red, green and blue) feature extraction module, a depth feature extraction module, a mixed feature convolution layer, a detail information processing module, a global information processing module, an SKNet network model and a post-processing module;
the RGB feature extraction module comprises four color map neural network blocks, four color attention layers, eight color up-sampling layers, four attention convolution layers and four color convolution layers which are sequentially connected; the four color map neural network blocks which are connected in sequence respectively correspond to four modules which are connected in sequence in ResNet50, the output of the first color map neural network block is respectively connected to the first RGB branch and the fifth RGB branch, the output of the second color map neural network block is respectively connected to the second RGB branch and the sixth RGB branch, the output of the third color map neural network block is respectively connected to the third RGB branch and the seventh RGB branch, and the output of the fourth color map neural network block is respectively connected to the fourth RGB branch and the eighth RGB branch;
the depth feature extraction module comprises four depth map neural network blocks, four depth attention layers, eight depth upsampling layers, four attention convolutional layers and four depth convolutional layers which are sequentially connected, wherein the four depth map neural network blocks which are sequentially connected respectively correspond to the four modules which are sequentially connected in ResNet50, the output of the first depth map neural network block is respectively connected to a first depth branch and a fifth depth branch, the output of the second depth map neural network block is respectively connected to a second depth branch and a sixth depth branch, the output of the third depth map neural network block is respectively connected to a third depth branch and a seventh depth branch, and the output of the fourth depth map neural network block is respectively connected to the fourth depth branch and the eighth depth branch;
multiplying the outputs of the first RGB branch and the second RGB branch to serve as one input of the low-level feature convolution layer, and multiplying the outputs of the first depth branch and the second depth branch to serve as the other input of the low-level feature convolution layer; multiplying the outputs of the third RGB branch and the fourth RGB branch to be used as one input of the advanced feature convolution layer, and multiplying the outputs of the third depth branch and the fourth depth branch to be used as the other input of the advanced feature convolution layer;
the outputs of the low-level characteristic convolution layer and the high-level characteristic convolution layer are input into the mixed characteristic convolution layer;
the fusion result of the fifth RGB branch and the sixth RGB branch is multiplied by the fusion result of the fifth depth branch and the sixth depth branch and then input into a detail information processing module; the fusion result of the seventh RGB branch and the eighth RGB branch is multiplied by the fusion result of the seventh depth branch and the eighth depth branch and then input into the global information processing module;
the output of the mixed characteristic convolution layer and the output of the detail information processing module are fused and then serve as one input of the SKNet network model, and the output of the mixed characteristic convolution layer and the output of the global information processing module are fused and then serve as the other input of the SKNet network model;
the post-processing module comprises a first anti-convolution layer and a second anti-convolution layer which are sequentially connected, the input of the post-processing module is the output of the SKNet network model, and the output of the post-processing module is finally output through the output layer.
The first RGB branch comprises a first color attention layer, a first color up-sampling layer and a first attention convolution layer which are connected in sequence, the second RGB branch comprises a second color attention layer, a second color up-sampling layer and a second attention convolution layer which are connected in sequence, the third RGB branch comprises a third color attention layer, a third color up-sampling layer and a third attention convolution layer which are connected in sequence, and the fourth RGB branch comprises a fourth color attention layer, a fourth color up-sampling layer and a fourth attention convolution layer which are connected in sequence;
the fifth RGB branch comprises a first color convolution layer and a fifth color up-sampling layer which are sequentially connected, the sixth RGB branch comprises a second color convolution layer and a sixth color up-sampling layer which are sequentially connected, the seventh RGB branch comprises a third color convolution layer and a seventh color up-sampling layer which are sequentially connected, and the eighth RGB branch comprises a fourth color convolution layer and an eighth color up-sampling layer which are sequentially connected;
the first depth branch comprises a first depth attention layer, a first depth up-sampling layer and a fifth attention convolution layer which are connected in sequence, the second depth branch comprises a second depth attention layer, a second depth up-sampling layer and a sixth attention convolution layer which are connected in sequence, the third depth branch comprises a third depth attention layer, a third depth up-sampling layer and a seventh attention convolution layer which are connected in sequence, and the fourth depth branch comprises a fourth depth attention layer, a fourth depth up-sampling layer and an eighth attention convolution layer which are connected in sequence;
the fifth depth branch comprises a first depth convolution layer and a fifth depth upsampling layer which are sequentially connected, the sixth depth branch comprises a second depth convolution layer and a sixth depth upsampling layer which are sequentially connected, the seventh depth branch comprises a third depth convolution layer and a seventh depth upsampling layer which are sequentially connected, and the eighth depth branch comprises a fourth depth convolution layer and an eighth depth upsampling layer which are sequentially connected.
The detail information processing module comprises a first network module and a second over-convolution layer which are sequentially connected, and the input of the detail information processing module is fused with the output of the second over-convolution layer through the output of the first over-convolution layer and then is used as the output of the detail information processing module;
the global information processing module comprises three processing branches, the three processing branches comprise a global network module and a global convolution layer which are sequentially connected, and the outputs of the three processing branches are fused and then serve as the output of the global information processing module.
Each color attention layer and each depth attention layer adopt a CBAM (Convolutional module attention mechanism module), and each color up-sampling layer and each depth up-sampling layer are used for performing up-sampling processing of bilinear interpolation on input features; each of the attention convolution layer, the color convolution layer, the depth convolution layer, the low-level feature convolution layer, the high-level feature convolution layer, and the mixed-feature convolution layer includes one convolution layer; each of said deconvolution layers comprises a deconvolution;
the excessive convolutional layer in the detail information processing module and the global convolutional layer in the global information processing module both comprise one convolutional layer, the first network module in the detail information processing module adopts a Dense block in a DenseNet network, and each global network module in the global information processing module adopts an ASPP module (a cavity space pyramid pool module).
The input end of the RGB image input layer receives an RGB input image, and the input end of the depth image input layer receives a depth image corresponding to the RGB image; the input of the RGB feature extraction module and the input of the depth feature extraction module are respectively the output of an RGB image input layer and a depth image input layer.
Compared with the prior art, the invention has the advantages that:
1) the invention uses ResNet50 to pre-train RGB and depth maps respectively (converting the depth map into three-channel input), extracts the different results of the RGB and depth maps passing through 4 modules in ResNet50 respectively, and performs two different operations on the extracted results, wherein the detail optimization of high-low level features is performed through an attention mechanism, and the high-low level features are fused after a main network is formed and then transmitted into a later model.
2) The method extracts the feature information from the pre-training, divides the image features into high-level and low-level features, extracts the detail features of the image from the high-level and low-level features at the left part of the model, and fuses the high-level features and the low-level features after the operation respectively, so that the method has excellent effect.
3) Two novel modules are designed in the right side of the model of the invention: the first module adopts a module combining convolution and a Dense block, and fully combines the advantages of convolution and DenseNet, so that the detection result of the method is more detailed. The second module utilizes ASPP to enlarge the visual field and then is matched with convolution, so that the global feature collection is facilitated, and the detection result of the method is more comprehensive. And finally, performing overall fusion on the detail features on the left side of the fusion model.
Drawings
Fig. 1 is a block diagram of the overall implementation of the method of the present invention.
Fig. 2a is an original RGB image.
Fig. 2b is the depth image of fig. 2 a.
Fig. 3a is the true saliency detection image of fig. 2 a.
Fig. 3b is a significance prediction image obtained by the invention of fig. 2a and 2 b.
FIG. 4a shows the results of the Recall evaluation of the present invention.
FIG. 4b shows the results of the present invention on ROC.
FIG. 4c is the results of the present invention on MAE.
Detailed Description
The invention is described in further detail below with reference to the accompanying examples.
The overall implementation block diagram of the multi-feature cascade RGB-D significance target detection method provided by the invention is shown in FIG. 1, and the method comprises three processes, namely a training stage, a verification stage and a testing stage.
The specific steps of the training phase process are as follows:
step 1_ 1: selecting Q color real target images, corresponding depth images and saliency images corresponding to each color real target image, forming a training set, and recording the Q-th original object image in the training set as { I }q(i, j) }, depth image is noted as { Dq(I, j) }, the training set is summed with { I }q(i, j) } the corresponding true saliency image is noted as
Figure BDA0002269501490000051
Wherein, the color real target image is an RGB color image, the depth map is a binary gray scale map, Q is a positive integer, Q is more than or equal to 200, if Q is 1588, Q is a positive integer, Q is more than or equal to 1 and less than or equal to Q, I is more than or equal to 1 and less than or equal to W, j is more than or equal to 1 and less than or equal to H, W represents { I ≦ Hq(I, j) }, H denotes { I }q(I, j) } e.g. take W512, H512, Iq(I, j) represents { IqThe coordinate position in (i, j) is (i,j) the pixel value of the pixel point of (a),
Figure BDA0002269501490000052
to represent
Figure BDA0002269501490000053
The middle coordinate position is the pixel value of the pixel point of (i, j); here, 1588 images in the training set of the database NJU2000 are directly selected as the color real target image.
Step 1_ 2: constructing a convolutional neural network:
the convolutional neural network comprises an input layer, a hidden layer and an output layer;
the input layers include an RGB image input layer and a depth image input layer. For an RGB image input layer, an input end receives an R channel component, a G channel component and a B channel component of an original input image, and an output end of the input layer outputs the R channel component, the G channel component and the B channel component of the original input image to a hidden layer; wherein the input end of the input layer is required to receive the original input image with width W and height H. For a depth image input layer, an input end receives a depth image corresponding to an original input image, an output end of the input end outputs the original depth image, two channels are superposed by the original depth image to form a three-channel depth image, and three-channel components are sent to a hidden layer; wherein the input end of the input layer is required to receive the original input image with width W and height H.
The hidden layer comprises an RGB (red, green and blue) feature extraction module, a depth feature extraction module, a mixed feature convolution layer, a detail information processing module, a global information processing module, an SKNet network model and a post-processing module;
the RGB feature extraction module comprises four color map neural network blocks, four color attention layers, eight color up-sampling layers, four attention convolution layers and four color convolution layers which are sequentially connected. The four color map neural network blocks which are connected in sequence respectively correspond to the four modules which are connected in sequence in the ResNet 50. The 1 st color image neural network block, the 2 nd color image neural network, the 3 rd color image neural network and the 4 th color image neural network are respectively and sequentially corresponding to ResNet50The 4 modules adopt a pre-training method, and pre-training is carried out on the input image by utilizing the network of ResNet50 carried by the pytorech and the weight of the network. Outputting 256 characteristic graphs after passing through the 1 st color image neural network block, and recording a set consisting of the 256 output characteristic graphs as P1,P1Wherein each feature map has a width of
Figure BDA0002269501490000061
Has a height of
Figure BDA0002269501490000062
Outputting 512 feature maps after passing through the 2 nd color image neural network block, and recording a set of the 512 output feature maps as P2,P2Wherein each feature map has a width of
Figure BDA0002269501490000063
Has a height of
Figure BDA0002269501490000064
The image is output as 1024 characteristic graphs after passing through a 3 rd color image neural network block, and a set formed by the output 1024 characteristic graphs is marked as P3,P3Wherein each feature map has a width of
Figure BDA0002269501490000065
Has a height of
Figure BDA0002269501490000066
2048 feature maps are output after passing through a 4 th color image neural network block, and a set formed by the 2048 output feature maps is marked as P4,P4Each feature map has a width of
Figure BDA0002269501490000067
Has a height of
Figure BDA0002269501490000068
The depth feature extraction module comprises four depth map neural network blocks and four depths which are connected in sequenceAn attention layer, eight depth upsampling layers, four attention convolution layers, and four depth convolution layers. The four depth map neural network blocks connected in sequence respectively correspond to the four modules connected in sequence in ResNet 50. For the 1 st depth image neural network, the 2 nd depth image neural network, the 3 rd depth image neural network and the 4 th depth image neural network, 4 modules in ResNet50 respectively correspond in sequence, a pre-training method is adopted, and the input image is pre-trained by using a network of ResNet50 carried by a pytorch and the weight of the network. The images are output as 256 characteristic graphs after passing through a 1 st depth image neural network block, and a set formed by the output 256 characteristic graphs is recorded as D1,D1Wherein each feature map has a width of
Figure BDA0002269501490000069
Has a height of
Figure BDA00022695014900000610
Outputting 512 feature maps after passing through a 2 nd depth image neural network block, and recording a set formed by the 512 output feature maps as D2,D2Wherein each feature map has a width of
Figure BDA00022695014900000611
Has a height of
Figure BDA00022695014900000612
The images are output as 1024 characteristic graphs after passing through a 3 rd depth image neural network block, and a set formed by the output 1024 characteristic graphs is recorded as D3,D3Wherein each feature map has a width of
Figure BDA00022695014900000613
Has a height of
Figure BDA00022695014900000614
2048 feature maps are output after passing through a 4 th depth image neural network block, and a set formed by the 2048 output feature maps is recorded as D4,D4Wherein each feature map has a width of
Figure BDA0002269501490000071
Has a height of
Figure BDA0002269501490000072
The output of the first color map neural network block is respectively connected to the first RGB branch and the fifth RGB branch, the output of the second color map neural network block is respectively connected to the second RGB branch and the sixth RGB branch, the output of the third color map neural network block is respectively connected to the third RGB branch and the seventh RGB branch, and the output of the fourth color map neural network block is respectively connected to the fourth RGB branch and the eighth RGB branch. The first RGB branch comprises a first color attention layer, a first color up-sampling layer and a first attention convolution layer which are connected in sequence, the second RGB branch comprises a second color attention layer, a second color up-sampling layer and a second attention convolution layer which are connected in sequence, the third RGB branch comprises a third color attention layer, a third color up-sampling layer and a third attention convolution layer which are connected in sequence, and the fourth RGB branch comprises a fourth color attention layer, a fourth color up-sampling layer and a fourth attention convolution layer which are connected in sequence; the fifth RGB branch comprises a first color convolution layer and a fifth color up-sampling layer which are sequentially connected, the sixth RGB branch comprises a second color convolution layer and a sixth color up-sampling layer which are sequentially connected, the seventh RGB branch comprises a third color convolution layer and a seventh color up-sampling layer which are sequentially connected, and the eighth RGB branch comprises a fourth color convolution layer and an eighth color up-sampling layer which are sequentially connected;
the output of the first depth map neural network block is connected to the first depth branch and the fifth depth branch respectively, the output of the second depth map neural network block is connected to the second depth branch and the sixth depth branch respectively, the output of the third depth map neural network block is connected to the third depth branch and the seventh depth branch respectively, and the output of the fourth depth map neural network block is connected to the fourth depth branch and the eighth depth branch respectively. The first depth branch comprises a first depth attention layer, a first depth up-sampling layer and a fifth attention convolution layer which are connected in sequence, the second depth branch comprises a second depth attention layer, a second depth up-sampling layer and a sixth attention convolution layer which are connected in sequence, the third depth branch comprises a third depth attention layer, a third depth up-sampling layer and a seventh attention convolution layer which are connected in sequence, and the fourth depth branch comprises a fourth depth attention layer, a fourth depth up-sampling layer and an eighth attention convolution layer which are connected in sequence; the fifth depth branch comprises a first depth convolution layer and a fifth depth upsampling layer which are sequentially connected, the sixth depth branch comprises a second depth convolution layer and a sixth depth upsampling layer which are sequentially connected, the seventh depth branch comprises a third depth convolution layer and a seventh depth upsampling layer which are sequentially connected, and the eighth depth branch comprises a fourth depth convolution layer and an eighth depth upsampling layer which are sequentially connected.
Each color attention layer or depth attention layer is composed of a cbam (volumetric block attention module) module. For color attention layer 1, this operation did not change the map size and number of channels, and was still 256 feature maps. For the 2 nd color attention layer, this operation did not change the map size and number of channels, and was still 512 feature maps. For the 3 rd color attention layer, this operation did not change the map size and the number of channels, and still was 1024 feature maps. For the 4 th color attention layer, this operation did not change the map size and the number of channels, and was still 2048 feature maps. For the 1 st depth attention layer, this operation does not change the map size and the number of channels, and still is 256 feature maps. For the 2 nd depth attention layer, this operation does not change the map size and the number of channels, and is still 512 feature maps. For the 3 rd depth attention layer, this operation does not change the map size and the number of channels, and still is 1024 feature maps. For the 4 th depth attention layer, this operation does not change the map size and the number of channels, and still 2048 feature maps.
Each attention convolutional layer consists of one convolutional layer. For the 1 st attention convolution layer, the convolution kernel size is 3 × 3, the convolutionThe number of kernels is 256, zero padding parameter is 1, step length is 1, 256 characteristic graphs are output, and a set formed by the 256 characteristic graphs is marked as S1. For the 2 nd attention convolution layer, the size of convolution kernel is 5 × 5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, the output is 256 feature maps, and the set composed of 256 feature maps is recorded as S2. For the 3 rd attention convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 256, zero padding parameter is 1, step length is 1, output is 256 feature maps, and the set composed of 256 feature maps is recorded as S3. For the 4 th attention convolution layer, the size of convolution kernel is 5 × 5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, the output is 256 feature maps, and the set composed of 256 feature maps is recorded as S4. For the 5 th attention convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 256, zero padding parameter is 1, step size is 1, output is 256 feature maps, and the set composed of 256 feature maps is marked as G1. For the 6 th attention convolution layer, the size of convolution kernel is 5 × 5, the number of convolution kernels is 256, the zero padding parameter is 2, the step size is 1, the output is 256 feature maps, and the set composed of 256 feature maps is denoted as G. For the 7 th attention convolution layer, the size of convolution kernels is 3 × 3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature maps are output, and the set consisting of the 256 feature maps is marked as G3. For the 8 th attention convolution layer, the size of convolution kernel is 5 × 5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, the output is 256 feature maps, and the set composed of 256 feature maps is marked as G4
For the 1 st multiplication operation, S is1And S2Multiplying the results to output 256 feature maps, and designating a set of 256 feature maps as S1S2As one of the inputs to the low-level feature convolution layer. For the 2 nd multiplication operation, S3And S4Multiplying the results to output 256 feature maps, and designating a set of 256 feature maps as S3S4As another input to the low-level feature convolution layer. For the 3 rd multiplication operation, G1And G2Multiplying and outputting 256 characteristicsIn the figure, a set of 256 feature maps is denoted as G1G2As one of the inputs to the advanced feature convolution layer. For the 4 th multiplication operation, G3And G4Multiplying, outputting 256 characteristic diagrams, and recording the set of 256 characteristic diagrams as G3G4As another input to the advanced feature convolution layer.
Each color convolution layer or depth convolution layer is comprised of one convolution. For the 1 st color convolution layer, the convolution kernel size is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 2 nd color convolution layer, the convolution kernel size is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 3 rd color convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 4 th color convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 1 st depth convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 2 nd depth convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 3 rd depth convolution layer, the size of convolution kernel is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and the output is 512 feature maps. For the 4 th depth convolution layer, the size of convolution kernels is 3 × 3, the number of convolution kernels is 512, the zero padding parameter is 1, the step size is 1, and 512 feature maps are output.
Each color upsampling layer or depth upsampling layer is used for performing an upsampling process of bilinear interpolation on the input features.
For the 1 st color upsampling layer, set the output feature map width to
Figure BDA0002269501490000091
Has a height of
Figure BDA0002269501490000092
This operation does not change the number of feature maps. For the 2 nd color upsampling layer, set the output feature map width to
Figure BDA0002269501490000093
Has a height of
Figure BDA0002269501490000094
This operation does not change the number of feature maps. For the 3 rd color upsampling layer, set the output feature map width to
Figure BDA0002269501490000095
Has a height of
Figure BDA0002269501490000096
This operation does not change the number of feature maps. For the 4 th color upsampling layer, set the output feature map width to
Figure BDA0002269501490000097
Has a height of
Figure BDA0002269501490000098
This operation does not change the number of feature maps. For the 1 st depth up-sampling layer, setting the width of the output characteristic diagram to be
Figure BDA0002269501490000099
Has a height of
Figure BDA00022695014900000910
This operation does not change the number of feature maps. For the 2 nd depth up-sampling layer, setting the width of the output characteristic diagram to be
Figure BDA00022695014900000911
Has a height of
Figure BDA00022695014900000912
This operation does not change the number of feature maps. Setting the width of an output feature map to be 3 rd depth up-sampling layer
Figure BDA00022695014900000913
Has a height of
Figure BDA00022695014900000914
This operation does not change the number of feature maps. Setting the width of an output feature map to be
Figure BDA00022695014900000915
Has a height of
Figure BDA00022695014900000916
This operation does not change the number of feature maps.
For the 5 th color upsampling layer, set the output feature map width to
Figure BDA00022695014900000917
Has a height of
Figure BDA00022695014900000918
The operation does not change the number of the feature maps, 512 feature maps are output, and a set formed by the 512 feature maps is marked as U1. For the 6 th color upsampling layer, set the output feature map width to
Figure BDA00022695014900000919
Has a height of
Figure BDA00022695014900000920
The operation does not change the number of the feature maps, 512 feature maps are output, and a set formed by the 512 feature maps is marked as U2. Setting the output feature map width to be
Figure BDA00022695014900000921
Has a height of
Figure BDA00022695014900000922
The operation does not change the number of the feature maps, 512 feature maps are output, and a set formed by the 512 feature maps is marked as U3. For the 8 th color upsampling layer, set the output feature map width to
Figure BDA00022695014900000923
Has a height of
Figure BDA00022695014900000924
The operation does not change the number of the feature maps, 512 feature maps are output, and a set formed by the 512 feature maps is marked as U4. Setting the width of an output feature map to be 5 th depth up-sampling layer
Figure BDA00022695014900000925
Has a height of
Figure BDA00022695014900000926
This operation does not change the number of feature maps, 512 feature maps are output, and a set of 512 feature maps is denoted as F1. Setting the width of an output feature map to be 6 th depth up-sampling layer
Figure BDA00022695014900000927
Has a height of
Figure BDA00022695014900000928
This operation does not change the number of feature maps, 512 feature maps are output, and a set of 512 feature maps is denoted as F2. Setting the width of an output feature map to be 7 th depth up-sampling layer
Figure BDA00022695014900000929
Has a height of
Figure BDA00022695014900000930
This operation does not change the number of feature maps, 512 feature maps are output, and a set of 512 feature maps is denoted as F3. Setting the width of an output feature map to be 8 th depth up-sampling layer
Figure BDA0002269501490000101
Has a height of
Figure BDA0002269501490000102
This operation does not change the number of feature maps, 512 feature maps are output, and a set of 512 feature maps is denoted as F4
The outputs of both the low level feature convolutional layer and the high level feature convolutional layer are input into the hybrid feature convolutional layer. For the 1 st high-level feature convolution layer, the high-level feature convolution layer is composed of convolution, the size of convolution kernels is 3 multiplied by 3, the number of the convolution kernels is 256, zero padding parameters are 1, the step length is 1, and 256 feature maps are output; the high-level features of the RGB map and the depth map are fused. For the 1 st low-level feature convolution layer, the convolution layer is composed of a convolution, the size of convolution kernels is 3 multiplied by 3, the number of the convolution kernels is 256, zero padding parameters are 1, the step length is 1, and 256 feature maps are output; the low-level features of the RGB map and the depth map are fused. For the 1 st mixed feature convolution layer, the convolution layer is composed of a convolution, the convolution kernel size is 3 multiplied by 3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, the output is 256 feature maps, and the set composed of the 256 feature maps is recorded as X1
For the 5 th multiplication operation, U is added1And U2The result of addition and F1And F2The added results are multiplied, and 512 feature maps are output. For the 6 th multiplication operation, U is added3And U4The result of addition and F3And F4The added results are multiplied, and 512 feature maps are output. The fusion result of the fifth RGB branch and the sixth RGB branch is multiplied by the fusion result of the fifth depth branch and the sixth depth branch and then input into a detail information processing module; and the fusion result of the seventh RGB branch and the eighth RGB branch is multiplied by the fusion result of the seventh depth branch and the eighth depth branch and then input into the global information processing module.
The detail information processing module comprises a first network module and a second over-convolution layer which are sequentially connected, and the input of the detail information processing module is fused with the output of the second over-convolution layer through the output of the first over-convolution layer and then is used as the output of the detail information processing module. For the 1 st network module, the Dense block of the DenseNet network is used. Wherein the parameters are set as: the number of layers is 6, the size is 4, the number of increasing stages is 4, and the output is 536 characteristic graphs. For the 1 st over convolutionThe layer is composed of convolution, the size of convolution kernel is 3 multiplied by 3, the number of convolution kernels is 256, zero filling parameter is 1, step length is 1, output is 256 characteristic graphs, and a set composed of 256 characteristic graphs is recorded as H1. For the 2 nd over-convolution layer, the over-convolution layer is composed of one convolution, the size of convolution kernel is 3 multiplied by 3, the number of convolution kernels is 256, zero-filling parameter is 1, step length is 1, output is 256 characteristic diagrams, and the set composed of 256 characteristic diagrams is recorded as H2
The global information processing module comprises three processing branches, the three processing branches comprise a global network module and a global convolution layer which are sequentially connected, and the outputs of the three processing branches are fused and then serve as the output of the global information processing module. Each global network module is an aspp (advanced Spatial Pyramid farming) module. For the 1 st global network module, the output is 512 feature maps. For the 2 nd global network module, the output is 512 feature maps. For the 3 rd global network module, the output is 512 feature maps. Each global convolutional layer consists of one convolutional layer. For the 1 st global convolutional layer, the size of the convolutional kernel is 3 × 3, the number of the convolutional kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature maps are output, and a set formed by the 256 feature maps is recorded as E1. For the 2 nd global convolutional layer, the size of the convolutional kernel is 5 × 5, the number of the convolutional kernels is 256, the zero padding parameter is 2, the step length is 1, the output is 256 feature maps, and the set formed by the 256 feature maps is recorded as E2. For the 3 rd global convolutional layer, the size of the convolutional kernel is 7 × 7, the number of the convolutional kernels is 256, the zero padding parameter is 3, the step length is 1, the output is 256 feature maps, and the set consisting of the 256 feature maps is recorded as E3
The output of the mixed characteristic convolution layer and the output of the detail information processing module are fused and then serve as one input of the SKNet network model, and the output of the mixed characteristic convolution layer and the output of the global information processing module are fused and then serve as the other input of the SKNet network model. For the 1 st SKNet, the SKNet is composed of a Selective Kernel Networks, the SKNet network model has two inputs, the first input is H1、H2And X1Sum, second input E1、E2、E3And X1And the input parameters are: 256 characteristic diagrams with the width of the characteristic diagram
Figure BDA0002269501490000111
Has a height of
Figure BDA0002269501490000112
The output of this operation is still 256 feature maps, and the size of the maps is unchanged.
The post-processing module comprises a first anti-convolution layer and a second anti-convolution layer which are sequentially connected, the input of the post-processing module is the output of the SKNet network model, and the output of the post-processing module is finally output through the output layer. For the 1 st deconvolution layer, the deconvolution layer is composed of one deconvolution, the convolution kernel size of the deconvolution layer is 2 x 2, the number of convolution kernels is 128, the zero filling parameter is 0, the step length is 2, and the width of each feature map is
Figure BDA0002269501490000113
Has a height of
Figure BDA0002269501490000114
And for the 2 nd deconvolution layer, the deconvolution layer is formed by deconvolution, the convolution kernel size of the deconvolution layer is 2 x 2, the number of the convolution kernels is 1, the zero padding parameter is 0, the step length is 2, and the width and the height of each feature map are W and H.
Step 1_ 3: converting the conversion size of each original color real target image in the training set into 224 multiplied by 224 to be used as an original RGB input image, converting the conversion size of the depth image corresponding to each original color real target image in the training set into 224 multiplied by 224 and converting the depth image into a three-channel image to be used as a depth input image, inputting the three-channel image into ResNet50 for pre-training, and inputting the corresponding feature map into a model for training after pre-training. Obtaining a significance detection prediction image corresponding to each color real target image in the training set, and calculating the significance of the color real target image according to the significance detection prediction imageq(i, j) } the set of significance detection prediction maps corresponding to the (i, j) } is recorded as
Figure BDA0002269501490000115
Step 1_ 4: calculating loss function values between a set formed by a saliency detection prediction image corresponding to each original color real target image in a training set and a set formed by a coding image of a corresponding size processed by a corresponding real saliency detection image, and processing the loss function values
Figure BDA0002269501490000116
And
Figure BDA0002269501490000117
the value of the loss function in between is recorded as
Figure BDA0002269501490000118
Obtained using the BCE loss function.
Step 1_ 5: repeatedly executing the step 1_3 and the step 1_4 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then finding out the loss function value with the minimum value from the Q multiplied by V loss function values; and then, correspondingly taking the weight vector and the bias item corresponding to the loss function value with the minimum value as the optimal weight vector and the optimal bias item of the convolutional neural network classification training model, and correspondingly marking as WbestAnd bbest(ii) a Where V > 1, in this example, V is 100.
The specific steps of the test phase process of the embodiment are as follows:
step 2_ 1: order to
Figure BDA0002269501490000119
Representing a color real target image to be saliency detected,
Figure BDA00022695014900001110
representing a depth image corresponding to a real object to be subjected to saliency detection; wherein, i ' is more than or equal to 1 and less than or equal to W ', j ' is more than or equal to 1 and less than or equal to H ', and W ' represents
Figure BDA00022695014900001111
Width of (A), H' represents
Figure BDA0002269501490000121
The height of (a) of (b),
Figure BDA0002269501490000122
to represent
Figure BDA0002269501490000123
The middle coordinate position is the pixel value of the pixel point of (i, j),
Figure BDA0002269501490000124
to represent
Figure BDA0002269501490000125
And the middle coordinate position is the pixel value of the pixel point of (i, j).
Step 2_ 2: will be provided with
Figure BDA0002269501490000126
And R, G and B channel components and transformed
Figure BDA0002269501490000127
Inputting the three-channel components into a convolutional neural network classification training model and utilizing WbestAnd bbestMaking a prediction to obtain
Figure BDA0002269501490000128
And
Figure BDA0002269501490000129
corresponding predicted saliency detection image, denoted
Figure BDA00022695014900001210
Wherein the content of the first and second substances,
Figure BDA00022695014900001211
to represent
Figure BDA00022695014900001212
And the pixel value of the pixel point with the middle coordinate position of (i ', j').
To further verify the feasibility and effectiveness of the method of the invention, experiments were conducted on the method of the invention.
And (3) constructing a multi-scale residual convolutional neural network by using a python-based deep learning library Pytrch 4.0.1. The method adopts a real object image database NJU2000 test set to analyze how significant detection effect of real scene images (397 real object images) is obtained by prediction by the method. Here, the detection performance of the predicted significance detection image is evaluated by using 3 common objective parameters of the significance detection method as evaluation indexes, namely, a Precision Recall Curve (Precision Recall Curve), a working characteristic Curve (ROC), and a Mean Absolute Error (MAE).
The method of the invention is utilized to predict each real scene image in the real scene image database NJU2000 test set, and a prediction significance detection image corresponding to each real scene image is obtained.
FIG. 4a reflects the accuracy recall Curve (PR Curve) for significance detection effect of the method of the present invention, with the result Curve being as close to 1 as better.
FIG. 4b reflects the operating characteristic curve (ROC) of significance detection effect of the method of the present invention, with the result curve being as close to 1 as possible.
Fig. 4c reflects the Mean Absolute Error (MAE) of significance detection by the method of the present invention, with lower MAE results indicating better detection.
As can be seen from the figure, the significance detection result of the real scene image obtained by the method is very good, which shows that the method for obtaining the prediction significance detection image corresponding to the real scene image is feasible and effective.

Claims (7)

1. A multi-feature cascade RGB-D significance target detection method is characterized by comprising the following steps:
step 1_ 1: selecting Q original RGB images and depth maps corresponding to the original RGB images, and combining real significant images corresponding to the original RGB images to form a training set;
step 1_ 2: constructing a convolutional neural network: the convolutional neural network comprises two input layers, a hidden layer and an output layer, wherein the two input layers are connected to the input end of the hidden layer, and the output end of the hidden layer is connected to the output layer;
step 1_ 3: inputting each original RGB image in the training set and the depth image corresponding to the original RGB image in the training set as original input images of two input layers into a convolutional neural network for training to obtain a prediction significance image corresponding to each original RGB image in the training set; calculating a loss function value between a prediction significance image corresponding to each original RGB image in the training set and a corresponding real significance image, wherein the loss function value is obtained by adopting a BCE (binary coded decimal) loss function;
step 1_ 4: repeatedly executing the step 1_3 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then finding out the loss function value with the minimum value from the Q multiplied by V loss function values; then, the weight vector and the bias item corresponding to the loss function value with the minimum value are correspondingly used as the optimal weight vector and the optimal bias item, and the weight vector and the bias item in the trained convolutional neural network training model are replaced;
step 1_ 5: inputting the RGB image to be predicted and the depth image corresponding to the RGB image to be predicted into a trained convolutional neural network training model, and predicting by using the optimal weight vector and the optimal bias term to obtain a predicted saliency image corresponding to the RGB image to be predicted, thereby realizing saliency target detection.
2. The method as claimed in claim 1, wherein the step 1_2 includes two input layers, the 1 st input layer is an RGB image input layer, and the 2 nd input layer is a depth image input layer; the hidden layer comprises an RGB (red, green and blue) feature extraction module, a depth feature extraction module, a mixed feature convolution layer, a detail information processing module, a global information processing module, an SKNet network model and a post-processing module;
the RGB feature extraction module comprises four color map neural network blocks, four color attention layers, eight color up-sampling layers, four attention convolution layers and four color convolution layers which are sequentially connected; the four color map neural network blocks which are connected in sequence respectively correspond to four modules which are connected in sequence in ResNet50, the output of the first color map neural network block is respectively connected to the first RGB branch and the fifth RGB branch, the output of the second color map neural network block is respectively connected to the second RGB branch and the sixth RGB branch, the output of the third color map neural network block is respectively connected to the third RGB branch and the seventh RGB branch, and the output of the fourth color map neural network block is respectively connected to the fourth RGB branch and the eighth RGB branch;
the depth feature extraction module comprises four depth map neural network blocks, four depth attention layers, eight depth upsampling layers, four attention convolutional layers and four depth convolutional layers which are sequentially connected, wherein the four depth map neural network blocks which are sequentially connected respectively correspond to the four modules which are sequentially connected in ResNet50, the output of the first depth map neural network block is respectively connected to a first depth branch and a fifth depth branch, the output of the second depth map neural network block is respectively connected to a second depth branch and a sixth depth branch, the output of the third depth map neural network block is respectively connected to a third depth branch and a seventh depth branch, and the output of the fourth depth map neural network block is respectively connected to the fourth depth branch and the eighth depth branch;
multiplying the outputs of the first RGB branch and the second RGB branch to serve as one input of the low-level feature convolution layer, and multiplying the outputs of the first depth branch and the second depth branch to serve as the other input of the low-level feature convolution layer; multiplying the outputs of the third RGB branch and the fourth RGB branch to be used as one input of the advanced feature convolution layer, and multiplying the outputs of the third depth branch and the fourth depth branch to be used as the other input of the advanced feature convolution layer;
the outputs of the low-level characteristic convolution layer and the high-level characteristic convolution layer are input into the mixed characteristic convolution layer;
the fusion result of the fifth RGB branch and the sixth RGB branch is multiplied by the fusion result of the fifth depth branch and the sixth depth branch and then input into a detail information processing module; the fusion result of the seventh RGB branch and the eighth RGB branch is multiplied by the fusion result of the seventh depth branch and the eighth depth branch and then input into the global information processing module;
the output of the mixed characteristic convolution layer and the output of the detail information processing module are fused and then serve as one input of the SKNet network model, and the output of the mixed characteristic convolution layer and the output of the global information processing module are fused and then serve as the other input of the SKNet network model;
the post-processing module comprises a first anti-convolution layer and a second anti-convolution layer which are sequentially connected, the input of the post-processing module is the output of the SKNet network model, and the output of the post-processing module is finally output through the output layer.
3. The multi-feature concatenated RGB-D saliency target detection method of claim 2,
the first RGB branch comprises a first color attention layer, a first color up-sampling layer and a first attention convolution layer which are connected in sequence, the second RGB branch comprises a second color attention layer, a second color up-sampling layer and a second attention convolution layer which are connected in sequence, the third RGB branch comprises a third color attention layer, a third color up-sampling layer and a third attention convolution layer which are connected in sequence, and the fourth RGB branch comprises a fourth color attention layer, a fourth color up-sampling layer and a fourth attention convolution layer which are connected in sequence;
the fifth RGB branch comprises a first color convolution layer and a fifth color up-sampling layer which are sequentially connected, the sixth RGB branch comprises a second color convolution layer and a sixth color up-sampling layer which are sequentially connected, the seventh RGB branch comprises a third color convolution layer and a seventh color up-sampling layer which are sequentially connected, and the eighth RGB branch comprises a fourth color convolution layer and an eighth color up-sampling layer which are sequentially connected;
the first depth branch comprises a first depth attention layer, a first depth up-sampling layer and a fifth attention convolution layer which are connected in sequence, the second depth branch comprises a second depth attention layer, a second depth up-sampling layer and a sixth attention convolution layer which are connected in sequence, the third depth branch comprises a third depth attention layer, a third depth up-sampling layer and a seventh attention convolution layer which are connected in sequence, and the fourth depth branch comprises a fourth depth attention layer, a fourth depth up-sampling layer and an eighth attention convolution layer which are connected in sequence;
the fifth depth branch comprises a first depth convolution layer and a fifth depth upsampling layer which are sequentially connected, the sixth depth branch comprises a second depth convolution layer and a sixth depth upsampling layer which are sequentially connected, the seventh depth branch comprises a third depth convolution layer and a seventh depth upsampling layer which are sequentially connected, and the eighth depth branch comprises a fourth depth convolution layer and an eighth depth upsampling layer which are sequentially connected.
4. The multi-feature cascade RGB-D significance target detection method according to claim 2, wherein the detail information processing module comprises a first network module and a second over-convolution layer which are connected in sequence, and the input of the detail information processing module is fused with the output of the first over-convolution layer and the output of the second over-convolution layer to be used as the output of the detail information processing module;
the global information processing module comprises three processing branches, the three processing branches comprise a global network module and a global convolution layer which are sequentially connected, and the outputs of the three processing branches are fused and then serve as the output of the global information processing module.
5. The multi-feature cascade RGB-D saliency target detection method of claim 3, characterized in that each said color attention layer and depth attention layer uses CBAM module, each said color up-sampling layer and depth up-sampling layer are used for up-sampling processing of bilinear interpolation of input features; each of the attention convolution layer, the color convolution layer, the depth convolution layer, the low-level feature convolution layer, the high-level feature convolution layer, and the mixed-feature convolution layer includes one convolution layer; each of said deconvolution layers comprises a deconvolution.
6. The method as claimed in claim 4, wherein the hyper-convolutional layer in the detail information processing module and the global convolutional layer in the global information processing module each comprise a convolutional layer, the first network module in the detail information processing module uses a Dense block in a DenseNet network, and each global network module in the global information processing module uses an ASPP module.
7. The method as claimed in claim 2, wherein the input end of the RGB image input layer receives RGB input images, and the input end of the depth image input layer receives depth images corresponding to the RGB images; the input of the RGB feature extraction module and the input of the depth feature extraction module are respectively the output of an RGB image input layer and a depth image input layer.
CN201911099871.2A 2019-11-12 2019-11-12 Multi-feature cascading RGB-D significance target detection method Active CN110929736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911099871.2A CN110929736B (en) 2019-11-12 2019-11-12 Multi-feature cascading RGB-D significance target detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911099871.2A CN110929736B (en) 2019-11-12 2019-11-12 Multi-feature cascading RGB-D significance target detection method

Publications (2)

Publication Number Publication Date
CN110929736A true CN110929736A (en) 2020-03-27
CN110929736B CN110929736B (en) 2023-05-26

Family

ID=69852888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911099871.2A Active CN110929736B (en) 2019-11-12 2019-11-12 Multi-feature cascading RGB-D significance target detection method

Country Status (1)

Country Link
CN (1) CN110929736B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461043A (en) * 2020-04-07 2020-07-28 河北工业大学 Video significance detection method based on deep network
CN111666854A (en) * 2020-05-29 2020-09-15 武汉大学 High-resolution SAR image vehicle target detection method fusing statistical significance
CN111768375A (en) * 2020-06-24 2020-10-13 海南大学 Asymmetric GM multi-mode fusion significance detection method and system based on CWAM
CN111985552A (en) * 2020-08-17 2020-11-24 中国民航大学 Method for detecting diseases of thin strip-shaped structure of airport pavement under complex background
CN112330642A (en) * 2020-11-09 2021-02-05 山东师范大学 Pancreas image segmentation method and system based on double-input full convolution network
CN112507933A (en) * 2020-12-16 2021-03-16 南开大学 Saliency target detection method and system based on centralized information interaction
CN112528899A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112580694A (en) * 2020-12-01 2021-03-30 中国船舶重工集团公司第七0九研究所 Small sample image target identification method and system based on joint attention mechanism
CN112651406A (en) * 2020-12-18 2021-04-13 浙江大学 Depth perception and multi-mode automatic fusion RGB-D significance target detection method
CN113516022A (en) * 2021-04-23 2021-10-19 黑龙江机智通智能科技有限公司 Fine-grained classification system for cervical cells
CN114723951A (en) * 2022-06-08 2022-07-08 成都信息工程大学 Method for RGB-D image segmentation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181593A1 (en) * 2016-12-28 2018-06-28 Shutterstock, Inc. Identification of a salient portion of an image
CN109409435A (en) * 2018-11-01 2019-03-01 上海大学 A kind of depth perception conspicuousness detection method based on convolutional neural networks
CN109903276A (en) * 2019-02-23 2019-06-18 中国民航大学 Convolutional neural networks RGB-D conspicuousness detection method based on multilayer fusion
CN110263813A (en) * 2019-05-27 2019-09-20 浙江科技学院 A kind of conspicuousness detection method merged based on residual error network and depth information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181593A1 (en) * 2016-12-28 2018-06-28 Shutterstock, Inc. Identification of a salient portion of an image
CN109409435A (en) * 2018-11-01 2019-03-01 上海大学 A kind of depth perception conspicuousness detection method based on convolutional neural networks
CN109903276A (en) * 2019-02-23 2019-06-18 中国民航大学 Convolutional neural networks RGB-D conspicuousness detection method based on multilayer fusion
CN110263813A (en) * 2019-05-27 2019-09-20 浙江科技学院 A kind of conspicuousness detection method merged based on residual error network and depth information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QINGHUA REN ET AL.: "Multi-scale deep encoder-decoder network for salient object detection", 《NEUROCOMPUTING》 *
张松龙等: "基于级联全卷积神经网络的显著性检测", 《激光与光电子学进展》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461043B (en) * 2020-04-07 2023-04-18 河北工业大学 Video significance detection method based on deep network
CN111461043A (en) * 2020-04-07 2020-07-28 河北工业大学 Video significance detection method based on deep network
CN111666854A (en) * 2020-05-29 2020-09-15 武汉大学 High-resolution SAR image vehicle target detection method fusing statistical significance
CN111666854B (en) * 2020-05-29 2022-08-30 武汉大学 High-resolution SAR image vehicle target detection method fusing statistical significance
CN111768375B (en) * 2020-06-24 2022-07-26 海南大学 Asymmetric GM multi-mode fusion significance detection method and system based on CWAM
CN111768375A (en) * 2020-06-24 2020-10-13 海南大学 Asymmetric GM multi-mode fusion significance detection method and system based on CWAM
CN111985552A (en) * 2020-08-17 2020-11-24 中国民航大学 Method for detecting diseases of thin strip-shaped structure of airport pavement under complex background
CN112330642A (en) * 2020-11-09 2021-02-05 山东师范大学 Pancreas image segmentation method and system based on double-input full convolution network
CN112330642B (en) * 2020-11-09 2022-11-04 山东师范大学 Pancreas image segmentation method and system based on double-input full convolution network
CN112580694B (en) * 2020-12-01 2024-04-19 中国船舶重工集团公司第七0九研究所 Small sample image target recognition method and system based on joint attention mechanism
CN112580694A (en) * 2020-12-01 2021-03-30 中国船舶重工集团公司第七0九研究所 Small sample image target identification method and system based on joint attention mechanism
CN112507933B (en) * 2020-12-16 2022-09-16 南开大学 Saliency target detection method and system based on centralized information interaction
CN112507933A (en) * 2020-12-16 2021-03-16 南开大学 Saliency target detection method and system based on centralized information interaction
CN112528899B (en) * 2020-12-17 2022-04-12 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112528899A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112651406B (en) * 2020-12-18 2022-08-09 浙江大学 Depth perception and multi-mode automatic fusion RGB-D significance target detection method
CN112651406A (en) * 2020-12-18 2021-04-13 浙江大学 Depth perception and multi-mode automatic fusion RGB-D significance target detection method
CN113516022A (en) * 2021-04-23 2021-10-19 黑龙江机智通智能科技有限公司 Fine-grained classification system for cervical cells
CN113516022B (en) * 2021-04-23 2023-01-10 黑龙江机智通智能科技有限公司 Fine-grained classification system for cervical cells
CN114723951A (en) * 2022-06-08 2022-07-08 成都信息工程大学 Method for RGB-D image segmentation

Also Published As

Publication number Publication date
CN110929736B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
CN110929736B (en) Multi-feature cascading RGB-D significance target detection method
CN108537742B (en) Remote sensing image panchromatic sharpening method based on generation countermeasure network
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN108510532B (en) Optical and SAR image registration method based on deep convolution GAN
CN110782462B (en) Semantic segmentation method based on double-flow feature fusion
CN110728192B (en) High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN110175986B (en) Stereo image visual saliency detection method based on convolutional neural network
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
CN110555434A (en) method for detecting visual saliency of three-dimensional image through local contrast and global guidance
CN111563418A (en) Asymmetric multi-mode fusion significance detection method based on attention mechanism
CN112507997A (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN110458178B (en) Multi-mode multi-spliced RGB-D significance target detection method
CN110619638A (en) Multi-mode fusion significance detection method based on convolution block attention module
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN114638836B (en) Urban street view segmentation method based on highly effective driving and multi-level feature fusion
CN110532959B (en) Real-time violent behavior detection system based on two-channel three-dimensional convolutional neural network
CN111445432A (en) Image significance detection method based on information fusion convolutional neural network
CN112132739A (en) 3D reconstruction and human face posture normalization method, device, storage medium and equipment
CN115439694A (en) High-precision point cloud completion method and device based on deep learning
CN113269224A (en) Scene image classification method, system and storage medium
CN112149662A (en) Multi-mode fusion significance detection method based on expansion volume block
CN110570402B (en) Binocular salient object detection method based on boundary perception neural network
CN116797787A (en) Remote sensing image semantic segmentation method based on cross-modal fusion and graph neural network
CN111783862A (en) Three-dimensional significant object detection technology of multi-attention-directed neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant