CN110929736B - Multi-feature cascading RGB-D significance target detection method - Google Patents
Multi-feature cascading RGB-D significance target detection method Download PDFInfo
- Publication number
- CN110929736B CN110929736B CN201911099871.2A CN201911099871A CN110929736B CN 110929736 B CN110929736 B CN 110929736B CN 201911099871 A CN201911099871 A CN 201911099871A CN 110929736 B CN110929736 B CN 110929736B
- Authority
- CN
- China
- Prior art keywords
- depth
- layer
- rgb
- branch
- color
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-feature cascading RGB-D significance target detection method. Selecting RGB images and corresponding depth images and true significance images thereof to form a training set, constructing a convolutional neural network, and inputting the training set into the convolutional neural network to train to obtain significance prediction images corresponding to each RGB image in the training set, calculating loss function values between the significance prediction images corresponding to each RGB image in the training set and the corresponding true significance images, and continuously training weight vectors and bias items corresponding to the loss function values with the minimum values; and inputting the RGB image and the depth image to be predicted into a trained convolutional neural network training model to obtain a prediction segmentation image. The model disclosed by the invention is novel in structure, and the similarity of the obtained salient map and the target map is high after model processing.
Description
Technical Field
The invention relates to a human eye saliency target detection method, in particular to a multi-feature cascading RGB-D saliency target detection method.
Background
Saliency target detection is a branch of image processing and is also a field of computer vision. Computer vision, in a broad sense, is the discipline that imparts natural vision capabilities to machines. Natural vision ability refers to the visual ability that the biological vision system embodies. In fact, computer vision is essentially the research of visual perception problems. The key problem is to study how to organize the input image information, identify objects and scenes, and further explain the image content.
Computer vision has been the subject of increasing interest and rigorous research in recent decades. Computer vision is also increasingly better at recognizing patterns from images. Even in various fields, as the striking achievements of artificial intelligence and computer vision technology become more common in different industries, the future of computer vision appears to be filled with desirable and inconceivable results. The salient object detection referred to herein is one of the classifications, but plays a great role.
Significance detection is a method for predicting the position of a person in an image, and has attracted extensive research interest in recent years. It plays an important role in preprocessing in the problems of image classification, image repositioning, target recognition, etc. Unlike RGB saliency detection, RGB saliency detection is less studied. Saliency detection methods can be classified into top-down methods and bottom-up methods according to the definition of saliency. Top-down saliency detection is a task-dependent detection method that incorporates high-level features to locate salient objects. On the other hand, the bottom-up approach is dead, which uses low-level features to map out regions from a biological perspective.
Disclosure of Invention
In order to solve the problems in the background technology, the invention provides a multi-feature cascading RGB-D saliency target detection method, the similarity of a saliency map obtained after model processing and a target map is high, and the model structure is novel.
The technical scheme adopted by the invention is as follows, comprising the following steps:
step 1_1: q original RGB images and corresponding depth maps thereof are selected, and a training set is formed by combining the true significance images corresponding to the original RGB images;
step 1_2: constructing a convolutional neural network: the convolutional neural network comprises two input layers, a hidden layer and an output layer, wherein the two input layers are connected to the input end of the hidden layer, and the output end of the hidden layer is connected to the output layer;
step 1_3: each original RGB image and the corresponding depth image in the training set are respectively used as the original input images of the two input layers and are input into a convolutional neural network for training, so that a prediction significance image corresponding to each original RGB image in the training set is obtained; calculating a loss function value between a predicted saliency image corresponding to each original RGB image in the training set and a corresponding real saliency image, wherein the loss function value is obtained by adopting a BCE loss function;
Step 1_4: repeating the step 1_3 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then find out the smallest value of loss function value from Q X V pieces of loss function values; then, the weight vector and the bias term corresponding to the loss function value with the minimum value are correspondingly used as the optimal weight vector and the optimal bias term, and the weight vector and the bias term in the trained convolutional neural network training model are replaced;
step 1_5: inputting the RGB image to be predicted and the depth image corresponding to the RGB image to be predicted into a trained convolutional neural network training model, and predicting by utilizing an optimal weight vector and an optimal bias term to obtain a predicted saliency image corresponding to the RGB image to be predicted, thereby realizing saliency target detection.
Two input layers in the step 1_2, wherein the 1 st input layer is an RGB image input layer, and the 2 nd input layer is a depth image input layer; the hidden layer comprises an RGB feature extraction module, a depth feature extraction module, a mixed feature convolution layer, a detail information processing module, a global information processing module, a SKNet network model and a post-processing module;
the RGB feature extraction module comprises four color map neural network blocks, four color attention layers, eight color upsampling layers, four attention convolution layers and four color convolution layers which are connected in sequence; the four sequentially connected color map neural network blocks correspond to the four sequentially connected modules in the ResNet50 respectively, the output of the first color map neural network block is connected to the first RGB branch and the fifth RGB branch respectively, the output of the second color map neural network block is connected to the second RGB branch and the sixth RGB branch respectively, the output of the third color map neural network block is connected to the third RGB branch and the seventh RGB branch respectively, and the output of the fourth color map neural network block is connected to the fourth RGB branch and the eighth RGB branch respectively;
The depth feature extraction module comprises four depth map neural network blocks, four depth attention layers, eight depth up-sampling layers, four attention convolution layers and four depth convolution layers which are sequentially connected, wherein the four depth map neural network blocks are respectively corresponding to the four modules which are sequentially connected in the ResNet50, the output of the first depth map neural network block is respectively connected to the first depth branch and the fifth depth branch, the output of the second depth map neural network block is respectively connected to the second depth branch and the sixth depth branch, the output of the third depth map neural network block is respectively connected to the third depth branch and the seventh depth branch, and the output of the fourth depth map neural network block is respectively connected to the fourth depth branch and the eighth depth branch;
the outputs of the first RGB branch and the second RGB branch are multiplied to be used as one input of the low-level characteristic convolution layer, and the outputs of the first depth branch and the second depth branch are multiplied to be used as the other input of the low-level characteristic convolution layer; the outputs of the third RGB branch and the fourth RGB branch are multiplied to be used as one input of the advanced characteristic convolution layer, and the outputs of the third depth branch and the fourth depth branch are multiplied to be used as the other input of the advanced characteristic convolution layer;
The outputs of the low-level characteristic convolution layer and the high-level characteristic convolution layer are input into the mixed characteristic convolution layer;
the fusion result of the fifth RGB branch and the sixth RGB branch is multiplied by the fusion result of the fifth depth branch and the sixth depth branch and then input into a detail information processing module; the fusion result of the seventh RGB branch and the eighth RGB branch is multiplied by the fusion result of the seventh depth branch and the eighth depth branch and then input into the global information processing module;
the output of the mixed characteristic convolution layer and the output of the detail information processing module are fused and then used as one input of the SKNet network model, and the output of the mixed characteristic convolution layer and the output of the global information processing module are fused and then used as the other input of the SKNet network model;
the post-processing module comprises a first deconvolution layer and a second deconvolution layer which are sequentially connected, wherein the input of the post-processing module is the output of the SKNet network model, and the output of the post-processing module is finally output through the output layer.
The first RGB branch comprises a first color attention layer, a first color upsampling layer and a first attention convolution layer which are sequentially connected, the second RGB branch comprises a second color attention layer, a second color upsampling layer and a second attention convolution layer which are sequentially connected, the third RGB branch comprises a third color attention layer, a third color upsampling layer and a third attention convolution layer which are sequentially connected, and the fourth RGB branch comprises a fourth color attention layer, a fourth color upsampling layer and a fourth attention convolution layer which are sequentially connected;
The fifth RGB branch comprises a first color convolution layer and a fifth color up-sampling layer which are sequentially connected, the sixth RGB branch comprises a second color convolution layer and a sixth color up-sampling layer which are sequentially connected, the seventh RGB branch comprises a third color convolution layer and a seventh color up-sampling layer which are sequentially connected, and the eighth RGB branch comprises a fourth color convolution layer and an eighth color up-sampling layer which are sequentially connected;
the first depth branch comprises a first depth attention layer, a first depth upsampling layer and a fifth attention convolution layer which are sequentially connected, the second depth branch comprises a second depth attention layer, a second depth upsampling layer and a sixth attention convolution layer which are sequentially connected, the third depth branch comprises a third depth attention layer, a third depth upsampling layer and a seventh attention convolution layer which are sequentially connected, and the fourth depth branch comprises a fourth depth attention layer, a fourth depth upsampling layer and an eighth attention convolution layer which are sequentially connected;
the fifth depth branch comprises a first depth convolution layer and a fifth depth upsampling layer which are sequentially connected, the sixth depth branch comprises a second depth convolution layer and a sixth depth upsampling layer which are sequentially connected, the seventh depth branch comprises a third depth convolution layer and a seventh depth upsampling layer which are sequentially connected, and the eighth depth branch comprises a fourth depth convolution layer and an eighth depth upsampling layer which are sequentially connected.
The detail information processing module comprises a first network module and a second excessive convolution layer which are sequentially connected, wherein the input of the detail information processing module is fused with the output of the second excessive convolution layer through the output of the first excessive convolution layer and then used as the output of the detail information processing module;
the global information processing module comprises three processing branches, wherein the three processing branches comprise a global network module and a global convolution layer which are sequentially connected, and the outputs of the three processing branches are fused and then used as the output of the global information processing module.
Each color attention layer and each depth attention layer adopt a CBAM module (attention mechanism module of a convolution module), and each color upsampling layer and each depth upsampling layer are used for upsampling processing of bilinear interpolation of input features; each attention convolution layer, color convolution layer, depth convolution layer, low-level feature convolution layer, high-level feature convolution layer and mixed feature convolution layer comprises a convolution layer; each of said deconvolution layers comprising a deconvolution;
the overconvolution layer in the detail information processing module and the global convolution layer in the global information processing module comprise one convolution layer, a first network module in the detail information processing module adopts a Dense block in a DenseNet network, and each global network module in the global information processing module adopts an ASPP module (space pyramid pool module).
The input end of the RGB image input layer receives an RGB input image, and the input end of the depth image input layer receives a depth image corresponding to the RGB image; the input of the RGB feature extraction module and the depth feature extraction module is the output of an RGB image input layer and a depth image input layer respectively.
Compared with the prior art, the invention has the advantages that:
1) The invention uses ResNet50 to respectively pretrain RGB image and depth image (change depth image into three-channel input), respectively extracts different results of RGB image and depth image passing through 4 modules in ResNet50, and carries out two different operations on the extracted results, firstly, carries out detail optimization of high-low level characteristics through an attention mechanism, secondly, fuses the high-low level characteristics after forming a main network, and then transmits the fused high-low level characteristics into a later model.
2) The invention extracts the characteristic information from the pre-training, divides the image characteristic into two types of high-level and low-level characteristics, extracts the detail characteristic of the image from the high-level and low-level characteristics at the left part of the model, and fuses the high-level characteristic and the low-level characteristic after the operation respectively, so that the mode has excellent effect.
3) Two novel modules are designed in the right side of the model: the first module adopts a module combining convolution and a Dense block, and fully combines the advantages of convolution and DenseNet, so that the detection result of the method is finer. The second module utilizes ASPP to expand the field of view and then matches with convolution, which is beneficial to the collection of global features, so that the detection result of the method is more comprehensive. And finally, carrying out overall fusion on the detail characteristics at the left side of the fusion model.
Drawings
Fig. 1 is a block diagram of a general implementation of the method of the present invention.
Fig. 2a is an original RGB image.
Fig. 2b is the depth image of fig. 2 a.
Fig. 3a is a true saliency detection image of fig. 2 a.
Fig. 3b is a predicted saliency image of fig. 2a and 2b according to the present invention.
FIG. 4a shows the results of the present invention on a Recall evaluation.
Fig. 4b shows the results of the present invention on ROC.
FIG. 4c shows the results of the present invention on MAE.
Detailed Description
The invention is described in further detail below with reference to the embodiments of the drawings.
The invention provides a multi-feature cascading RGB-D significance target detection method, the general implementation block diagram of which is shown in figure 1, and the method comprises three processes of a training stage, a verification stage and a testing stage.
The training phase process comprises the following specific steps:
step 1_1: q color real target images and corresponding depth images are selected, a training set is formed by the corresponding saliency images of each color real target image, and the Q-th original object image in the training set is recorded as { I } q (i, j) } the depth image is denoted as { D } q (I, j) } and { I } in the training set q The true saliency image corresponding to (i, j) is recorded asWherein the color real target image is an RGB color image, the depth image is a binary gray scale image, Q is a positive integer, Q is more than or equal to 200, if Q=1588 is taken, Q is a positive integer, Q is more than or equal to 1 and less than or equal to Q, I is more than or equal to 1 and less than or equal to W, j is more than or equal to 1 and less than or equal to H, and W represents { I } q Width of (I, j) }, H represents { I }, and q height of (i, j), e.g. w=512, h=512, i q (I, j) represents { I } q Pixel value of pixel point with coordinate position (i, j) in (i, j), +.>Representation->Pixel values of the pixel points with the middle coordinate positions (i, j); here, the color real target image directly selects the database NJU2000 training set1588 images of (3).
Step 1_2: constructing a convolutional neural network:
the convolutional neural network comprises an input layer, a hidden layer and an output layer;
the input layers include an RGB image input layer and a depth image input layer. For an RGB image input layer, an input end receives an R channel component, a G channel component and a B channel component of an original input image, and an output end of the input layer outputs the R channel component, the G channel component and the B channel component of the original input image to a hidden layer; wherein the width of the original input image received by the input end of the required input layer is W, and the height is H. For a depth image input layer, an input end receives a depth image corresponding to an original input image, an output end of the input end outputs the original depth image, the original depth image is changed into a three-channel depth image through superposition of two channels, and three-channel components are given to a hidden layer; wherein the width of the original input image received by the input end of the required input layer is W, and the height is H.
The hidden layer comprises an RGB feature extraction module, a depth feature extraction module, a mixed feature convolution layer, a detail information processing module, a global information processing module, a SKNet network model and a post-processing module;
the RGB feature extraction module comprises four color map neural network blocks, four color attention layers, eight color upsampling layers, four attention convolution layers and four color convolution layers which are connected in sequence. The four color map neural network blocks connected in sequence correspond to the four modules connected in sequence in the ResNet50 respectively. For the 1 st color image neural network block, the 2 nd color image neural network, the 3 rd color image neural network and the 4 th color image neural network, 4 modules in the ResNet50 are respectively corresponding in sequence, a pretraining method is adopted, and the input image is pretrained by utilizing the network of the ResNet50 of the pytorch and the weight thereof. The output is 256 feature images after passing through the 1 st color image neural network block, and the set formed by the 256 feature images is marked as P 1 ,P 1 The width of each characteristic diagram isHeight is +.>The obtained image is output as 512 feature images after passing through the 2 nd color image neural network block, and the set formed by the output 512 feature images is marked as P 2 ,P 2 The width of each feature map in (a) is +.>Height is +.>The output is 1024 feature images after passing through the 3 rd color image neural network block, and the set formed by the 1024 feature images is marked as P 3 ,P 3 The width of each feature map in (a) is +.>Height is +.>The result is output as 2048 feature images after passing through the 4 th color image neural network block, and the set formed by the 2048 feature images is marked as P 4 ,P 4 The width of each feature map is +.>Height is +.>
The depth feature extraction module comprises four depth map neural network blocks, four depth attention layers, eight depth upsampling layers, four attention convolution layers and four depth convolution layers which are connected in sequence. The four depth map neural network blocks connected in sequence correspond to the four modules connected in sequence in the ResNet50 respectively. For the 1 st depth image neural network, the 2 nd depth image neural network, the 3 rd depth image neural network and the 4 th depth image neural network, 4 ResNet50 are respectively corresponding in sequenceThe module adopts a pretraining method, and pretrains the input image by utilizing the network of ResNet50 of the pyrach and the weight thereof. The output is 256 feature images after passing through the 1 st depth image neural network block, and the set formed by the 256 feature images is marked as D 1 ,D 1 The width of each characteristic diagram isHeight is +.>The obtained images are output into 512 feature images after passing through a 2 nd depth image neural network block, and a set formed by the output 512 feature images is marked as D 2 ,D 2 The width of each feature map in (a) is +.>Height is +.>The 1024 feature images are output after passing through the 3 rd depth image neural network block, and the set formed by the 1024 feature images is marked as D 3 ,D 3 The width of each feature map in (a) is +.>Height is +.>The image is output as 2048 feature images after passing through the 4 th depth image neural network block, and a set formed by the 2048 feature images is marked as D 4 ,D 4 The width of each feature map in (a) is +.>Height is +.>
The output of the first color map neural network block is connected to the first RGB branch and the fifth RGB branch, respectively, the output of the second color map neural network block is connected to the second RGB branch and the sixth RGB branch, respectively, the output of the third color map neural network block is connected to the third RGB branch and the seventh RGB branch, respectively, and the output of the fourth color map neural network block is connected to the fourth RGB branch and the eighth RGB branch, respectively. The first RGB branch comprises a first color attention layer, a first color upsampling layer and a first attention convolution layer which are sequentially connected, the second RGB branch comprises a second color attention layer, a second color upsampling layer and a second attention convolution layer which are sequentially connected, the third RGB branch comprises a third color attention layer, a third color upsampling layer and a third attention convolution layer which are sequentially connected, and the fourth RGB branch comprises a fourth color attention layer, a fourth color upsampling layer and a fourth attention convolution layer which are sequentially connected; the fifth RGB branch comprises a first color convolution layer and a fifth color up-sampling layer which are sequentially connected, the sixth RGB branch comprises a second color convolution layer and a sixth color up-sampling layer which are sequentially connected, the seventh RGB branch comprises a third color convolution layer and a seventh color up-sampling layer which are sequentially connected, and the eighth RGB branch comprises a fourth color convolution layer and an eighth color up-sampling layer which are sequentially connected;
The output of the first depth map neural network block is connected to the first depth branch and the fifth depth branch respectively, the output of the second depth map neural network block is connected to the second depth branch and the sixth depth branch respectively, the output of the third depth map neural network block is connected to the third depth branch and the seventh depth branch respectively, and the output of the fourth depth map neural network block is connected to the fourth depth branch and the eighth depth branch respectively. The first depth branch comprises a first depth attention layer, a first depth upsampling layer and a fifth attention convolution layer which are sequentially connected, the second depth branch comprises a second depth attention layer, a second depth upsampling layer and a sixth attention convolution layer which are sequentially connected, the third depth branch comprises a third depth attention layer, a third depth upsampling layer and a seventh attention convolution layer which are sequentially connected, and the fourth depth branch comprises a fourth depth attention layer, a fourth depth upsampling layer and an eighth attention convolution layer which are sequentially connected; the fifth depth branch comprises a first depth convolution layer and a fifth depth upsampling layer which are sequentially connected, the sixth depth branch comprises a second depth convolution layer and a sixth depth upsampling layer which are sequentially connected, the seventh depth branch comprises a third depth convolution layer and a seventh depth upsampling layer which are sequentially connected, and the eighth depth branch comprises a fourth depth convolution layer and an eighth depth upsampling layer which are sequentially connected.
Each color attention layer or depth attention layer is composed of one CBAM (Convolutional Block Attention Module) module. For the 1 st color attention layer, this operation does not change the graph size and channel number, and is still 256 feature graphs. For the 2 nd color attention layer, this operation did not change the graph size and channel number, still 512 feature graphs. For the 3 rd color attention layer, this operation does not change the graph size and channel number, still 1024 feature graphs. For the 4 th color attention layer, this operation does not change the graph size and channel number, still 2048 feature graphs. For the 1 st depth attention layer, this operation does not change the graph size and channel number, and is still 256 feature graphs. For the 2 nd depth attention layer, this operation did not change the graph size and channel number, still 512 feature graphs. For the 3 rd depth attention layer, this operation does not change the graph size and channel number, still 1024 feature graphs. For the 4 th depth attention layer, this operation does not change the graph size and channel number, still 2048 feature graphs.
Each attention convolution layer is composed of one convolution layer. For the 1 st attention convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature images are output, and a set formed by the 256 feature images is denoted as S 1 . For the 2 nd attention convolution layer, the convolution kernel size is 5×5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, 256 feature images are output, and the set formed by the 256 feature images is denoted as S 2 . For a pair ofIn the 3 rd attention convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature images are output, and a set formed by the 256 feature images is denoted as S 3 . For the 4 th attention convolution layer, the convolution kernel size is 5×5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, 256 feature images are output, and the set formed by the 256 feature images is denoted as S 4 . For the 5 th attention convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature images are output, and the set formed by the 256 feature images is denoted as G 1 . For the 6 th attention convolution layer, the convolution kernel size is 5×5, the number of convolution kernels is 256, the zero padding parameter is 2, the step size is 1, 256 feature images are output, and the set formed by the 256 feature images is denoted as G. For the 7 th attention convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature images are output, and the set formed by the 256 feature images is denoted as G 3 . For the 8 th attention convolution layer, the convolution kernel size is 5×5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, 256 feature images are output, and the set formed by the 256 feature images is denoted as G 4 。
For the 1 st multiplication operation, S is 1 And S is 2 The multiplication is carried out, 256 feature images are output, and a set formed by the 256 feature images is denoted as S 1 S 2 As one of the inputs to the low-level feature convolution layer. For the 2 nd multiplication operation, S is 3 And S is 4 The multiplication is carried out, 256 feature images are output, and a set formed by the 256 feature images is denoted as S 3 S 4 As another input to the low-level feature convolution layer. For the 3 rd multiplication operation, G will be 1 And G 2 The multiplication is carried out, 256 feature images are output, and a set formed by the 256 feature images is denoted as G 1 G 2 As one of the inputs to the advanced feature convolution layer. For the 4 th multiplication operation, G will be 3 And G 4 The multiplication is carried out, 256 feature images are output, and a set formed by the 256 feature images is denoted as G 3 G 4 As another input to the advanced feature convolution layer.
Each color convolution layer or depth convolution layer is composed of one convolution. For the 1 st color convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 2 nd color convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 3 rd color convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 4 th color convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 1 st depth convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 2 nd depth convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 3 rd depth convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output. For the 4 th depth convolution layer, the convolution kernel size is 3×3, the number of convolution kernels is 512, the zero padding parameter is 1, the step length is 1, and 512 feature images are output.
Each color upsampling layer or depth upsampling layer is used for upsampling processing of bilinear interpolation of the input features.
For the 1 st color up-sampling layer, the width of the output characteristic diagram is set to beHeight is +.>This operation does not change the feature map number. For the 2 nd color up-sampling layer, the width of the output feature map is set to be +.>Height is +.>This operation does not change the feature map number. For the 3 rd color upsampling layer, the output feature map width is set to +.>Height is +.>This operation does not change the feature map number. For the 4 th color up-sampling layer, the width of the output feature map is set to be +.>Height is +.>This operation does not change the feature map number. For the 1 st depth up-sampling layer, the width of the output feature map is set to be +.>Height is +.>This operation does not change the feature map number. For the 2 nd depth up-sampling layer, the width of the output feature map is set to be +.>Height is +.>This operation does not change the feature map number. For the 3 rd depth upsampling layer, the output feature map width is set to +.>Height is +.>This operation does not change the feature map number. For the 4 th depth up-sampling layer, setting the width of the output characteristic diagramDegree is->Height is +.>This operation does not change the feature map number.
For the 5 th color up-sampling layer, the width of the output characteristic diagram is set to be Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and the set formed by the 512 feature images is marked as U 1 . For the 6 th color up-sampling layer, the width of the output feature map is set to be +.>Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and the set formed by the 512 feature images is marked as U 2 . For the 7 th color upsampling layer, the output feature map width is set to +.>Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and the set formed by the 512 feature images is marked as U 3 . For the 8 th color up-sampling layer, the width of the output feature map is set to +.>Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and the set formed by the 512 feature images is marked as U 4 . For the 5 th depth upsampling layer, the output feature map width is set to +.>Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and marks the set formed by 512 feature images as F 1 . For the 6 th depth upsampling layer, the output feature map width is set to +.>Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and marks the set formed by 512 feature images as F 2 . For the 7 th depth upsampling layer, the output feature map width is set to +. >Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and marks the set formed by 512 feature images as F 3 . For the 8 th depth upsampling layer, the output feature map width is set to +.>Height is +.>The operation does not change the number of the feature images, outputs 512 feature images, and marks the set formed by 512 feature images as F 4 。
Low-level feature volumeThe outputs of both the laminate and advanced feature convolution layers are input into the hybrid feature convolution layer. For the 1 st advanced characteristic convolution layer, the convolution layer consists of one convolution, the convolution kernel size is 3 multiplied by 3, the number of the convolution kernels is 256, the zero padding parameter is 1, the step length is 1, and 256 characteristic images are output; the high-level features of the RGB map and the depth map are fused. For the 1 st low-level characteristic convolution layer, the convolution layer consists of a convolution, the convolution kernel size is 3 multiplied by 3, the number of the convolution kernels is 256, zero padding parameters are 1, the step length is 1, and 256 characteristic diagrams are output; the low-level features of the RGB map and the depth map are fused. For the 1 st mixed characteristic convolution layer, the convolution layer consists of one convolution, the convolution kernel size is 3 multiplied by 3, the number of the convolution kernels is 256, zero padding parameters are 1, the step length is 1, 256 characteristic images are output, and a set formed by the 256 characteristic images is marked as X 1 。
For the 5 th multiplication operation, U is 1 And U 2 The result after addition is compared with F 1 And F 2 The added results are multiplied and output as 512 feature maps. For the 6 th multiplication operation, U is 3 And U 4 The result after addition is compared with F 3 And F 4 The added results are multiplied and output as 512 feature maps. The fusion result of the fifth RGB branch and the sixth RGB branch is multiplied by the fusion result of the fifth depth branch and the sixth depth branch and then input into a detail information processing module; the fusion result of the seventh RGB branch and the eighth RGB branch is multiplied by the fusion result of the seventh depth branch and the eighth depth branch and then input into the global information processing module.
The detail information processing module comprises a first network module and a second excessive convolution layer which are connected in sequence, and the input of the detail information processing module is fused with the output of the second excessive convolution layer through the output of the first excessive convolution layer and then is used as the output of the detail information processing module. For the 1 st network module, a Dense block of the DenseNet network is used. Wherein the parameters are set as follows: the number of layers is 6, the size is 4, the number of steps is increased by 4, and 536 characteristic diagrams are output. For the 1 st excessive convolution layer, the method consists of one convolution, wherein the convolution kernel size is 3 multiplied by 3, the number of convolution kernels is 256, zero padding parameters are 1, step length is 1, 256 feature images are output, and the 256 feature images are formed The set is denoted as H 1 . For the 2 nd excessive convolution layer, the convolution kernel is formed by one convolution, the size of the convolution kernel is 3 multiplied by 3, the number of the convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature images are output, and a set formed by the 256 feature images is marked as H 2 。
The global information processing module comprises three processing branches, wherein the three processing branches comprise a global network module and a global convolution layer which are sequentially connected, and the outputs of the three processing branches are fused and then used as the output of the global information processing module. Each global network module adopts a ASPP (Atrous Spatial Pyramid Pooling) module. For the 1 st global network module, the output is 512 feature graphs. For the 2 nd global network module, the output is 512 feature graphs. For the 3 rd global network module, the output is 512 feature maps. Each global convolution layer is made up of one convolution layer. For the 1 st global convolution layer, the convolution kernel size is 3 multiplied by 3, the number of convolution kernels is 256, the zero padding parameter is 1, the step length is 1, 256 feature images are output, and a set formed by the 256 feature images is marked as E 1 . For the 2 nd global convolution layer, the convolution kernel size is 5×5, the number of convolution kernels is 256, the zero padding parameter is 2, the step length is 1, 256 feature images are output, and a set formed by the 256 feature images is denoted as E 2 . For the 3 rd global convolution layer, the convolution kernel size is 7 multiplied by 7, the number of convolution kernels is 256, the zero padding parameter is 3, the step length is 1, 256 feature images are output, and a set formed by the 256 feature images is marked as E 3 。
The output of the mixed characteristic convolution layer and the output of the detail information processing module are fused and then used as one input of the SKNet network model, and the output of the mixed characteristic convolution layer and the output of the global information processing module are fused and then used as the other input of the SKNet network model. For the 1 st SKNet, consisting of one Selective Kernel Networks, the SKNet network model has two inputs, the first input being H 1 、H 2 And X is 1 The sum of the second inputs is E 1 、E 2 、E 3 And X is 1 The sum of the input parameters is: 256 feature maps with width ofHeight is +.>The operation output is still 256 feature maps, and the map size is unchanged.
The post-processing module comprises a first deconvolution layer and a second deconvolution layer which are sequentially connected, wherein the input of the post-processing module is the output of the SKNet network model, and the output of the post-processing module is finally output through the output layer. For the 1 st deconvolution layer, the deconvolution layer consists of a deconvolution layer, the convolution kernel size of the deconvolution layer is 2 multiplied by 2, the number of convolution kernels is 128, the zero padding parameter is 0, the step length is 2, and the width of each feature graph is Height is +.>The 2 nd deconvolution layer consists of a deconvolution layer, the convolution kernel size of the deconvolution layer is 2 multiplied by 2, the number of the convolution kernels is 1, the zero padding parameter is 0, the step length is 2, and the width of each feature map is W and the height is H.
Step 1_3: the conversion size of each original color real target image in the training set is changed into 224 multiplied by 224 to be used as an original RGB input image, the conversion size of the depth image corresponding to each original color real target image in the training set is changed into 224 multiplied by 224 and is changed into a three-channel image to be used as a depth input image, the depth input image is input into the ResNet50 for pre-training, and the corresponding feature images are input into the model for training after the pre-training. Obtaining a significance detection prediction graph corresponding to each color real target image in the training set, and carrying out { I } q The set of significance detection prediction graphs corresponding to (i, j) is denoted as
Step 1_4: computing a set and a correspondence of saliency detection predictive graphs corresponding to each original color real target image in a training setIs processed into a set of encoded images of corresponding size, the loss function value between the set of encoded images willAnd->The loss function value between them is recorded as Obtained using BCE loss function.
Step 1_5: repeating the step 1_3 and the step 1_4 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then find out the smallest value of loss function value from Q X V pieces of loss function values; then, the weight vector and the bias term corresponding to the loss function value with the minimum value are correspondingly used as the optimal weight vector and the optimal bias term of the convolutional neural network classification training model, and correspondingly marked as W best And b best The method comprises the steps of carrying out a first treatment on the surface of the Where V > 1, v=100 in this example.
The specific steps of the test phase process of the embodiment are as follows:
step 2_1: order theRepresenting a colored real target image to be detected for saliency, a color real target image>Representing a depth image corresponding to a real object to be detected for significance; wherein, 1.ltoreq.i '. Ltoreq.W ', 1.ltoreq.j '. Ltoreq.H ', W ' represents +.>Is H' represents ∈>Height of->Representation->The pixel value of the pixel point with the middle coordinate position of (i, j),representation->The pixel value of the pixel point whose middle coordinate position is (i, j).
Step 2_2: will beR channel component, G channel component, and B channel component, and convertedIs input into a convolutional neural network classification training model and utilizes W best And b best Predicting to obtain->And->Corresponding predictive saliency detection image, noted +.>Wherein (1)>Representation->And the pixel value of the pixel point with the middle coordinate position of (i ', j').
To further verify the feasibility and effectiveness of the method of the invention, experiments were performed.
Architecture of multi-scale residual convolutional neural network was built using python-based deep learning library pytorch4.0.1. The method adopts an NJU2000 test set of a real object image database to analyze how the significance detection effect of the real scene image (397 real object images) predicted by the method is achieved. Here, the detection performance of the predicted saliency detection image is evaluated using 3 commonly used objective parameters of the estimated saliency detection method as evaluation indexes, namely, a class accuracy recall curve (Precision Recall Curve), a work feature curve (ROC), and an average absolute error (Mean Absolute Error, MAE).
The method is used for predicting each real scene image in the real scene image database NJU2000 test set to obtain a prediction saliency detection image corresponding to each real scene image.
FIG. 4a reflects the accuracy recall Curve (PR Curve) of the significance detection effect of the method of the present invention, with the result Curve being better the closer to 1.
Fig. 4b reflects the operating characteristic curve (ROC) of the significance test effect of the method of the present invention, with the result curve being better the closer to 1.
FIG. 4c reflects the Mean Absolute Error (MAE) of the significance test effect of the method of the present invention, with lower MAE results representing better test effect.
The figure shows that the significance detection result of the real scene image obtained by the method is very good, which indicates that the method for obtaining the predicted significance detection image corresponding to the real scene image is feasible and effective.
Claims (6)
1. The multi-feature cascading RGB-D significance target detection method is characterized by comprising the following steps of:
step 1_1: q original RGB images and corresponding depth images thereof are selected, and a training set is formed by combining the true significance images corresponding to the original RGB images;
step 1_2: constructing a convolutional neural network: the convolutional neural network comprises two input layers, a hidden layer and an output layer, wherein the two input layers are connected to the input end of the hidden layer, and the output end of the hidden layer is connected to the output layer;
step 1_3: each original RGB image and the corresponding depth image in the training set are respectively used as the original input images of the two input layers and are input into a convolutional neural network for training, so that a prediction significance image corresponding to each original RGB image in the training set is obtained; calculating a loss function value between a predicted saliency image corresponding to each original RGB image in the training set and a corresponding real saliency image, wherein the loss function value is obtained by adopting a BCE loss function;
Step 1_4: repeating the step 1_3 for V times to obtain a convolutional neural network classification training model, and obtaining Q multiplied by V loss function values; then find out the smallest value of loss function value from Q X V pieces of loss function values; then, the weight vector and the bias term corresponding to the loss function value with the minimum value are correspondingly used as the optimal weight vector and the optimal bias term, and the weight vector and the bias term in the trained convolutional neural network classification training model are replaced;
step 1_5: inputting the RGB image to be predicted and the depth image corresponding to the RGB image to be predicted into a trained convolutional neural network classification training model, and predicting by utilizing an optimal weight vector and an optimal bias term to obtain a predicted saliency image corresponding to the RGB image to be predicted, thereby realizing saliency target detection;
two input layers in the step 1_2, wherein the 1 st input layer is an RGB image input layer, and the 2 nd input layer is a depth image input layer; the hidden layer comprises an RGB feature extraction module, a depth feature extraction module, a mixed feature convolution layer, a detail information processing module, a global information processing module, a SKNet network model and a post-processing module;
the RGB feature extraction module comprises four color map neural network blocks, four color attention layers, eight color upsampling layers, four attention convolution layers and four color convolution layers which are connected in sequence; the four sequentially connected color map neural network blocks correspond to the four sequentially connected modules in the ResNet50 respectively, the output of the first color map neural network block is connected to the first RGB branch and the fifth RGB branch respectively, the output of the second color map neural network block is connected to the second RGB branch and the sixth RGB branch respectively, the output of the third color map neural network block is connected to the third RGB branch and the seventh RGB branch respectively, and the output of the fourth color map neural network block is connected to the fourth RGB branch and the eighth RGB branch respectively;
The depth feature extraction module comprises four depth map neural network blocks, four depth attention layers, eight depth up-sampling layers, four attention convolution layers and four depth convolution layers which are sequentially connected, wherein the four depth map neural network blocks are respectively corresponding to the four modules which are sequentially connected in the ResNet50, the output of the first depth map neural network block is respectively connected to the first depth branch and the fifth depth branch, the output of the second depth map neural network block is respectively connected to the second depth branch and the sixth depth branch, the output of the third depth map neural network block is respectively connected to the third depth branch and the seventh depth branch, and the output of the fourth depth map neural network block is respectively connected to the fourth depth branch and the eighth depth branch;
the outputs of the first RGB branch and the second RGB branch are multiplied to be used as one input of the low-level characteristic convolution layer, and the outputs of the first depth branch and the second depth branch are multiplied to be used as the other input of the low-level characteristic convolution layer; the outputs of the third RGB branch and the fourth RGB branch are multiplied to be used as one input of the advanced characteristic convolution layer, and the outputs of the third depth branch and the fourth depth branch are multiplied to be used as the other input of the advanced characteristic convolution layer;
The outputs of the low-level characteristic convolution layer and the high-level characteristic convolution layer are input into the mixed characteristic convolution layer;
the fusion result of the fifth RGB branch and the sixth RGB branch is multiplied by the fusion result of the fifth depth branch and the sixth depth branch and then input into a detail information processing module; the fusion result of the seventh RGB branch and the eighth RGB branch is multiplied by the fusion result of the seventh depth branch and the eighth depth branch and then input into the global information processing module;
the output of the mixed characteristic convolution layer and the output of the detail information processing module are fused and then used as one input of the SKNet network model, and the output of the mixed characteristic convolution layer and the output of the global information processing module are fused and then used as the other input of the SKNet network model;
the post-processing module comprises a first deconvolution layer and a second deconvolution layer which are sequentially connected, wherein the input of the post-processing module is the output of the SKNet network model, and the output of the post-processing module is finally output through the output layer.
2. A multi-feature cascading RGB-D significance target detection method according to claim 1, characterized in that,
the first RGB branch comprises a first color attention layer, a first color upsampling layer and a first attention convolution layer which are sequentially connected, the second RGB branch comprises a second color attention layer, a second color upsampling layer and a second attention convolution layer which are sequentially connected, the third RGB branch comprises a third color attention layer, a third color upsampling layer and a third attention convolution layer which are sequentially connected, and the fourth RGB branch comprises a fourth color attention layer, a fourth color upsampling layer and a fourth attention convolution layer which are sequentially connected;
The fifth RGB branch comprises a first color convolution layer and a fifth color up-sampling layer which are sequentially connected, the sixth RGB branch comprises a second color convolution layer and a sixth color up-sampling layer which are sequentially connected, the seventh RGB branch comprises a third color convolution layer and a seventh color up-sampling layer which are sequentially connected, and the eighth RGB branch comprises a fourth color convolution layer and an eighth color up-sampling layer which are sequentially connected;
the first depth branch comprises a first depth attention layer, a first depth upsampling layer and a fifth attention convolution layer which are sequentially connected, the second depth branch comprises a second depth attention layer, a second depth upsampling layer and a sixth attention convolution layer which are sequentially connected, the third depth branch comprises a third depth attention layer, a third depth upsampling layer and a seventh attention convolution layer which are sequentially connected, and the fourth depth branch comprises a fourth depth attention layer, a fourth depth upsampling layer and an eighth attention convolution layer which are sequentially connected;
the fifth depth branch comprises a first depth convolution layer and a fifth depth upsampling layer which are sequentially connected, the sixth depth branch comprises a second depth convolution layer and a sixth depth upsampling layer which are sequentially connected, the seventh depth branch comprises a third depth convolution layer and a seventh depth upsampling layer which are sequentially connected, and the eighth depth branch comprises a fourth depth convolution layer and an eighth depth upsampling layer which are sequentially connected.
3. The multi-feature cascading RGB-D saliency target detection method of claim 1, wherein the detail information processing module includes a first network module and a second excessive convolution layer connected in sequence, and an input of the detail information processing module is fused with an output of the first excessive convolution layer and an output of the second excessive convolution layer to serve as an output of the detail information processing module;
the global information processing module comprises three processing branches, wherein the three processing branches comprise a global network module and a global convolution layer which are sequentially connected, and the outputs of the three processing branches are fused and then used as the output of the global information processing module.
4. The multi-feature cascading RGB-D saliency target detection method of claim 2, wherein each of the color attention layer and the depth attention layer adopts a CBAM module, and each of the color upsampling layer and the depth upsampling layer is used for upsampling processing of bilinear interpolation of input features; each attention convolution layer, color convolution layer, depth convolution layer, low-level feature convolution layer, high-level feature convolution layer and mixed feature convolution layer comprises a convolution layer; each of said deconvolution layers comprises a deconvolution.
5. A multi-feature cascading RGB-D saliency target detection method according to claim 3, wherein the excessive convolution layer in the detail information processing module and the global convolution layer in the global information processing module each comprise a convolution layer, a first network module in the detail information processing module adopts a Dense block in a DenseNet network, and each global network module in the global information processing module adopts an ASPP module.
6. The multi-feature cascading RGB-D saliency target detection method of claim 1, wherein the RGB image input layer input receives an RGB input image and the depth image input layer input receives a depth image corresponding to the RGB image; the input of the RGB feature extraction module and the depth feature extraction module is the output of an RGB image input layer and a depth image input layer respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911099871.2A CN110929736B (en) | 2019-11-12 | 2019-11-12 | Multi-feature cascading RGB-D significance target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911099871.2A CN110929736B (en) | 2019-11-12 | 2019-11-12 | Multi-feature cascading RGB-D significance target detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110929736A CN110929736A (en) | 2020-03-27 |
CN110929736B true CN110929736B (en) | 2023-05-26 |
Family
ID=69852888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911099871.2A Active CN110929736B (en) | 2019-11-12 | 2019-11-12 | Multi-feature cascading RGB-D significance target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110929736B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111461043B (en) * | 2020-04-07 | 2023-04-18 | 河北工业大学 | Video significance detection method based on deep network |
CN111666854B (en) * | 2020-05-29 | 2022-08-30 | 武汉大学 | High-resolution SAR image vehicle target detection method fusing statistical significance |
CN111768375B (en) * | 2020-06-24 | 2022-07-26 | 海南大学 | Asymmetric GM multi-mode fusion significance detection method and system based on CWAM |
CN111985552B (en) * | 2020-08-17 | 2022-07-29 | 中国民航大学 | Method for detecting diseases of thin strip-shaped structure of airport pavement under complex background |
CN112330642B (en) * | 2020-11-09 | 2022-11-04 | 山东师范大学 | Pancreas image segmentation method and system based on double-input full convolution network |
CN112580694B (en) * | 2020-12-01 | 2024-04-19 | 中国船舶重工集团公司第七0九研究所 | Small sample image target recognition method and system based on joint attention mechanism |
CN112507933B (en) * | 2020-12-16 | 2022-09-16 | 南开大学 | Saliency target detection method and system based on centralized information interaction |
CN112528899B (en) * | 2020-12-17 | 2022-04-12 | 南开大学 | Image salient object detection method and system based on implicit depth information recovery |
CN112651406B (en) * | 2020-12-18 | 2022-08-09 | 浙江大学 | Depth perception and multi-mode automatic fusion RGB-D significance target detection method |
CN113516022B (en) * | 2021-04-23 | 2023-01-10 | 黑龙江机智通智能科技有限公司 | Fine-grained classification system for cervical cells |
CN114723951B (en) * | 2022-06-08 | 2022-11-04 | 成都信息工程大学 | Method for RGB-D image segmentation |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409435A (en) * | 2018-11-01 | 2019-03-01 | 上海大学 | A kind of depth perception conspicuousness detection method based on convolutional neural networks |
CN109903276A (en) * | 2019-02-23 | 2019-06-18 | 中国民航大学 | Convolutional neural networks RGB-D conspicuousness detection method based on multilayer fusion |
CN110263813A (en) * | 2019-05-27 | 2019-09-20 | 浙江科技学院 | A kind of conspicuousness detection method merged based on residual error network and depth information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10437878B2 (en) * | 2016-12-28 | 2019-10-08 | Shutterstock, Inc. | Identification of a salient portion of an image |
-
2019
- 2019-11-12 CN CN201911099871.2A patent/CN110929736B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409435A (en) * | 2018-11-01 | 2019-03-01 | 上海大学 | A kind of depth perception conspicuousness detection method based on convolutional neural networks |
CN109903276A (en) * | 2019-02-23 | 2019-06-18 | 中国民航大学 | Convolutional neural networks RGB-D conspicuousness detection method based on multilayer fusion |
CN110263813A (en) * | 2019-05-27 | 2019-09-20 | 浙江科技学院 | A kind of conspicuousness detection method merged based on residual error network and depth information |
Non-Patent Citations (2)
Title |
---|
Multi-scale deep encoder-decoder network for salient object detection;Qinghua Ren et al.;《Neurocomputing》;20181117;第316卷;全文 * |
基于级联全卷积神经网络的显著性检测;张松龙等;《激光与光电子学进展》;20181029(第07期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110929736A (en) | 2020-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110929736B (en) | Multi-feature cascading RGB-D significance target detection method | |
CN108510532B (en) | Optical and SAR image registration method based on deep convolution GAN | |
CN108182441B (en) | Parallel multichannel convolutional neural network, construction method and image feature extraction method | |
CN110728192B (en) | High-resolution remote sensing image classification method based on novel characteristic pyramid depth network | |
CN108154194B (en) | Method for extracting high-dimensional features by using tensor-based convolutional network | |
Wu et al. | 3d shapenets for 2.5 d object recognition and next-best-view prediction | |
CN113221639B (en) | Micro-expression recognition method for representative AU (AU) region extraction based on multi-task learning | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN110458178B (en) | Multi-mode multi-spliced RGB-D significance target detection method | |
CN112396607A (en) | Streetscape image semantic segmentation method for deformable convolution fusion enhancement | |
JP7135659B2 (en) | SHAPE COMPLEMENTATION DEVICE, SHAPE COMPLEMENTATION LEARNING DEVICE, METHOD, AND PROGRAM | |
Nguyen et al. | Satellite image classification using convolutional learning | |
CN112036260B (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN110674685B (en) | Human body analysis segmentation model and method based on edge information enhancement | |
CN103646256A (en) | Image characteristic sparse reconstruction based image classification method | |
CN113269224A (en) | Scene image classification method, system and storage medium | |
CN113743521B (en) | Target detection method based on multi-scale context awareness | |
CN114494594A (en) | Astronaut operating equipment state identification method based on deep learning | |
CN111612046B (en) | Feature pyramid graph convolution neural network and application thereof in 3D point cloud classification | |
CN112597956A (en) | Multi-person attitude estimation method based on human body anchor point set and perception enhancement network | |
CN116342961B (en) | Time sequence classification deep learning system based on mixed quantum neural network | |
Jafrasteh et al. | Generative adversarial networks as a novel approach for tectonic fault and fracture extraction in high resolution satellite and airborne optical images | |
EP3588441B1 (en) | Imagification of multivariate data sequences | |
CN113450313B (en) | Image significance visualization method based on regional contrast learning | |
CN105718858A (en) | Pedestrian recognition method based on positive-negative generalized max-pooling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |