CN110263813A - A kind of conspicuousness detection method merged based on residual error network and depth information - Google Patents
A kind of conspicuousness detection method merged based on residual error network and depth information Download PDFInfo
- Publication number
- CN110263813A CN110263813A CN201910444775.0A CN201910444775A CN110263813A CN 110263813 A CN110263813 A CN 110263813A CN 201910444775 A CN201910444775 A CN 201910444775A CN 110263813 A CN110263813 A CN 110263813A
- Authority
- CN
- China
- Prior art keywords
- layer
- output
- feature maps
- neural network
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 95
- 238000013528 artificial neural network Methods 0.000 claims abstract description 170
- 238000012549 training Methods 0.000 claims abstract description 94
- 238000011176 pooling Methods 0.000 claims abstract description 56
- 230000004927 fusion Effects 0.000 claims abstract description 54
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 41
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 230000004913 activation Effects 0.000 claims description 240
- 238000010606 normalization Methods 0.000 claims description 159
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 83
- 238000000034 method Methods 0.000 claims description 66
- 239000013256 coordination polymer Substances 0.000 claims description 41
- 238000010586 diagram Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
本发明公开了一种基于残差网络和深度信息融合的显著性检测方法,其在训练阶段构建卷积神经网络,输入层包括RGB图输入层和深度图输入层,隐层包括5个RGB图神经网络块、4个RGB图最大池化层、5个深度图神经网络块、4个深度图最大池化层、5个级联层、5个融合神经网络块、4个反卷积层,输出层包括5个子输出层;将训练集中的彩色真实物体图像和深度图像输入到卷积神经网络中进行训练,得到显著性检测预测图;通过计算显著性检测预测图与真实显著性检测标签图像之间的损失函数值,得到卷积神经网络训练模型;在测试阶段利用卷积神经网络训练模型对待显著性检测的彩色真实物体图像进行预测,得到预测显著性检测图像;优点是显著性检测准确率高。
The invention discloses a saliency detection method based on residual network and depth information fusion, which constructs a convolutional neural network in the training stage, the input layer includes an RGB image input layer and a depth image input layer, and the hidden layer includes 5 RGB images Neural network block, 4 RGB image max pooling layers, 5 depth image neural network blocks, 4 depth image max pooling layers, 5 cascade layers, 5 fused neural network blocks, 4 deconvolution layers, The output layer includes 5 sub-output layers; the color real object images and depth images in the training set are input into the convolutional neural network for training, and the saliency detection prediction map is obtained; by calculating the saliency detection prediction map and the real saliency detection label image Between the loss function value, the convolutional neural network training model is obtained; in the test stage, the convolutional neural network training model is used to predict the color real object image to be detected, and the predicted saliency detection image is obtained; the advantage is that the saliency detection is accurate High rate.
Description
技术领域technical field
本发明涉及一种视觉显著性检测技术,尤其是涉及一种基于残差网络和深度信息融合的显著性检测方法。The invention relates to a visual saliency detection technology, in particular to a saliency detection method based on residual network and depth information fusion.
背景技术Background technique
视觉显著性可以帮助人类快速地过滤掉不重要的信息,让人们的注意力更加集中在有意义的区域,从而能更好地理解眼前的场景。随着计算机视觉领域的快速发展,人们希望电脑也能拥有和人类相同的能力,即在理解和分析复杂的场景时,电脑可以更加针对性地处理有用的信息,从而能更大的降低算法的复杂度,并且排除杂波的干扰。在传统做法中,研究人员根据观察到的各种先验知识对显著性对象检测算法进行建模,生成显著性图。这些先验知识包括对比度、中心先验、边缘先验、语义先验等。然而,在复杂的场景中,传统做法往往不够准确,这是因为这些观察往往局限于低级别的特征(例如:颜色和对比度等),所以不能准确反映出显著性对象本质的共同点。Visual saliency can help humans quickly filter out unimportant information, allowing people to focus more on meaningful areas, so that they can better understand the scene in front of them. With the rapid development of the field of computer vision, people hope that computers can also have the same ability as humans, that is, when understanding and analyzing complex scenes, computers can process useful information in a more targeted manner, which can greatly reduce the algorithm. Complexity, and eliminate the interference of clutter. In traditional approaches, researchers model salient object detection algorithms based on various prior knowledge of observations to generate saliency maps. These prior knowledge include contrast, center prior, edge prior, semantic prior, etc. However, in complex scenes, traditional approaches are often inaccurate because these observations are often limited to low-level features (e.g., color and contrast, etc.), so they cannot accurately reflect the common ground of the essence of salient objects.
近年来,卷积神经网络已广泛运用于计算机视觉的各个领域,许多困难的视觉问题都获得了重大的进展。不同于传统做法,深度卷积神经网络能够从大量的训练样本中建模并自动的端到端(end-to-end)地学习到更为本质的特性,从而有效地避免了传统人工建模和设计特征的弊端。最近,3D传感器的有效应用更加丰富了数据库,人们不但可以获得彩色图片,而且可以获取彩色图片的深度信息。深度信息在现实3D场景中是人眼视觉系统中很重要的一环,这是在之前的传统做法中所完全忽略掉的一条重要的信息,因此现在最重要的任务就是如何建立模型从而有效地利用好深度信息。In recent years, convolutional neural networks have been widely used in various fields of computer vision, and significant progress has been made in many difficult vision problems. Unlike traditional practices, deep convolutional neural networks can model from a large number of training samples and automatically learn more essential features end-to-end, thus effectively avoiding traditional manual modeling. and design feature drawbacks. Recently, the effective application of 3D sensors has enriched the database, and people can obtain not only color pictures, but also the depth information of color pictures. Depth information is an important part of the human visual system in real 3D scenes. This is an important piece of information that was completely ignored in the previous traditional methods. Therefore, the most important task now is how to build a model to effectively Make good use of deep information.
在RGB-D数据库中采用深度学习的显著性检测方法,直接进行像素级别端到端的显著性检测,只需要将训练集中的图像输入进模型框架中训练,得到权重与模型,即可在测试集进行预测。目前,基于RGB-D数据库的深度学习显著性检测模型主要用的结构为编码-译码架构,在如何利用深度信息的方法上有三种:第一种方法就是直接将深度信息与彩色图信息叠加为一个四维的输入信息或在编码过程中将彩色图信息和深度信息进行相加或者叠加,这类方法称为前融合;第二种方法则是将在编码过程中对应的彩色图信息和深度信息利用跳层(skip connection)的方式相加或者叠加到对应的译码过程中,这类称为后融合;第三种方法则是分别利用彩色图信息和深度信息进行显著性预测,将最后的结果融合。上述第一种方法,由于彩色图信息和深度信息的分布有较大差异,因此直接在编码过程中加入深度信息会在一定程度上添加了噪声。上述第三种方法,分别利用深度信息和彩色图信息进行显著性预测,但是如果深度信息和彩色图信息的预测结果都不准确时,那么最终的融合结果也是相对不够精确的。上述第二种方法不仅避免了在编码阶段直接利用深度信息带来的噪声,而且在网络模型的不断优化中能够充分学习到彩色图信息和深度信息的互补关系。相比于之前的后融合的方案,如RGB-D Saliency Detection by Multi-streamLate Fusion Network(基于多流的后融合RGB-D显著性检测网络模型),以下简称为MLF,MLF分别对彩色图信息和深度信息进行特征提取和下采样操作,并在最高维通过对应位置元素相乘的方法进行融合,在此融合的结果上输出一个尺寸很小的显著性预测图。MLF由于只有下采样操作,因此使得物体的空间细节信息在不断的下采样的操作中变得模糊,而且MLF是在最小的尺寸上进行显著性预测输出,再放大到原始尺寸后会丢失很多的显著物体的信息。In the RGB-D database, the saliency detection method of deep learning is used to directly perform end-to-end saliency detection at the pixel level. It only needs to input the images in the training set into the model framework for training, and obtain the weights and models, which can be used in the test set. Make predictions. At present, the deep learning saliency detection model based on the RGB-D database mainly uses the encoding-decoding architecture. There are three methods on how to use the depth information: the first method is to directly superimpose the depth information and the color image information For a four-dimensional input information or add or superimpose the color image information and depth information during the encoding process, this method is called pre-fusion; the second method is to combine the corresponding color image information and depth information during the encoding process The information is added or superimposed to the corresponding decoding process by means of skip connection, which is called post-fusion; the third method is to use color image information and depth information for saliency prediction respectively, and the final result fusion. In the first method above, since the distribution of color image information and depth information is quite different, directly adding depth information in the encoding process will add noise to a certain extent. The above third method uses depth information and color image information to perform saliency prediction respectively, but if the prediction results of depth information and color image information are not accurate, then the final fusion result is relatively inaccurate. The above second method not only avoids the noise caused by direct use of depth information in the encoding stage, but also can fully learn the complementary relationship between color image information and depth information in the continuous optimization of the network model. Compared with the previous post-fusion scheme, such as RGB-D Saliency Detection by Multi-streamLate Fusion Network (based on multi-stream post-fusion RGB-D saliency detection network model), hereinafter referred to as MLF, MLF separately for color map information Perform feature extraction and down-sampling operations with depth information, and fuse in the highest dimension by multiplying corresponding position elements, and output a small-sized saliency prediction map on the result of this fusion. Since MLF only has a downsampling operation, the spatial details of the object become blurred in the continuous downsampling operation, and MLF performs saliency prediction output on the smallest size, and a lot of it will be lost after being enlarged to the original size. Information about salient objects.
发明内容Contents of the invention
本发明所要解决的技术问题是一种基于残差网络和深度信息融合的显著性检测方法,其通过高效地利用深度信息和彩色图信息,从而提升了显著性检测准确率。The technical problem to be solved by the present invention is a saliency detection method based on residual network and depth information fusion, which improves the accuracy of saliency detection by efficiently utilizing depth information and color image information.
本发明解决上述技术问题所采用的技术方案为:一种基于残差网络和深度信息融合的显著性检测方法,其特征在于包括训练阶段和测试阶段两个过程;The technical solution adopted by the present invention to solve the above-mentioned technical problems is: a saliency detection method based on residual network and depth information fusion, which is characterized in that it includes two processes of training phase and testing phase;
所述的训练阶段过程的具体步骤为:The specific steps of the described training phase process are:
步骤1_1:选取Q幅原始的彩色真实物体图像及每幅原始的彩色真实物体图像对应的深度图像和真实显著性检测标签图像,并构成训练集,将训练集中的第q幅原始的彩色真实物体图像及其对应的深度图像和真实显著性检测标签图像对应记为{Iq(i,j)}、{Dq(i,j)}、其中,Q为正整数,Q≥200,q为正整数,q的初始值为1,1≤q≤Q,1≤i≤W,1≤j≤H,W表示{Iq(i,j)}、{Dq(i,j)}、的宽度,H表示{Iq(i,j)}、{Dq(i,j)}、的高度,W和H均能够被2整除,{Iq(i,j)}为RGB彩色图像,Iq(i,j)表示{Iq(i,j)}中坐标位置为(i,j)的像素点的像素值,{Dq(i,j)}为单通道的深度图像,Dq(i,j)表示{Dq(i,j)}中坐标位置为(i,j)的像素点的像素值,表示中坐标位置为(i,j)的像素点的像素值;Step 1_1: Select Q original color real object images and the corresponding depth images and true saliency detection label images of each original color real object image, and form a training set, and the qth original color real object in the training set The image and its corresponding depth image and the real saliency detection label image are correspondingly denoted as {I q (i,j)}, {D q (i,j)}, Among them, Q is a positive integer, Q≥200, q is a positive integer, the initial value of q is 1, 1≤q≤Q, 1≤i≤W, 1≤j≤H, W means {I q (i, j )}, {D q (i,j)}, The width of , H represents {I q (i,j)}, {D q (i,j)}, height, both W and H can be divisible by 2, {I q (i,j)} is an RGB color image, I q (i,j) means that the coordinate position in {I q (i,j)} is (i, The pixel value of the pixel point of j), {D q (i,j)} is a single-channel depth image, D q (i,j) means that the coordinate position in {D q (i,j)} is (i,j) ) pixel value of the pixel point, express The pixel value of the pixel point whose middle coordinate position is (i, j);
步骤1_2:构建卷积神经网络:该卷积神经网络包含输入层、隐层、输出层,输入层包括RGB图输入层和深度图输入层,隐层包括5个RGB图神经网络块、4个RGB图最大池化层、5个深度图神经网络块、4个深度图最大池化层、5个级联层、5个融合神经网络块、4个反卷积层,输出层包括5个子输出层;其中,5个RGB图神经网络块和4个RGB图最大池化层构成RGB图的编码结构,5个深度图神经网络块和4个深度图最大池化层构成深度图的编码结构,RGB图的编码结构和深度图的编码结构构成卷积神经网络的编码层,5个级联层、5个融合神经网络块和4个反卷积层构成卷积神经网络的译码层;Step 1_2: Construct a convolutional neural network: The convolutional neural network includes an input layer, a hidden layer, and an output layer. The input layer includes an RGB image input layer and a depth image input layer. The hidden layer includes 5 RGB image neural network blocks, 4 RGB map max pooling layer, 5 depth map neural network blocks, 4 depth map max pooling layers, 5 cascade layers, 5 fused neural network blocks, 4 deconvolution layers, output layer including 5 sub-outputs layer; among them, 5 RGB image neural network blocks and 4 RGB image maximum pooling layers constitute the encoding structure of RGB image, 5 depth image neural network blocks and 4 depth image maximum pooling layers constitute the encoding structure of depth image, The encoding structure of the RGB image and the encoding structure of the depth image constitute the encoding layer of the convolutional neural network, and 5 cascade layers, 5 fusion neural network blocks and 4 deconvolution layers constitute the decoding layer of the convolutional neural network;
对于RGB图输入层,其输入端接收一幅训练用RGB彩色图像的R通道分量、G通道分量和B通道分量,其输出端输出训练用RGB彩色图像的R通道分量、G通道分量和B通道分量给隐层;其中,要求训练用RGB彩色图像的宽度为W且高度为H;For the RGB image input layer, its input terminal receives the R channel component, G channel component and B channel component of a RGB color image for training, and its output terminal outputs the R channel component, G channel component and B channel component of the RGB color image for training The component is given to the hidden layer; among them, the width of the RGB color image required for training is W and the height is H;
对于深度图输入层,其输入端接收RGB图输入层的输入端接收的训练用RGB彩色图像对应的训练用深度图像,其输出端输出训练用深度图像给隐层;其中,训练用深度图像的宽度为W且高度为H;For the depth image input layer, its input terminal receives the training depth image corresponding to the training RGB color image received by the input end of the RGB image input layer, and its output terminal outputs the training depth image to the hidden layer; wherein, the training depth image Width is W and height is H;
对于第1个RGB图神经网络块,其输入端接收RGB图输入层的输出端输出的训练用RGB彩色图像的R通道分量、G通道分量和B通道分量,其输出端输出32幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为CP1;For the first RGB image neural network block, its input terminal receives the R channel component, G channel component and B channel component of the training RGB color image output by the output terminal of the RGB image input layer, and its output terminal outputs 32 widths of W And the feature map with a height of H, the set of all output feature maps is recorded as CP 1 ;
对于第1个RGB图最大池化层,其输入端接收CP1中的所有特征图,其输出端输出32幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC1;For the first RGB image maximum pooling layer, its input terminal receives all feature maps in CP 1 , and its output terminal outputs 32 widths of and the height is The feature map of , the set of all output feature maps is recorded as ZC 1 ;
对于第2个RGB图神经网络块,其输入端接收ZC1中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP2;For the second RGB image neural network block, its input terminal receives all feature maps in ZC 1 , and its output terminal outputs 64 widths of and the height is The feature map of , the set of all the output feature maps is recorded as CP 2 ;
对于第2个RGB图最大池化层,其输入端接收CP2中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC2;For the second RGB image maximum pooling layer, its input receives all feature maps in CP 2 , and its output outputs 64 widths of and the height is The feature map of , record the set of all feature maps that are output as ZC 2 ;
对于第3个RGB图神经网络块,其输入端接收ZC2中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP3;For the third RGB image neural network block, its input terminal receives all feature maps in ZC 2 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all output feature maps is recorded as CP 3 ;
对于第3个RGB图最大池化层,其输入端接收CP3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC3;For the third RGB image maximum pooling layer, its input terminal receives all feature maps in CP 3 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all feature maps of the output is recorded as ZC 3 ;
对于第4个RGB图神经网络块,其输入端接收ZC3中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP4;For the fourth RGB image neural network block, its input terminal receives all feature maps in ZC 3 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as CP 4 ;
对于第4个RGB图最大池化层,其输入端接收CP4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC4;For the fourth RGB image maximum pooling layer, its input terminal receives all feature maps in CP 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all feature maps that are output is recorded as ZC 4 ;
对于第5个RGB图神经网络块,其输入端接收ZC4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP5;For the fifth RGB image neural network block, its input terminal receives all feature maps in ZC 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all the output feature maps is recorded as CP 5 ;
对于第1个深度图神经网络块,其输入端接收深度图输入层的输出端输出的训练用深度图像,其输出端输出32幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为DP1;For the first depth map neural network block, its input terminal receives the training depth image output from the output terminal of the depth map input layer, and its output terminal outputs 32 feature maps with a width of W and a height of H, and all the output features The collection of graphs is denoted as DP 1 ;
对于第1个深度图最大池化层,其输入端接收DP1中的所有特征图,其输出端输出32幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC1;For the first depth map maximum pooling layer, its input terminal receives all feature maps in DP 1 , and its output terminal outputs 32 widths of and the height is The feature map of , the set of all output feature maps is recorded as DC 1 ;
对于第2个深度图神经网络块,其输入端接收DC1中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP2;For the second depth map neural network block, its input terminal receives all feature maps in DC 1 , and its output terminal outputs 64 widths of and the height is The feature map of , the set of all the output feature maps is recorded as DP 2 ;
对于第2个深度图最大池化层,其输入端接收DP2中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC2;For the second depth map maximum pooling layer, its input receives all feature maps in DP 2 , and its output outputs 64 widths of and the height is The feature map of , the set of all output feature maps is recorded as DC 2 ;
对于第3个深度图神经网络块,其输入端接收DC2中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP3;For the third depth map neural network block, its input terminal receives all feature maps in DC 2 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all feature maps that are output is recorded as DP 3 ;
对于第3个深度图最大池化层,其输入端接收DP3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC3;For the third depth map maximum pooling layer, its input terminal receives all feature maps in DP 3 , and its output terminal outputs 128 widths. and the height is The feature map of , the set of all output feature maps is recorded as DC 3 ;
对于第4个深度图神经网络块,其输入端接收DC3中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP4;For the fourth depth map neural network block, its input terminal receives all feature maps in DC 3 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all feature maps that are output is recorded as DP 4 ;
对于第4个深度图最大池化层,其输入端接收DP4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC4;For the fourth depth map maximum pooling layer, its input terminal receives all feature maps in DP 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as DC 4 ;
对于第5个深度图神经网络块,其输入端接收DC4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP5;For the fifth depth map neural network block, its input terminal receives all feature maps in DC 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as DP 5 ;
对于第1个级联层,其输入端接收CP5中的所有特征图和DP5中的所有特征图,对CP5中的所有特征图和DP5中的所有特征图进行叠加,其输出端输出512幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con1;For the first cascade layer, its input receives all feature maps in CP 5 and all feature maps in DP 5 , and superimposes all feature maps in CP 5 and all feature maps in DP 5 , and its output The output width of 512 is and the height is The feature map of , the set of all the output feature maps is recorded as Con 1 ;
对于第1个融合神经网络块,其输入端接收Con1中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH1;For the first fusion neural network block, its input terminal receives all feature maps in Con 1 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all feature maps that are output is recorded as RH 1 ;
对于第1个反卷积层,其输入端接收RH1中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FJ1;For the first deconvolution layer, its input receives all feature maps in RH 1 , and its output outputs 256 widths of and the height is The feature map of , record the set of all feature maps that are output as FJ 1 ;
对于第2个级联层,其输入端接收FJ1中的所有特征图、CP4中的所有特征图和DP4中的所有特征图,对FJ1中的所有特征图、CP4中的所有特征图和DP4中的所有特征图进行叠加,其输出端输出768幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con2;For the second cascade layer, its input receives all feature maps in FJ 1 , all feature maps in CP 4 , and all feature maps in DP 4 , for all feature maps in FJ 1 , all feature maps in CP 4 The feature map and all feature maps in DP 4 are superimposed, and the output terminal outputs 768 widths of and the height is The feature map of , record the set of all feature maps that are output as Con 2 ;
对于第2个融合神经网络块,其输入端接收Con2中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH2;For the second fusion neural network block, its input terminal receives all feature maps in Con 2 , and its output terminal outputs 256 widths of and the height is The feature map of , record the set of all feature maps that are output as RH 2 ;
对于第2个反卷积层,其输入端接收RH2中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FJ2;For the second deconvolution layer, its input receives all the feature maps in RH 2 , and its output outputs 256 widths of and the height is The feature map of , record the set of all feature maps that are output as FJ 2 ;
对于第3个级联层,其输入端接收FJ2中的所有特征图、CP3中的所有特征图和DP3中的所有特征图,对FJ2中的所有特征图、CP3中的所有特征图和DP3中的所有特征图进行叠加,其输出端输出512幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con3;For the third cascade layer, its input receives all feature maps in FJ 2 , all feature maps in CP 3 , and all feature maps in DP 3 , for all feature maps in FJ 2 , all feature maps in CP 3 The feature map and all feature maps in DP 3 are superimposed, and the output terminal outputs 512 widths of and the height is The feature map of , record the set of all feature maps that are output as Con 3 ;
对于第3个融合神经网络块,其输入端接收Con3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH3;For the third fusion neural network block, its input terminal receives all the feature maps in Con 3 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all feature maps of the output is recorded as RH 3 ;
对于第3个反卷积层,其输入端接收RH3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FJ3;For the third deconvolution layer, its input receives all feature maps in RH 3 , and its output outputs 128 widths of and the height is The feature map of , record the set of all feature maps that are output as FJ 3 ;
对于第4个级联层,其输入端接收FJ3中的所有特征图、CP2中的所有特征图和DP2中的所有特征图,对FJ3中的所有特征图、CP2中的所有特征图和DP2中的所有特征图进行叠加,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con4;For the fourth cascade layer, its input receives all feature maps in FJ 3 , all feature maps in CP 2 , and all feature maps in DP 2 , for all feature maps in FJ 3 , all feature maps in CP 2 The feature map and all feature maps in DP 2 are superimposed, and the output terminal outputs 256 widths of and the height is The feature map of , record the set of all feature maps that are output as Con 4 ;
对于第4个融合神经网络块,其输入端接收Con4中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH4;For the fourth fusion neural network block, its input terminal receives all feature maps in Con 4 , and its output terminal outputs 64 widths of and the height is The feature map of , the set of all feature maps of the output is recorded as RH 4 ;
对于第4个反卷积层,其输入端接收RH4中的所有特征图,其输出端输出64幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为FJ4;For the fourth deconvolution layer, its input terminal receives all feature maps in RH 4 , and its output terminal outputs 64 feature maps with width W and height H, and the set of all output feature maps is recorded as FJ 4 ;
对于第5个级联层,其输入端接收FJ4中的所有特征图、CP1中的所有特征图和DP1中的所有特征图,对FJ4中的所有特征图、CP1中的所有特征图和DP1中的所有特征图进行叠加,其输出端输出128幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为Con5;For the fifth cascaded layer, its input receives all feature maps in FJ 4 , all feature maps in CP 1 , and all feature maps in DP 1 , for all feature maps in FJ 4 , all feature maps in CP 1 The feature map is superimposed with all feature maps in DP 1 , and its output terminal outputs 128 feature maps with a width of W and a height of H, and the set of all feature maps that are output is denoted as Con 5 ;
对于第5个融合神经网络块,其输入端接收Con5中的所有特征图,其输出端输出32幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为RH5;For the fifth fused neural network block, its input terminal receives all feature maps in Con 5 , and its output terminal outputs 32 feature maps with a width of W and a height of H, and the set of all output feature maps is recorded as RH 5 ;
对于第1个子输出层,其输入端接收RH1中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out1,Out1中的其中一幅特征图为显著性检测预测图;For the first sub-output layer, its input terminal receives all feature maps in RH 1 , and its output terminal outputs 2 widths of and the height is feature map, record the set of all output feature maps as Out 1 , and one of the feature maps in Out 1 is the saliency detection prediction map;
对于第2个子输出层,其输入端接收RH2中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out2,Out2中的其中一幅特征图为显著性检测预测图;For the second sub-output layer, its input receives all the feature maps in RH 2 , and its output outputs 2 widths of and the height is The feature map of , the set of all the output feature maps is recorded as Out 2 , and one of the feature maps in Out 2 is the saliency detection prediction map;
对于第3个子输出层,其输入端接收RH3中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out3,Out3中的其中一幅特征图为显著性检测预测图;For the third sub-output layer, its input receives all the feature maps in RH 3 , and its output outputs 2 widths of and the height is feature map, record the set of all output feature maps as Out 3 , and one of the feature maps in Out 3 is a saliency detection prediction map;
对于第4个子输出层,其输入端接收RH4中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out4,Out4中的其中一幅特征图为显著性检测预测图;For the fourth sub-output layer, its input receives all feature maps in RH 4 , and its output outputs 2 widths of and the height is The feature map of , the set of all the output feature maps is recorded as Out 4 , and one of the feature maps in Out 4 is the saliency detection prediction map;
对于第5个子输出层,其输入端接收RH5中的所有特征图,其输出端输出2幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为Out5,Out5中的其中一幅特征图为显著性检测预测图;For the fifth sub-output layer, its input terminal receives all feature maps in RH 5 , and its output terminal outputs two feature maps with a width of W and a height of H, and the set of all output feature maps is recorded as Out 5 , One of the feature maps in Out 5 is a saliency detection prediction map;
步骤1_3:将训练集中的每幅原始的彩色真实物体图像作为训练用RGB彩色图像,将训练集中的每幅原始的彩色真实物体图像对应的深度图像作为训练用深度图像,输入到卷积神经网络中进行训练,得到训练集中的每幅原始的彩色真实物体图像对应的5幅显著性检测预测图,将{Iq(i,j)}对应的5幅显著性检测预测图构成的集合记为 Step 1_3: Use each original color real object image in the training set as the RGB color image for training, use the depth image corresponding to each original color real object image in the training set as the depth image for training, and input it to the convolutional neural network training in , to obtain 5 saliency detection prediction maps corresponding to each original color real object image in the training set, and record the set of 5 saliency detection prediction maps corresponding to {I q (i,j)} as
步骤1_4:对训练集中的每幅原始的彩色真实物体图像对应的真实显著性检测标签图像进行5种不同尺寸大小的缩放处理,得到宽度为且高度为的图像、宽度为且高度为的图像、宽度为且高度为的图像、宽度为且高度为的图像、宽度为W且高度为H的图像,将{Iq(i,j)}对应的真实显著性检测图像经缩放处理后得到的5幅图像构成的集合记为 Step 1_4: Scale the real saliency detection label image corresponding to each original color real object image in the training set to 5 different sizes, and obtain a width of and the height is image with a width of and the height is image with a width of and the height is image with a width of and the height is image, the image whose width is W and the height is H, and the set of 5 images obtained by scaling the real saliency detection image corresponding to {I q (i, j)} is denoted as
步骤1_5:计算训练集中的每幅原始的彩色真实物体图像对应的5幅显著性检测预测图构成的集合与该原始的彩色真实物体图像对应的真实显著性检测图像经缩放处理后得到的5幅图像构成的集合之间的损失函数值,将与之间的损失函数值记为采用分类交叉熵获得;Step 1_5: Calculate the set of 5 saliency detection prediction maps corresponding to each original color real object image in the training set and the 5 real saliency detection images corresponding to the original color real object image after scaling processing The loss function value between the set of images will be and The loss function value between is denoted as Obtained using categorical cross-entropy;
步骤1_6:重复执行步骤1_3至步骤1_5共V次,得到卷积神经网络训练模型,并共得到Q×V个损失函数值;然后从Q×V个损失函数值中找出值最小的损失函数值;接着将值最小的损失函数值对应的权值矢量和偏置项对应作为卷积神经网络训练模型的最优权值矢量和最优偏置项,对应记为Wbest和bbest;其中,V>1;Step 1_6: Repeat step 1_3 to step 1_5 for a total of V times to obtain the convolutional neural network training model, and obtain a total of Q×V loss function values; then find the loss function with the smallest value from the Q×V loss function values value; then use the weight vector and bias item corresponding to the loss function value with the smallest value as the optimal weight vector and the optimal bias item of the convolutional neural network training model, correspondingly recorded as W best and b best ; where , V>1;
所述的测试阶段过程的具体步骤为:The specific steps of the described testing phase process are:
步骤2_1:令表示待显著性检测的彩色真实物体图像,将对应的深度图像记为其中,1≤i'≤W',1≤j'≤H',W'表示和的宽度,H'表示和的高度,表示中坐标位置为(i',j')的像素点的像素值,表示中坐标位置为(i',j')的像素点的像素值;Step 2_1: Order Represents a color real object image to be saliency detected, the The corresponding depth image is recorded as Among them, 1≤i'≤W', 1≤j'≤H', W' means and The width, H' means and the height of, express The pixel value of the pixel point whose coordinate position is (i', j'), express The pixel value of the pixel point whose middle coordinate position is (i',j');
步骤2_2:将的R通道分量、G通道分量和B通道分量以及输入到卷积神经网络训练模型中,并利用Wbest和bbest进行预测,得到对应的5幅不同尺寸大小的预测显著性检测图像,将尺寸大小与的尺寸大小一致的预测显著性检测图像作为对应的最终预测显著性检测图像,并记为其中,表示中坐标位置为(i',j')的像素点的像素值。Step 2_2: Put The R channel component, G channel component and B channel component of the Input to the convolutional neural network training model, and use W best and b best to predict, get Corresponding to the predicted saliency detection images of 5 different sizes, the size and The size of the consistent predicted saliency detection image as The corresponding final predicted saliency detection image is denoted as in, express The pixel value of the pixel whose coordinate position is (i', j').
所述的步骤1_2中,第1个RGB图神经网络块和第1个深度图神经网络块的结构相同,其由依次设置的第一卷积层、第一批标准化层、第一激活层、第一残差块、第二卷积层、第二批标准化层、第二激活层组成,第一卷积层的输入端为其所在的神经网络块的输入端,第一批标准化层的输入端接收第一卷积层的输出端输出的所有特征图,第一激活层的输入端接收第一批标准化层的输出端输出的所有特征图,第一残差块的输入端接收第一激活层的输出端输出的所有特征图,第二卷积层的输入端接收第一残差块的输出端输出的所有特征图,第二批标准化层的输入端接收第二卷积层的输出端输出的所有特征图,第二激活层的输入端接收第二批标准化层的输出端输出的所有特征图,第二激活层的输出端为其所在的神经网络块的输出端;其中,第一卷积层和第二卷积层的卷积核大小均为3×3、卷积核个数均为32、补零参数均为1,第一激活层和第二激活层的激活方式均为“Relu”,第一批标准化层、第二批标准化层、第一激活层、第二激活层和第一残差块各自的输出端输出32幅特征图;In the described step 1_2, the structure of the first RGB graph neural network block and the first depth graph neural network block is the same, which consists of the first convolutional layer, the first batch of normalization layers, the first activation layer, The first residual block, the second convolutional layer, the second batch of normalization layers, and the second activation layer are composed. The input of the first convolutional layer is the input of the neural network block where it is located, and the input of the first batch of normalization layers is The input end of the first convolutional layer receives all the feature maps output by the output end of the first activation layer, the input end of the first activation layer receives all the feature maps output by the output end of the first normalization layer, and the input end of the first residual block receives the first activation All the feature maps output by the output of the second convolutional layer, the input of the second convolutional layer receives all the feature maps output by the output of the first residual block, and the input of the second normalization layer receives the output of the second convolutional layer All the feature maps of the output, the input of the second activation layer receives all the feature maps output by the output of the second batch of normalization layers, and the output of the second activation layer is the output of the neural network block where it is located; wherein, the first The convolution kernel size of the convolution layer and the second convolution layer are both 3×3, the number of convolution kernels is 32, and the zero padding parameters are 1. The activation methods of the first activation layer and the second activation layer are both "Relu", the first batch of normalization layers, the second batch of normalization layers, the first activation layer, the second activation layer and the first residual block respectively output 32 feature maps;
第2个RGB图神经网络块和第2个深度图神经网络块的结构相同,其由依次设置的第三卷积层、第三批标准化层、第三激活层、第二残差块、第四卷积层、第四批标准化层、第四激活层组成,第三卷积层的输入端为其所在的神经网络块的输入端,第三批标准化层的输入端接收第三卷积层的输出端输出的所有特征图,第三激活层的输入端接收第三批标准化层的输出端输出的所有特征图,第二残差块的输入端接收第三激活层的输出端输出的所有特征图,第四卷积层的输入端接收第二残差块的输出端输出的所有特征图,第四批标准化层的输入端接收第四卷积层的输出端输出的所有特征图,第四激活层的输入端接收第四批标准化层的输出端输出的所有特征图,第四激活层的输出端为其所在的神经网络块的输出端;其中,第三卷积层和第四卷积层的卷积核大小均为3×3、卷积核个数均为64、补零参数均为1,第三激活层和第四激活层的激活方式均为“Relu”,第三批标准化层、第四批标准化层、第三激活层、第四激活层和第二残差块各自的输出端输出64幅特征图;The structure of the second RGB image neural network block is the same as that of the second depth image neural network block, which consists of the third convolutional layer, the third batch normalization layer, the third activation layer, the second residual block, and the second Composed of four convolutional layers, the fourth batch of normalization layers, and the fourth activation layer, the input of the third convolutional layer is the input of the neural network block where it is located, and the input of the third batch of normalization layers receives the third convolutional layer All the feature maps output by the output of the third activation layer, the input of the third activation layer receives all the feature maps output by the output of the third normalization layer, and the input of the second residual block receives all the output of the output of the third activation layer The feature map, the input end of the fourth convolutional layer receives all the feature maps output by the output end of the second residual block, the input end of the fourth batch of normalization layer receives all the feature maps output by the output end of the fourth convolutional layer, the first The input of the four activation layers receives all the feature maps output by the output of the fourth batch of normalization layers, and the output of the fourth activation layer is the output of the neural network block where it is located; where the third convolutional layer and the fourth volume The convolution kernel size of the product layer is 3×3, the number of convolution kernels is 64, and the zero padding parameters are all 1. The activation mode of the third activation layer and the fourth activation layer are both "Relu", and the third batch The output terminals of the normalization layer, the fourth batch of normalization layers, the third activation layer, the fourth activation layer and the second residual block output 64 feature maps;
第3个RGB图神经网络块和第3个深度图神经网络块的结构相同,其由依次设置的第五卷积层、第五批标准化层、第五激活层、第三残差块、第六卷积层、第六批标准化层、第六激活层组成,第五卷积层的输入端为其所在的神经网络块的输入端,第五批标准化层的输入端接收第五卷积层的输出端输出的所有特征图,第五激活层的输入端接收第五批标准化层的输出端输出的所有特征图,第三残差块的输入端接收第五激活层的输出端输出的所有特征图,第六卷积层的输入端接收第三残差块的输出端输出的所有特征图,第六批标准化层的输入端接收第六卷积层的输出端输出的所有特征图,第六激活层的输入端接收第六批标准化层的输出端输出的所有特征图,第六激活层的输出端为其所在的神经网络块的输出端;其中,第五卷积层和第六卷积层的卷积核大小均为3×3、卷积核个数均为128、补零参数均为1,第五激活层和第六激活层的激活方式均为“Relu”,第五批标准化层、第六批标准化层、第五激活层、第六激活层和第三残差块各自的输出端输出128幅特征图;The third RGB graph neural network block has the same structure as the third depth graph neural network block, which consists of the fifth convolutional layer, the fifth batch normalization layer, the fifth activation layer, the third residual block, the It consists of six convolutional layers, the sixth batch of normalization layers, and the sixth activation layer. The input of the fifth convolutional layer is the input of the neural network block where it is located, and the input of the fifth batch of normalization layers receives the fifth convolutional layer. All the feature maps output by the output of the fifth activation layer, the input of the fifth activation layer receives all the feature maps output by the output of the fifth normalization layer, and the input of the third residual block receives all the output of the output of the fifth activation layer The feature map, the input of the sixth convolutional layer receives all the feature maps output by the output of the third residual block, the input of the sixth batch of normalization layer receives all the feature maps output by the output of the sixth convolutional layer, the first The input of the six activation layers receives all the feature maps output by the output of the sixth batch of normalization layers, and the output of the sixth activation layer is the output of the neural network block where it is located; among them, the fifth convolutional layer and the sixth volume The size of the convolution kernel of the product layer is 3×3, the number of convolution kernels is 128, and the zero-padding parameter is 1. The activation mode of the fifth activation layer and the sixth activation layer are both "Relu", and the fifth batch The output terminals of the normalization layer, the sixth batch of normalization layers, the fifth activation layer, the sixth activation layer and the third residual block output 128 feature maps;
第4个RGB图神经网络块和第4个深度图神经网络块的结构相同,其由依次设置的第七卷积层、第七批标准化层、第七激活层、第四残差块、第八卷积层、第八批标准化层、第八激活层组成,第七卷积层的输入端为其所在的神经网络块的输入端,第七批标准化层的输入端接收第七卷积层的输出端输出的所有特征图,第七激活层的输入端接收第七批标准化层的输出端输出的所有特征图,第四残差块的输入端接收第七激活层的输出端输出的所有特征图,第八卷积层的输入端接收第四残差块的输出端输出的所有特征图,第八批标准化层的输入端接收第八卷积层的输出端输出的所有特征图,第八激活层的输入端接收第八批标准化层的输出端输出的所有特征图,第八激活层的输出端为其所在的神经网络块的输出端;其中,第七卷积层和第八卷积层的卷积核大小均为3×3、卷积核个数均为256、补零参数均为1,第七激活层和第八激活层的激活方式均为“Relu”,第七批标准化层、第八批标准化层、第七激活层、第八激活层和第四残差块各自的输出端输出256幅特征图;The structure of the 4th RGB graph neural network block is the same as that of the 4th depth graph neural network block, which consists of the seventh convolutional layer, the seventh batch normalization layer, the seventh activation layer, the fourth residual block, the Eight convolutional layers, the eighth batch of normalization layers, and the eighth activation layer, the input of the seventh convolutional layer is the input of the neural network block where it is located, and the input of the seventh batch of normalization layers receives the seventh convolutional layer All the feature maps output by the output of the seventh activation layer, the input of the seventh activation layer receives all the feature maps output by the output of the seventh normalization layer, and the input of the fourth residual block receives all the output of the output of the seventh activation layer The feature map, the input end of the eighth convolutional layer receives all the feature maps output by the output end of the fourth residual block, the input end of the eighth normalization layer receives all the feature maps output by the output end of the eighth convolutional layer, the first The input of the eighth activation layer receives all the feature maps output by the output of the eighth normalization layer, and the output of the eighth activation layer is the output of the neural network block where it is located; among them, the seventh convolutional layer and the eighth volume The size of the convolution kernel of the product layer is 3×3, the number of convolution kernels is 256, and the zero-padding parameter is 1. The activation mode of the seventh activation layer and the eighth activation layer are both "Relu", and the seventh batch The output terminals of the normalization layer, the eighth batch of normalization layers, the seventh activation layer, the eighth activation layer and the fourth residual block output 256 feature maps;
第5个RGB图神经网络块和第5个深度图神经网络块的结构相同,其由依次设置的第九卷积层、第九批标准化层、第九激活层、第五残差块、第十卷积层、第十批标准化层、第十激活层组成,第九卷积层的输入端为其所在的神经网络块的输入端,第九批标准化层的输入端接收第九卷积层的输出端输出的所有特征图,第九激活层的输入端接收第九批标准化层的输出端输出的所有特征图,第五残差块的输入端接收第九激活层的输出端输出的所有特征图,第十卷积层的输入端接收第五残差块的输出端输出的所有特征图,第十批标准化层的输入端接收第十卷积层的输出端输出的所有特征图,第十激活层的输入端接收第十批标准化层的输出端输出的所有特征图,第十激活层的输出端为其所在的神经网络块的输出端;其中,第九卷积层和第十卷积层的卷积核大小均为3×3、卷积核个数均为256、补零参数均为1,第九激活层和第十激活层的激活方式均为“Relu”,第九批标准化层、第十批标准化层、第九激活层、第十激活层和第五残差块各自的输出端输出256幅特征图。The fifth RGB graph neural network block has the same structure as the fifth depth graph neural network block, which consists of the ninth convolutional layer, the ninth batch normalization layer, the ninth activation layer, the fifth residual block, the The tenth convolutional layer, the tenth batch of normalization layer, and the tenth activation layer are composed. The input end of the ninth convolutional layer is the input end of the neural network block where it is located, and the input end of the ninth batch of normalization layer receives the ninth convolutional layer All the feature maps output by the output of the ninth activation layer, the input of the ninth activation layer receives all the feature maps output by the output of the ninth normalization layer, and the input of the fifth residual block receives all the output of the output of the ninth activation layer The feature map, the input end of the tenth convolutional layer receives all the feature maps output by the output end of the fifth residual block, the input end of the tenth batch normalization layer receives all the feature maps output by the output end of the tenth convolutional layer, the first The input end of the tenth activation layer receives all the feature maps output by the output end of the tenth batch of normalization layers, and the output end of the tenth activation layer is the output end of the neural network block where it is located; among them, the ninth convolutional layer and the tenth volume The convolution kernel size of the product layer is 3×3, the number of convolution kernels is 256, and the zero padding parameters are all 1. The activation methods of the ninth activation layer and the tenth activation layer are both "Relu". The respective outputs of the normalization layer, the tenth normalization layer, the ninth activation layer, the tenth activation layer and the fifth residual block output 256 feature maps.
所述的步骤1_2中,4个RGB图最大池化层和4个深度图最大池化层均为最大池化层,4个RGB图最大池化层和4个深度图最大池化层的池化尺寸均为2、步长均为2。In the step 1_2, the maximum pooling layers of the 4 RGB images and the maximum pooling layers of the 4 depth images are the maximum pooling layers, and the pooling of the maximum pooling layers of the 4 RGB images and the maximum pooling layers of the 4 depth images All sizes are 2, and the step size is 2.
所述的步骤1_2中,5个融合神经网络块的结构相同,其由依次设置的第十一卷积层、第十一批标准化层、第十一激活层、第六残差块、第十二卷积层、第十二批标准化层、第十二激活层组成,第十一卷积层的输入端为其所在的融合神经网络块的输入端,第十一批标准化层的输入端接收第十一卷积层的输出端输出的所有特征图,第十一激活层的输入端接收第十一批标准化层的输出端输出的所有特征图,第六残差块的输入端接收第十一激活层的输出端输出的所有特征图,第十二卷积层的输入端接收第六残差块的输出端输出的所有特征图,第十二批标准化层的输入端接收第十二卷积层的输出端输出的所有特征图,第十二激活层的输入端接收第十二批标准化层的输出端输出的所有特征图,第十二激活层的输出端为其所在的神经网络块的输出端;其中,第1个和第2个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为256、补零参数均为1,第1个和第2个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第1个和第2个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出256幅特征图,第3个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为128、补零参数均为1,第3个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第3个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出128幅特征图,第4个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为64、补零参数均为1,第4个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第4个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出64幅特征图,第5个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为32、补零参数均为1,第5个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第5个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出32幅特征图。In the step 1_2, the structure of the 5 fused neural network blocks is the same, which consists of the eleventh convolutional layer, the eleventh batch of normalization layers, the eleventh activation layer, the sixth residual block, the tenth The second convolutional layer, the twelfth batch of normalization layers, and the twelfth activation layer are composed. The input end of the eleventh convolutional layer is the input end of the fusion neural network block where it is located, and the input end of the eleventh batch of normalization layers receives All the feature maps output by the output of the eleventh convolutional layer, the input of the eleventh activation layer receive all the feature maps output by the output of the eleventh normalization layer, and the input of the sixth residual block receives the tenth All the feature maps output by the output of the first activation layer, the input of the twelfth convolutional layer receives all the feature maps output by the output of the sixth residual block, and the input of the twelfth batch normalization layer receives the twelfth volume All the feature maps output by the output of the product layer, the input of the twelfth activation layer receives all the feature maps output by the output of the twelfth batch of normalization layers, and the output of the twelfth activation layer is the neural network block where it is located The output terminal of ; wherein, the size of the convolution kernel of the eleventh convolution layer and the twelfth convolution layer in the first and second fusion neural network blocks are both 3×3, and the number of convolution kernels is 256. The zero padding parameters are all 1, the activation methods of the eleventh activation layer and the twelfth activation layer in the first and second fusion neural network blocks are both "Relu", the first and second fusion The eleventh batch of normalization layers, the twelfth batch of normalization layers, the eleventh activation layer, the twelfth activation layer, and the sixth residual block in the neural network block each output 256 feature maps, and the third fusion The size of the convolution kernel of the eleventh convolutional layer and the twelfth convolutional layer in the neural network block is 3×3, the number of convolution kernels is 128, and the zero-padding parameters are all 1. The third fusion neural The activation modes of the eleventh activation layer and the twelfth activation layer in the network block are both "Relu", and the eleventh batch normalization layer, the twelfth batch normalization layer, the eleventh batch normalization layer in the third fusion neural network block The respective outputs of the activation layer, the twelfth activation layer, and the sixth residual block output 128 feature maps, and the convolution kernels of the eleventh convolutional layer and the twelfth convolutional layer in the fourth fusion neural network block The size is 3×3, the number of convolution kernels is 64, and the zero-padding parameters are all 1. The activation methods of the eleventh activation layer and the twelfth activation layer in the fourth fusion neural network block are both "Relu ", the output terminals of the eleventh batch of normalization layers, the twelfth batch of normalization layers, the eleventh activation layer, the twelfth activation layer and the sixth residual block in the fourth fusion neural network block output 64 features In the figure, the size of the convolution kernel of the eleventh convolution layer and the twelfth convolution layer in the fifth fusion neural network block is 3×3, the number of convolution kernels is 32, and the zero-padding parameters are all 1 , the activation modes of the eleventh activation layer and the twelfth activation layer in the fifth fusion neural network block are both "Relu", the eleventh batch of normalization layers, the twelfth batch of activation layers in the fifth fusion neural network block The respective outputs of the normalization layer, the eleventh activation layer, the twelfth activation layer and the sixth residual block output 32 feature maps.
所述的步骤1_2中,第1个和第2个反卷积层的卷积核大小均为2×2、卷积核个数均为256、步长均为2、补零参数均为0,第3个反卷积层的卷积核大小为2×2、卷积核个数为128、步长为2、补零参数为0,第4个反卷积层的卷积核大小为2×2、卷积核个数为64、步长为2、补零参数为0。In the above step 1_2, the convolution kernel sizes of the first and second deconvolution layers are both 2×2, the number of convolution kernels is 256, the step size is 2, and the zero padding parameters are all 0 , the convolution kernel size of the third deconvolution layer is 2×2, the number of convolution kernels is 128, the step size is 2, and the zero padding parameter is 0. The convolution kernel size of the fourth deconvolution layer is 2×2, the number of convolution kernels is 64, the step size is 2, and the zero padding parameter is 0.
所述的步骤1_2中,5个子输出层的结构相同,其由第十三卷积层组成;其中,第十三卷积层的卷积核大小为1×1、卷积核个数为2、补零参数为0。In the step 1_2, the structure of the five sub-output layers is the same, which is composed of the thirteenth convolutional layer; wherein, the size of the convolutional kernel of the thirteenth convolutional layer is 1×1, and the number of convolutional kernels is 2 , The zero padding parameter is 0.
与现有技术相比,本发明的优点在于:Compared with the prior art, the present invention has the advantages of:
1)本发明方法构建的卷积神经网络,实现了端到端的显著性物体检测,易于训练,方便快捷;使用训练集中的彩色真实物体图像和对应的深度图像输入到卷积神经网络中进行训练,得到卷积神经网络训练模型;再将待显著性检测的彩色真实物体图像和对应的深度图像输入到卷积神经网络训练模型中,预测得到彩色真实物体图像对应的预测显著性检测图像,由于本发明方法在构建卷积神经网络时结合了残差块和反卷积层的特点,因此能够在加深卷积神经网络训练模型的同时,并提升了卷积神经网络训练模型的预测准确率。1) The convolutional neural network constructed by the method of the present invention realizes end-to-end salient object detection, is easy to train, and is convenient and quick; use the color real object images and corresponding depth images in the training set to input them into the convolutional neural network for training , to obtain the convolutional neural network training model; then input the color real object image and the corresponding depth image to be saliency detection into the convolutional neural network training model, and predict the predicted saliency detection image corresponding to the color real object image, because The method of the present invention combines the characteristics of the residual block and the deconvolution layer when constructing the convolutional neural network, so while deepening the convolutional neural network training model, the prediction accuracy of the convolutional neural network training model is improved.
2)本发明方法在利用深度信息的时候采用后融合的方式,将在编码层对应的深度信息和彩色图信息与对应译码层进行级联(concatenation),避免了前融合在编码阶段加入噪声信息,同时在卷积神经网络训练模型训练的时候能够充分地学习到彩色图信息和深度信息的互补信息,进而在训练集与测试集上都能得到较好效果。2) The method of the present invention adopts a post-fusion mode when utilizing depth information, and concatenates (concatenation) the depth information and color map information corresponding to the encoding layer with the corresponding decoding layer, avoiding the addition of noise in the encoding stage of the former fusion At the same time, the complementary information of color image information and depth information can be fully learned during the training of the convolutional neural network training model, and better results can be obtained in both the training set and the test set.
3)本发明采用了多尺度监督(multi-scale Supervision),即通过反卷积层使得物体的空间细节信息能够在上采样的过程中得到优化,并在不同的尺寸输出预测图,并用相对应尺寸的标签图进行监督,能够指导卷积神经网络训练模型逐步地构建显著性检测预测图,从而使得在训练集和测试集上得到了更好的效果。3) The present invention adopts multi-scale supervision (multi-scale Supervision), that is, through the deconvolution layer, the spatial detail information of the object can be optimized in the process of upsampling, and the prediction map is output in different sizes, and the corresponding Supervised by the label map of the same size, it can guide the convolutional neural network training model to gradually build a saliency detection prediction map, so that better results are obtained on the training set and test set.
附图说明Description of drawings
图1为本发明方法构建的卷积神经网络的组成结构示意图;Fig. 1 is the composition structure schematic diagram of the convolutional neural network that the inventive method builds;
图2a为利用本发明方法对真实物体图像数据库NLPR测试集中的每幅彩色真实物体图像进行预测,反映本发明方法的显著性检测效果的类准确率召回率曲线;Fig. 2 a is to utilize the method of the present invention to predict each color real object image in the real object image database NLPR test set, reflecting the class accuracy-recall rate curve of the significance detection effect of the inventive method;
图2b为利用本发明方法对真实物体图像数据库NLPR测试集中的每幅彩色真实物体图像进行预测,反映本发明方法的显著性检测效果的平均绝对误差;Fig. 2b is to use the method of the present invention to predict each color real object image in the real object image database NLPR test set, reflecting the average absolute error of the significance detection effect of the method of the present invention;
图2c为利用本发明方法对真实物体图像数据库NLPR测试集中的每幅彩色真实物体图像进行预测,反映本发明方法的显著性检测效果的F度量值;Fig. 2c is to use the method of the present invention to predict each color real object image in the real object image database NLPR test set, reflecting the F measure value of the significance detection effect of the method of the present invention;
图3a为同一场景的第1幅原始的彩色真实物体图像;Figure 3a is the first original color real object image of the same scene;
图3b为图3a对应的深度图像;Figure 3b is a depth image corresponding to Figure 3a;
图3c为利用本发明方法对图3a进行预测得到的预测显著性检测图像;Fig. 3c is the predicted saliency detection image obtained by predicting Fig. 3a by using the method of the present invention;
图4a为同一场景的第2幅原始的彩色真实物体图像;Figure 4a is the second original color real object image of the same scene;
图4b为图4a对应的深度图像;Figure 4b is the depth image corresponding to Figure 4a;
图4c为利用本发明方法对图4a进行预测得到的预测显著性检测图像;Fig. 4c is the prediction saliency detection image obtained by predicting Fig. 4a by using the method of the present invention;
图5a为同一场景的第3幅原始的彩色真实物体图像;Figure 5a is the third original color real object image of the same scene;
图5b为图5a对应的深度图像;Figure 5b is the depth image corresponding to Figure 5a;
图5c为利用本发明方法对图5a进行预测得到的预测显著性检测图像;Fig. 5c is a prediction saliency detection image obtained by predicting Fig. 5a using the method of the present invention;
图6a为同一场景的第4幅原始的彩色真实物体图像;Figure 6a is the fourth original color real object image of the same scene;
图6b为图6a对应的深度图像;Figure 6b is the depth image corresponding to Figure 6a;
图6c为利用本发明方法对图6a进行预测得到的预测显著性检测图像。Fig. 6c is a predicted saliency detection image obtained by predicting Fig. 6a using the method of the present invention.
具体实施方式Detailed ways
以下结合附图实施例对本发明作进一步详细描述。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.
本发明提出的一种基于残差网络和深度信息融合的显著性检测方法,其包括训练阶段和测试阶段两个过程。A saliency detection method based on residual network and deep information fusion proposed by the present invention includes two processes of a training phase and a testing phase.
所述的训练阶段过程的具体步骤为:The specific steps of the described training phase process are:
步骤1_1:选取Q幅原始的彩色真实物体图像及每幅原始的彩色真实物体图像对应的深度图像和真实显著性检测标签图像,并构成训练集,将训练集中的第q幅原始的彩色真实物体图像及其对应的深度图像和真实显著性检测标签图像对应记为{Iq(i,j)}、{Dq(i,j)}、其中,Q为正整数,Q≥200,如取Q=367,q为正整数,q的初始值为1,1≤q≤Q,1≤i≤W,1≤j≤H,W表示{Iq(i,j)}、{Dq(i,j)}、的宽度,H表示{Iq(i,j)}、{Dq(i,j)}、的高度,W和H均能够被2整除,如取W=512、H=512,{Iq(i,j)}为RGB彩色图像,Iq(i,j)表示{Iq(i,j)}中坐标位置为(i,j)的像素点的像素值,{Dq(i,j)}为单通道的深度图像,Dq(i,j)表示{Dq(i,j)}中坐标位置为(i,j)的像素点的像素值,表示中坐标位置为(i,j)的像素点的像素值;在此,原始的彩色真实物体图像直接选用数据库NLPR训练集中的800幅图像。Step 1_1: Select Q original color real object images and the corresponding depth images and true saliency detection label images of each original color real object image, and form a training set, and the qth original color real object in the training set The image and its corresponding depth image and the real saliency detection label image are correspondingly denoted as {I q (i,j)}, {D q (i,j)}, Among them, Q is a positive integer, Q≥200, such as Q=367, q is a positive integer, the initial value of q is 1, 1≤q≤Q, 1≤i≤W, 1≤j≤H, W means { I q (i,j)}, {D q (i,j)}, The width of , H represents {I q (i,j)}, {D q (i,j)}, height, both W and H can be divisible by 2, such as taking W=512, H=512, {I q (i,j)} is an RGB color image, and I q (i,j) means {I q (i, The pixel value of the pixel whose coordinate position is (i,j) in j)}, {D q (i,j)} is a single-channel depth image, D q (i,j) means {D q (i,j) )} in the pixel value of the pixel whose coordinate position is (i, j), express The middle coordinate position is the pixel value of the pixel point (i, j); here, the original color real object image directly selects 800 images in the database NLPR training set.
步骤1_2:构建卷积神经网络:如图1所示,该卷积神经网络包含输入层、隐层、输出层,输入层包括RGB图输入层和深度图输入层,隐层包括5个RGB图神经网络块、4个RGB图最大池化层(Maxpooling,Pool)、5个深度图神经网络块、4个深度图最大池化层、5个级联层、5个融合神经网络块、4个反卷积层,输出层包括5个子输出层;其中,5个RGB图神经网络块和4个RGB图最大池化层构成RGB图的编码结构,5个深度图神经网络块和4个深度图最大池化层构成深度图的编码结构,RGB图的编码结构和深度图的编码结构构成卷积神经网络的编码层,5个级联层、5个融合神经网络块和4个反卷积层构成卷积神经网络的译码层。Step 1_2: Construct a convolutional neural network: As shown in Figure 1, the convolutional neural network includes an input layer, a hidden layer, and an output layer. The input layer includes an RGB image input layer and a depth image input layer, and the hidden layer includes 5 RGB images. Neural network block, 4 RGB image maximum pooling layers (Maxpooling, Pool), 5 depth image neural network blocks, 4 depth image maximum pooling layers, 5 cascade layers, 5 fusion neural network blocks, 4 Deconvolution layer, the output layer includes 5 sub-output layers; among them, 5 RGB image neural network blocks and 4 RGB image maximum pooling layers constitute the encoding structure of RGB images, 5 depth image neural network blocks and 4 depth images The maximum pooling layer constitutes the encoding structure of the depth map, the encoding structure of the RGB image and the encoding structure of the depth map constitute the encoding layer of the convolutional neural network, 5 cascade layers, 5 fused neural network blocks and 4 deconvolution layers Constitutes the decoding layer of the convolutional neural network.
对于RGB图输入层,其输入端接收一幅训练用RGB彩色图像的R通道分量、G通道分量和B通道分量,其输出端输出训练用RGB彩色图像的R通道分量、G通道分量和B通道分量给隐层;其中,要求训练用RGB彩色图像的宽度为W且高度为H。For the RGB image input layer, its input terminal receives the R channel component, G channel component and B channel component of a RGB color image for training, and its output terminal outputs the R channel component, G channel component and B channel component of the RGB color image for training The component is given to the hidden layer; among them, the width of the RGB color image required for training is W and the height is H.
对于深度图输入层,其输入端接收RGB图输入层的输入端接收的训练用RGB彩色图像对应的训练用深度图像,其输出端输出训练用深度图像给隐层;其中,训练用深度图像的宽度为W且高度为H。For the depth image input layer, its input terminal receives the training depth image corresponding to the training RGB color image received by the input end of the RGB image input layer, and its output terminal outputs the training depth image to the hidden layer; wherein, the training depth image The width is W and the height is H.
对于第1个RGB图神经网络块,其输入端接收RGB图输入层的输出端输出的训练用RGB彩色图像的R通道分量、G通道分量和B通道分量,其输出端输出32幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为CP1。For the first RGB image neural network block, its input terminal receives the R channel component, G channel component and B channel component of the training RGB color image output by the output terminal of the RGB image input layer, and its output terminal outputs 32 widths of W And a feature map with a height of H, the set of all output feature maps is denoted as CP 1 .
对于第1个RGB图最大池化层,其输入端接收CP1中的所有特征图,其输出端输出32幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC1。For the first RGB image maximum pooling layer, its input terminal receives all feature maps in CP 1 , and its output terminal outputs 32 widths of and the height is The feature map of , denote the set of all output feature maps as ZC 1 .
对于第2个RGB图神经网络块,其输入端接收ZC1中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP2。For the second RGB image neural network block, its input terminal receives all feature maps in ZC 1 , and its output terminal outputs 64 widths of and the height is The feature map of , the set of all output feature maps is denoted as CP 2 .
对于第2个RGB图最大池化层,其输入端接收CP2中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC2。For the second RGB image maximum pooling layer, its input receives all feature maps in CP 2 , and its output outputs 64 widths of and the height is The feature map of , denote the set of all output feature maps as ZC 2 .
对于第3个RGB图神经网络块,其输入端接收ZC2中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP3。For the third RGB image neural network block, its input terminal receives all feature maps in ZC 2 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all output feature maps is denoted as CP 3 .
对于第3个RGB图最大池化层,其输入端接收CP3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC3。For the third RGB image maximum pooling layer, its input terminal receives all feature maps in CP 3 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all output feature maps is denoted as ZC 3 .
对于第4个RGB图神经网络块,其输入端接收ZC3中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP4。For the fourth RGB image neural network block, its input terminal receives all feature maps in ZC 3 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is denoted as CP 4 .
对于第4个RGB图最大池化层,其输入端接收CP4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为ZC4。For the fourth RGB image maximum pooling layer, its input terminal receives all feature maps in CP 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is denoted as ZC 4 .
对于第5个RGB图神经网络块,其输入端接收ZC4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为CP5。For the fifth RGB image neural network block, its input terminal receives all feature maps in ZC 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is denoted as CP 5 .
对于第1个深度图神经网络块,其输入端接收深度图输入层的输出端输出的训练用深度图像,其输出端输出32幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为DP1。For the first depth map neural network block, its input terminal receives the training depth image output from the output terminal of the depth map input layer, and its output terminal outputs 32 feature maps with a width of W and a height of H, and all the output features A collection of graphs is denoted as DP 1 .
对于第1个深度图最大池化层,其输入端接收DP1中的所有特征图,其输出端输出32幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC1。For the first depth map maximum pooling layer, its input terminal receives all feature maps in DP 1 , and its output terminal outputs 32 widths of and the height is The feature map of , the set of all output feature maps is recorded as DC 1 .
对于第2个深度图神经网络块,其输入端接收DC1中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP2。For the second depth map neural network block, its input terminal receives all feature maps in DC 1 , and its output terminal outputs 64 widths of and the height is The feature map of , the set of all output feature maps is recorded as DP 2 .
对于第2个深度图最大池化层,其输入端接收DP2中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC2。For the second depth map maximum pooling layer, its input receives all feature maps in DP 2 , and its output outputs 64 widths of and the height is The feature map of , the set of all output feature maps is recorded as DC 2 .
对于第3个深度图神经网络块,其输入端接收DC2中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP3。For the third depth map neural network block, its input terminal receives all feature maps in DC 2 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all output feature maps is recorded as DP 3 .
对于第3个深度图最大池化层,其输入端接收DP3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC3。For the third depth map maximum pooling layer, its input terminal receives all feature maps in DP 3 , and its output terminal outputs 128 widths. and the height is The feature map of , the set of all output feature maps is recorded as DC 3 .
对于第4个深度图神经网络块,其输入端接收DC3中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP4。For the fourth depth map neural network block, its input terminal receives all feature maps in DC 3 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as DP 4 .
对于第4个深度图最大池化层,其输入端接收DP4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DC4。For the fourth depth map maximum pooling layer, its input terminal receives all feature maps in DP 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as DC 4 .
对于第5个深度图神经网络块,其输入端接收DC4中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为DP5。For the fifth depth map neural network block, its input terminal receives all feature maps in DC 4 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as DP 5 .
对于第1个级联(concatenation)层,其输入端接收CP5中的所有特征图和DP5中的所有特征图,对CP5中的所有特征图和DP5中的所有特征图进行叠加,其输出端输出512幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con1。For the first concatenation layer, its input receives all feature maps in CP 5 and all feature maps in DP 5 , and superimposes all feature maps in CP 5 and all feature maps in DP 5 , Its output terminal outputs 512 widths as and the height is The feature map of , the set of all output feature maps is denoted as Con 1 .
对于第1个融合神经网络块,其输入端接收Con1中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH1。For the first fusion neural network block, its input terminal receives all feature maps in Con 1 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as RH 1 .
对于第1个反卷积层,其输入端接收RH1中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FJ1。For the first deconvolution layer, its input receives all feature maps in RH 1 , and its output outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as FJ 1 .
对于第2个级联层,其输入端接收FJ1中的所有特征图、CP4中的所有特征图和DP4中的所有特征图,对FJ1中的所有特征图、CP4中的所有特征图和DP4中的所有特征图进行叠加,其输出端输出768幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con2。For the second cascade layer, its input receives all feature maps in FJ 1 , all feature maps in CP 4 , and all feature maps in DP 4 , for all feature maps in FJ 1 , all feature maps in CP 4 The feature map and all feature maps in DP 4 are superimposed, and the output terminal outputs 768 widths of and the height is The feature map of , the set of all output feature maps is denoted as Con 2 .
对于第2个融合神经网络块,其输入端接收Con2中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH2。For the second fusion neural network block, its input terminal receives all feature maps in Con 2 , and its output terminal outputs 256 widths of and the height is The feature map of , the set of all output feature maps is recorded as RH 2 .
对于第2个反卷积层,其输入端接收RH2中的所有特征图,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FJ2。For the second deconvolution layer, its input receives all the feature maps in RH 2 , and its output outputs 256 widths of and the height is The feature map of , and the set of all output feature maps is denoted as FJ 2 .
对于第3个级联层,其输入端接收FJ2中的所有特征图、CP3中的所有特征图和DP3中的所有特征图,对FJ2中的所有特征图、CP3中的所有特征图和DP3中的所有特征图进行叠加,其输出端输出512幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con3。For the third cascade layer, its input receives all feature maps in FJ 2 , all feature maps in CP 3 , and all feature maps in DP 3 , for all feature maps in FJ 2 , all feature maps in CP 3 The feature map and all feature maps in DP 3 are superimposed, and the output terminal outputs 512 widths of and the height is The feature map of , the set of all output feature maps is denoted as Con 3 .
对于第3个融合神经网络块,其输入端接收Con3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH3。For the third fusion neural network block, its input terminal receives all the feature maps in Con 3 , and its output terminal outputs 128 widths of and the height is The feature map of , the set of all output feature maps is recorded as RH 3 .
对于第3个反卷积层,其输入端接收RH3中的所有特征图,其输出端输出128幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为FJ3。For the third deconvolution layer, its input receives all feature maps in RH 3 , and its output outputs 128 widths of and the height is The feature map of , the set of all output feature maps is recorded as FJ 3 .
对于第4个级联层,其输入端接收FJ3中的所有特征图、CP2中的所有特征图和DP2中的所有特征图,对FJ3中的所有特征图、CP2中的所有特征图和DP2中的所有特征图进行叠加,其输出端输出256幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Con4。For the fourth cascade layer, its input receives all feature maps in FJ 3 , all feature maps in CP 2 , and all feature maps in DP 2 , for all feature maps in FJ 3 , all feature maps in CP 2 The feature map and all feature maps in DP 2 are superimposed, and the output terminal outputs 256 widths of and the height is The feature map of , and the set of all output feature maps is denoted as Con 4 .
对于第4个融合神经网络块,其输入端接收Con4中的所有特征图,其输出端输出64幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为RH4。For the fourth fusion neural network block, its input terminal receives all feature maps in Con 4 , and its output terminal outputs 64 widths of and the height is The feature map of , the set of all output feature maps is recorded as RH 4 .
对于第4个反卷积层,其输入端接收RH4中的所有特征图,其输出端输出64幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为FJ4。For the fourth deconvolution layer, its input terminal receives all feature maps in RH 4 , and its output terminal outputs 64 feature maps with width W and height H, and the set of all output feature maps is recorded as FJ 4 .
对于第5个级联层,其输入端接收FJ4中的所有特征图、CP1中的所有特征图和DP1中的所有特征图,对FJ4中的所有特征图、CP1中的所有特征图和DP1中的所有特征图进行叠加,其输出端输出128幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为Con5。For the fifth cascaded layer, its input receives all feature maps in FJ 4 , all feature maps in CP 1 , and all feature maps in DP 1 , for all feature maps in FJ 4 , all feature maps in CP 1 The feature map is superimposed with all feature maps in DP 1 , and its output terminal outputs 128 feature maps with a width of W and a height of H, and the set of all output feature maps is denoted as Con 5 .
对于第5个融合神经网络块,其输入端接收Con5中的所有特征图,其输出端输出32幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为RH5。For the fifth fused neural network block, its input terminal receives all feature maps in Con 5 , and its output terminal outputs 32 feature maps with a width of W and a height of H, and the set of all output feature maps is recorded as RH 5 .
对于第1个子输出层,其输入端接收RH1中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out1,Out1中的其中一幅特征图(第2幅特征图)为显著性检测预测图。For the first sub-output layer, its input terminal receives all feature maps in RH 1 , and its output terminal outputs 2 widths of and the height is The feature map of , the set of all the output feature maps is recorded as Out 1 , and one of the feature maps (the second feature map) in Out 1 is the saliency detection prediction map.
对于第2个子输出层,其输入端接收RH2中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out2,Out2中的其中一幅特征图(第2幅特征图)为显著性检测预测图。For the second sub-output layer, its input receives all the feature maps in RH 2 , and its output outputs 2 widths of and the height is The feature map of , the set of all output feature maps is recorded as Out 2 , and one of the feature maps in Out 2 (the second feature map) is the saliency detection prediction map.
对于第3个子输出层,其输入端接收RH3中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out3,Out3中的其中一幅特征图(第2幅特征图)为显著性检测预测图。For the third sub-output layer, its input receives all the feature maps in RH 3 , and its output outputs 2 widths of and the height is The feature map of the output feature map is recorded as Out 3 , and one of the feature maps (the second feature map) in Out 3 is the saliency detection prediction map.
对于第4个子输出层,其输入端接收RH4中的所有特征图,其输出端输出2幅宽度为且高度为的特征图,将输出的所有特征图构成的集合记为Out4,Out4中的其中一幅特征图(第2幅特征图)为显著性检测预测图。For the fourth sub-output layer, its input receives all feature maps in RH 4 , and its output outputs 2 widths of and the height is The feature map of , the set of all the output feature maps is recorded as Out 4 , and one of the feature maps (the second feature map) in Out 4 is the saliency detection prediction map.
对于第5个子输出层,其输入端接收RH5中的所有特征图,其输出端输出2幅宽度为W且高度为H的特征图,将输出的所有特征图构成的集合记为Out5,Out5中的其中一幅特征图(第2幅特征图)为显著性检测预测图。For the fifth sub-output layer, its input terminal receives all feature maps in RH 5 , and its output terminal outputs two feature maps with a width of W and a height of H, and the set of all output feature maps is recorded as Out 5 , One of the feature maps (the second feature map) in Out 5 is the saliency detection prediction map.
步骤1_3:将训练集中的每幅原始的彩色真实物体图像作为训练用RGB彩色图像,将训练集中的每幅原始的彩色真实物体图像对应的深度图像作为训练用深度图像,输入到卷积神经网络中进行训练,得到训练集中的每幅原始的彩色真实物体图像对应的5幅显著性检测预测图,将{Iq(i,j)}对应的5幅显著性检测预测图构成的集合记为 Step 1_3: Use each original color real object image in the training set as the RGB color image for training, use the depth image corresponding to each original color real object image in the training set as the depth image for training, and input it to the convolutional neural network training in , to obtain 5 saliency detection prediction maps corresponding to each original color real object image in the training set, and record the set of 5 saliency detection prediction maps corresponding to {I q (i,j)} as
步骤1_4:对训练集中的每幅原始的彩色真实物体图像对应的真实显著性检测标签图像进行5种不同尺寸大小的缩放处理,得到宽度为且高度为的图像、宽度为且高度为的图像、宽度为且高度为的图像、宽度为且高度为的图像、宽度为W且高度为H的图像,将{Iq(i,j)}对应的真实显著性检测图像经缩放处理后得到的5幅图像构成的集合记为 Step 1_4: Scale the real saliency detection label image corresponding to each original color real object image in the training set to 5 different sizes, and obtain a width of and the height is image with a width of and the height is image with a width of and the height is image with a width of and the height is image, the image whose width is W and the height is H, and the set of 5 images obtained by scaling the real saliency detection image corresponding to {I q (i, j)} is denoted as
步骤1_5:计算训练集中的每幅原始的彩色真实物体图像对应的5幅显著性检测预测图构成的集合与该原始的彩色真实物体图像对应的真实显著性检测图像经缩放处理后得到的5幅图像构成的集合之间的损失函数值,将与之间的损失函数值记为采用分类交叉熵(categorical crossentropy)获得。Step 1_5: Calculate the set of 5 saliency detection prediction maps corresponding to each original color real object image in the training set and the 5 real saliency detection images corresponding to the original color real object image after scaling processing The loss function value between the set of images will be and The loss function value between is denoted as Obtained using categorical crossentropy.
步骤1_6:重复执行步骤1_3至步骤1_5共V次,得到卷积神经网络训练模型,并共得到Q×V个损失函数值;然后从Q×V个损失函数值中找出值最小的损失函数值;接着将值最小的损失函数值对应的权值矢量和偏置项对应作为卷积神经网络训练模型的最优权值矢量和最优偏置项,对应记为Wbest和bbest;其中,V>1,在本实施例中取V=300。Step 1_6: Repeat step 1_3 to step 1_5 for a total of V times to obtain the convolutional neural network training model, and obtain a total of Q×V loss function values; then find the loss function with the smallest value from the Q×V loss function values value; then use the weight vector and bias item corresponding to the loss function value with the smallest value as the optimal weight vector and the optimal bias item of the convolutional neural network training model, correspondingly recorded as W best and b best ; where , V>1, V=300 in this embodiment.
所述的测试阶段过程的具体步骤为:The specific steps of the described testing phase process are:
步骤2_1:令表示待显著性检测的彩色真实物体图像,将对应的深度图像记为其中,1≤i'≤W',1≤j'≤H',W'表示和的宽度,H'表示和的高度,表示中坐标位置为(i',j')的像素点的像素值,表示中坐标位置为(i',j')的像素点的像素值。Step 2_1: Order Represents a color real object image to be saliency detected, the The corresponding depth image is recorded as Among them, 1≤i'≤W', 1≤j'≤H', W' means and The width, H' means and the height of, express The pixel value of the pixel point whose coordinate position is (i', j'), express The pixel value of the pixel whose coordinate position is (i', j').
步骤2_2:将的R通道分量、G通道分量和B通道分量以及输入到卷积神经网络训练模型中,并利用Wbest和bbest进行预测,得到对应的5幅不同尺寸大小的预测显著性检测图像,将尺寸大小与的尺寸大小一致的预测显著性检测图像作为对应的最终预测显著性检测图像,并记为其中,表示中坐标位置为(i',j')的像素点的像素值。Step 2_2: Put The R channel component, G channel component and B channel component of the Input to the convolutional neural network training model, and use W best and b best to predict, get Corresponding to the predicted saliency detection images of 5 different sizes, the size and The size of the consistent predicted saliency detection image as The corresponding final predicted saliency detection image is denoted as in, express The pixel value of the pixel whose coordinate position is (i', j').
在此具体实施例中,步骤1_2中,第1个RGB图神经网络块和第1个深度图神经网络块的结构相同,其由依次设置的第一卷积层(Convolution,Conv)、第一批标准化层(BatchNormalize,BN)、第一激活层(Activation,Act)、第一残差块(Residual Block,RB)、第二卷积层、第二批标准化层、第二激活层组成,第一卷积层的输入端为其所在的神经网络块的输入端,第一批标准化层的输入端接收第一卷积层的输出端输出的所有特征图,第一激活层的输入端接收第一批标准化层的输出端输出的所有特征图,第一残差块的输入端接收第一激活层的输出端输出的所有特征图,第二卷积层的输入端接收第一残差块的输出端输出的所有特征图,第二批标准化层的输入端接收第二卷积层的输出端输出的所有特征图,第二激活层的输入端接收第二批标准化层的输出端输出的所有特征图,第二激活层的输出端为其所在的神经网络块的输出端;其中,第一卷积层和第二卷积层的卷积核大小(kernel_size)均为3×3、卷积核个数(filters)均为32、补零参数(padding)均为1,第一激活层和第二激活层的激活方式均为“Relu”,第一批标准化层、第二批标准化层、第一激活层、第二激活层和第一残差块各自的输出端输出32幅特征图。In this specific embodiment, in step 1-2, the structure of the first RGB graph neural network block and the first depth graph neural network block are the same, which consists of the first convolutional layer (Convolution, Conv), the first The batch normalization layer (BatchNormalize, BN), the first activation layer (Activation, Act), the first residual block (Residual Block, RB), the second convolution layer, the second batch normalization layer, the second activation layer, the first The input end of a convolutional layer is the input end of the neural network block where it is located, the input end of the first batch of normalization layers receives all the feature maps output by the output end of the first convolutional layer, and the input end of the first activation layer receives the first All the feature maps output by the output of a batch of normalization layers, the input of the first residual block receives all the feature maps output by the output of the first activation layer, and the input of the second convolutional layer receives the output of the first residual block All the feature maps output by the output end, the input end of the second batch of normalization layer receives all the feature maps output by the output end of the second convolutional layer, and the input end of the second activation layer receives all the output end output end of the second batch normalization layer Feature map, the output end of the second activation layer is the output end of the neural network block where it is located; wherein, the convolution kernel size (kernel_size) of the first convolution layer and the second convolution layer are both 3×3, convolution The number of cores (filters) is 32, the padding parameters are 1, the activation mode of the first activation layer and the second activation layer are both "Relu", the first batch of normalization layers, the second batch of normalization layers, The respective output terminals of the first activation layer, the second activation layer and the first residual block output 32 feature maps.
在此具体实施例中,第2个RGB图神经网络块和第2个深度图神经网络块的结构相同,其由依次设置的第三卷积层、第三批标准化层、第三激活层、第二残差块、第四卷积层、第四批标准化层、第四激活层组成,第三卷积层的输入端为其所在的神经网络块的输入端,第三批标准化层的输入端接收第三卷积层的输出端输出的所有特征图,第三激活层的输入端接收第三批标准化层的输出端输出的所有特征图,第二残差块的输入端接收第三激活层的输出端输出的所有特征图,第四卷积层的输入端接收第二残差块的输出端输出的所有特征图,第四批标准化层的输入端接收第四卷积层的输出端输出的所有特征图,第四激活层的输入端接收第四批标准化层的输出端输出的所有特征图,第四激活层的输出端为其所在的神经网络块的输出端;其中,第三卷积层和第四卷积层的卷积核大小均为3×3、卷积核个数均为64、补零参数均为1,第三激活层和第四激活层的激活方式均为“Relu”,第三批标准化层、第四批标准化层、第三激活层、第四激活层和第二残差块各自的输出端输出64幅特征图。In this specific embodiment, the structure of the 2nd RGB graph neural network block and the 2nd depth graph neural network block are the same, and it consists of the third convolutional layer, the third batch normalization layer, the third activation layer, The second residual block, the fourth convolutional layer, the fourth batch of normalization layers, and the fourth activation layer are composed. The input of the third convolutional layer is the input of the neural network block where it is located, and the input of the third batch of normalization layers is The input end of the third convolutional layer receives all the feature maps output by the output end of the third convolutional layer, the input end of the third activation layer receives all the feature maps output by the output end of the third normalization layer, and the input end of the second residual block receives the third activation All feature maps output by the output of the layer, the input of the fourth convolutional layer receives all the feature maps output by the output of the second residual block, and the input of the fourth batch normalization layer receives the output of the fourth convolutional layer All the feature maps of the output, the input of the fourth activation layer receives all the feature maps output by the output of the fourth batch of normalization layers, and the output of the fourth activation layer is the output of the neural network block where it is located; wherein, the third The convolution kernel size of the convolution layer and the fourth convolution layer are both 3×3, the number of convolution kernels is 64, and the zero padding parameters are 1. The activation methods of the third activation layer and the fourth activation layer are both "Relu", the third batch of normalization layer, the fourth batch of normalization layer, the third activation layer, the fourth activation layer and the second residual block respectively output 64 feature maps.
在此具体实施例中,第3个RGB图神经网络块和第3个深度图神经网络块的结构相同,其由依次设置的第五卷积层、第五批标准化层、第五激活层、第三残差块、第六卷积层、第六批标准化层、第六激活层组成,第五卷积层的输入端为其所在的神经网络块的输入端,第五批标准化层的输入端接收第五卷积层的输出端输出的所有特征图,第五激活层的输入端接收第五批标准化层的输出端输出的所有特征图,第三残差块的输入端接收第五激活层的输出端输出的所有特征图,第六卷积层的输入端接收第三残差块的输出端输出的所有特征图,第六批标准化层的输入端接收第六卷积层的输出端输出的所有特征图,第六激活层的输入端接收第六批标准化层的输出端输出的所有特征图,第六激活层的输出端为其所在的神经网络块的输出端;其中,第五卷积层和第六卷积层的卷积核大小均为3×3、卷积核个数均为128、补零参数均为1,第五激活层和第六激活层的激活方式均为“Relu”,第五批标准化层、第六批标准化层、第五激活层、第六激活层和第三残差块各自的输出端输出128幅特征图。In this specific embodiment, the structure of the 3rd RGB graph neural network block and the 3rd depth graph neural network block are the same, and it consists of the fifth convolution layer, the fifth batch normalization layer, the fifth activation layer, The third residual block, the sixth convolutional layer, the sixth batch of normalization layers, and the sixth activation layer, the input of the fifth convolutional layer is the input of the neural network block where it is located, and the input of the fifth batch of normalization layers The input end of the fifth convolutional layer receives all the feature maps output by the output end of the fifth convolutional layer, the input end of the fifth activation layer receives all the feature maps output by the output end of the fifth batch of normalization layers, and the input end of the third residual block receives the fifth activation All feature maps output by the output of the layer, the input of the sixth convolutional layer receives all the feature maps output by the output of the third residual block, and the input of the sixth batch normalization layer receives the output of the sixth convolutional layer All the feature maps of the output, the input of the sixth activation layer receives all the feature maps output by the output of the sixth batch of normalization layers, and the output of the sixth activation layer is the output of the neural network block where it is located; wherein, the fifth The convolution kernel size of the convolution layer and the sixth convolution layer are both 3×3, the number of convolution kernels is 128, and the zero padding parameters are 1. The activation methods of the fifth activation layer and the sixth activation layer are both "Relu", the fifth batch of normalization layer, the sixth batch of normalization layer, the fifth activation layer, the sixth activation layer and the third residual block respectively output 128 feature maps.
在此具体实施例中,第4个RGB图神经网络块和第4个深度图神经网络块的结构相同,其由依次设置的第七卷积层、第七批标准化层、第七激活层、第四残差块、第八卷积层、第八批标准化层、第八激活层组成,第七卷积层的输入端为其所在的神经网络块的输入端,第七批标准化层的输入端接收第七卷积层的输出端输出的所有特征图,第七激活层的输入端接收第七批标准化层的输出端输出的所有特征图,第四残差块的输入端接收第七激活层的输出端输出的所有特征图,第八卷积层的输入端接收第四残差块的输出端输出的所有特征图,第八批标准化层的输入端接收第八卷积层的输出端输出的所有特征图,第八激活层的输入端接收第八批标准化层的输出端输出的所有特征图,第八激活层的输出端为其所在的神经网络块的输出端;其中,第七卷积层和第八卷积层的卷积核大小均为3×3、卷积核个数均为256、补零参数均为1,第七激活层和第八激活层的激活方式均为“Relu”,第七批标准化层、第八批标准化层、第七激活层、第八激活层和第四残差块各自的输出端输出256幅特征图。In this specific embodiment, the structure of the 4th RGB graph neural network block and the 4th depth graph neural network block is the same, and it consists of the seventh convolutional layer, the seventh batch normalization layer, the seventh activation layer, The fourth residual block, the eighth convolutional layer, the eighth normalization layer, and the eighth activation layer, the input of the seventh convolutional layer is the input of the neural network block where it is located, and the input of the seventh normalization layer The terminal receives all the feature maps output by the output of the seventh convolutional layer, the input of the seventh activation layer receives all the feature maps output by the output of the seventh normalization layer, and the input of the fourth residual block receives the seventh activation All feature maps output by the output of the layer, the input of the eighth convolutional layer receives all the feature maps output by the output of the fourth residual block, and the input of the eighth normalization layer receives the output of the eighth convolutional layer All the feature maps of the output, the input of the eighth activation layer receives all the feature maps output by the output of the eighth batch of normalization layers, and the output of the eighth activation layer is the output of the neural network block where it is located; wherein, the seventh The convolution kernel size of the convolution layer and the eighth convolution layer are both 3×3, the number of convolution kernels is 256, and the zero padding parameters are 1. The activation methods of the seventh activation layer and the eighth activation layer are both "Relu", the seventh batch of normalization layer, the eighth batch of normalization layer, the seventh activation layer, the eighth activation layer and the fourth residual block respectively output 256 feature maps.
在此具体实施例中,第5个RGB图神经网络块和第5个深度图神经网络块的结构相同,其由依次设置的第九卷积层、第九批标准化层、第九激活层、第五残差块、第十卷积层、第十批标准化层、第十激活层组成,第九卷积层的输入端为其所在的神经网络块的输入端,第九批标准化层的输入端接收第九卷积层的输出端输出的所有特征图,第九激活层的输入端接收第九批标准化层的输出端输出的所有特征图,第五残差块的输入端接收第九激活层的输出端输出的所有特征图,第十卷积层的输入端接收第五残差块的输出端输出的所有特征图,第十批标准化层的输入端接收第十卷积层的输出端输出的所有特征图,第十激活层的输入端接收第十批标准化层的输出端输出的所有特征图,第十激活层的输出端为其所在的神经网络块的输出端;其中,第九卷积层和第十卷积层的卷积核大小均为3×3、卷积核个数均为256、补零参数均为1,第九激活层和第十激活层的激活方式均为“Relu”,第九批标准化层、第十批标准化层、第九激活层、第十激活层和第五残差块各自的输出端输出256幅特征图。In this specific embodiment, the structure of the 5th RGB graph neural network block and the 5th depth graph neural network block is the same, and it consists of the ninth convolutional layer, the ninth batch normalization layer, the ninth activation layer, The fifth residual block, the tenth convolutional layer, the tenth batch normalization layer, and the tenth activation layer are composed. The input end of the ninth convolutional layer is the input end of the neural network block where it is located, and the input end of the ninth batch normalization layer The terminal receives all the feature maps output by the output terminal of the ninth convolutional layer, the input terminal of the ninth activation layer receives all the feature maps output by the output terminal of the ninth batch normalization layer, and the input terminal of the fifth residual block receives the ninth activation All the feature maps output by the output of the layer, the input of the tenth convolutional layer receives all the feature maps output by the output of the fifth residual block, and the input of the tenth normalization layer receives the output of the tenth convolutional layer All feature maps output, the input end of the tenth activation layer receives all feature maps output by the output end of the tenth batch of normalization layers, and the output end of the tenth activation layer is the output end of the neural network block where it is located; wherein, the ninth The convolution kernel size of the convolutional layer and the tenth convolutional layer are both 3×3, the number of convolution kernels is 256, and the zero padding parameters are all 1. The activation methods of the ninth activation layer and the tenth activation layer are both "Relu", the ninth batch of normalization layer, the tenth batch of normalization layer, the ninth activation layer, the tenth activation layer and the output of the fifth residual block output 256 feature maps.
在此具体实施例中,步骤1_2中,4个RGB图最大池化层和4个深度图最大池化层均为最大池化层,4个RGB图最大池化层和4个深度图最大池化层的池化尺寸(pool_size)均为2、步长(stride)均为2。In this specific embodiment, in step 1-2, the 4 RGB image maximum pooling layers and the 4 depth image maximum pooling layers are all maximum pooling layers, and the 4 RGB image maximum pooling layers and the 4 depth image maximum pooling layers The pooling size (pool_size) of the layer is 2, and the stride is 2.
在此具体实施例中,步骤1_2中,5个融合神经网络块的结构相同,其由依次设置的第十一卷积层、第十一批标准化层、第十一激活层、第六残差块、第十二卷积层、第十二批标准化层、第十二激活层组成,第十一卷积层的输入端为其所在的融合神经网络块的输入端,第十一批标准化层的输入端接收第十一卷积层的输出端输出的所有特征图,第十一激活层的输入端接收第十一批标准化层的输出端输出的所有特征图,第六残差块的输入端接收第十一激活层的输出端输出的所有特征图,第十二卷积层的输入端接收第六残差块的输出端输出的所有特征图,第十二批标准化层的输入端接收第十二卷积层的输出端输出的所有特征图,第十二激活层的输入端接收第十二批标准化层的输出端输出的所有特征图,第十二激活层的输出端为其所在的神经网络块的输出端;其中,第1个和第2个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为256、补零参数均为1,第1个和第2个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第1个和第2个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出256幅特征图,第3个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为128、补零参数均为1,第3个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第3个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出128幅特征图,第4个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为64、补零参数均为1,第4个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第4个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出64幅特征图,第5个融合神经网络块中的第十一卷积层和第十二卷积层的卷积核大小均为3×3、卷积核个数均为32、补零参数均为1,第5个融合神经网络块中的第十一激活层和第十二激活层的激活方式均为“Relu”,第5个融合神经网络块中的第十一批标准化层、第十二批标准化层、第十一激活层、第十二激活层和第六残差块各自的输出端输出32幅特征图。In this specific embodiment, in steps 1-2, the structure of the 5 fused neural network blocks is the same, which consists of the eleventh convolutional layer, the eleventh batch of normalization layers, the eleventh activation layer, and the sixth residual block, the twelfth convolutional layer, the twelfth batch of normalization layers, and the twelfth activation layer. The input end of the eleventh convolutional layer is the input end of the fusion neural network block where it is located. The input of the eleventh convolutional layer receives all the feature maps output by the output of the eleventh activation layer, the input of the eleventh activation layer receives all the feature maps output by the output of the eleventh normalization layer, and the input of the sixth residual block The terminal receives all the feature maps output by the output terminal of the eleventh activation layer, the input terminal of the twelfth convolutional layer receives all the feature maps output by the output terminal of the sixth residual block, and the input terminal of the twelfth batch normalization layer receives All the feature maps output by the output of the twelfth convolutional layer, the input of the twelfth activation layer receive all the feature maps output by the output of the twelfth batch normalization layer, and the output of the twelfth activation layer is where The output end of the neural network block; wherein, the convolution kernel size of the eleventh convolution layer and the twelfth convolution layer in the first and second fusion neural network blocks are both 3×3, convolution kernel The number is 256, and the zero-padding parameters are all 1. The activation methods of the eleventh activation layer and the twelfth activation layer in the first and second fusion neural network blocks are both "Relu". The output terminals of the eleventh batch of normalization layers, the twelfth batch of normalization layers, the eleventh activation layer, the twelfth activation layer, and the sixth residual block in the second fusion neural network block output 256 feature maps, The size of the convolution kernels of the eleventh convolutional layer and the twelfth convolutional layer in the third fusion neural network block are both 3×3, the number of convolution kernels is 128, and the zero-padding parameters are all 1. The activation mode of the eleventh activation layer and the twelfth activation layer in the 3 fused neural network blocks is "Relu", and the 11th batch of normalization layers and the 12th batch of normalization layers in the 3rd fused neural network block , the eleventh activation layer, the twelfth activation layer and the sixth residual block respectively output 128 feature maps, the eleventh convolutional layer and the twelfth convolutional layer in the fourth fusion neural network block The convolution kernel size is 3×3, the number of convolution kernels is 64, and the zero-padding parameters are all 1. The activation method of the eleventh activation layer and the twelfth activation layer in the fourth fusion neural network block Both are "Relu", the respective outputs of the eleventh normalization layer, the twelfth normalization layer, the eleventh activation layer, the twelfth activation layer and the sixth residual block in the fourth fused neural network block Output 64 feature maps, the size of the convolution kernel of the eleventh convolution layer and the twelfth convolution layer in the fifth fused neural network block are both 3×3, the number of convolution kernels is 32, and zero padding The parameters are all 1, the activation modes of the eleventh activation layer and the twelfth activation layer in the fifth fusion neural network block are both "Relu", the eleventh batch of standardized layers in the fifth fusion neural network block, The output terminals of the twelfth batch of normalization layers, the eleventh activation layer, the twelfth activation layer and the sixth residual block output 32 feature maps.
在此具体实施例中,步骤1_2中,第1个和第2个反卷积层的卷积核大小均为2×2、卷积核个数均为256、步长均为2、补零参数均为0,第3个反卷积层的卷积核大小为2×2、卷积核个数为128、步长为2、补零参数为0,第4个反卷积层的卷积核大小为2×2、卷积核个数为64、步长为2、补零参数为0。In this specific embodiment, in step 1_2, the size of the convolution kernel of the first and second deconvolution layers is 2×2, the number of convolution kernels is 256, the step size is 2, and zero padding The parameters are all 0, the convolution kernel size of the third deconvolution layer is 2×2, the number of convolution kernels is 128, the step size is 2, and the zero padding parameter is 0, the volume of the fourth deconvolution layer The size of the product kernel is 2×2, the number of convolution kernels is 64, the step size is 2, and the zero padding parameter is 0.
在此具体实施例中,步骤1_2中,5个子输出层的结构相同,其由第十三卷积层组成;其中,第十三卷积层的卷积核大小为1×1、卷积核个数为2、补零参数为0。In this specific embodiment, in step 1_2, the structure of the 5 sub-output layers is the same, which is composed of the thirteenth convolutional layer; wherein, the convolution kernel size of the thirteenth convolutional layer is 1×1, and the convolution kernel The number is 2, and the zero padding parameter is 0.
为了进一步验证本发明方法的可行性和有效性,进行实验。In order to further verify the feasibility and effectiveness of the method of the present invention, experiments were carried out.
使用基于python的深度学习库Pytorch0.4.1构建本发明方法提出的卷积神经网络的架构。采用真实物体图像数据库NLPR测试集,来分析利用本发明方法预测得到的彩色真实物体图像(取200幅真实物体图像)的显著性检测效果如何。这里,利用评估显著性检测方法的3个常用客观参量作为评价指标,即类准确率召回率曲线(Precision RecallCurve)、平均绝对误差(Mean Absolute Error,MAE)、F度量值(F-Measure)来评价预测显著性检测图像的检测性能。Use python-based deep learning library Pytorch0.4.1 to build the architecture of the convolutional neural network proposed by the method of the present invention. The real object image database NLPR test set is used to analyze the saliency detection effect of the color real object images (taking 200 real object images) predicted by the method of the present invention. Here, three commonly used objective parameters for evaluating the significance detection method are used as evaluation indicators, namely the class precision recall rate curve (Precision Recall Curve), mean absolute error (Mean Absolute Error, MAE), and F-measure value (F-Measure). Evaluate detection performance on predicted saliency detected images.
利用本发明方法对真实物体图像数据库NLPR测试集中的每幅彩色真实物体图像进行预测,得到每幅彩色真实物体图像对应的预测显著性检测图像。反映本发明方法的显著性检测效果的类准确率召回率曲线(PR Curve)如图2a所示,反映本发明方法的显著性检测效果的平均绝对误差(MAE)如图2b所示,值为0.058,反映本发明方法的显著性检测效果的F度量值(F-Measure)如图2c所示,值为0.796。从图2a至图2c中可以看出,按本发明方法得到的彩色真实物体图像的显著性检测结果是好的,表明利用本发明方法来获取彩色真实物体图像对应的预测显著性检测图像是可行且有效的。The method of the invention is used to predict each color real object image in the NLPR test set of the real object image database, and obtain the predicted significance detection image corresponding to each color real object image. The class precision-recall rate curve (PR Curve) reflecting the significance detection effect of the method of the present invention is shown in Figure 2a, and the mean absolute error (MAE) reflecting the significance detection effect of the method of the present invention is shown in Figure 2b, and the value is 0.058, the F-measure value (F-Measure) reflecting the significance detection effect of the method of the present invention is shown in Figure 2c, and the value is 0.796. As can be seen from Fig. 2a to Fig. 2c, the saliency detection result of the color real object image obtained by the method of the present invention is good, indicating that it is feasible to use the method of the present invention to obtain the corresponding prediction saliency detection image of the color real object image and effective.
图3a给出了同一场景的第1幅原始的彩色真实物体图像,图3b给出了图3a对应的深度图像,图3c给出了利用本发明方法对图3a进行预测得到的预测显著性检测图像;图4a给出了同一场景的第2幅原始的彩色真实物体图像,图4b给出了图4a对应的深度图像,图4c给出了利用本发明方法对图4a进行预测得到的预测显著性检测图像;图5a给出了同一场景的第3幅原始的彩色真实物体图像,图5b给出了图5a对应的深度图像,图5c给出了利用本发明方法对图5a进行预测得到的预测显著性检测图像;图6a给出了同一场景的第4幅原始的彩色真实物体图像,图6b给出了图6a对应的深度图像,图6c给出了利用本发明方法对图6a进行预测得到的预测显著性检测图像。对比图3a和图3c,对比图4a和4c,对比图5a和图5c,对比图6a和图6c,可以看出利用本发明方法得到的预测显著性检测图像的检测精度较高。Figure 3a shows the first original color real object image of the same scene, Figure 3b shows the depth image corresponding to Figure 3a, and Figure 3c shows the predicted saliency detection obtained by predicting Figure 3a using the method of the present invention image; Fig. 4a provides the 2nd original color real object image of the same scene, Fig. 4b provides the depth image corresponding to Fig. 4a, Fig. 4c provides the prediction significant that utilizes the inventive method to Fig. 4a to predict Figure 5a shows the 3rd original color real object image of the same scene, Figure 5b shows the depth image corresponding to Figure 5a, and Figure 5c shows the result obtained by using the method of the present invention to predict Figure 5a Predict the saliency detection image; Fig. 6a provides the 4th original color real object image of the same scene, Fig. 6b provides the depth image corresponding to Fig. 6a, and Fig. 6c provides the method of the present invention to predict Fig. 6a The resulting predicted saliency detected image. Comparing Figure 3a and Figure 3c, comparing Figure 4a and 4c, comparing Figure 5a and Figure 5c, comparing Figure 6a and Figure 6c, it can be seen that the detection accuracy of the predicted saliency detection image obtained by the method of the present invention is relatively high.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910444775.0A CN110263813B (en) | 2019-05-27 | 2019-05-27 | Significance detection method based on residual error network and depth information fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910444775.0A CN110263813B (en) | 2019-05-27 | 2019-05-27 | Significance detection method based on residual error network and depth information fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110263813A true CN110263813A (en) | 2019-09-20 |
CN110263813B CN110263813B (en) | 2020-12-01 |
Family
ID=67915440
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910444775.0A Active CN110263813B (en) | 2019-05-27 | 2019-05-27 | Significance detection method based on residual error network and depth information fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263813B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751157A (en) * | 2019-10-18 | 2020-02-04 | 厦门美图之家科技有限公司 | Image saliency segmentation and image saliency model training method and device |
CN110782458A (en) * | 2019-10-23 | 2020-02-11 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110929736A (en) * | 2019-11-12 | 2020-03-27 | 浙江科技学院 | Multi-feature cascade RGB-D significance target detection method |
CN111160410A (en) * | 2019-12-11 | 2020-05-15 | 北京京东乾石科技有限公司 | Object detection method and device |
CN111209919A (en) * | 2020-01-06 | 2020-05-29 | 上海海事大学 | Marine ship significance detection method and system |
CN111242238A (en) * | 2020-01-21 | 2020-06-05 | 北京交通大学 | Method for acquiring RGB-D image saliency target |
CN111351450A (en) * | 2020-03-20 | 2020-06-30 | 南京理工大学 | Single-frame stripe image three-dimensional measurement method based on deep learning |
CN111428602A (en) * | 2020-03-18 | 2020-07-17 | 浙江科技学院 | Binocular saliency image detection method based on edge-assisted enhancement of convolutional neural network |
CN111783862A (en) * | 2020-06-22 | 2020-10-16 | 浙江科技学院 | Stereo salient object detection technology based on multi-attention-directed neural network |
CN112749712A (en) * | 2021-01-22 | 2021-05-04 | 四川大学 | RGBD significance object detection method based on 3D convolutional neural network |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7611145B2 (en) * | 2019-08-09 | 2025-01-09 | 株式会社半導体エネルギー研究所 | system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351941A1 (en) * | 2016-06-03 | 2017-12-07 | Miovision Technologies Incorporated | System and Method for Performing Saliency Detection Using Deep Active Contours |
CN108961220A (en) * | 2018-06-14 | 2018-12-07 | 上海大学 | A kind of image collaboration conspicuousness detection method based on multilayer convolution Fusion Features |
CN109409380A (en) * | 2018-08-27 | 2019-03-01 | 浙江科技学院 | A kind of significant extracting method of stereo-picture vision based on double learning networks |
CN109409435A (en) * | 2018-11-01 | 2019-03-01 | 上海大学 | A kind of depth perception conspicuousness detection method based on convolutional neural networks |
CN109598268A (en) * | 2018-11-23 | 2019-04-09 | 安徽大学 | A kind of RGB-D well-marked target detection method based on single flow depth degree network |
CN109635822A (en) * | 2018-12-07 | 2019-04-16 | 浙江科技学院 | The significant extracting method of stereo-picture vision based on deep learning coding and decoding network |
-
2019
- 2019-05-27 CN CN201910444775.0A patent/CN110263813B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351941A1 (en) * | 2016-06-03 | 2017-12-07 | Miovision Technologies Incorporated | System and Method for Performing Saliency Detection Using Deep Active Contours |
CN108961220A (en) * | 2018-06-14 | 2018-12-07 | 上海大学 | A kind of image collaboration conspicuousness detection method based on multilayer convolution Fusion Features |
CN109409380A (en) * | 2018-08-27 | 2019-03-01 | 浙江科技学院 | A kind of significant extracting method of stereo-picture vision based on double learning networks |
CN109409435A (en) * | 2018-11-01 | 2019-03-01 | 上海大学 | A kind of depth perception conspicuousness detection method based on convolutional neural networks |
CN109598268A (en) * | 2018-11-23 | 2019-04-09 | 安徽大学 | A kind of RGB-D well-marked target detection method based on single flow depth degree network |
CN109635822A (en) * | 2018-12-07 | 2019-04-16 | 浙江科技学院 | The significant extracting method of stereo-picture vision based on deep learning coding and decoding network |
Non-Patent Citations (3)
Title |
---|
HUANG, RUI 等: "RGB-D Salient Object Detection by a CNN With Multiple Layers Fusion", 《IEEE SIGNAL PROCESSING LETTERS》 * |
WUJIE ZHOU 等: "Saliency Detection for Stereoscopic 3D Images in the Quaternion Frequency Domain", 《3DR EXPRESS》 * |
李荣 等: "利用卷积神经网络的显著性区域预测方法", 《重庆邮电大学学报( 自然科学版)》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110751157A (en) * | 2019-10-18 | 2020-02-04 | 厦门美图之家科技有限公司 | Image saliency segmentation and image saliency model training method and device |
CN110751157B (en) * | 2019-10-18 | 2022-06-24 | 厦门美图之家科技有限公司 | Image significance segmentation and image significance model training method and device |
CN110782458B (en) * | 2019-10-23 | 2022-05-31 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110782458A (en) * | 2019-10-23 | 2020-02-11 | 浙江科技学院 | Object image 3D semantic prediction segmentation method of asymmetric coding network |
CN110929736A (en) * | 2019-11-12 | 2020-03-27 | 浙江科技学院 | Multi-feature cascade RGB-D significance target detection method |
CN110929736B (en) * | 2019-11-12 | 2023-05-26 | 浙江科技学院 | Multi-feature cascading RGB-D significance target detection method |
CN111160410A (en) * | 2019-12-11 | 2020-05-15 | 北京京东乾石科技有限公司 | Object detection method and device |
CN111160410B (en) * | 2019-12-11 | 2023-08-08 | 北京京东乾石科技有限公司 | Object detection method and device |
CN111209919A (en) * | 2020-01-06 | 2020-05-29 | 上海海事大学 | Marine ship significance detection method and system |
CN111209919B (en) * | 2020-01-06 | 2023-06-09 | 上海海事大学 | Marine ship significance detection method and system |
CN111242238A (en) * | 2020-01-21 | 2020-06-05 | 北京交通大学 | Method for acquiring RGB-D image saliency target |
CN111242238B (en) * | 2020-01-21 | 2023-12-26 | 北京交通大学 | RGB-D image saliency target acquisition method |
CN111428602A (en) * | 2020-03-18 | 2020-07-17 | 浙江科技学院 | Binocular saliency image detection method based on edge-assisted enhancement of convolutional neural network |
CN111351450B (en) * | 2020-03-20 | 2021-09-28 | 南京理工大学 | Single-frame stripe image three-dimensional measurement method based on deep learning |
CN111351450A (en) * | 2020-03-20 | 2020-06-30 | 南京理工大学 | Single-frame stripe image three-dimensional measurement method based on deep learning |
CN111783862A (en) * | 2020-06-22 | 2020-10-16 | 浙江科技学院 | Stereo salient object detection technology based on multi-attention-directed neural network |
CN112749712A (en) * | 2021-01-22 | 2021-05-04 | 四川大学 | RGBD significance object detection method based on 3D convolutional neural network |
CN112749712B (en) * | 2021-01-22 | 2022-04-12 | 四川大学 | A RGBD salient object detection method based on 3D convolutional neural network |
Also Published As
Publication number | Publication date |
---|---|
CN110263813B (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263813A (en) | A kind of conspicuousness detection method merged based on residual error network and depth information | |
CN108664981B (en) | Salient image extraction method and device | |
CN110246148B (en) | Multi-modal significance detection method for depth information fusion and attention learning | |
CN110175986B (en) | Stereo image visual saliency detection method based on convolutional neural network | |
CN110490082B (en) | Road scene semantic segmentation method capable of effectively fusing neural network features | |
CN110059728B (en) | RGB-D image visual saliency detection method based on attention model | |
CN111582316A (en) | A RGB-D Saliency Object Detection Method | |
CN111563418A (en) | A Saliency Detection Method for Asymmetric Multimodal Fusion Based on Attention Mechanism | |
CN110929736A (en) | Multi-feature cascade RGB-D significance target detection method | |
CN112991278B (en) | Method and system for detecting Deepfake video by combining RGB (red, green and blue) space domain characteristics and LoG (LoG) time domain characteristics | |
CN111310767A (en) | A saliency detection method based on boundary enhancement | |
CN113192073A (en) | Clothing semantic segmentation method based on cross fusion network | |
CN110009700B (en) | Convolutional neural network visual depth estimation method based on RGB (red, green and blue) graph and gradient graph | |
CN112149662A (en) | A Multimodal Fusion Saliency Detection Method Based on Dilated Convolution Blocks | |
CN113450313B (en) | Image significance visualization method based on regional contrast learning | |
CN112597985A (en) | Crowd counting method based on multi-scale feature fusion | |
CN110570402B (en) | Binocular salient object detection method based on boundary perception neural network | |
CN115937594A (en) | Remote sensing image classification method and device based on fusion of local and global features | |
CN110210492A (en) | A kind of stereo-picture vision significance detection method based on deep learning | |
CN110782458B (en) | Object image 3D semantic prediction segmentation method of asymmetric coding network | |
CN111445432A (en) | An image saliency detection method based on information fusion convolutional neural network | |
CN110705566A (en) | A Multimodal Fusion Saliency Detection Method Based on Spatial Pyramid Pooling | |
CN114037893A (en) | A high-resolution remote sensing image building extraction method based on convolutional neural network | |
CN109508639B (en) | Road scene semantic segmentation method based on multi-scale porous convolutional neural network | |
CN112529862A (en) | Significance image detection method for interactive cycle characteristic remodeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230118 Address after: Room 2202, 22 / F, Wantong building, No. 3002, Sungang East Road, Sungang street, Luohu District, Shenzhen City, Guangdong Province Patentee after: Shenzhen dragon totem technology achievement transformation Co.,Ltd. Address before: 310023 No. 318 stay Road, Xihu District, Zhejiang, Hangzhou Patentee before: ZHEJIANG University OF SCIENCE AND TECHNOLOGY |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230627 Address after: 710000 Room 1306, Building 7, Taihua Jinmao International, Keji Second Road, Hi tech Zone, Xi'an City, Shaanxi Province Patentee after: Huahao Technology (Xi'an) Co.,Ltd. Address before: Room 2202, 22 / F, Wantong building, No. 3002, Sungang East Road, Sungang street, Luohu District, Shenzhen City, Guangdong Province Patentee before: Shenzhen dragon totem technology achievement transformation Co.,Ltd. |