CN111814895A - A saliency object detection method based on absolute and relative depth-induced networks - Google Patents
A saliency object detection method based on absolute and relative depth-induced networks Download PDFInfo
- Publication number
- CN111814895A CN111814895A CN202010695446.6A CN202010695446A CN111814895A CN 111814895 A CN111814895 A CN 111814895A CN 202010695446 A CN202010695446 A CN 202010695446A CN 111814895 A CN111814895 A CN 111814895A
- Authority
- CN
- China
- Prior art keywords
- depth
- network
- absolute
- feature
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 30
- 230000006698 induction Effects 0.000 claims abstract description 35
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 6
- 239000013589 supplement Substances 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 8
- 230000010354 integration Effects 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 230000000875 corresponding effect Effects 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 2
- 230000002596 correlated effect Effects 0.000 claims description 2
- 230000000306 recurrent effect Effects 0.000 claims description 2
- 230000001939 inductive effect Effects 0.000 claims 1
- 239000000284 extract Substances 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
本发明公开本发明涉及基于绝对和相对深度诱导网络的显著性目标检测方法,包括如下步骤:以残差网络为主干网络的深度诱导网络训练;绝对深度诱导模块跨模态特征融合,定位物体;相对深度诱导模块建立空间几何模型补充细节信息。本发明不仅从残差网络中提RG图像特征,而且提出利用深度信息来帮助显著性目标检测任务,绝对深度诱导模块RGB图像特征和深度图像信息以从粗到细的方式跨模态融合利用,避免了由于两个空间的异步特性而引起的杂乱噪声干扰;相对深度诱导模块建立空间图卷积模型探索空间结构和几何信息,以增强局部特征表示能力,从而提高检测的准确性以及鲁棒性,使其可达到极好的检测效果,具有广阔的应用前景。
The present invention discloses the present invention relates to a saliency target detection method based on absolute and relative depth induction network, comprising the following steps: depth induction network training with residual network as backbone network; absolute depth induction module cross-modal feature fusion to locate objects; The relative depth induction module builds spatial geometric models to supplement detailed information. The invention not only extracts RG image features from the residual network, but also proposes to use depth information to help the salient target detection task, and the absolute depth induction module RGB image features and depth image information are used for cross-modal fusion in a coarse-to-fine manner, The cluttered noise interference caused by the asynchronous characteristics of the two spaces is avoided; the relative depth induction module establishes a spatial graph convolution model to explore the spatial structure and geometric information to enhance the local feature representation ability, thereby improving the accuracy and robustness of detection , so that it can achieve excellent detection effect and has broad application prospects.
Description
技术领域technical field
本发明属于及显著性目标检测技术领域,具体为基于绝对和相对深度诱导网络的显著性目标检测方法。The invention belongs to the technical field of saliency target detection, in particular to a salient target detection method based on absolute and relative depth induction networks.
背景技术Background technique
显著性目标检测是计算机图像处理中的基本操作,显著性目标检测旨在定位和分割图像中最具视觉特色的物体。近年来,它被广泛应用到各个领域,例如重新定位,场景分类,视觉跟踪和语义分割等。计算机在进行相关图像处理操作前可以采用显著性检测技术过滤掉无关信息,从而大大减小图像处理的工作,提高效率。Salient object detection is a fundamental operation in computer image processing, and salient object detection aims to locate and segment the most visually distinctive objects in an image. In recent years, it has been widely used in various fields, such as relocalization, scene classification, visual tracking and semantic segmentation, etc. The computer can use the saliency detection technology to filter out irrelevant information before performing related image processing operations, thereby greatly reducing the work of image processing and improving efficiency.
早期的显著性目标检测方法主要是设计手工制作的特征(例如亮度,颜色和纹理)来进行检测图像中的显着目标。近年来,由于CNN的发展,提出了各种基于深度学习的模型。2017年Hou等人,提出了一种在层与层之间的短连接机制,并使用它来聚合来自多个尺度的特征图。2017年Zhang等人,探索每个尺度的多层次特征,并以递归方式生成显着性图。2019年Feng等人,提出了一个注意反馈模块,以更好地探索显着物体的结构。但是,这些最近提出的方法在面对例如语义上复杂的背景,低亮度环境和透明对象等极端复杂的情况下具有一定挑战,为了解决这个问题,我们提出利用深度信息来补充RGB图像。这样我们就可以探索场景的空间结构和3D几何信息,从而提高网络的有效性和鲁棒性。Early salient object detection methods mainly designed hand-crafted features (such as brightness, color, and texture) to detect salient objects in images. In recent years, due to the development of CNN, various deep learning based models have been proposed. In 2017, Hou et al. proposed a short connection mechanism between layers and used it to aggregate feature maps from multiple scales. Zhang et al., 2017, explore multi-level features at each scale and generate saliency maps recursively. In 2019, Feng et al. proposed an attention feedback module to better explore the structure of salient objects. However, these recently proposed methods have certain challenges in the face of extremely complex situations such as semantically complex backgrounds, low-luminance environments, and transparent objects. To address this problem, we propose to supplement RGB images with depth information. In this way we can explore the spatial structure and 3D geometric information of the scene, thereby improving the effectiveness and robustness of the network.
传统的RGB-D显着物体检测方法提取的特征,缺少全局上下文信息和特征中的语义线索。近年来,深度和RGB特征的有效集成方法是此任务的关键问题。2019年Zhao等人设计了一个对比度损失来探索深度图像中的先验对比度。然后,通过融合细化的深度和RGB特征生成注意力图。通过充分利用多尺度跨模态特征的流体金字塔集成策略输出最终的显着性映射。2019年Pial等人分层整合了深度和RGB图像,并通过递归注意力模型细化最终显着性图。但是目前的方法融合深度和RGB图像特征空间是异步的,会在网络中引入杂波噪声。The features extracted by traditional RGB-D salient object detection methods lack global context information and semantic clues in the features. In recent years, efficient integration methods of depth and RGB features are a key issue for this task. In 2019 Zhao et al. designed a contrast loss to explore the prior contrast in depth images. Then, an attention map is generated by fusing the refined depth and RGB features. The final saliency map is output through a fluid pyramid ensemble strategy that takes full advantage of multi-scale cross-modal features. In 2019 Pial et al. integrated depth and RGB images hierarchically and refined the final saliency map through a recursive attention model. However, the current method fusing depth and RGB image feature space is asynchronous, which will introduce clutter noise into the network.
综上所述,现有显著性目标检测技术有以下几个方面的缺陷:第一,大多数现有方法仅从RGB图像中提取特征,这些特征不足以从凌乱的背景区域中区分出显着物体;第二,现有的大多数方法都通过单独的网络提取深度和RGB特征,并使用不同的策略直接融合它们。但是,跨模态特征空间不一致。直接将它们融合会导致预测结果中出现嘈杂的响应;第三,虽然利用绝对深度诱导模块可以精确的定位显著物体,但是仍没有深入的探索局部区域的详细显著性信息,这也限制了模型性能的进一步提升。To sum up, the existing saliency object detection techniques have the following shortcomings: First, most existing methods only extract features from RGB images, which are not enough to distinguish salient objects from messy background regions objects; second, most existing methods extract depth and RGB features through separate networks and fuse them directly using different strategies. However, the feature space is inconsistent across modalities. Directly fusing them results in noisy responses in the prediction results; thirdly, although the absolute depth induction module can accurately locate salient objects, the detailed saliency information of local regions is still not deeply explored, which also limits the model performance further improvement.
发明内容SUMMARY OF THE INVENTION
(一)解决的技术问题(1) Technical problems solved
针对现有技术的不足,本发明提供基于绝对和相对深度诱导网络的显著性目标检测方法,解决了背景技术中提到的问题。In view of the deficiencies of the prior art, the present invention provides a saliency target detection method based on absolute and relative depth induction networks, which solves the problems mentioned in the background art.
(二)技术方案(2) Technical solutions
为实现上述目的,本发明提供如下技术方案:一种基于绝对和相对深度诱导网络的显著性目标检测方法,包括如下步骤:In order to achieve the above object, the present invention provides the following technical solutions: a method for detecting a salient target based on absolute and relative depth induced networks, comprising the following steps:
a.以残差网络为主干网络的深度诱导网络训练:将ResNet-50的最后池化层和全连接层移除,网络输入图像统一调整为256×256,并将数据集进行归一化处理,将五个卷积块生成的特征图通过金字塔的方式生成对应的侧输出图,然后在网络中自上而下的进行融合操作;a. Depth-induced network training with residual network as backbone network: The last pooling layer and fully connected layer of ResNet-50 are removed, the network input image is uniformly adjusted to 256×256, and the dataset is normalized , the feature maps generated by the five convolution blocks are generated by pyramids to generate the corresponding side output maps, and then the top-down fusion operation is performed in the network;
b.绝对深度诱导模块跨模态特征融合,定位物体:将输入图像的深度图像输入到一组卷积中,得到一个与Res2_x特征映射尺寸相同的深度特征映射图,多次应用绝对深度诱导网络,以递归的方式将深度特征图和RGB特征图集成在一起,实现跨模态的特征融合,避免了简单的融合两种异步模态特征,带来的噪音干扰,加强了深度和颜色特征之间的深度交互作用,可以在每个尺度上自适应地融合RGB和深度特征;b. Absolute depth induction module cross-modal feature fusion to locate objects: input the depth image of the input image into a set of convolutions to obtain a depth feature map with the same size as the Res2_x feature map, and apply the absolute depth induction network multiple times , integrates the depth feature map and RGB feature map in a recursive way to achieve cross-modal feature fusion, avoids the noise interference caused by simply fusing two asynchronous modal features, and strengthens the relationship between depth and color features. Depth interaction between RGB and depth features can be adaptively fused at each scale;
c.相对深度诱导模块建立空间几何模型补充细节信息:首先将来自解码网络最后阶段Res5_x的特征图进行上采样并与绝对深度诱导模块跨模态融合得到的特征图集成在一起,生成新的特征图,然后将其和绝对深度诱导模块产生的深度图共同输入到相对深度诱导模块中,来探索图像的空间结构和详细的显着性信息,将相对深度信息包裹在网络中以提高显着性模型的性能;c. The relative depth induction module builds the spatial geometric model to supplement the detailed information: first, the feature map from the final stage of the decoding network Res5_x is upsampled and integrated with the feature map obtained by cross-modal fusion of the absolute depth induction module to generate new features and then input it together with the depth map generated by the absolute depth induction module into the relative depth induction module to explore the spatial structure and detailed saliency information of the image, and wrap the relative depth information in the network to improve the saliency the performance of the model;
进一步的,步骤a中所述输入网络图像尺寸大小一样时,我们利用双线性插值的方法对数据集进行操作。Further, when the size of the input network images described in step a is the same, we use the bilinear interpolation method to operate on the dataset.
进一步的,步骤a中生成侧输出图时,将四个残差块的输出特征图输入到一个1*1的卷积层,将特征图的通道降维,即为侧输出图,从而用于后续的自上而下的集成多级特征图。Further, when the side output map is generated in step a, the output feature maps of the four residual blocks are input into a 1*1 convolutional layer, and the channel dimension of the feature map is reduced, that is, the side output map, which is used for Subsequent top-down integrated multi-level feature maps.
进一步的,步骤b中所述以递归的方式将深度特征图和RGB2特征图集成在一起,绝对深度诱导网络由门控递归单元(GRU)实现,该单元旨在处理序列问题,我们将多尺度特征集成过程表述为一个序列问题,并将每个尺度视为一个时间步。Further, the depth feature map and RGB2 feature map are integrated recursively as described in step b, and the absolute depth induction network is implemented by a gated recursive unit (GRU), which is designed to deal with sequence problems, we will multi-scale The feature integration process is formulated as a sequence problem and treats each scale as a time step.
进一步的,在每个时间步中,首先将深度特征图降维,然后通过全局最大池化将深度和RGB特征图进行级联和转化,生成新的特征向量,再经过全连接层等操作,可以实现在每个尺度上自适应地融合RGB和深度特征。Further, in each time step, the depth feature map is first dimensionally reduced, and then the depth and RGB feature maps are cascaded and transformed through global max pooling to generate new feature vectors, and then go through operations such as fully connected layers. Adaptive fusion of RGB and depth features at each scale can be achieved.
进一步的,步骤c中所述利用相对深度诱导模块来探索图像的空间结构和详细的显着性信息,该模块利用图卷积网络(GCN)来探索相对深度信息。Further, the relative depth induction module described in step c is used to explore the spatial structure and detailed saliency information of the image, which utilizes a graph convolutional network (GCN) to explore the relative depth information.
进一步的,提出的图卷积网络(GCN),根据图像像素的空间位置和深度值将其投影到3D空间中,弥补了2D空间中的相邻像素在3D点云空间中没有强烈关联这一劣势,根据短距离相对深度关系在局部区域执行信息传播,通过在多尺度上探索空间结构和几何信息,相继增强了局部特征表示能力,通过这种方式,可以在相对诱导网络中利用详细的显着性信息,从而有助于精确预测最终结果。Further, the proposed Graph Convolutional Network (GCN) projects image pixels into 3D space according to their spatial positions and depth values, making up for the fact that adjacent pixels in 2D space are not strongly correlated in 3D point cloud space. Disadvantages, information propagation is performed in local regions according to short-range relative depth relationships, and local feature representation capabilities are successively enhanced by exploring spatial structure and geometric information at multiple scales. relevant information, which can help to accurately predict the final result.
(三)有益效果(3) Beneficial effects
与现有技术相比,本发明提供了基于绝对和相对深度诱导网络的显著性目标检测方法,具备以下有益效果:Compared with the prior art, the present invention provides a saliency target detection method based on absolute and relative depth induced networks, which has the following beneficial effects:
本发明不仅从残差网络中提取RGB图像特征,而且提出利用深度信息来帮助显著性目标检测任务,大多数现有的RGB-D模型仅简单地提取深度和RGB特征,并启发式地融合它们,利用绝对深度诱导模块将RGB图像特征和深度图像信息以从粗到细的方式跨模态融合利用,避免了由于两个空间的异步特性而引起的杂乱噪声干扰,从而精确定位物体;利用相对深度诱导模块建立空间图卷积模型探索空间结构和几何信息,以增强局部特征表示能力,从而提高检测的准确性以及鲁棒性,使其可达到极好的检测效果,有助于与其它领域融合,具有广阔的应用前景。The present invention not only extracts RGB image features from the residual network, but also proposes to use depth information to help the salient object detection task, most existing RGB-D models simply extract depth and RGB features and fuse them heuristically , using the absolute depth induction module to fuse the RGB image features and the depth image information in a coarse-to-fine manner across modalities, avoiding the cluttered noise interference caused by the asynchronous characteristics of the two spaces, so as to accurately locate the object; using the relative The depth induction module establishes a spatial graph convolution model to explore the spatial structure and geometric information to enhance the local feature representation ability, thereby improving the accuracy and robustness of detection, making it possible to achieve excellent detection results, which is helpful for other fields. Fusion has broad application prospects.
附图说明Description of drawings
图1是本发明提出的一种基于绝对和相对深度诱导网络的显著性目标测流程图。FIG. 1 is a flow chart of a saliency target detection based on absolute and relative depth-induced network proposed by the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
请参阅图1,本实用新形提供一种技术方案:基于绝对和相对深度诱导网络的显著性目标检测方法,包括以下步骤:Referring to Fig. 1, the present invention provides a technical solution: a salient target detection method based on an absolute and relative depth induced network, comprising the following steps:
以残差网络为主干网络的深度诱导网络训练:将Resnet-50的最后池化层和全连接层移除,网络输入图像统一调整为256×256,并将数据集进行归一化处理,将五个卷积块生成的特征图通过金字塔的方式生成对应的侧输出图。然后在网络中自上而下的进行融合操作。Depth-induced network training with residual network as the backbone network: The last pooling layer and fully connected layer of Resnet-50 are removed, the network input image is uniformly adjusted to 256×256, and the dataset is normalized. The feature maps generated by the five convolution blocks generate corresponding side output maps in a pyramid fashion. Then the fusion operation is performed top-down in the network.
以残差网络为主干网络的深度诱导网络训练:将ResNet-50的最后池化层和全连接层移除,主干网络包括五个卷积块,Conv1,Res2_x,…Res5_x,输入尺寸大小为×W H的RGB图像,通过卷积块分别生成尺寸为的特征图较浅的层捕获图像的低级信息,如纹理和空间细节,深层特征映射包含高级语义信息。我们以金字塔的方式融合特征图利用1*1卷积核将通道降低为C,得到侧输出图然后以自上而下的方式集成多级特征图,Depth induced network training with residual network as backbone network: The last pooling layer and fully connected layer of ResNet-50 are removed. The backbone network includes five convolution blocks, Conv1, Res2_x,...Res5_x, and the input size is × The RGB images of WH are generated by convolution blocks with dimensions of feature map of Shallow layers capture low-level information of the image, such as texture and spatial details, and deep feature maps contain high-level semantic information. We fuse feature maps in a pyramid fashion Use the 1*1 convolution kernel to reduce the channel to C to get the side output map Then multi-level feature maps are integrated in a top-down manner,
其中,(.)σ是ReLU激活函数,CAT[·,·]是在相同通道维度下连接两个特征图的级联运算,UP(.)是具有双线性插值的上采样运算,Wl,bl是网络中的可训练参数。where (.)σ is the ReLU activation function, CAT[ , ] is the concatenation operation connecting two feature maps under the same channel dimension, UP(.) is the upsampling operation with bilinear interpolation, W l , bl is a trainable parameter in the network.
绝对深度诱导模块跨模态特征融合,定位物体:首先将尺寸为W*H的输入深度图像D输入到一组卷积层中,并生成尺寸为的特征图fd,然后,多次应用绝对深度诱导模块(ADIM),以递归的方式将深度特征图与RGB特征图集成在一起,以加强深度和颜色特征之间的深度交互作用,The absolute depth induction module fuses cross-modal features to locate objects: first, input an input depth image D of size W*H into a set of convolutional layers, and generate a size of The feature map f d of are integrated to enhance the deep interaction between depth and color features,
其中,是更新的深度特征,是第l层中的深度和RGB信息的聚合结果。in, is the updated depth feature, is the aggregation result of depth and RGB information in layer l.
根据上述实施例,优选地,ADIM由门控递归单元(GRU)实现,该单元旨在处理序列问题,我们将多尺度特征集成过程表述为一个序列问题,并将每个尺度视为一个时间步。在每个时间步中,我们将RGB特征视为GRU的输入,而深度特征可视为最后一步的隐藏状态,通过全局最大池化(GMP)操作将两个特征图进行级联和转换,并生成特征向量。随后在该特征向量上应用完全连接层以生成重置门r和更新门z。这两个门的值通过S型函数进行归一化,实际上门r控制深度和RGB特征的集成度,z控制的更新。通过这种方式,可以在每个尺度上自适应地融合RGB和深度特征。通过网络增强了深度和RGB特征之间的交互作用。然后将所生成的多尺度特征图与处于解码状态的特征图组合,即公式(1)重新表示为:According to the above embodiments, ADIM is preferably implemented by a Gated Recurrent Unit (GRU), which is designed to handle sequence problems, we formulate the multi-scale feature integration process as a sequence problem, and treat each scale as a time step . At each time step, we convert the RGB features regarded as the input of the GRU, while the deep features It can be regarded as the hidden state of the last step, and the two feature maps are concatenated and transformed through a global max pooling (GMP) operation, and a feature vector is generated. A fully connected layer is then applied on this feature vector to generate reset gate r and update gate z. The values of these two gates are normalized by a sigmoid function, in fact the gate r controls the integration of depth and RGB features, z controls 's update. In this way, RGB and depth features can be adaptively fused at each scale. The interaction between depth and RGB features is enhanced by the network. Then the generated multi-scale feature map Combined with the feature map in the decoding state, that is, formula (1) is re-expressed as:
相对深度诱导模块建立空间几何模型补充细节信息:相对深度诱导模块(RDIM)用于解码阶段,首先将来自解码网络最后阶段的特征图进行上采样与特征图集成在一起,如公式(3)所述,所生成的特征图被表示为然后将RDIM应用于特征图和深度图像,以在网络中嵌入相对深度信息The relative depth induction module builds the spatial geometric model Supplementary details: The relative depth induction module (RDIM) is used in the decoding stage, first by incorporating the feature maps from the last stage of the decoding network Perform upsampling and feature maps integrated together, as described in Equation (3), the resulting feature map is denoted as Then apply RDIM to the feature map and depth images to embed relative depth information in the network
根据上述实施例,优选地,RDIM由图卷积网络(GCN)实现,为了探究像素之间地相对深度关系,我们首先将由ADIM生成的特征图表示为G=(V,E,其中节点集合为V,边缘集合为E。我们将图中的每个节点ni视为3D坐标系中的一个点,并将坐标表示为(xi,yi,di),其中(xi,yi)是特征的空间位置映射且di是相应地深度值。将节点集合表示为V={n1,n2,...,nk},k是节点数。我们定义3D坐标和其相邻的m个元素的边缘集合ei,j∈E,计算边缘ei,j上的权重wi,j,作为相对深度值,以测量节点ni和nj之间的空间相关性,According to the above embodiment, RDIM is preferably implemented by a graph convolutional network (GCN). In order to explore the relative depth relationship between pixels, we first use the feature map generated by ADIM Denoted as G=(V, E, where the set of nodes is V and the set of edges is E. We treat each node ni in the graph as a point in a 3D coordinate system, and denote the coordinates as (x i , y i , d i ), where ( xi , y i ) is the spatial location map of the feature And di is the corresponding depth value. Denote the set of nodes as V={n 1 , n 2 , . . . , n k }, where k is the number of nodes. We define the 3D coordinates and their adjacent m elements of the edge set ei,j ∈ E, calculate the weight wi ,j on the edge ei ,j , as the relative depth value, to measure the distance between nodes ni and nj spatial correlation,
wi,j=|(xi,yi,di)-(xj,yj,dj)| (5)w i,j =|(x i , yi , d i )-(x j , y j , d j )| (5)
为了描述节点ni和nj之间的语义关系,我们为边缘ei,j,定义了一个属性特征ai,j,为了进一步考虑图像的全局上下文信息,对特征图应用GAP以提取高级语义信息,输出特征向量fg。In order to describe the semantic relationship between nodes ni and nj, we define an attribute feature ai,j for the edge e i, j . In order to further consider the global context information of the image, the feature map GAP is applied to extract high-level semantic information, outputting a feature vector f g .
空间GCN由一组堆叠的图卷积层(GCL)组成,对于每个GCL,首先更新边缘ei,j的属性特征ai,j,Spatial GCN consists of a set of stacked graph convolutional layers (GCLs), for each GCL, the attribute features a i,j of edges e i,j are first updated,
其中,和分别是特征图的位置(xi,yi)和(xj,yj)的特征向量,利用MLP更新每个节点的功能,in, and feature map eigenvectors of the positions (x i , y i ) and (x j , y j ) of , using MLP to update the function of each node,
其中N(ni)是节点ni的相邻像素集合wi,j视为边缘ei,j上的相对深度值的关注值,通过这种方式,RDIM更加关注具有较大相对距离的区域,度值的关注值,通过这种方式,RDIM更加关注具有较大相对距离的区域,消息通过相邻节点的边缘传输。然后,将所有节点的更新的特征馈送到全局最大池化层,并且获得更新的全局特征向量fg,最后,我们通过最后一个获得了特征图其中是尺度为lRDIM的整体输出。通过使用GCN在节点之间传输消息,每个节点的功能将根据其与所有其他相邻节点的关系进行更新和完善。在我们的网络中,我们在解码阶段的第3级和第4级应用RDIM。然后将所生成的RDIM特征图输入到下一个解码阶段。where N( ni ) is the set of adjacent pixels wi ,j of node ni as the attention value of the relative depth value on the edge ei,j , in this way, RDIM pays more attention to the area with larger relative distance, The attention value of the degree value, in this way, RDIM pays more attention to the area with large relative distance, and the message is transmitted through the edge of the adjacent node. Then, the updated features of all nodes are is fed to the global max pooling layer, and the updated global feature vector f g is obtained, and finally, we pass the last obtained feature map in is the overall output of scale lRDIM. By using GCNs to transmit messages between nodes, each node's functionality will be updated and refined based on its relationship to all other neighboring nodes. In our network, we apply RDIM at stage 3 and 4 of the decoding stage. The generated RDIM feature maps are then fed into the next decoding stage.
选择最后一个解码阶段生成的特征图预测最终的显著图,因为它结合了绝对和相对深度信息,首先使用双线性插值运算将特征图向上采样,使其与输入大小相同,最后输入单个通道的卷积层中,得到最终的显著图S,在训练过程中,最终的显著图通过交叉熵损失函数由真值图监督,Select the feature map generated by the last decoding stage To predict the final saliency map, since it combines absolute and relative depth information, the feature maps are first Up-sampling to make it the same size as the input, and finally input into the convolutional layer of a single channel to get the final saliency map S. During the training process, the final saliency map is obtained from the ground-truth map through the cross-entropy loss function supervise,
其中和Si,j分别是真值图和显着性图的位置(i,j)中的显着性值。in and S i,j are the saliency values in position (i,j) of the ground-truth map and saliency map, respectively.
本发明不仅从残差网络中提取RGB图像特征,而且提出利用深度信息来帮助显著性目标检测任务,大多数现有的RGB-D模型仅简单地提取深度和RGB特征,并启发式地融合它们,本发明中设计绝对深度诱导模块将RGB图像特征和深度图像信息以从粗到细的方式跨模态融合利用,避免了由于两个空间的异步特性而引起的杂乱噪声干扰,从而精确定位物体;同时设计相对深度诱导模块建立空间图卷积模型探索空间结构和几何信息,以增强局部特征表示能力,从而提高检测的准确性以及鲁棒性,使其可达到极好的检测效果,有助于与其它领域融合,具有广阔的应用前景。The present invention not only extracts RGB image features from the residual network, but also proposes to use depth information to help the salient object detection task, most existing RGB-D models simply extract depth and RGB features and fuse them heuristically In the present invention, the absolute depth induction module is designed to fuse and utilize RGB image features and depth image information across modes in a coarse-to-fine manner, avoiding cluttered noise interference caused by the asynchronous characteristics of the two spaces, so as to accurately locate objects ; At the same time, a relative depth induction module is designed to establish a spatial graph convolution model to explore spatial structure and geometric information to enhance the local feature representation ability, thereby improving the accuracy and robustness of detection, so that it can achieve excellent detection results and help It has broad application prospects for integration with other fields.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus.
在本发明的描述中,需要说明的是,术语“上”、“下”、“内”、“外”“前端”、“后端”、“两端”、“一端”、“另一端”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本发明的限制。In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front end", "rear end", "two ends", "one end" and "the other end" The orientation or positional relationship indicated by etc. is based on the orientation or positional relationship shown in the accompanying drawings, and is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation, with a specific orientation. The orientation configuration and operation are therefore not to be construed as limitations of the present invention.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, and substitutions can be made in these embodiments without departing from the principle and spirit of the invention and modifications, the scope of the present invention is defined by the appended claims and their equivalents.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010695446.6A CN111814895B (en) | 2020-07-17 | 2020-07-17 | Salient object detection method based on absolute and relative depth induced network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010695446.6A CN111814895B (en) | 2020-07-17 | 2020-07-17 | Salient object detection method based on absolute and relative depth induced network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814895A true CN111814895A (en) | 2020-10-23 |
CN111814895B CN111814895B (en) | 2024-12-03 |
Family
ID=72866457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010695446.6A Active CN111814895B (en) | 2020-07-17 | 2020-07-17 | Salient object detection method based on absolute and relative depth induced network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814895B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076947A (en) * | 2021-03-26 | 2021-07-06 | 东北大学 | RGB-T image significance detection system with cross-guide fusion |
CN113537279A (en) * | 2021-05-18 | 2021-10-22 | 齐鲁工业大学 | A COVID-19 Identification System Based on Class Residual Convolution and LSTM |
CN113963081A (en) * | 2021-10-11 | 2022-01-21 | 华东师范大学 | A method for intelligent synthesis of image charts based on graph convolutional network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351941A1 (en) * | 2016-06-03 | 2017-12-07 | Miovision Technologies Incorporated | System and Method for Performing Saliency Detection Using Deep Active Contours |
WO2019136946A1 (en) * | 2018-01-15 | 2019-07-18 | 中山大学 | Deep learning-based weakly supervised salient object detection method and system |
CN110210539A (en) * | 2019-05-22 | 2019-09-06 | 西安电子科技大学 | The RGB-T saliency object detection method of multistage depth characteristic fusion |
CN110399907A (en) * | 2019-07-03 | 2019-11-01 | 杭州深睿博联科技有限公司 | Thoracic cavity illness detection method and device, storage medium based on induction attention |
AU2020100274A4 (en) * | 2020-02-25 | 2020-03-26 | Huang, Shuying DR | A Multi-Scale Feature Fusion Network based on GANs for Haze Removal |
CN111242238A (en) * | 2020-01-21 | 2020-06-05 | 北京交通大学 | Method for acquiring RGB-D image saliency target |
CN111242138A (en) * | 2020-01-11 | 2020-06-05 | 杭州电子科技大学 | RGBD significance detection method based on multi-scale feature fusion |
-
2020
- 2020-07-17 CN CN202010695446.6A patent/CN111814895B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170351941A1 (en) * | 2016-06-03 | 2017-12-07 | Miovision Technologies Incorporated | System and Method for Performing Saliency Detection Using Deep Active Contours |
WO2019136946A1 (en) * | 2018-01-15 | 2019-07-18 | 中山大学 | Deep learning-based weakly supervised salient object detection method and system |
CN110210539A (en) * | 2019-05-22 | 2019-09-06 | 西安电子科技大学 | The RGB-T saliency object detection method of multistage depth characteristic fusion |
CN110399907A (en) * | 2019-07-03 | 2019-11-01 | 杭州深睿博联科技有限公司 | Thoracic cavity illness detection method and device, storage medium based on induction attention |
CN111242138A (en) * | 2020-01-11 | 2020-06-05 | 杭州电子科技大学 | RGBD significance detection method based on multi-scale feature fusion |
CN111242238A (en) * | 2020-01-21 | 2020-06-05 | 北京交通大学 | Method for acquiring RGB-D image saliency target |
AU2020100274A4 (en) * | 2020-02-25 | 2020-03-26 | Huang, Shuying DR | A Multi-Scale Feature Fusion Network based on GANs for Haze Removal |
Non-Patent Citations (2)
Title |
---|
刘政怡;段群涛;石松;赵鹏;: "基于多模态特征融合监督的RGB-D图像显著性检测", 电子与信息学报, no. 04, 15 April 2020 (2020-04-15), pages 206 - 213 * |
陈凯;王永雄;: "结合空间注意力多层特征融合显著性检测", 中国图象图形学报, no. 06, 16 June 2020 (2020-06-16), pages 66 - 77 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113076947A (en) * | 2021-03-26 | 2021-07-06 | 东北大学 | RGB-T image significance detection system with cross-guide fusion |
CN113076947B (en) * | 2021-03-26 | 2023-09-01 | 东北大学 | A RGB-T image saliency detection system based on cross-guided fusion |
CN113537279A (en) * | 2021-05-18 | 2021-10-22 | 齐鲁工业大学 | A COVID-19 Identification System Based on Class Residual Convolution and LSTM |
CN113963081A (en) * | 2021-10-11 | 2022-01-21 | 华东师范大学 | A method for intelligent synthesis of image charts based on graph convolutional network |
CN113963081B (en) * | 2021-10-11 | 2024-05-17 | 华东师范大学 | Image chart intelligent synthesis method based on graph convolution network |
Also Published As
Publication number | Publication date |
---|---|
CN111814895B (en) | 2024-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
CN110276316B (en) | A human keypoint detection method based on deep learning | |
TWI821671B (en) | A method and device for positioning text areas | |
CN111488474A (en) | A fine-grained hand-drawn sketch image retrieval method based on enhanced attention | |
CN109800817B (en) | Image classification method based on fusion semantic neural network | |
CN114463736B (en) | A multi-target detection method and device based on multimodal information fusion | |
CN111814895A (en) | A saliency object detection method based on absolute and relative depth-induced networks | |
CN113609896A (en) | Object-level remote sensing change detection method and system based on dual-correlation attention | |
Li et al. | ADR-MVSNet: A cascade network for 3D point cloud reconstruction with pixel occlusion | |
CN116485860A (en) | Monocular depth prediction algorithm based on multi-scale progressive interaction and aggregation cross attention features | |
CN110853057A (en) | Aerial image segmentation method based on global and multi-scale fully convolutional network | |
CN112883934A (en) | Attention mechanism-based SAR image road segmentation method | |
Deng et al. | Fusing geometrical and visual information via superpoints for the semantic segmentation of 3D road scenes | |
CN113628329A (en) | Zero-sample sketch three-dimensional point cloud retrieval method | |
CN117455868A (en) | SAR image change detection method based on significant fusion difference map and deep learning | |
CN114519819A (en) | Remote sensing image target detection method based on global context awareness | |
CN115527159B (en) | Counting system and method based on inter-modal scale attention aggregation features | |
CN114298187B (en) | An Object Detection Method Fused with Improved Attention Mechanism | |
CN114693951A (en) | An RGB-D Saliency Object Detection Method Based on Global Context Information Exploration | |
CN110009625A (en) | Image processing system, method, terminal, and medium based on deep learning | |
CN118674989A (en) | Infrared and visible light target identification method based on image registration | |
CN113313108A (en) | Saliency target detection method based on super-large receptive field characteristic optimization | |
Li et al. | Stereo superpixel segmentation via decoupled dynamic spatial-embedding fusion network | |
CN114693953B (en) | A RGB-D salient object detection method based on cross-modal bidirectional complementary network | |
CN116433904A (en) | Cross-modal RGB-D semantic segmentation method based on shape perception and pixel convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |