WO2018023734A1 - Significance testing method for 3d image - Google Patents

Significance testing method for 3d image Download PDF

Info

Publication number
WO2018023734A1
WO2018023734A1 PCT/CN2016/093637 CN2016093637W WO2018023734A1 WO 2018023734 A1 WO2018023734 A1 WO 2018023734A1 CN 2016093637 W CN2016093637 W CN 2016093637W WO 2018023734 A1 WO2018023734 A1 WO 2018023734A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
saliency
map
depth
color
Prior art date
Application number
PCT/CN2016/093637
Other languages
French (fr)
Chinese (zh)
Inventor
王旭
张秋丹
江健民
赖志辉
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2016/093637 priority Critical patent/WO2018023734A1/en
Priority to CN201680000652.2A priority patent/CN106462771A/en
Publication of WO2018023734A1 publication Critical patent/WO2018023734A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Definitions

  • the invention belongs to the technical field of 3D image processing, and more particularly relates to a method for detecting the saliency of a 3D image.
  • 3D applications are becoming more and more popular in our daily lives. Compared to the traditional 2D visual experience, 3D applications can provide users with a deep perception and immersive viewing experience. However, there are still many open issues in the 3D process that need to be well resolved.
  • the saliency detection of 3D images is a very basic problem. Its main purpose is to find the location of the region of interest to the human eye in a natural scene image. Moreover, he can be applied to various fields, for example, 3D video coding can be used to optimize bit allocation, space pooling in stereo image quality evaluation, and feature extraction in 3D object detection.
  • these traditional saliency detection models are not able to accurately predict the location of an area of interest in a 3D scene when people are watching.
  • the depth information of a stereoscopic image needs to take into account its depth information.
  • Fang et al. proposed a framework to estimate the saliency of a stereoscopic image using the contrast of features such as color, brightness, texture, and depth.
  • the model still uses the traditional method of manually extracting features to extract the underlying features and depth features when calculating stereo image saliency.
  • Qi et al. proposed a 3D visual saliency detection model, mainly to manually extract the depth features from the generated disparity maps and extract the underlying features from the left and right views.
  • Kim et al. describe a saliency prediction model for stereoscopic video, which combines some discrete underlying features, depth feature distributions, and high-level scene classifications.
  • the method of manually extracting features cannot effectively and accurately remove the original pixel extraction features of the hierarchical level, and there are many uncertain factors for manually extracting features, and some unpredictable errors may occur, and manual extraction is performed.
  • Features often require a lot of manpower, but also rely on professional knowledge, and manual extraction is often not general. Therefore the performance of these models is limited.
  • the object of the present invention is to provide a method for detecting the saliency of a 3D image, which aims to solve the problem that the method of manually extracting features in the prior art cannot effectively extract features in the original pixel and cause large errors.
  • the invention provides a method for detecting the saliency of a 3D image, comprising the following steps:
  • step (1) is specifically:
  • the structure of the convolutional neural network model is five convolutional layers and three fully connected layers; for each layer of the network, a specific network parameter configuration is set: first, the picture input layer, setting input The size of the image is 227*227.
  • the convolution kernel has a size of 11, a total of 96 convolution filters, a convolution stride of 4, and the number of pictures outputted is 96.
  • the ReLUs and max-pooling operations are performed after the convolution layer one.
  • the number of neurons in the fully connected layer 1 and layer 2 is 4096
  • the number of neurons in the fully connected layer 3 is 1000.
  • a saliency map of the depth map and the color map is generated according to a neural network (NN) model; wherein the neural network (NN) model has one output layer and two full connections The hidden layer, the input of the neural network (NN) model is a feature vector, and the output is a saliency label of the current region.
  • the saliency label is 1, the current region is significant, and when the saliency label is 0, then Indicates that the current area is not significant.
  • the saliency map of the depth image is represented by a formula Generated, where x represents the area of the depth image Pixels in, Represents the weighting factor of the jth layer of the depth map, L represents the total number of layers, S d (x) represents the saliency map of the depth image, and j represents the number of layers of the depth map, a segmentation area indicating that the jth layer index of the depth map is i, Represents a mapping function that primarily describes the local area of the depth map The relationship between the feature vector and the saliency label of the region.
  • the saliency map of the color image is represented by a formula Generated, where x represents the area of the color image Pixels in, The weighting factor representing the jth layer of the color map, L is the total number of layers, and j is the number of layers of the color map.
  • a segmentation area indicating that the jth layer index of the color map is i Represents a mapping function that primarily describes the local area of the color map The relationship between the feature vector and the saliency label of the region.
  • the total number L of layers is 15 and the weight w is 0.5.
  • the invention performs depth learning feature extraction of multi-scale regions on color images and depth images respectively based on a Convolutional Neural Network (CNN) model; the saliency map of depth images (or color images) is based on depth through the NN model The feature vector and the saliency label of the region are generated.
  • the NN model is equivalent to the role of the classifier.
  • a linear fusion method is used which combines the depth saliency map and the color saliency map to generate the final 3D image. The saliency map; small error and high precision.
  • FIG. 1 is a schematic diagram of the framework of a method for detecting the significance of a 3D image provided by the present invention
  • FIG. 2 is a flow chart of implementing a saliency detection method for a 3D image according to an embodiment of the present invention
  • FIG. 3 is a diagram showing an example of a comparative simulation of a 3D image saliency detection method and a prior art according to an embodiment of the present invention.
  • the saliency detection method of the 3D image provided by the present invention can be applied to the fields of video coding, video compression, image retrieval, image quality evaluation, detection of an object of interest, and image retrieval.
  • the way it is applied is mainly based on the field of its application.
  • the framework of the visual saliency model based on deep learning features proposed by the present invention comprises three main steps, namely, extraction of depth features, generation of saliency maps, fusion of saliency maps, and the description of the framework is as shown in FIG. Show.
  • the depth feature vectors of the color image and the depth image are extracted by a convolutional neural network (CNN) model.
  • CNN convolutional neural network
  • the saliency maps of the depth map and the color map are generated by the generated region feature vector and the region's saliency label through a three-layer neural network.
  • the saliency map of the 3D image is generated by a linear fusion of the saliency maps of the color image and the depth image.
  • FIG. 2 is a flowchart of a method for detecting a saliency of a 3D image according to an embodiment of the present invention, which specifically includes:
  • the visual attention mechanism contains a hierarchical selection process from coarse to fine. Therefore, we perform multi-level segmentation of the image before feature extraction. Feature extraction is then performed for each of the segmented regions of each layer.
  • the color map Ic and the depth map I d are divided into non-overlapping regions, respectively, which are represented as with among them with Respectively represent the color maps and depth maps of layer j index region m j and n j's, m j and n j represents the image area index layer j, and it is according to the most finely divided by the crudest to the way.
  • the neural network model of the graph In feature extraction, we used a model that has been pre-trained, called a convolutional neural network, to extract the features of the depth and color maps.
  • the model is trained on the ImageNet dataset and has five convolutional layers and three fully connected layers, which are a neural network model for image classification.
  • each local area is not only dependent on its own characteristics, it is often subject to the content of its neighborhood and its background information (that is, the rest of the area is removed). influences. Therefore, for each layer j of the segmentation of a depth image, we divide each local region of the layer Its adjacent area And the background area
  • the CNN model is used to extract the respective feature vectors.
  • the local area we defined here is irregular because of its shape during the segmentation process. Therefore, we use a rectangular area as the border of the image area. Resize each area rectangle to a pixel size of 227x227 and feed it into the CNN model.
  • the final output from each region is a 12288-dimensional feature vector and is represented as Here. For a color image, its operation is consistent with the depth image, its local area
  • the feature vector is represented as
  • the output feature vector is just a sparse representation of the current local area.
  • a mapping function for the feature vector and the significance tag.
  • the eigenvector is the input to the neural network, and its output is the saliency label of the current region. A value of 1 indicates that the current region is significant, and a value of 0 indicates non-significant.
  • the NN model is trained separately for the color image and the depth image.
  • the mapping function between the saliency label and the feature vector of the region of the depth image and the color image is expressed as with All pixels in the same region share the same saliency tag, which is derived from ground truth data.
  • the saliency map of the depth image is generated by (1):
  • the saliency map of the color image is generated by (2):
  • x represents the area of the depth image
  • the area of the color image The pixels in . with The weighting factors of the depth image and the color image are respectively indicated.
  • w is the contribution weight used to adjust the depth and color significance maps.
  • the way to adjust is to set the final saliency map by setting the value between 0 and 1 for w.
  • the w value is then adjusted by a series of evaluation indicators SIM, EMD and CC to determine the accuracy of the generated 3D image saliency map. This behavior is also known as the method of autoregression.
  • a currently widely used center bias mechanism to enhance the final saliency map.
  • the method provided by the present invention is a 3D visual saliency detection model based on deep learning features.
  • the first advantage of this technology is that instead of using the traditional manual extraction feature, the deep convolutional neural network is used to extract the feature information of the color map and the depth map.
  • the advantage of this is that the neural network extracts. Compared with manual extraction, the features are not accurate enough due to human factors, and the manual extraction of features is a large amount of engineering and consumes a lot of human resources.
  • the second advantage is that when calculating the 3D image saliency map, the depth information will be Considering with color information, tradition Most of the saliency models are extracted for 2D images; the third advantage is that we use a neural network model NN to act as a regression, that is, through the extracted image region features and the region's saliency tags.
  • the saliency value is estimated to generate a saliency map of the color map and the depth map.
  • the proposed model is compared with the existing model, which includes Li's multi-scale based model (expressed as VSMD).
  • the wavelet domain-based model represented as SDLL
  • the square 2D saliency model denoted as SSDF2D
  • the square 3D saliency model denoted as SSDF3D
  • the three models VSMD, SDLL and SSDF2D are mainly for the saliency calculation of 2D images
  • the SSDF3D model is for 3D images.
  • the total number L of layers of the model we proposed during the experiment was 15. w is set to 0.5.
  • the CC, EMD, and SIM scores for our proposed model are 0.5225, 2.1547, and 0.4985, respectively.
  • the CC, EMD and SIM scores for the VSMD model are only 0.3783, 2.8419 and 0.3812.
  • the experimental results show that the performance of the 3D image saliency detection model benefits from the fusion of color saliency maps and depth saliency maps.
  • the present invention proposes a visual saliency detection model of a 3D image based on deep learning features.
  • the CNN model to perform depth learning feature extraction for multi-scale regions on color images and depth images, respectively.
  • the saliency map of the depth image is generated by the NN model based on the depth feature vector and the saliency label of the region, which is here equivalent to the role of the classifier.
  • we use a linear fusion method that combines a depth saliency map with a color saliency map to generate a saliency map of the final 3D image.
  • a central biasing mechanism to enhance the saliency map.
  • Our proposed model achieves excellent performance on these two publicly available data sets.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a significance testing method for a 3D image, comprising steps of: (1) respectively extracting depth characteristic vectors of a color image and a depth image on the basis of a convolutional neural network; (2) respectively generating significance maps of the depth image and the color image according to a three-layer neural network and the extracted depth characteristic vectors of the color image and the depth image; (3) linearly blending the significance maps of the color image and the depth image to obtain a significance map of a 3D image. According to the present invention, depth study characteristics of a color image and a depth image are extracted in a multi-scale area on the basis of a CNN model; a significance map of the depth image (or the color image) is generated using a trained NN model on the basis of a depth characteristic vector and a significance label of the area, the NN model being used as a classifier in this case; with the depth significance map and a color significance map as an input, a final significance map of a 3D image is generated using a linear blending method. The present testing method has advantages of small error and high precision.

Description

一种3D图像的显著性检测方法A method for detecting the significance of 3D images 技术领域Technical field
本发明属于3D图像处理技术领域,更具体地,涉及一种3D图像的显著性检测方法。The invention belongs to the technical field of 3D image processing, and more particularly relates to a method for detecting the saliency of a 3D image.
背景技术Background technique
随着消费电子行业的不断发展,在我们的日常生活中,3D应用变得越来越受欢迎。与传统的2D视觉体验相比,3D应用能够给用户提供一种深度感知和身临其境的观看感受。然而,在3D处理过程中还是有很多开放性的问题需要被很好得解决。在3D研究中,3D图像的显著性检测是一个非常基本的问题,它主要的目的是在一张自然场景图像中找到人眼感兴趣的区域的位置。并且,他可以被应用于各种领域,例如,3D视频编码里可以用来优化比特分配,立体图像质量评估时的空间池化以及3D物体检测时的特征提取。With the continuous development of the consumer electronics industry, 3D applications are becoming more and more popular in our daily lives. Compared to the traditional 2D visual experience, 3D applications can provide users with a deep perception and immersive viewing experience. However, there are still many open issues in the 3D process that need to be well resolved. In 3D research, the saliency detection of 3D images is a very basic problem. Its main purpose is to find the location of the region of interest to the human eye in a natural scene image. Moreover, he can be applied to various fields, for example, 3D video coding can be used to optimize bit allocation, space pooling in stereo image quality evaluation, and feature extraction in 3D object detection.
现有的视觉显著性检测模型大多数都是跟2D图像有关。这些模型主要是从颜色图像中通过手动提取一些底层的特征(比如亮度、颜色、对比度、纹理等)来估计其显著性,并且这些模型没有考虑深度信息。例如,Itti等为快速场景分析提出了一个显著性模型,主要是结合了多尺度下的图像特征去估计显著性。Bruce等介绍了一个基于信息最大化的显著性方法,它主要是在进行显著性估计的时候将香农的自信息理论应用进去。Goferman等设计了一个基于上下文感知的显著性检测模型,主要是为了检测出一些可以代表场景的图像区域。Yang等提出了一个基于自顶向下方法的视觉显著性模型,主要是通过加入条件随机域和判别式字典方法。然而这些方法基本都是在针对2D图像去进行显著性检测的。Most of the existing visual saliency detection models are related to 2D images. These models mainly estimate the saliency from the color image by manually extracting some underlying features (such as brightness, color, contrast, texture, etc.), and these models do not consider depth information. For example, Itti et al. proposed a saliency model for fast scene analysis, which mainly combines image features at multiple scales to estimate saliency. Bruce et al. introduced a saliency method based on information maximization, which is mainly used to apply Shannon's self-information theory when making significant estimates. Goferman et al. designed a context-aware saliency detection model to detect some areas of the image that can represent the scene. Yang et al. proposed a visual saliency model based on the top-down approach, mainly by adding conditional random fields and discriminant dictionary methods. However, these methods basically perform saliency detection for 2D images.
因此,这些传统的显著性检测模型不能够准确的去预测在一个3D场景中人们在观看时感兴趣的区域位置。为了提升预测的准确度,一些研究者提出在模 型化立体图像的显著性检测时需要将其深度信息考虑在内。例如,方等人提出了一个框架是采用颜色、亮度、纹理和深度等特征的对比度来估计一个立体图像的显著性。该模型在计算立体图像显著性时依旧使用的是传统的手动提取特征的方法来提取底层特征和深度特征。祁等人提出了一个3D视觉显著性检测模型,主要还是手动的用已经生成的视差图来提取深度特征,从左右视图来提取底层特征。Kim等描述了一个立体视频的显著性预测模型,它主要是通过将一些离散的底层特征、深度特征分布与高水平的场景分类等相结合。但是对于这些研究,手动提取特征的方法不能够有效地准确地去分层级的原始像素提取特征,并且手动提取特征的不确定的因素很多,还会出现一些无法预料的误差,在进行手动提取特征的时候往往需要投入大量的人力,同时还要依赖于专业的知识,并且,手动提取往往不能一概而论的都好。因此这些模型的性能就被受限。Therefore, these traditional saliency detection models are not able to accurately predict the location of an area of interest in a 3D scene when people are watching. In order to improve the accuracy of prediction, some researchers have proposed The depth information of a stereoscopic image needs to take into account its depth information. For example, Fang et al. proposed a framework to estimate the saliency of a stereoscopic image using the contrast of features such as color, brightness, texture, and depth. The model still uses the traditional method of manually extracting features to extract the underlying features and depth features when calculating stereo image saliency. Qi et al. proposed a 3D visual saliency detection model, mainly to manually extract the depth features from the generated disparity maps and extract the underlying features from the left and right views. Kim et al. describe a saliency prediction model for stereoscopic video, which combines some discrete underlying features, depth feature distributions, and high-level scene classifications. However, for these studies, the method of manually extracting features cannot effectively and accurately remove the original pixel extraction features of the hierarchical level, and there are many uncertain factors for manually extracting features, and some unpredictable errors may occur, and manual extraction is performed. Features often require a lot of manpower, but also rely on professional knowledge, and manual extraction is often not general. Therefore the performance of these models is limited.
发明内容Summary of the invention
针对现有技术的缺陷,本发明的目的在于提供一种3D图像的显著性检测方法,旨在解决现有技术中采用手动提取特征的方法不能有效地在原始像素提取特征导致误差大的问题。Aiming at the defects of the prior art, the object of the present invention is to provide a method for detecting the saliency of a 3D image, which aims to solve the problem that the method of manually extracting features in the prior art cannot effectively extract features in the original pixel and cause large errors.
本发明提供了一种3D图像的显著性检测方法,包括下述步骤:The invention provides a method for detecting the saliency of a 3D image, comprising the following steps:
(1)对颜色图像和深度图像的深度特征向量进行提取;(1) extracting depth feature vectors of color images and depth images;
(2)根据三层的神经网络以及提取的颜色图像和深度图像的深度特征向量生成深度图和颜色图的显著性图;(2) generating a saliency map of the depth map and the color map according to the three-layer neural network and the extracted depth image of the color image and the depth image;
(3)将所述颜色图像和深度图像的显著性图进行线性融合处理后获得3D图像的显著性图。(3) Performing a linear fusion process on the saliency map of the color image and the depth image to obtain a saliency map of the 3D image.
更进一步地,步骤(1)具体为:Further, step (1) is specifically:
(1.1)将颜色图像和与所述颜色图像相关的深度图像分别进行图像分割后,获得多层次的没有重叠的图像区域; (1.1) separately performing image segmentation on the color image and the depth image associated with the color image, and obtaining multi-level image regions without overlapping;
(1.2)采用卷积神经网络模型分别对分割后的颜色图像和深度图像的特征向量进行提取。(1.2) The feature vector of the segmented color image and depth image is extracted by convolutional neural network model.
更进一步地,所述卷积神经网络模型的结构为五个卷积层和三个全连接层;对于每一层的网络,设定具体的网络参数配置:首先是图片输入层,设定输入图像的大小为227*227。以卷积层一为例,其卷积核的大小为11,共96个卷积滤波器,卷积步幅为4,输出的图片数量为96。ReLUs和max-pooling操作在卷积层一之后执行。最后是三个全连接层,是作为神经网络的分类器。全连接层一和层二其神经元的个数分别都为4096,全连接层三的神经元的个数为1000个。Further, the structure of the convolutional neural network model is five convolutional layers and three fully connected layers; for each layer of the network, a specific network parameter configuration is set: first, the picture input layer, setting input The size of the image is 227*227. Taking the convolution layer one as an example, the convolution kernel has a size of 11, a total of 96 convolution filters, a convolution stride of 4, and the number of pictures outputted is 96. The ReLUs and max-pooling operations are performed after the convolution layer one. Finally, there are three fully connected layers, which are classifiers for neural networks. The number of neurons in the fully connected layer 1 and layer 2 is 4096, and the number of neurons in the fully connected layer 3 is 1000.
更进一步地,步骤(2)中,根据神经网络(Neural Network,NN)模型生成深度图和颜色图的显著性图;其中,所述神经网络(NN)模型有一个输出层和两个全连接的隐藏层,所述神经网络(NN)模型的输入为特征向量,输出为当前区域的显著性标签,当显著性标签为1时则表示当前区域是显著的,当显著性标签为0时则表示当前区域是非显著。Further, in step (2), a saliency map of the depth map and the color map is generated according to a neural network (NN) model; wherein the neural network (NN) model has one output layer and two full connections The hidden layer, the input of the neural network (NN) model is a feature vector, and the output is a saliency label of the current region. When the saliency label is 1, the current region is significant, and when the saliency label is 0, then Indicates that the current area is not significant.
更进一步地,所述深度图像的显著性图由公式
Figure PCTCN2016093637-appb-000001
生成,其中,x表示深度图像的区域
Figure PCTCN2016093637-appb-000002
中的像素,
Figure PCTCN2016093637-appb-000003
表示深度图第j层的权重因子,L表示分层的总数目,Sd(x)表示深度图像的显著性图,j表示深度图的层数,
Figure PCTCN2016093637-appb-000004
表示深度图的第j层索引为i的分割区域,
Figure PCTCN2016093637-appb-000005
表示一个映射函数,主要描述深度图的本地区域
Figure PCTCN2016093637-appb-000006
的特征向量与该区域的显著性标签之间的关系。
Further, the saliency map of the depth image is represented by a formula
Figure PCTCN2016093637-appb-000001
Generated, where x represents the area of the depth image
Figure PCTCN2016093637-appb-000002
Pixels in,
Figure PCTCN2016093637-appb-000003
Represents the weighting factor of the jth layer of the depth map, L represents the total number of layers, S d (x) represents the saliency map of the depth image, and j represents the number of layers of the depth map,
Figure PCTCN2016093637-appb-000004
a segmentation area indicating that the jth layer index of the depth map is i,
Figure PCTCN2016093637-appb-000005
Represents a mapping function that primarily describes the local area of the depth map
Figure PCTCN2016093637-appb-000006
The relationship between the feature vector and the saliency label of the region.
更进一步地,所述颜色图像的显著性图由公式
Figure PCTCN2016093637-appb-000007
生成,其中,x表示颜色图像的区域
Figure PCTCN2016093637-appb-000008
中的像素,
Figure PCTCN2016093637-appb-000009
表示颜色图第j层的的权重因子,L表示分层的总数目,j表示颜色图的层数,
Figure PCTCN2016093637-appb-000010
表示颜色图的第j层索引为i的分割区域,
Figure PCTCN2016093637-appb-000011
表示一个映射函数,主要描述颜色图的本地区域
Figure PCTCN2016093637-appb-000012
的特征向量与该区域的显著性标签之间的关系。
Further, the saliency map of the color image is represented by a formula
Figure PCTCN2016093637-appb-000007
Generated, where x represents the area of the color image
Figure PCTCN2016093637-appb-000008
Pixels in,
Figure PCTCN2016093637-appb-000009
The weighting factor representing the jth layer of the color map, L is the total number of layers, and j is the number of layers of the color map.
Figure PCTCN2016093637-appb-000010
a segmentation area indicating that the jth layer index of the color map is i,
Figure PCTCN2016093637-appb-000011
Represents a mapping function that primarily describes the local area of the color map
Figure PCTCN2016093637-appb-000012
The relationship between the feature vector and the saliency label of the region.
更进一步地,所述3D图像的显著性图S=w·Sc+(1-w)·Sd,其中,Sd为深度 图像的显著性图,Sc为颜色图像的显著性图,w为颜色图的显著性图在最终的3D图像的视觉显著性图的贡献权重。Further, the saliency map of the 3D image is S=w·S c +(1-w)·S d , where S d is a saliency map of the depth image, and S c is a saliency map of the color image, w is the saliency map of the color map and contributes weight to the visual saliency map of the final 3D image.
更进一步地,分层的总数目L是15,权重w为0.5。Further, the total number L of layers is 15 and the weight w is 0.5.
本发明通过基于卷积神经网络(Convolutional Neural Network,CNN)模型对颜色图像和深度图像分别进行多尺度区域的深度学习特征提取;深度图像(或者颜色图像)的显著性图是通过NN模型基于深度特征向量和区域的显著性标签来生成的,NN模型在此相当于分类器的作用;并采用了一个线性融合的方法其结合了深度显著性图和颜色显著性图从而生成了最终的3D图像的显著性图;误差小、精度高。The invention performs depth learning feature extraction of multi-scale regions on color images and depth images respectively based on a Convolutional Neural Network (CNN) model; the saliency map of depth images (or color images) is based on depth through the NN model The feature vector and the saliency label of the region are generated. The NN model is equivalent to the role of the classifier. A linear fusion method is used which combines the depth saliency map and the color saliency map to generate the final 3D image. The saliency map; small error and high precision.
附图说明DRAWINGS
图1是本发明提供的3D图像的显著性检测方法的框架原理图;1 is a schematic diagram of the framework of a method for detecting the significance of a 3D image provided by the present invention;
图2是本发明实施例提供的3D图像的显著性检测方法的流程实现图;2 is a flow chart of implementing a saliency detection method for a 3D image according to an embodiment of the present invention;
图3是本发明实施例提供的3D图像的显著性检测方法与现有技术的对比仿真示例图。FIG. 3 is a diagram showing an example of a comparative simulation of a 3D image saliency detection method and a prior art according to an embodiment of the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
根据以上描述可以知道,计算视觉显著性图的模型性能在很大的程度上是受那些具有代表性的特征影响的。因此对于3D视觉显著性研究来说,找出那些具有代表性的视觉特征是相当重要的。又因为现存的3D图像的显著性检测模型基本都是基于手动的方法来提取特征的。然而这些研究方法很难去实现显著性区域与其邻域之间的更高成都的差异区分。除此之外,由于对3D视觉感知方面的知识的欠缺,那么深度信息对最终的视觉显著性图有怎么样的贡献仍旧不是 很清晰。As can be seen from the above description, the performance of a model for calculating a visual saliency map is largely affected by those representative features. Therefore, for 3D visual saliency studies, it is important to find those representative visual features. And because the saliency detection models of existing 3D images are basically based on manual methods to extract features. However, these research methods are difficult to distinguish the difference between the higher regions of the significant regions and their neighbors. In addition, due to the lack of knowledge about 3D visual perception, how the depth information contributes to the final visual saliency map is still not Very clear.
本发明提供的3D图像的显著性检测方法可以应用于视频编码、视频压缩、图像检索、图像质量评估、感兴趣目标物体的检测以及图像检索等领域。其应用的方式主要还是根据其应用的领域来定。The saliency detection method of the 3D image provided by the present invention can be applied to the fields of video coding, video compression, image retrieval, image quality evaluation, detection of an object of interest, and image retrieval. The way it is applied is mainly based on the field of its application.
本发明提出的基于深度学习特征的视觉显著性模型的框架包含了三个主要的步骤,分别是深度特征的提取,显著性图的生成,显著性图的融合,其框架描述图如图1所示。首先,颜色图像和深度图像的深度特征向量是被由一个卷积神经网络(CNN)模型提取出来的。然后,深度图和颜色图的显著性图是由生成的区域特征向量以及区域的显著性标签通过一个三层的神经网络生成的。最后,3D图像的显著性图是将颜色图像和深度图像的显著性图通过一个线性融合生成的。The framework of the visual saliency model based on deep learning features proposed by the present invention comprises three main steps, namely, extraction of depth features, generation of saliency maps, fusion of saliency maps, and the description of the framework is as shown in FIG. Show. First, the depth feature vectors of the color image and the depth image are extracted by a convolutional neural network (CNN) model. Then, the saliency maps of the depth map and the color map are generated by the generated region feature vector and the region's saliency label through a three-layer neural network. Finally, the saliency map of the 3D image is generated by a linear fusion of the saliency maps of the color image and the depth image.
图2示出了本发明实施例提供的3D图像的显著性检测方法的流程,具体包括:FIG. 2 is a flowchart of a method for detecting a saliency of a 3D image according to an embodiment of the present invention, which specifically includes:
(1)深度学习特征提取(1) Deep learning feature extraction
基于人类视觉系统的理论知识,视觉注意力机制包含了一个从粗略到精细的分层选择处理。因此,在特征提取之前我们先对图像进行多层次的分割。之后在针对每一层的每一个分割区域进行特征的提取。Based on the theoretical knowledge of the human visual system, the visual attention mechanism contains a hierarchical selection process from coarse to fine. Therefore, we perform multi-level segmentation of the image before feature extraction. Feature extraction is then performed for each of the segmented regions of each layer.
A.多层次的图像分割A. Multi-level image segmentation
在我们的研究中,我们关注的是基于深度图的3D图像格式,它的每一幅颜色图都与一幅深度图有关联。对于每一个3D图像,我们将颜色图以及其相关的深度图分别降解为多层次的没有重叠的图像区域。为了方便来说,我们假定所分层次的总数目为L。对于每一层j,颜色图Ic和深度图Id分割成无重叠的区域集分别被表示为
Figure PCTCN2016093637-appb-000013
Figure PCTCN2016093637-appb-000014
其中
Figure PCTCN2016093637-appb-000015
Figure PCTCN2016093637-appb-000016
分别表示的是颜色图和深度图的第j层索引为mj和nj的区域,mj和nj表示的是图像第j层区域索引,并且其是按照由最粗略到最精细的划分方式。
In our research, we focus on a 3D image format based on depth maps, each of which is associated with a depth map. For each 3D image, we decompose the color map and its associated depth map into multiple levels of unoverlapping image areas. For convenience, we assume that the total number of levels is L. For each layer j, the color map Ic and the depth map I d are divided into non-overlapping regions, respectively, which are represented as
Figure PCTCN2016093637-appb-000013
with
Figure PCTCN2016093637-appb-000014
among them
Figure PCTCN2016093637-appb-000015
with
Figure PCTCN2016093637-appb-000016
Respectively represent the color maps and depth maps of layer j index region m j and n j's, m j and n j represents the image area index layer j, and it is according to the most finely divided by the crudest to the way.
B.基于CNN的特征提取B. Feature extraction based on CNN
由于深度图像获取技术的匮乏,在3D显著性检测领域内公开可用的基准数据的数量并不够多。那么因此,很难依赖于这些可用数据集去训练一个精准的CNN模型,该模型是由3D图像的颜色图和深度图以及其显著性图作为训练数据,通过训练之后得到的关于3D图像显著性图的神经网络模型。在特征提取时,我们采用了一个已经预先训练好的叫做卷积神经网络的模型分别去提取深度图和颜色图的特征。该模型是在ImageNet数据集上训练的,它有五个卷积层和三个全连接层,其是一个用于图像分类的神经网络模型。Due to the lack of depth image acquisition techniques, the number of publicly available benchmark data in the field of 3D significance detection is not sufficient. So, it is difficult to rely on these available data sets to train a precise CNN model, which is based on the color and depth maps of the 3D image and its saliency map as training data, and the 3D image saliency obtained through training. The neural network model of the graph. In feature extraction, we used a model that has been pre-trained, called a convolutional neural network, to extract the features of the depth and color maps. The model is trained on the ImageNet dataset and has five convolutional layers and three fully connected layers, which are a neural network model for image classification.
据我们所了解的,每一个本地区域的显著性并不仅仅只是依赖与其自身所拥有的特性,它往往还会受到其邻域的内容以及其背景信息(也就是除去该区域的剩余部分)的影响。因此,对于一个深度图像的分割的每一层j,我们对该层所分割的每一个本地区域
Figure PCTCN2016093637-appb-000017
它的相邻区域
Figure PCTCN2016093637-appb-000018
以及背景区域
Figure PCTCN2016093637-appb-000019
分别用CNN模型去提取各自的特征向量。在这里我们定义的本地区域由于在分割的过程中其形状是不规则的,因此,我们就采用一个矩形区域去作为该图像区域的边框。将每一个区域矩形重新调节为227x227的像素大小并将其送入到CNN模型中去。每一个区域最后得到的输出是一个12288维的特征向量并将其表示为
Figure PCTCN2016093637-appb-000020
这里。对于颜色图像来说,其操作与深度图像是一致的,它的本地区域
Figure PCTCN2016093637-appb-000021
的特征向量被表示为
Figure PCTCN2016093637-appb-000022
As far as we know, the significance of each local area is not only dependent on its own characteristics, it is often subject to the content of its neighborhood and its background information (that is, the rest of the area is removed). influences. Therefore, for each layer j of the segmentation of a depth image, we divide each local region of the layer
Figure PCTCN2016093637-appb-000017
Its adjacent area
Figure PCTCN2016093637-appb-000018
And the background area
Figure PCTCN2016093637-appb-000019
The CNN model is used to extract the respective feature vectors. The local area we defined here is irregular because of its shape during the segmentation process. Therefore, we use a rectangular area as the border of the image area. Resize each area rectangle to a pixel size of 227x227 and feed it into the CNN model. The final output from each region is a 12288-dimensional feature vector and is represented as
Figure PCTCN2016093637-appb-000020
Here. For a color image, its operation is consistent with the depth image, its local area
Figure PCTCN2016093637-appb-000021
The feature vector is represented as
Figure PCTCN2016093637-appb-000022
(2)显著性图的生成(2) Generation of saliency map
输出的特征向量仅仅只是当前的本地区域的一个稀疏表示。为了确定当前区域是显著还是非显著,我们需要建立一个特征向量与显著性标签的映射函数。我们训练了一个神经网络(NN)模型,该模型有一个输出层和两个全连接的隐藏层。特征向量为该神经网络的输入,其输出为当前区域的显著性标签。该标签为1则表示当前区域是显著的,为0则表示非显著。为颜色图像和深度图像分别训练了NN模型。深度图像和颜色图像的区域的显著性标签与特征向量之间的映射函数分别被表示为
Figure PCTCN2016093637-appb-000023
Figure PCTCN2016093637-appb-000024
在同一个区域里的所有像素都共用 同一个显著性标签,该显著性标签的来源于基准(ground truth)数据。最后,深度图像的显著性图被由(1)生成:
The output feature vector is just a sparse representation of the current local area. To determine if the current region is significant or not, we need to create a mapping function for the feature vector and the significance tag. We trained a neural network (NN) model with an output layer and two fully connected hidden layers. The eigenvector is the input to the neural network, and its output is the saliency label of the current region. A value of 1 indicates that the current region is significant, and a value of 0 indicates non-significant. The NN model is trained separately for the color image and the depth image. The mapping function between the saliency label and the feature vector of the region of the depth image and the color image is expressed as
Figure PCTCN2016093637-appb-000023
with
Figure PCTCN2016093637-appb-000024
All pixels in the same region share the same saliency tag, which is derived from ground truth data. Finally, the saliency map of the depth image is generated by (1):
Figure PCTCN2016093637-appb-000025
Figure PCTCN2016093637-appb-000025
颜色图像的显著性图被由(2)生成:The saliency map of the color image is generated by (2):
Figure PCTCN2016093637-appb-000026
Figure PCTCN2016093637-appb-000026
其中,x表示的是深度图像的区域
Figure PCTCN2016093637-appb-000027
和颜色图像的区域
Figure PCTCN2016093637-appb-000028
中的像素。
Figure PCTCN2016093637-appb-000029
Figure PCTCN2016093637-appb-000030
分别表示的是深度图像和颜色图像的权重因子。
Where x represents the area of the depth image
Figure PCTCN2016093637-appb-000027
And the area of the color image
Figure PCTCN2016093637-appb-000028
The pixels in .
Figure PCTCN2016093637-appb-000029
with
Figure PCTCN2016093637-appb-000030
The weighting factors of the depth image and the color image are respectively indicated.
(3)显著性图的融合和增强(3) Fusion and enhancement of saliency maps
为了获得一个精确的3D图像的视觉显著性图,我们有必要将深度显著性图和颜色显著性图融合在一起。在经过显著性图生成步骤,我们获取到了深度图和颜色图各自的显著性图,分别表示为Sd和Sc,通过一个线性融合的方法来生成3D图像的最终显著性图。计算公式如下:In order to obtain a visual saliency map of a precise 3D image, it is necessary to fuse the depth saliency map with the color saliency map. After the saliency map generation step, we obtain the saliency maps of the depth map and the color map, respectively denoted as S d and S c , respectively, and generate a final saliency map of the 3D image by a linear fusion method. Calculated as follows:
S=w·Sc+(1-w)·Sd……(3)S=w·S c +(1-w)·S d ......(3)
其中w是用来调整深度和颜色显著性图的贡献权重。调整的方式就是通过给w设定0~1之间的数值,来进行实验得到最终的显著性图。然后通过一系列的评估指标SIM、EMD和CC来判断生成的3D图像显著性图的准确性,以此来调整该w值。这种行为也被称为自回归的方法。同时,为了更进一步的提升该模型的性能,我们采用了一个目前广泛使用的中心偏置机制来增强最终的显著性图。Where w is the contribution weight used to adjust the depth and color significance maps. The way to adjust is to set the final saliency map by setting the value between 0 and 1 for w. The w value is then adjusted by a series of evaluation indicators SIM, EMD and CC to determine the accuracy of the generated 3D image saliency map. This behavior is also known as the method of autoregression. At the same time, in order to further improve the performance of the model, we have adopted a currently widely used center bias mechanism to enhance the final saliency map.
本发明提供的这种方法是一个基于深度学习特征的3D视觉显著性检测模型。首先,该技术的第一个优点就是没有使用传统的手动提取特征的方式,而是采用了通过深度卷积神经网络来提取颜色图和深度图的特征信息,这样做的好处就是神经网络提取的特征相较于手动提取会排出一些由于人为因素导致提取的特征不够精确,并且手动提取特征的话工程量大,耗费人力资源多;第二个优点就是在计算3D图像显著性图时,将深度信息和颜色信息一起考虑,传统 的显著性模型大多数都是针对2D图像进行提取的;第三个优点就是在我们使用了一个神经网络模型NN来充当一个回归器,也就是通过提取的图像区域特征与该区域的显著性标签来进行显著性值的估计,从而生成颜色图和深度图的显著性图。The method provided by the present invention is a 3D visual saliency detection model based on deep learning features. First of all, the first advantage of this technology is that instead of using the traditional manual extraction feature, the deep convolutional neural network is used to extract the feature information of the color map and the depth map. The advantage of this is that the neural network extracts. Compared with manual extraction, the features are not accurate enough due to human factors, and the manual extraction of features is a large amount of engineering and consumes a lot of human resources. The second advantage is that when calculating the 3D image saliency map, the depth information will be Considering with color information, tradition Most of the saliency models are extracted for 2D images; the third advantage is that we use a neural network model NN to act as a regression, that is, through the extracted image region features and the region's saliency tags. The saliency value is estimated to generate a saliency map of the color map and the depth map.
在本发明实施例中,图像分割就有很多种方法,分别有区域增长、像素聚类等。特征提取的卷积神经网络模型也有很多,比如GoogleNet等。In the embodiment of the present invention, there are many methods for image segmentation, including region growth, pixel clustering, and the like. There are also many convolutional neural network models for feature extraction, such as GoogleNet.
将提出的模型同现有的模型做了比较,比较的模型包括李的基于多尺度的模型(表示为VSMD),
Figure PCTCN2016093637-appb-000031
的基于小波域的模型(表示为SDLL),方的2D显著性模型(表示为SSDF2D)以及方的3D显著性模型(表示为SSDF3D)被作为基准模型。VSMD,SDLL和SSDF2D这三个模型主要是针对2D图像的显著性计算,SSDF3D模型是针对3D图像的。实验过程中我们提出的模型的分层的总数目L是15。w设置为0.5。
The proposed model is compared with the existing model, which includes Li's multi-scale based model (expressed as VSMD).
Figure PCTCN2016093637-appb-000031
The wavelet domain-based model (represented as SDLL), the square 2D saliency model (denoted as SSDF2D), and the square 3D saliency model (denoted as SSDF3D) are used as the baseline model. The three models VSMD, SDLL and SSDF2D are mainly for the saliency calculation of 2D images, and the SSDF3D model is for 3D images. The total number L of layers of the model we proposed during the experiment was 15. w is set to 0.5.
为了验证3D视觉显著性检测模型的性能,我们在两个目前广泛使用的公开数据集上测试了这些模型。这两个具有代表性的数据集分别是NUS3D-saliency数据集和NCTU-3DFixation数据集。同时我们采用了三个评估准则在实验的过程中。分别是Pearson Correlation Coefficient(CC)、Earth Mover’s Distance(EMD)和Similarity score(SIM)。这是三个评估所提出模型性能的准则。那么一个好的模型就要有高的CC和SIM分数,但是EMD的分数要低。我们提出的模型比起其他的2D显著性模型获得了较好的性能在NCTU和NUS这两个数据集上如图表1所示。例如,我们提出的模型的CC,EMD和SIM的分数分别是0.5225,2.1547,和0.4985。但是VSMD模型的CC,EMD和SIM的分数仅是0.3783,2.8419和0.3812。这个实验结果表明了3D图像显著性检测模型的性能受益于融合了颜色显著性图和深度显著性图。 To validate the performance of the 3D visual saliency detection model, we tested these models on two publicly available data sets that are currently widely used. The two representative data sets are the NUS3D-saliency data set and the NCTU-3DFixation data set. At the same time we used three evaluation criteria in the course of the experiment. They are Pearson Correlation Coefficient (CC), Earth Mover's Distance (EMD), and Similarity score (SIM). These are three criteria for evaluating the performance of the proposed model. Then a good model will have high CC and SIM scores, but EMD scores should be low. The proposed model achieves better performance than the other 2D saliency models. The two data sets NCTU and NUS are shown in Figure 1. For example, the CC, EMD, and SIM scores for our proposed model are 0.5225, 2.1547, and 0.4985, respectively. However, the CC, EMD and SIM scores for the VSMD model are only 0.3783, 2.8419 and 0.3812. The experimental results show that the performance of the 3D image saliency detection model benefits from the fusion of color saliency maps and depth saliency maps.
Figure PCTCN2016093637-appb-000032
Figure PCTCN2016093637-appb-000032
表1.在CC,EMD和SIM准则下,模型在两个数据集上的性能。Table 1. Performance of the model on two data sets under CC, EMD and SIM criteria.
我们提出的3D模型与SSDF3D模型在两个数据集上性能比较的结果同样在表一中给出,很明显的可以看出在NUS这个数据集上我们提出的模型的CC,EMD和SIM的分数是优于SSDF3D模型的。对于NCTU这个数据集,我们的模型的CC分数少于SSDF3D模型,但是其他的EMD和SIM分数还是优于SSDF3D模型的。为了进一步的说明,我们将该模型的中一些蒋策样本在图二中给出,同时也可以看出我们提出的模型能够得到最好的性能。The results of the performance comparison between the proposed 3D model and the SSDF3D model on the two datasets are also given in Table 1. It is obvious that the CC, EMD and SIM scores of the proposed model on the NUS dataset are shown. It is better than the SSDF3D model. For the NCTU dataset, our model has fewer CC scores than the SSDF3D model, but other EMD and SIM scores are better than the SSDF3D model. For further explanation, we give some of the Jiangze samples of the model in Figure 2. It can also be seen that the proposed model can get the best performance.
如图3所示,从左到右是四张来自NUS数据集的四张样本图像。从第二行到最后一行,所给出结果的模型的顺序为SSDF2D模型,SDLL模型,VSMD模型,SSDF3D模型以及我们所提出的模型,从该图中可以明显的看出我们的模型得出的结果在视觉上是优于其他的模型的,原始图像中的显著性区域都明显检测出来并且比较清晰,同时也可以看出我们提出的模型能够得到最好的性能。As shown in Figure 3, from left to right are four sample images from the NUS data set. From the second line to the last line, the order of the model given is the SSDF2D model, the SDLL model, the VSMD model, the SSDF3D model, and the model we proposed. From this figure, we can clearly see the results of our model. The results are visually superior to the other models. The saliency areas in the original image are clearly detected and clear, and we can see that the proposed model can get the best performance.
本发明提出了基于深度学习特征的3D图像的视觉显著性检测模型。在我们的方法中有三个关键的因素。首先,我们通过CNN模型对颜色图像和深度图像分别进行多尺度区域的深度学习特征提取。其次,深度图像(或者颜色图像)的显著性图是通过NN模型基于深度特征向量和区域的显著性标签来生成的,NN模型在此相当于分类器的作用。最后,我们采用线性融合的方法其结合了深度显著性图和颜色显著性图从而生成了最终的3D图像的显著性图。并且我们还采用了中心偏置机制来增强显著性图。我们提出的模型在这个两个公开可用的数据集上取得了卓越的性能。The present invention proposes a visual saliency detection model of a 3D image based on deep learning features. There are three key factors in our approach. First, we use the CNN model to perform depth learning feature extraction for multi-scale regions on color images and depth images, respectively. Second, the saliency map of the depth image (or color image) is generated by the NN model based on the depth feature vector and the saliency label of the region, which is here equivalent to the role of the classifier. Finally, we use a linear fusion method that combines a depth saliency map with a color saliency map to generate a saliency map of the final 3D image. And we also used a central biasing mechanism to enhance the saliency map. Our proposed model achieves excellent performance on these two publicly available data sets.
本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并 不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。 Those skilled in the art will readily appreciate that the above description is only a preferred embodiment of the present invention, and It is intended that the present invention be construed as being limited by the scope of the invention.

Claims (8)

  1. 一种3D图像的显著性检测方法,其特征在于,包括下述步骤:A method for detecting the significance of a 3D image, comprising the steps of:
    (1)对颜色图像和深度图像的深度特征向量进行提取;(1) extracting depth feature vectors of color images and depth images;
    (2)根据三层的神经网络以及提取的颜色图像和深度图像的深度特征向量生成深度图和颜色图的显著性图;(2) generating a saliency map of the depth map and the color map according to the three-layer neural network and the extracted depth image of the color image and the depth image;
    (3)将所述颜色图像和深度图像的显著性图进行线性融合处理后获得3D图像的显著性图。(3) Performing a linear fusion process on the saliency map of the color image and the depth image to obtain a saliency map of the 3D image.
  2. 如权利要求1所述的显著性检测方法,其特征在于,步骤(1)具体为:The saliency detecting method according to claim 1, wherein the step (1) is specifically:
    (1.1)将颜色图像和与所述颜色图像相关的深度图像分别进行图像分割后,获得多层次的没有重叠的图像区域;(1.1) separately performing image segmentation on the color image and the depth image associated with the color image, and obtaining multi-level image regions without overlapping;
    (1.2)采用卷积神经网络模型分别对分割后的颜色图像和深度图像的特征向量进行提取。(1.2) The feature vector of the segmented color image and depth image is extracted by convolutional neural network model.
  3. 如权利要求2所述的显著性检测方法,其特征在于,所述卷积神经网络模型的结构为五个卷积层和三个全连接层;对于每一层的网络设定不同的网络参数配置。The saliency detecting method according to claim 2, wherein the convolutional neural network model has five convolutional layers and three fully connected layers; and different network parameters are set for each layer of the network. Configuration.
  4. 如权利要求1所述的显著性检测方法,其特征在于,步骤(2)中,根据神经网络模型生成深度图和颜色图的显著性图;The saliency detecting method according to claim 1, wherein in step (2), a saliency map of the depth map and the color map is generated according to the neural network model;
    其中,所述神经网络模型有一个输出层和两个全连接的隐藏层,所述神经网络模型的输入为特征向量,输出为当前区域的显著性标签,当显著性标签为1时则表示当前区域是显著的,当显著性标签为0时则表示当前区域是非显著。Wherein, the neural network model has an output layer and two fully connected hidden layers, the input of the neural network model is a feature vector, and the output is a saliency label of the current region, and when the saliency tag is 1, the current The area is significant, and when the significance label is 0, it indicates that the current area is non-significant.
  5. 如权利要求4所述的显著性检测方法,其特征在于,所述深度图像的显著性图由公式
    Figure PCTCN2016093637-appb-100001
    生成,其中,x表示深度图像的区域
    Figure PCTCN2016093637-appb-100002
    中的像素,
    Figure PCTCN2016093637-appb-100003
    表示深度图第j层的权重因子,L表示分层的总数目,Sd(x)表示深度图像的显著性图,j表示深度图的层数,
    Figure PCTCN2016093637-appb-100004
    表示深度图的第j层索引为i的分割 区域,
    Figure PCTCN2016093637-appb-100005
    表示一个映射函数,主要描述深度图的本地区域
    Figure PCTCN2016093637-appb-100006
    的特征向量与该区域的显著性标签之间的关系。
    The saliency detecting method according to claim 4, wherein the saliency map of the depth image is represented by a formula
    Figure PCTCN2016093637-appb-100001
    Generated, where x represents the area of the depth image
    Figure PCTCN2016093637-appb-100002
    Pixels in,
    Figure PCTCN2016093637-appb-100003
    Represents the weighting factor of the jth layer of the depth map, L represents the total number of layers, S d (x) represents the saliency map of the depth image, and j represents the number of layers of the depth map,
    Figure PCTCN2016093637-appb-100004
    The segmentation area representing the j-th layer index of the depth map is i,
    Figure PCTCN2016093637-appb-100005
    Represents a mapping function that primarily describes the local area of the depth map
    Figure PCTCN2016093637-appb-100006
    The relationship between the feature vector and the saliency label of the region.
  6. 如权利要求4所述的显著性检测方法,其特征在于,所述颜色图像的显著性图由公式
    Figure PCTCN2016093637-appb-100007
    生成,其中,x表示颜色图像的区域
    Figure PCTCN2016093637-appb-100008
    中的像素,
    Figure PCTCN2016093637-appb-100009
    表示颜色图第j层的的权重因子,L表示分层的总数目,j表示颜色图的层数,
    Figure PCTCN2016093637-appb-100010
    表示颜色图的第j层索引为i的分割区域,
    Figure PCTCN2016093637-appb-100011
    表示一个映射函数,主要描述颜色图的本地区域
    Figure PCTCN2016093637-appb-100012
    的特征向量与该区域的显著性标签之间的关系。
    The saliency detecting method according to claim 4, wherein the saliency map of the color image is represented by a formula
    Figure PCTCN2016093637-appb-100007
    Generated, where x represents the area of the color image
    Figure PCTCN2016093637-appb-100008
    Pixels in,
    Figure PCTCN2016093637-appb-100009
    The weighting factor representing the jth layer of the color map, L is the total number of layers, and j is the number of layers of the color map.
    Figure PCTCN2016093637-appb-100010
    a segmentation area indicating that the jth layer index of the color map is i,
    Figure PCTCN2016093637-appb-100011
    Represents a mapping function that primarily describes the local area of the color map
    Figure PCTCN2016093637-appb-100012
    The relationship between the feature vector and the saliency label of the region.
  7. 如权利要求1-6任一项所述的显著性检测方法,其特征在于,所述3D图像的显著性图S=w·Sc+(1-w)·Sd,其中,Sd为深度图像的显著性图,Sc为颜色图像的显著性图,w为颜色图的显著性图在最终的3D图像的视觉显著性图的贡献权重。The saliency detecting method according to any one of claims 1 to 6, wherein the saliency map of the 3D image is S = w · S c + (1 - w) · S d , wherein S d is The saliency map of the depth image, S c is the saliency map of the color image, and w is the contribution weight of the saliency map of the color map in the visual saliency map of the final 3D image.
  8. 如权利要求7所述的显著性检测方法,其特征在于,分层的总数目L是15,权重w为0.5。 The saliency detecting method according to claim 7, wherein the total number L of stratifications is 15 and the weight w is 0.5.
PCT/CN2016/093637 2016-08-05 2016-08-05 Significance testing method for 3d image WO2018023734A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/093637 WO2018023734A1 (en) 2016-08-05 2016-08-05 Significance testing method for 3d image
CN201680000652.2A CN106462771A (en) 2016-08-05 2016-08-05 3D image significance detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/093637 WO2018023734A1 (en) 2016-08-05 2016-08-05 Significance testing method for 3d image

Publications (1)

Publication Number Publication Date
WO2018023734A1 true WO2018023734A1 (en) 2018-02-08

Family

ID=58215885

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/093637 WO2018023734A1 (en) 2016-08-05 2016-08-05 Significance testing method for 3d image

Country Status (2)

Country Link
CN (1) CN106462771A (en)
WO (1) WO2018023734A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097115A (en) * 2019-04-28 2019-08-06 南开大学 A kind of saliency object detecting method based on attention metastasis
CN111275642A (en) * 2020-01-16 2020-06-12 西安交通大学 Low-illumination image enhancement method based on significant foreground content
WO2020119624A1 (en) * 2018-12-12 2020-06-18 中国科学院深圳先进技术研究院 Class-sensitive edge detection method based on deep learning
CN111583171A (en) * 2020-02-19 2020-08-25 西安工程大学 Insulator defect detection method integrating foreground compact characteristic and multi-environment information
CN111914850A (en) * 2019-05-07 2020-11-10 百度在线网络技术(北京)有限公司 Picture feature extraction method, device, server and medium
CN112488122A (en) * 2020-11-25 2021-03-12 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network
CN112990226A (en) * 2019-12-16 2021-06-18 中国科学院沈阳计算技术研究所有限公司 Salient object detection method based on machine learning
US11729407B2 (en) 2018-10-29 2023-08-15 University Of Washington Saliency-based video compression systems and methods

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106993186B (en) * 2017-04-13 2019-04-30 宁波大学 A kind of stereo-picture conspicuousness detection method
CN107423747B (en) * 2017-04-13 2019-09-20 中国人民解放军国防科学技术大学 A kind of conspicuousness object detection method based on depth convolutional network
CN107403430B (en) * 2017-06-15 2020-08-07 中山大学 RGBD image semantic segmentation method
CN107240107B (en) * 2017-06-30 2019-08-09 福州大学 A kind of first appraisal procedure of conspicuousness detection based on image retrieval
CN109257592B (en) * 2017-07-12 2020-09-01 天津大学 Stereoscopic video quality objective evaluation method based on deep learning
CN107886533B (en) * 2017-10-26 2021-05-04 深圳大学 Method, device and equipment for detecting visual saliency of three-dimensional image and storage medium
CN109960979A (en) * 2017-12-25 2019-07-02 大连楼兰科技股份有限公司 Vehicle checking method based on image layered technology
CN108345892B (en) * 2018-01-03 2022-02-22 深圳大学 Method, device and equipment for detecting significance of stereo image and storage medium
CN108090468B (en) * 2018-01-05 2019-05-03 百度在线网络技术(北京)有限公司 Method and apparatus for detecting face
CN108154147A (en) * 2018-01-15 2018-06-12 中国人民解放军陆军装甲兵学院 The region of interest area detecting method of view-based access control model attention model
CN108460348B (en) * 2018-02-12 2022-04-22 杭州电子科技大学 Road target detection method based on three-dimensional model
CN108846416A (en) * 2018-05-23 2018-11-20 北京市新技术应用研究所 The extraction process method and system of specific image
WO2020077604A1 (en) * 2018-10-19 2020-04-23 深圳大学 Image semantic segmentation method, computer device, and storage medium
CN110175986B (en) * 2019-04-23 2021-01-08 浙江科技学院 Stereo image visual saliency detection method based on convolutional neural network
CN110223295B (en) * 2019-06-21 2022-05-03 安徽大学 Significance prediction method and device based on deep neural network color perception
CN111611834A (en) * 2019-12-23 2020-09-01 珠海大横琴科技发展有限公司 Ship identification method and device based on SAR
CN111242138B (en) * 2020-01-11 2022-04-01 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
CN111524090A (en) * 2020-01-13 2020-08-11 镇江优瞳智能科技有限公司 Depth prediction image-based RGB-D significance detection method
CN113436245B (en) * 2021-08-26 2021-12-03 武汉市聚芯微电子有限责任公司 Image processing method, model training method, related device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7397851B2 (en) * 2001-05-10 2008-07-08 Roman Kendyl A Separate plane compression
CN102158712A (en) * 2011-03-22 2011-08-17 宁波大学 Multi-viewpoint video signal coding method based on vision
CN104103033A (en) * 2014-08-05 2014-10-15 四川九成信息技术有限公司 Image real-time processing method
CN104850836A (en) * 2015-05-15 2015-08-19 浙江大学 Automatic insect image identification method based on depth convolutional neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834933B (en) * 2014-02-10 2019-02-12 华为技术有限公司 A kind of detection method and device in saliency region
CN104318569B (en) * 2014-10-27 2017-02-22 北京工业大学 Space salient region extraction method based on depth variation model
CN105404888B (en) * 2015-11-16 2019-02-05 浙江大学 The conspicuousness object detection method of color combining and depth information
CN105701508B (en) * 2016-01-12 2017-12-15 西安交通大学 Global local optimum model and conspicuousness detection algorithm based on multistage convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7397851B2 (en) * 2001-05-10 2008-07-08 Roman Kendyl A Separate plane compression
CN102158712A (en) * 2011-03-22 2011-08-17 宁波大学 Multi-viewpoint video signal coding method based on vision
CN104103033A (en) * 2014-08-05 2014-10-15 四川九成信息技术有限公司 Image real-time processing method
CN104850836A (en) * 2015-05-15 2015-08-19 浙江大学 Automatic insect image identification method based on depth convolutional neural network

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11729407B2 (en) 2018-10-29 2023-08-15 University Of Washington Saliency-based video compression systems and methods
WO2020119624A1 (en) * 2018-12-12 2020-06-18 中国科学院深圳先进技术研究院 Class-sensitive edge detection method based on deep learning
CN110097115A (en) * 2019-04-28 2019-08-06 南开大学 A kind of saliency object detecting method based on attention metastasis
CN111914850A (en) * 2019-05-07 2020-11-10 百度在线网络技术(北京)有限公司 Picture feature extraction method, device, server and medium
CN111914850B (en) * 2019-05-07 2023-09-19 百度在线网络技术(北京)有限公司 Picture feature extraction method, device, server and medium
CN112990226A (en) * 2019-12-16 2021-06-18 中国科学院沈阳计算技术研究所有限公司 Salient object detection method based on machine learning
CN111275642A (en) * 2020-01-16 2020-06-12 西安交通大学 Low-illumination image enhancement method based on significant foreground content
CN111275642B (en) * 2020-01-16 2022-05-20 西安交通大学 Low-illumination image enhancement method based on significant foreground content
CN111583171A (en) * 2020-02-19 2020-08-25 西安工程大学 Insulator defect detection method integrating foreground compact characteristic and multi-environment information
CN111583171B (en) * 2020-02-19 2023-04-07 西安工程大学 Insulator defect detection method integrating foreground compact characteristic and multi-environment information
CN112488122A (en) * 2020-11-25 2021-03-12 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network
CN112488122B (en) * 2020-11-25 2024-04-16 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network

Also Published As

Publication number Publication date
CN106462771A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
WO2018023734A1 (en) Significance testing method for 3d image
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN113936339B (en) Fighting identification method and device based on double-channel cross attention mechanism
CN107622104B (en) Character image identification and marking method and system
AU2014368997B2 (en) System and method for identifying faces in unconstrained media
Yin et al. Hot region selection based on selective search and modified fuzzy C-means in remote sensing images
Wu et al. Research on image text recognition based on canny edge detection algorithm and k-means algorithm
CN101520894B (en) Method for extracting significant object based on region significance
CN106126585B (en) The unmanned plane image search method combined based on quality grading with perceived hash characteristics
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN110827312B (en) Learning method based on cooperative visual attention neural network
CN108389189B (en) Three-dimensional image quality evaluation method based on dictionary learning
CN111563408B (en) High-resolution image landslide automatic detection method with multi-level perception characteristics and progressive self-learning
CN106203448B (en) A kind of scene classification method based on Nonlinear Scale Space Theory
Zhang et al. Deep learning features inspired saliency detection of 3D images
Xiao et al. Multiresolution-Based Rough Fuzzy Possibilistic C-Means Clustering Method for Land Cover Change Detection
CN102542590B (en) High-resolution SAR (Synthetic Aperture Radar) image marking method based on supervised topic model
Zemin et al. Image classification optimization algorithm based on SVM
CN106570124B (en) Remote sensing images semantic retrieving method and system based on object level correlation rule
CN112598043B (en) Collaborative saliency detection method based on weak supervised learning
Li et al. A review of advances in image inpainting research
Chen et al. Mmml: Multi-manifold metric learning for few-shot remote sensing image scene classification
Wang et al. Surface and underwater human pose recognition based on temporal 3D point cloud deep learning
Zhu et al. [Retracted] Basketball Object Extraction Method Based on Image Segmentation Algorithm
Ning et al. Construction of multi-channel fusion salient object detection network based on gating mechanism and pooling network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16911329

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 31.05.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16911329

Country of ref document: EP

Kind code of ref document: A1