WO2018119807A1 - 一种基于卷积神经网络的时空一致性深度图序列的生成方法 - Google Patents

一种基于卷积神经网络的时空一致性深度图序列的生成方法 Download PDF

Info

Publication number
WO2018119807A1
WO2018119807A1 PCT/CN2016/112811 CN2016112811W WO2018119807A1 WO 2018119807 A1 WO2018119807 A1 WO 2018119807A1 CN 2016112811 W CN2016112811 W CN 2016112811W WO 2018119807 A1 WO2018119807 A1 WO 2018119807A1
Authority
WO
WIPO (PCT)
Prior art keywords
superpixel
depth
neural network
convolutional neural
consistency
Prior art date
Application number
PCT/CN2016/112811
Other languages
English (en)
French (fr)
Inventor
王勋
赵绪然
Original Assignee
浙江工商大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江工商大学 filed Critical 浙江工商大学
Priority to US16/067,819 priority Critical patent/US10540590B2/en
Priority to PCT/CN2016/112811 priority patent/WO2018119807A1/zh
Publication of WO2018119807A1 publication Critical patent/WO2018119807A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/13Differential equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the invention relates to the field of computer vision stereoscopic video, in particular to a method for generating a spatiotemporal consistency depth map sequence based on a convolutional neural network.
  • the basic principle of stereoscopic video is to superimpose two images with horizontal parallax, and the viewer sees the left and right eyes through the stereo glasses respectively, thereby generating stereoscopic perception.
  • Stereoscopic video can provide people with an immersive three-dimensional look and feel, which is very popular among consumers.
  • 3D video hardware continues to rise, the shortage of 3D video content has followed.
  • Direct shooting by 3D cameras is costly, and post-production is difficult, and is usually only used in high-cost movies. Therefore, the 2D/3D conversion technology of film and television works is an effective way to solve the problem of shortage of film sources. It can not only greatly expand the theme and quantity of stereoscopic films, but also allow some classic film and television works to return to the screen.
  • the depth map can be generated by manually mapping each frame of the video and giving depth values, but at a very high cost.
  • there are some semi-automatic depth map generation methods that is, the depth map of some key frames in the video is manually drawn, and the computer extends these depth maps to other adjacent frames through a propagation algorithm.
  • the fully automated depth recovery method maximizes labor costs.
  • Some algorithms can recover depth maps using specific rules through depth cues such as motion, focus, occlusion, or shading, but are usually only valid for a particular scene.
  • the method based on the motion inference structure can restore the depth of the static scene captured by the moving camera according to the clue that the relative displacement of the distant object between the adjacent frames is small and the relative displacement of the near object is large, but the method is moving the camera or the camera is stationary. Invalid in the case; the depth-based restoration method based on focus can restore the depth of the shallow depth image, but the effect is poor in the case of large depth of field.
  • Film and television works usually contain various scenes, so depth recovery methods based on depth cues are difficult to apply universally.
  • Convolutional neural network is a deep neural network especially suitable for images. It consists of a stack of basic elements such as convolutional layer, active layer, pooling layer and loss layer. It can simulate the complex function of image input x to specific output y. It has occupied a dominant position in solving various machine vision problems such as image classification and image segmentation. In the past year or two, some methods have used convolutional neural networks for deep recovery, and used a large amount of data to learn the mapping from RGB image input to depth map output. Deep recovery based on convolutional neural networks does not depend on various assumptions, has good universality, and has high recovery accuracy. This has great application potential in the 2D-3D conversion of film and television works.
  • the existing methods are based on single image optimization when training convolutional neural networks, while ignoring the continuity relationship between frames. If used to restore the depth of the image sequence, the depth map recovered from adjacent frames will undergo significant transitions. The depth map hopping of adjacent frames will cause the flash of the synthesized virtual view, which seriously affects the user's perception. In addition, inter-frame continuity also provides important clues to depth recovery, which is simply ignored in existing methods.
  • the object of the present invention is to provide a method for generating a space-time uniform depth map sequence based on a convolutional neural network, and to introduce the continuity of the RGB image and the depth map in the time domain into the convolutional neural network.
  • Multi-frame images are jointly optimized during training to generate a depth map that is continuous over the time domain and to improve the accuracy of depth recovery.
  • a method for generating a spatio-temporal consistency depth map sequence based on a convolutional neural network comprising the following steps:
  • Each training sample of the training set is a sequence of consecutive RGB images containing m frames, and its corresponding depth map sequence;
  • a convolutional neural network consisting of a single superpixel deep regression network containing parameters W and a space-time consistency condition containing the parameter a.
  • the role of a single superpixel deep regression network is to regress a depth value for each superpixel without considering the spatiotemporal consistency constraint; the space-time consistency condition of the random field loss layer is to use the time established in step 2)
  • the spatial similarity matrix constrains the output of a single superpixel regression network, and finally outputs a smoothed estimated depth map in the time domain and the spatial domain.
  • the depth map sequence is restored by forward propagation using the trained neural network.
  • step 2) is specifically:
  • S (2.2) Establish a spatial consistency similarity matrix S (s) of the n superpixels by: S (s) is an n ⁇ n matrix, wherein The intra-frame similarity relationship between the p-th superpixel and the qth superpixel is described:
  • c p and c q are the color histogram features of superpixels p and q , respectively, and ⁇ is a manually set parameter that can be set to the value of
  • S (t) is an n ⁇ n matrix, wherein The similarity relationship between the frames of the pth superpixel and the qth superpixel is described:
  • the convolutional neural network constructed in the step 3) is composed of two parts: a single super-pixel deep regression network, and a space-time consistency condition random field loss layer:
  • a single superpixel deep regression network consists of the first 31 layers of the VGG16 network, one superpixel pooling layer, and three fully connected layers. Among them, the super pixel pooling layer averages the features in each super pixel space range.
  • the input of the network is an m-frame continuous RGB image
  • the depth estimate of the p-th superpixel without considering any constraints.
  • the parameters of the convolutional neural network that need to be learned are denoted by W.
  • the first term ⁇ p ⁇ N (d p -z p ) 2 of the energy function is the difference between the predicted value of a single superpixel and the true value; Is a spatial consistency constraint, indicating that if the superpixels p and q are adjacent to the same frame, and the colors are similar ( Larger), the depth should be similar; the third Is a time consistency constraint, indicating that if the superpixels p and q are superpixels corresponding to the same object in two adjacent frames Its depth should be similar.
  • the energy function can be written in matrix form as:
  • S (s) and S (t) are the spatial and temporal similarity matrices obtained in steps (2.2) and (2.3).
  • L -1 represents the inverse matrix of L and
  • represents the determinant value of L.
  • the loss function can be defined as the negative logarithm of the conditional probability function:
  • step 4 the convolutional neural network training process in step 4) is specifically:
  • Tr() denotes the trace of the matrix
  • matrices A (s) and A (t) are the partial derivatives of the matrix L for ⁇ (s) and ⁇ (t) , which are calculated by the following formula:
  • step 5 the method for restoring an RGB image sequence of unknown depth is specifically:
  • matrix L is calculated by the method described in step (3.2). Indicates the depth value of the pth superpixel of the RGB image sequence.
  • the present invention uses a convolutional neural network to learn function mapping from RGB images to depth maps, rather than relying on specific assumptions about the scene, as compared to deep cues-based depth recovery methods;
  • the present invention adds time-space consistency constraints, and constructs a space-time-consistent random field loss layer to jointly optimize multi-frame images.
  • a depth map that outputs spatio-temporal consistency avoids interframe jumps in the depth map.
  • the present invention adds a space-time consistency constraint, which can improve the accuracy of depth recovery.
  • the present invention is disclosed in the public data set NYU depth v2 and an inventor's own data set LYB 3D-TV with Eigen, David, Christian Puhrsch, and Rob Fergus. "Depth map prediction from a single image using a multi-scale deep network "Advances in neural information processing systems. 2014. and other existing methods were compared. The results show that the method proposed by the present invention can significantly improve the time domain continuity of the recovery depth map and improve the accuracy of the depth estimation.
  • Figure 1 is a flow chart of an example of the present invention
  • FIG. 2 is a structural diagram of a convolutional neural network proposed by the present invention.
  • FIG. 3 is a structural diagram of a single superpixel deep regression network
  • FIG. 4 is a schematic diagram of a single superpixel acting on a multi-frame image.
  • the method of the present invention includes the following steps:
  • Each training sample of the training set is a sequence of consecutive RGB images containing m frames, and its corresponding depth map sequence;
  • a convolutional neural network consisting of a single superpixel deep regression network containing parameters W and a space-time consistency condition containing the parameter a.
  • the role of a single superpixel deep regression network is to regress a depth value for each superpixel without considering the spatiotemporal consistency constraint; the role of the space-time consistency condition with the airport loss layer is established using step 2).
  • the similarity matrix in time and space constrains the output of a single superpixel regression network, and finally outputs a smoothed estimated depth map in the time domain and the spatial domain.
  • the depth map sequence is restored by forward propagation using the trained neural network.
  • step 2) is as follows:
  • S (2.2) Establish a spatial consistency similarity matrix S (s) of the n superpixels by: S (s) is an n ⁇ n matrix, wherein The intra-frame similarity relationship between the p-th superpixel and the qth superpixel is described:
  • c p and c q are the color histogram features of superpixels p and q , respectively, and ⁇ is a manually set parameter that can be set to the value of
  • S (t) is an n ⁇ n matrix, wherein The similarity relationship between the frames of the pth superpixel and the qth superpixel is described:
  • step 3 The specific implementation of step 3) is as follows:
  • the convolutional neural network constructed by this method consists of two parts: a single superpixel deep regression network, and a space-time consistency condition random field loss layer.
  • the overall network structure is shown in Figure 2;
  • step (3.2) The single superpixel deep regression network described in step (3.1) is proposed by the document Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition.” arXivpreprint arXiv: 1409.1556 (2014) The first 31 layers of the VGG16 network, two convolutional layers, one superpixel pooling layer, and three fully connected layers are formed. The network structure is shown in FIG. Among them, the features of each superpixel spatial layer in the superpixel pooling layer are averaged, and other convolution, pooling, and activation layers are conventional layers of the convolutional neural network.
  • the network For m-frame continuous RGB image input, the network first acts on each frame separately, for example for a t- th frame image containing n t super-pixels, the network outputs an n t -dimensional vector z t representing each of the frames The superpixels have a deep regression output without considering any constraints. After that, the output of the m-frame image is spliced into one
  • the vector of the dimension z [z 1 ;...,; z n ] represents the estimated depth regression value of a total of n superpixels in the sequence of images, as shown in FIG.
  • the parameters of the convolutional neural network that need to be learned are denoted by W.
  • the first term ⁇ p ⁇ N (d p -z p ) 2 of the energy function is the difference between the predicted value of a single superpixel and the true value; Is a spatial consistency constraint, indicating that if the superpixels p and q are adjacent to the same frame, and the colors are similar ( Larger), the depth should be similar; the third Is a time consistency constraint, indicating that if the superpixels p and q are superpixels corresponding to the same object in two adjacent frames Its depth should be similar.
  • the energy function can be written in matrix form as:
  • S (s) and S (t) are the spatial and temporal similarity matrices obtained in steps (2.2) and (2.3).
  • L -1 represents the inverse matrix of L and
  • represents the determinant value of L.
  • the loss function can be defined as the negative logarithm of the conditional probability function:
  • the convolutional neural network training process in step 4) is specifically:
  • Tr( ⁇ ) is the operation of finding the trace of the matrix; where the matrices A (s) and A (t) are the partial derivatives of the matrix L versus ⁇ (s) and ⁇ (t) , which are calculated by the following formula:
  • step 5 the method for restoring an RGB image sequence of unknown depth is specifically as follows:
  • matrix L is calculated by the method described in step (3.3). Indicates the depth value of the pth superpixel of the RGB image sequence.
  • the present invention compares with other centralized methods in the public data set NYU depth v2 and an inventor's own data set LYB3D-TV.
  • the NYU depth v2 data set consists of 795 training scenes and 654 test scenes, each of which contains 30 consecutive RGB images and their corresponding depth maps.
  • the LYU 3D-TV database is taken from some scenes of the TV series " ⁇ ". We selected 5124 frames in 60 scenes and their manually annotated depth maps as training sets, and 1278 frames in 20 scenes and their manuals. The annotated depth map is used as a test set.
  • Multi-scale CNN Eigen, David, Christian Puhrsch, and Rob Fergus. "Depth map prediction from a single image using a multi-scale deep network.” Advances in neural information processing systems. 2014 (Multi-scale CNN),

Abstract

一种基于卷积神经网络的时空一致性深度图序列的生成方法,可用于影视作品2D转3D技术。该方法包括:1)收集训练集:训练集的每一个训练样本是一个连续RGB图像序列以及其对应的深度图序列;2)对训练集中的每一个图像序列进行时空一致性超像素分割,并且构建空间相似度矩阵和时间相似度矩阵;3)构建由单一超像素深度回归网络以及时空一致性条件随机场损失层构成的卷积神经网络;4)对卷积神经网络进行训练;5)对未知深度的RGB图像序列,使用训练好的神经网络通过前向传播恢复深度图序列。避免了基于线索的深度恢复方法对场景假设依赖过强,以及现有基于卷积神经网络的深度恢复方法生成的深度图帧间不连续的问题。

Description

一种基于卷积神经网络的时空一致性深度图序列的生成方法 技术领域
本发明涉及计算机视觉立体视频领域,具体涉及一种基于卷积神经网络的时空一致性深度图序列的生成方法。
背景技术
立体视频的基本原理是将两幅具有水平视差的影像叠加播放,观众通过立体眼镜分别看到左右眼的画面,从而产生立体感知。立体视频能给人提供身临其境的三维立体观感,深受消费者欢迎。然而随着3D影视硬件的普及度不断上升,3D影视内容的短缺随之而来。直接由3D摄像机拍摄成本高,后期制作难度大,通常只能在大成本电影中使用。因此影视作品的2D/3D转换技术是解决片源紧缺难题的一种有效的途径,不仅能大大拓展立体影片的题材和数量,还能让一些经典的影视作品重返荧屏。
由于立体视频中的左右视差直接与每个像素对应的深度相关,因此获取视频各帧对应的深度图是2D/3D转换技术的关键所在。深度图可以由人工对视频的每一帧抠图并赋予深度值产生,但是成本非常昂贵。同时,也存在一些的半自动的深度图生成方法,即先由人工绘制视频中一些关键帧的深度图,计算机通过传播算法将这些深度图扩展到其他相邻的帧。这些方法虽然能节省了一部分时间,但在大批量处理影视作品2D到3D转换时,仍然需要比较繁重的人工操作。
相比而言,全自动的深度恢复方法可以最大程度的节省人工成本。一些算法可以通过运动,聚焦、遮挡或阴影等深度线索,使用特定的规则恢复出深度图,但是通常只对特定场景有效。例如,基于运动推断结构的方法可以根据相邻帧间远处物体相对位移小、近处物体相对位移大的线索恢复移动摄像机拍摄的静态场景的深度,但是该类方法在拍摄对象移动或摄像机静止的情况下无效;基于聚焦的深度恢复方法可以恢复浅景深图像的深度,但在大景深的情况下效果很差。影视作品中通常包含各种场景,因此基于深度线索的深度恢复方法很难普遍应用。
卷积神经网络是一种特别适用于图像的深度神经网络,它由卷积层,激活层,池化层和损耗层等基本单元堆叠构成,可以模拟图像输入x到特定输出y的复杂函数,在解决图像分类,图像分割等各类机器视觉问题中占据了主导性地位。近一两年来,一些方法将卷积神经网络用于深度恢复,使用大量的数据学习得出从RGB图像输入到深度图输出的映射关系。基于卷积神经网络的深度恢复不依赖于各种假设,具有很好的普适性,而且恢复精度很高,因 此在影视作品的2D-3D转换中有很大的应用潜力。然而,现存的方法在训练卷积神经网络时都是基于单幅图像优化的,而忽略了帧间的连续性关系。如果运用于恢复图像序列的深度,相邻各帧恢复出的深度图会发生明显的跳变。而相邻帧的深度图跳变会造成合成的虚拟视图的闪烁,严重影响用户观感。此外,帧间的连续性也对深度恢复提供了重要线索,而在现存的方法里,这些信息被简单的忽略掉了。
发明内容
本发明的目的在于针对现有技术的不足,提供一种基于卷积神经网络的时空一致性深度图序列的生成方法,将RGB图像和深度图在时域上的连续性引入卷积神经网络中,在训练时将多帧图像联合优化,以生成在时域上连续的深度图,并且改善深度恢复的精确度。
本发明的目的是通过以下技术方案来实现的:一种基于卷积神经网络的时空一致性深度图序列的生成方法,包括如下步骤:
1)收集训练集。训练集的每一个训练样本是一个包含m帧的连续RGB图像序列,以及其对应的深度图序列;
2)对训练集中的每一个图像序列进行时空一致性超像素分割,并且构建空间上的相似度矩阵S(s)和时间上的相似度矩阵S(t)
3)构建卷积神经网络,该神经网络由包含参数W的单一超像素深度回归网络,以及包含参数α的时空一致性条件随机场损失层构成。其中单一超像素深度回归网络的作用是在不考虑时空一致性约束的情况下对每一个超像素回归出一个深度值;时空一致性条件随机场损失层的作用是使用步骤2)中建立的时间和空间上的相似度矩阵对单一超像素回归网络的输出进行约束,最终输出时域和空域上平滑的估计深度图。
4)利用训练集中的RGB图像序列和深度图序列对步骤3)中构建的卷积神经网络进行训练,得出网络参数W和α。
5)对未知深度的RGB图像序列,使用训练好的神经网络通过前向传播恢复深度图序列。
进一步地,所述的步骤2)具体为:
(2.1)对训练集中的每一个连续RGB图像序列进行时空一致性超像素分割。将输入序列标注为I=[I1,…,Im],其中It是第t帧RGB图像,共有m帧。时空一致性超像素分割将m帧分别分割为n1,…,nm个超像素,而且生成后一帧中每个超像素和前一帧中对应相同物体的超像素的对应关系。整个图像序列包含
Figure PCTCN2016112811-appb-000001
个超像素。对于每一个超像素p,将其重心位置的真实深度值记为dp,并定义n个超像素的真实深度向量d=[d1;…;dn]。
(2.2)建立这n个超像素的空间一致性相似度矩阵S(s),方法是:S(s)是一个n×n的矩阵,其中
Figure PCTCN2016112811-appb-000002
描述了第p个超像素和第q个超像素的帧内相似度关系:
Figure PCTCN2016112811-appb-000003
其中cp和cq分别是超像素p和q的颜色直方图特征,γ是手动设定的一个参数,可设定为所有相邻超像素对||cp-cq||2值的中位数。
(2.3)建立这n个超像素的空间一致性相似度矩阵S(t),方法是:S(t)是一个n×n的矩阵,其中
Figure PCTCN2016112811-appb-000004
描述了第p个超像素和第q个超像素的帧间的相似度关系:
Figure PCTCN2016112811-appb-000005
其中,相邻帧超像素的对应关系由步骤(2.1)中的时空一致性超像素分割得出。
进一步地,所述的步骤3)中构建的卷积神经网络由两个部分构成:单一超像素深度回归网络,以及时空一致性条件随机场损失层:
(3.1)单一超像素深度回归网络由VGG16网络的前31层,1个超像素池化层,和3个全连接层构成。其中,超像素池化层每个超像素空间范围内的特征进行平均池化。该网络的输入是m帧连续的RGB图像,输出是一个n维向量z=[z1,…zp],其中第p个元素zp是该连续RGB图像序列经时空一致性超像素分割后的第p个超像素在未考虑任何约束时的深度估计值。该卷积神经网络的需要学习的参数记为W。
(3.2)时空一致性条件随机场损失层的输入步骤(3.1)中单一超像素回归网络的输出z=[z1,…zn]、步骤(2.1)中定义的超像素真实深度向量d=[d1;…;dn],以及步骤(2.2)和(2.3)中得出的空间一致性相似度矩阵
Figure PCTCN2016112811-appb-000006
和时间一致性相似度矩阵
Figure PCTCN2016112811-appb-000007
在这里,时空一致性条件随机场的条件概率函数为:
Figure PCTCN2016112811-appb-000008
其中能量函数E(d,I)定义为:
Figure PCTCN2016112811-appb-000009
该能量函数的第一项∑p∈N(dp-zp)2是单一超像素预测值和真实值的差距;第二项
Figure PCTCN2016112811-appb-000010
是空间一致性约束,表明如果超像素p和q在同一帧相邻,而且颜色比较相近(
Figure PCTCN2016112811-appb-000011
比较大),则深度应该相仿;第三项
Figure PCTCN2016112811-appb-000012
是时间一致性约束,表明如果超像素p和q是相邻两帧中对应同一物体的超像素
Figure PCTCN2016112811-appb-000013
其深度应该相仿。将该能量函数用矩阵形式可以写成:
E(d,I)=dTLd-2zTd+zTz
其中:
Figure PCTCN2016112811-appb-000014
M=α(s)S(s)(t)S(t)
S(s)和S(t)是步骤(2.2)和步骤(2.3)中得出的空间和时间相似度矩阵,α(s)和α(t)是需要学习的两个参数,
Figure PCTCN2016112811-appb-000015
是n×n的单位矩阵,D是一个对角矩阵,Dpp=∑qMpq
Figure PCTCN2016112811-appb-000016
其中L-1表示L的逆矩阵,|L|表示L的行列式值。
因此,可将损失函数定义为条件概率函数的负对数:
Figure PCTCN2016112811-appb-000017
进一步地,步骤4)中的卷积神经网络训练过程具体为:
(4.1)使用随机梯度下降法对网络参数W,α(s)和α(t)进行优化,在每一次迭代中,参数用以下方式更新:
Figure PCTCN2016112811-appb-000018
Figure PCTCN2016112811-appb-000019
Figure PCTCN2016112811-appb-000020
其中lr是学习率。
(4.2)步骤(4.1)中代价函数J对参数W的偏导数由下述公式计算:
Figure PCTCN2016112811-appb-000021
其中
Figure PCTCN2016112811-appb-000022
由卷积神经网络的反向传播逐层计算得到。
(4.3)步骤(4.2)中代价函数J对参数α(s)和α(t)的偏导数
Figure PCTCN2016112811-appb-000023
Figure PCTCN2016112811-appb-000024
由下述公式计算:
Figure PCTCN2016112811-appb-000025
Figure PCTCN2016112811-appb-000026
其中Tr()表示求矩阵的迹,矩阵A(s)和A(t)是矩阵L对α(s)和α(t)的偏导数,由下述公式计算:
Figure PCTCN2016112811-appb-000027
Figure PCTCN2016112811-appb-000028
δ(p=q)当p=q时取值为1,否则取值为0。
进一步地,步骤5)中,恢复一个未知深度的RGB图像序列的方法具体为:
(5.1)按照步骤2中的方法对该RGB图像序列进行时空一致性超像素分割,并且计算空间相似度矩阵S(s)和时间相似度矩阵S(t)
(5.2)使用训练好的卷积神经网络对该RGB图像序列进行前向传播,得到单一超像素网络输出z;
(5.3)经过时空一致性约束的深度输出为
Figure PCTCN2016112811-appb-000029
由下述公式计算:
Figure PCTCN2016112811-appb-000030
其中矩阵L由步骤(3.2)中描述的方法计算。
Figure PCTCN2016112811-appb-000031
表示该RGB图像序列第p个超像素的深度值。
(5.4)将各个
Figure PCTCN2016112811-appb-000032
赋予该超像素相应帧的相应位置,即可得出m帧图像的深度图。
本发明的有益效果如下:
第一,相比于基于深度线索的深度恢复方法,本发明使用卷积神经网络学习从RGB图像到深度图的函数映射,不依赖于对场景的特定假设;
第二,相比于现有的基于卷积神经网络的深度恢复方法只对单帧图像优化,本发明加入时空一致性约束,通过构造时空一致性随机场损失层对多帧图像联合优化,可以输出时空一致性的深度图,避免了深度图的帧间跳跃。
第三,相比于现有的基于卷积神经网络的深度恢复方法,本发明加入的是时空一致性约束,可以提高深度恢复的精度。
本发明在公开数据集NYU depth v2以及一个发明人自己提出的数据集LYB 3D-TV上与Eigen,David,Christian Puhrsch,and Rob Fergus."Depth map prediction from a single image using a multi-scale deep network."Advances in neural information processing systems.2014.等其他现有的方法进行了比较。结果显示,本发明提出的方法可以显著地提高恢复深度图的时域连续致性,以及提高深度估计的精确度。
附图说明
图1是本发明的实例流程图;
图2是本发明提出的卷积神经网络结构图;
图3是单一超像素深度回归网络的结构图;
图4是单一超像素作用于多帧图像的示意图。
具体实施方式
下面结合附图和具体实施例对本发明作进一步详细说明。
如图1所示的实施例流程图,本发明方法包括如下步骤:
1)收集训练集。训练集的每一个训练样本是一个包含m帧的连续RGB图像序列,以及其对应的深度图序列;
2)使用Chang Jason et al.A video representation using temporal superpixels.CVPR 2013中提出的方法对训练集中的每一个图像序列进行时空一致性超像素分割,并且构建空间上的相似度矩阵S(s)和时间上的相似度矩阵S(t)
3)构建卷积神经网络,该神经网络由包含参数W的单一超像素深度回归网络,以及包含参数α的时空一致性条件随机场损失层构成。其中单一超像素深度回归网络的作用是在不考虑时空一致性约束的情况下对对每一个超像素回归出一个深度值;时空一致性条件随机场损失层的作用是使用步骤2)中建立的时间和空间上的相似度矩阵对单一超像素回归网络的输出进行约束,最终输出时域和空域上平滑的估计深度图。
4)利用训练集中的RGB图像序列和深度图序列对步骤3)中构建的卷积神经网络进行训练,得出网络参数W和α。
5)对未知深度的RGB图像序列,使用训练好的神经网络通过前向传播恢复深度图序列。
关于步骤2)的具体实施说明如下:
(2.1)使用Chang Jason et al.A video representation using temporal superpixels.CVPR 2013中提出的方法对训练集中的每一个连续RGB图像序列进行时空一致性超像素分割。将输入序列标注为I=[I1,…,Im],其中It是第t帧RGB图像,共有m帧。时空一致性超像素分割将m帧分别分割为n1,…,nm个超像素,而且生成后一帧中每个超像素和前一帧中对应相同物体的超像素的对应关系。整个图像序列包含
Figure PCTCN2016112811-appb-000033
个超像素。对于每一个超像素p,我们将其重心位置的真实深度值记为dp,并定义n个超像素的真实深度向量d=[d1;…;dn]。
(2.2)建立这n个超像素的空间一致性相似度矩阵S(s),方法是:S(s)是一个n×n的矩阵,其中
Figure PCTCN2016112811-appb-000034
描述了第p个超像素和第q个超像素的帧内相似度关系:
Figure PCTCN2016112811-appb-000035
其中cp和cq分别是超像素p和q的颜色直方图特征,γ是手动设定的一个参数,可设定为所有相邻超像素对||cp-cq||2值的中位数。
(2.3)建立这n个超像素的空间一致性相似度矩阵S(t),方法是:S(t)是一个n×n的矩阵,其中
Figure PCTCN2016112811-appb-000036
描述了第p个超像素和第q个超像素的帧间的相似度关系:
Figure PCTCN2016112811-appb-000037
其中,相邻帧超像素的对应关系由步骤(2.1)中的时空一致性超像素分割得出。
关于步骤3)的具体实施说明如下:
(3.1)本方法构建的卷积神经网络由两个部分构成:单一超像素深度回归网络,以及时空一致性条件随机场损失层,其整体网络结构如图2所示;
(3.2)步骤(3.1)中所述的单一超像素深度回归网络由文献Simonyan,Karen,and Andrew Zisserman."Very deep convolutional networks for large-scale image recognition."arXivpreprint arXiv:1409.1556(2014)中提出的VGG16网络的前31层,两个卷积层,1个超像素池化层,和3个全连接层构成,该网络结构如图3所示。其中,超像素池化层每个超像素空间范围内的特征进行平均池化,其他的卷积、池化、激活等层均为卷积神经网络常规的层。对于m帧连续的RGB图像输入,该网络首先单独作用于每一帧,例如对于包含nt个超像素的第t帧图像,该网络输出一个nt维的向量zt,代表该帧内每个超像素在不考虑任何约束下的深度回归输出。之后,将m帧图像的输出拼接成一个
Figure PCTCN2016112811-appb-000038
维的向量z=[z1;…,;zn],代表该图像序列中共n个超像素的估计深度回归值,如图4所示。该卷积神经网络的需要学习的参数记为W。
(3.3)步骤(3.1)中所述的时空一致性条件随机场损失层的输入步骤(3.2)中所述的单一超像素回归网络的输出z=[z1,…zn],以及、步骤(2.1)中定义的超像素真实深度向量d=[d1;…;dn],以及步骤(2.2)和(2.3)中得出的空间一致性相似度矩阵
Figure PCTCN2016112811-appb-000039
和时间一致性相似度矩阵
Figure PCTCN2016112811-appb-000040
在这里,时空一致性条件随机场的条件概率函数为:
Figure PCTCN2016112811-appb-000041
其中能量函数E(d,I)定义为:
Figure PCTCN2016112811-appb-000042
该能量函数的第一项∑p∈N(dp-zp)2是单一超像素预测值和真实值的差距;第二项
Figure PCTCN2016112811-appb-000043
是空间一致性约束,表明如果超像素p和q在同一帧相邻,而且颜色比较相近(
Figure PCTCN2016112811-appb-000044
比较大),则深度应该相仿;第三项
Figure PCTCN2016112811-appb-000045
是时间一致性约束,表明如果超像素p和q是相邻两帧中对应同一物体的超像素
Figure PCTCN2016112811-appb-000046
其深度应该 相仿。将该能量函数用矩阵形式可以写成:
E(d,I)=dTLd-2zTd+zTz
其中:
Figure PCTCN2016112811-appb-000047
M=α(s)S(s)(t)S(t)
S(s)和S(t)是步骤(2.2)和步骤(2.3)中得出的空间和时间相似度矩阵,α(s)和α(t)是需要学习的两个参数,
Figure PCTCN2016112811-appb-000048
是n×n的单位矩阵,D是一个对角矩阵,Dpp=∑qMpq
Figure PCTCN2016112811-appb-000049
其中L-1表示L的逆矩阵,|L|表示L的行列式值。
因此,可将损失函数定义为条件概率函数的负对数:
Figure PCTCN2016112811-appb-000050
步骤4)中的卷积神经网络训练过程,具体为:
(4.1)使用随机梯度下降法对网络参数W,α(s)和α(t)进行优化,在每一次迭代中,参数用以下方式更新:
Figure PCTCN2016112811-appb-000051
Figure PCTCN2016112811-appb-000052
Figure PCTCN2016112811-appb-000053
其中lr是学习率。
(4.2)步骤(4.1)中代价函数J对参数W的偏导数由下述公式计算:
Figure PCTCN2016112811-appb-000054
其中
Figure PCTCN2016112811-appb-000055
由卷积神经网络的反向传播逐层计算得到。
(4.3)步骤(4.2)中代价函数J对参数α(s)和α(t)的偏导数
Figure PCTCN2016112811-appb-000056
由下述公式计算:
Figure PCTCN2016112811-appb-000057
Figure PCTCN2016112811-appb-000058
Tr(·)是求矩阵的迹的运算;其中矩阵A(s)和A(t)是矩阵L对α(s)和α(t)的偏导数,由下述 公式计算:
Figure PCTCN2016112811-appb-000059
Figure PCTCN2016112811-appb-000060
δ(p=q)当p=q时取值为1,否则取值为0。
步骤5)中,恢复一个未知深度的RGB图像序列的方法具体为:
(5.1)按照步骤2中的方法对该RGB图像序列进行时空一致性超像素分割,并且计算空间相似度矩阵S(s)和时间相似度矩阵S(t)
(5.2)使用训练好的卷积神经网络对该RGB图像序列进行前向传播,得到单一超像素网络输出z;
(5.3)经过时空一致性约束的深度输出为
Figure PCTCN2016112811-appb-000061
由下述公式计算:
Figure PCTCN2016112811-appb-000062
其中矩阵L由步骤(3.3)中描述的方法计算。
Figure PCTCN2016112811-appb-000063
表示该RGB图像序列第p个超像素的深度值。
(5.4)将各个
Figure PCTCN2016112811-appb-000064
赋予该超像素相应帧的相应位置,即可得出m帧图像的深度图。
具体实施例:本发明在公开数据集NYU depth v2以及一个发明人自己提出的数据集LYB3D-TV上与其他集中现有的方法进行了比较。其中,NYU depth v2数据集由795个训练场景和654个测试场景构成,每一个场景包含30帧连续的rgb图像和其对应的深度图。LYU 3D-TV数据库取自电视剧《琅琊榜》的一些场景,我们选取了60个场景中的5124帧图片和其手工标注的深度图作为训练集,和20个场景中的1278帧图片和其手工标注的深度图作为测试集。我们将本发明提出的方法和下列方法在深度恢复精度上进行了对比:
1.Depth transfer:Karsch,Kevin,Ce Liu,and Sing Bing Kang."Depth transfer:Depth extraction from video using non-parametric sampling."IEEE transactions on pattern analysis and machine intelligence 36.11(2014):2144-2158.
2.discrete-continuous CRF:Liu,Miaomiao,Mathieu Salzmann,and Xuming He."Discrete-continuous depth estimation from a single image."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2014.
3.Multi-scale CNN:Eigen,David,Christian Puhrsch,and Rob Fergus."Depth map prediction from a single image using a multi-scale deep network."Advances in neural information processing systems.2014(Multi-scale CNN),
4.2D-DCNF:Liu,Fayao,et al."Learning depth from single monocular images using deep convolutional neural fields."IEEE transactions on pattern analysis and machine intelligence.
结果显示,我们的方法的精度相对于对比方法有所提升,而且恢复深度图的帧间跳跃现象明显减少。
表1:在NYU depth v2数据库的深度恢复精度对比
Figure PCTCN2016112811-appb-000065
表2:在LYB-3D TV数据库的深度恢复精度对比
Figure PCTCN2016112811-appb-000066

Claims (5)

  1. 一种基于卷积神经网络的时空一致性深度图序列的生成方法,其特征在于,包括下列步骤:
    1)收集训练集。训练集的每一个训练样本是一个包含m帧的连续RGB图像序列,以及其对应的深度图序列;
    2)对训练集中的每一个图像序列进行时空一致性超像素分割,并且构建空间上的相似度矩阵S(s)和时间上的相似度矩阵S(t)
    3)构建卷积神经网络,该神经网络由包含参数W的单一超像素深度回归网络,以及包含参数α的时空一致性条件随机场损失层构成。
    4)利用训练集中的RGB图像序列和深度图序列对步骤3)中构建的卷积神经网络进行训练,得出网络参数W和α。
    5)对未知深度的RGB图像序列,使用训练好的神经网络通过前向传播恢复深度图序列。
  2. 根据权利要求1所述的时空一致性深度图序列的生成方法,其特征在于,所述的步骤2)具体为:
    (2.1)对训练集中的每一个连续RGB图像序列进行时空一致性超像素分割。将输入序列标注为I=[I1,…,Im],其中It是第t帧RGB图像,共有m帧。时空一致性超像素分割将m帧分别分割为n1,…,nm个超像素,而且生成后一帧中每个超像素和前一帧中对应相同物体的超像素的对应关系。整个图像序列包含
    Figure PCTCN2016112811-appb-100001
    个超像素。对于每一个超像素p,将其重心位置的真实深度值记为dp,并定义n个超像素的真实深度向量d=[d1;…;dn]。
    (2.2)建立这n个超像素的空间一致性相似度矩阵S(s),方法是:S(s)是一个n×n的矩阵,其中
    Figure PCTCN2016112811-appb-100002
    描述了第p个超像素和第q个超像素的帧内相似度关系:
    Figure PCTCN2016112811-appb-100003
    其中cp和cq分别是超像素p和q的颜色直方图特征,γ是手动设定的一个参数,可设定为所有相邻超像素对||cp-cq||2值的中位数。
    (2.3)建立这n个超像素的时间一致性相似度矩阵S(t),方法是:S(t)是一个n×n的矩阵,其中
    Figure PCTCN2016112811-appb-100004
    描述了第p个超像素和第q个超像素的帧间的相似度关系:
    Figure PCTCN2016112811-appb-100005
    其中,相邻帧超像素的对应关系由步骤(2.1)中的时空一致性超像素分割得出。
  3. 根据权利要求2所述的时空一致性深度图序列的生成方法,其特征在于,所述的步骤3)中构建的卷积神经网络由两个部分构成:单一超像素深度回归网络,以及时空一致性条件随机场损失层:
    (3.1)单一超像素深度回归网络由VGG16网络的前31层,1个超像素池化层,和3个全连接层构成。其中,超像素池化层每个超像素空间范围内的特征进行平均池化。该网络的输入是m帧连续的RGB图像,输出是一个n维向量z=[z1,…zn],其中第p个元素zp是该连续RGB图像序列经时空一致性超像素分割后的第p个超像素在未考虑任何约束时的深度估计值。该卷积神经网络的需要学习的参数记为W。
    (3.2)时空一致性条件随机场损失层的输入是步骤(3.1)中单一超像素回归网络的输出z=[z1,…zn],、步骤(2.1)中定义的超像素真实深度向量d=[d1;…;dn],以及步骤(2.2)和(2.3)中得出的空间一致性相似度矩阵
    Figure PCTCN2016112811-appb-100006
    和时间一致性相似度矩阵
    Figure PCTCN2016112811-appb-100007
    损失函数定义为:
    Figure PCTCN2016112811-appb-100008
    其中L-1表示L的逆矩阵,并且:
    Figure PCTCN2016112811-appb-100009
    M=α(s)S(s)(t)S(t)
    其中,S(s)和S(t)是步骤(2.2)和步骤(2.3)中得出的空间和时间相似度矩阵,α(s)和α(t)是需要学习的两个参数,
    Figure PCTCN2016112811-appb-100010
    是n×n的单位矩阵,D是一个对角矩阵,Dpp=∑qMpq
  4. 根据权利要求3所述的时空一致性深度图序列的生成方法,其特征在于,所述的步骤4)中卷积神经网络训练过程具体为:
    (4.1)使用随机梯度下降法对网络参数W,α(s)和α(t)进行优化,在每一次迭代中,参数用以下方式更新:
    Figure PCTCN2016112811-appb-100011
    Figure PCTCN2016112811-appb-100012
    Figure PCTCN2016112811-appb-100013
    其中lr是学习率。
    (4.2)损失函数J对参数W的偏导数由下述公式计算:
    Figure PCTCN2016112811-appb-100014
    其中
    Figure PCTCN2016112811-appb-100015
    由卷积神经网络的反向传播逐层计算得到。
    (4.3)损失函数J对参数α(s)和α(t)的偏导数
    Figure PCTCN2016112811-appb-100016
    Figure PCTCN2016112811-appb-100017
    由下述公式计算:
    Figure PCTCN2016112811-appb-100018
    Figure PCTCN2016112811-appb-100019
    Tr(·)是求矩阵的迹的运算;其中矩阵A(s)和A(t)是矩阵L对α(s)和α(t)的偏导数,由下述公式计算:
    Figure PCTCN2016112811-appb-100020
    Figure PCTCN2016112811-appb-100021
    δ(p=q)当p=q时取值为1,否则取值为0。
  5. 根据权利要求4所述的时空一致性深度图序列的生成方法,其特征在于,所述的步骤5)中,恢复一个未知深度的RGB图像序列的方法具体为:
    (5.1)对该RGB图像序列进行时空一致性超像素分割,并且计算空间相似度矩阵S(s)和时间相似度矩阵S(t)
    (5.2)使用训练好的卷积神经网络对该RGB图像序列进行前向传播,得到单一超像素网络输出z;
    (5.3)经过时空一致性约束的深度输出为
    Figure PCTCN2016112811-appb-100022
    由下述公式计算:
    Figure PCTCN2016112811-appb-100023
    其中矩阵L由步骤(3.2)中描述的方法计算。
    Figure PCTCN2016112811-appb-100024
    表示该RGB图像序列第p个超像素的深度估计值。
    (5.4)将各个
    Figure PCTCN2016112811-appb-100025
    赋予该超像素相应帧的相应位置,即可得出m帧图像的深度图。
PCT/CN2016/112811 2016-12-29 2016-12-29 一种基于卷积神经网络的时空一致性深度图序列的生成方法 WO2018119807A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/067,819 US10540590B2 (en) 2016-12-29 2016-12-29 Method for generating spatial-temporally consistent depth map sequences based on convolution neural networks
PCT/CN2016/112811 WO2018119807A1 (zh) 2016-12-29 2016-12-29 一种基于卷积神经网络的时空一致性深度图序列的生成方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/112811 WO2018119807A1 (zh) 2016-12-29 2016-12-29 一种基于卷积神经网络的时空一致性深度图序列的生成方法

Publications (1)

Publication Number Publication Date
WO2018119807A1 true WO2018119807A1 (zh) 2018-07-05

Family

ID=62710231

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/112811 WO2018119807A1 (zh) 2016-12-29 2016-12-29 一种基于卷积神经网络的时空一致性深度图序列的生成方法

Country Status (2)

Country Link
US (1) US10540590B2 (zh)
WO (1) WO2018119807A1 (zh)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523560A (zh) * 2018-11-09 2019-03-26 成都大学 一种基于深度学习的三维图像分割方法
CN110113593A (zh) * 2019-06-11 2019-08-09 南开大学 基于卷积神经网络的宽基线多视点视频合成方法
CN110634142A (zh) * 2019-08-20 2019-12-31 长安大学 一种复杂车路图像边界优化方法
CN111210415A (zh) * 2020-01-06 2020-05-29 浙江大学 一种帕金森患者面部表情低迷的检测方法
CN111325797A (zh) * 2020-03-03 2020-06-23 华东理工大学 一种基于自监督学习的位姿估计方法
CN111401106A (zh) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 一种行为识别方法、装置及设备
CN111488879A (zh) * 2019-01-25 2020-08-04 斯特拉德视觉公司 利用双嵌入构成的用于提高分割性能的方法及装置
CN112164009A (zh) * 2020-09-30 2021-01-01 西安交通大学 基于两层全连接条件随机场模型的深度图结构修复方法
US20210125029A1 (en) * 2016-07-13 2021-04-29 Google Llc Superpixel methods for convolutional neural networks
CN112784859A (zh) * 2019-11-01 2021-05-11 南京原觉信息科技有限公司 一种基于矩阵的图像聚类方法
CN112836823A (zh) * 2021-03-02 2021-05-25 东南大学 基于循环重组和分块的卷积神经网络反向传播映射方法
CN113239614A (zh) * 2021-04-22 2021-08-10 西北工业大学 一种大气湍流相位时空预估算法
TWI770432B (zh) * 2019-02-15 2022-07-11 大陸商北京市商湯科技開發有限公司 圖像復原方法、電子設備、儲存介質
CN115599598A (zh) * 2022-10-08 2023-01-13 国网江苏省电力有限公司南通供电分公司(Cn) 一种电力负荷传感数据恢复方法和装置

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3428746B1 (en) * 2017-07-14 2021-06-02 Siemens Aktiengesellschaft A method and apparatus for providing an adaptive self-learning control program for deployment on a target field device
US10839543B2 (en) * 2019-02-26 2020-11-17 Baidu Usa Llc Systems and methods for depth estimation using convolutional spatial propagation networks
CN110059384B (zh) * 2019-04-09 2023-01-06 同济大学 基于生成式对抗网络模拟人群跳跃荷载的方法
US11288534B2 (en) * 2019-05-23 2022-03-29 Samsung Sds Co., Ltd. Apparatus and method for image processing for machine learning
US11763433B2 (en) * 2019-11-14 2023-09-19 Samsung Electronics Co., Ltd. Depth image generation method and device
CN111091597B (zh) * 2019-11-18 2020-11-13 贝壳找房(北京)科技有限公司 确定图像位姿变换的方法、装置及存储介质
KR102262832B1 (ko) * 2019-11-29 2021-06-08 연세대학교 산학협력단 단안 비디오 영상의 깊이 추정 방법 및 장치
CN111539983B (zh) * 2020-04-15 2023-10-20 上海交通大学 基于深度图像的运动物体分割方法及系统
TWI748426B (zh) * 2020-04-27 2021-12-01 國立成功大學 單視角影像深度圖序列生成方法、系統與電腦程式產品
CN111639787B (zh) * 2020-04-28 2024-03-15 北京工商大学 一种基于图卷积网络的时空数据预测方法
US11276249B2 (en) * 2020-05-14 2022-03-15 International Business Machines Corporation Method and system for video action classification by mixing 2D and 3D features
CN111752144B (zh) * 2020-05-18 2023-06-06 首都经济贸易大学 循环涟漪纹波预测方法与智能控制系统
US11531842B2 (en) 2020-05-20 2022-12-20 Toyota Research Institute, Inc. Invertible depth network for image reconstruction and domain transfers
EP3923183A1 (en) * 2020-06-11 2021-12-15 Tata Consultancy Services Limited Method and system for video analysis
KR20220013071A (ko) * 2020-07-24 2022-02-04 에스케이하이닉스 주식회사 깊이 맵 생성 장치
CN112150531B (zh) * 2020-09-29 2022-12-09 西北工业大学 一种鲁棒的自监督学习单帧图像深度估计方法
CN112270651B (zh) * 2020-10-15 2023-12-15 西安工程大学 一种基于多尺度判别生成对抗网络的图像修复方法
US11736748B2 (en) * 2020-12-16 2023-08-22 Tencent America LLC Reference of neural network model for adaptation of 2D video for streaming to heterogeneous client end-points
US20220292116A1 (en) * 2021-03-09 2022-09-15 Disney Enterprises, Inc. Constrained Multi-Label Dataset Partitioning for Automated Machine Learning
US20220388162A1 (en) * 2021-06-08 2022-12-08 Fanuc Corporation Grasp learning using modularized neural networks
US11809521B2 (en) * 2021-06-08 2023-11-07 Fanuc Corporation Network modularization to learn high dimensional robot tasks
CN113705349B (zh) * 2021-07-26 2023-06-06 电子科技大学 一种基于视线估计神经网络的注意力量化分析方法及系统
CN113643212B (zh) * 2021-08-27 2024-04-05 复旦大学 一种基于图神经网络的深度图降噪方法
CN113673483B (zh) * 2021-09-07 2023-07-14 天津大学 一种基于深度神经网络的多视角多目标关联方法
CN114518182B (zh) * 2022-03-02 2024-03-22 华北电力大学(保定) 布里渊散射谱图像中温度和应变信息同时提取方法及系统
CN117173169B (zh) * 2023-11-02 2024-02-06 泰安金冠宏食品科技有限公司 基于图像处理的预制菜分选方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945299A (zh) * 2010-07-09 2011-01-12 清华大学 一种基于拍摄设备阵列的动态场景深度恢复方法
CN105095862A (zh) * 2015-07-10 2015-11-25 南开大学 一种基于深度卷积条件随机场的人体动作识别方法
CN105657402A (zh) * 2016-01-18 2016-06-08 深圳市未来媒体技术研究院 一种深度图恢复方法
CN106157307A (zh) * 2016-06-27 2016-11-23 浙江工商大学 一种基于多尺度cnn和连续crf的单目图像深度估计方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201432622A (zh) * 2012-11-07 2014-08-16 Koninkl Philips Nv 產生一關於一影像之深度圖
US10706348B2 (en) * 2016-07-13 2020-07-07 Google Llc Superpixel methods for convolutional neural networks
US10462445B2 (en) * 2016-07-19 2019-10-29 Fotonation Limited Systems and methods for estimating and refining depth maps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945299A (zh) * 2010-07-09 2011-01-12 清华大学 一种基于拍摄设备阵列的动态场景深度恢复方法
CN105095862A (zh) * 2015-07-10 2015-11-25 南开大学 一种基于深度卷积条件随机场的人体动作识别方法
CN105657402A (zh) * 2016-01-18 2016-06-08 深圳市未来媒体技术研究院 一种深度图恢复方法
CN106157307A (zh) * 2016-06-27 2016-11-23 浙江工商大学 一种基于多尺度cnn和连续crf的单目图像深度估计方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
30 September 2016 (2016-09-30) *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210125029A1 (en) * 2016-07-13 2021-04-29 Google Llc Superpixel methods for convolutional neural networks
CN109523560A (zh) * 2018-11-09 2019-03-26 成都大学 一种基于深度学习的三维图像分割方法
CN111401106B (zh) * 2019-01-02 2023-03-31 中国移动通信有限公司研究院 一种行为识别方法、装置及设备
CN111401106A (zh) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 一种行为识别方法、装置及设备
CN111488879A (zh) * 2019-01-25 2020-08-04 斯特拉德视觉公司 利用双嵌入构成的用于提高分割性能的方法及装置
CN111488879B (zh) * 2019-01-25 2023-10-10 斯特拉德视觉公司 利用双嵌入构成的用于提高分割性能的方法及装置
TWI770432B (zh) * 2019-02-15 2022-07-11 大陸商北京市商湯科技開發有限公司 圖像復原方法、電子設備、儲存介質
CN110113593A (zh) * 2019-06-11 2019-08-09 南开大学 基于卷积神经网络的宽基线多视点视频合成方法
CN110634142A (zh) * 2019-08-20 2019-12-31 长安大学 一种复杂车路图像边界优化方法
CN110634142B (zh) * 2019-08-20 2024-02-02 长安大学 一种复杂车路图像边界优化方法
CN112784859A (zh) * 2019-11-01 2021-05-11 南京原觉信息科技有限公司 一种基于矩阵的图像聚类方法
CN111210415A (zh) * 2020-01-06 2020-05-29 浙江大学 一种帕金森患者面部表情低迷的检测方法
CN111210415B (zh) * 2020-01-06 2022-08-23 浙江大学 一种帕金森患者面部表情低迷的检测方法
CN111325797A (zh) * 2020-03-03 2020-06-23 华东理工大学 一种基于自监督学习的位姿估计方法
CN111325797B (zh) * 2020-03-03 2023-07-25 华东理工大学 一种基于自监督学习的位姿估计方法
CN112164009A (zh) * 2020-09-30 2021-01-01 西安交通大学 基于两层全连接条件随机场模型的深度图结构修复方法
CN112836823A (zh) * 2021-03-02 2021-05-25 东南大学 基于循环重组和分块的卷积神经网络反向传播映射方法
CN112836823B (zh) * 2021-03-02 2024-03-05 东南大学 基于循环重组和分块的卷积神经网络反向传播映射方法
CN113239614A (zh) * 2021-04-22 2021-08-10 西北工业大学 一种大气湍流相位时空预估算法
CN115599598A (zh) * 2022-10-08 2023-01-13 国网江苏省电力有限公司南通供电分公司(Cn) 一种电力负荷传感数据恢复方法和装置
CN115599598B (zh) * 2022-10-08 2023-08-15 国网江苏省电力有限公司南通供电分公司 一种电力负荷传感数据恢复方法和装置

Also Published As

Publication number Publication date
US20190332942A1 (en) 2019-10-31
US10540590B2 (en) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2018119807A1 (zh) 一种基于卷积神经网络的时空一致性深度图序列的生成方法
CN106612427B (zh) 一种基于卷积神经网络的时空一致性深度图序列的生成方法
Zou et al. Df-net: Unsupervised joint learning of depth and flow using cross-task consistency
US10719939B2 (en) Real-time mobile device capture and generation of AR/VR content
US10726560B2 (en) Real-time mobile device capture and generation of art-styled AR/VR content
US8953874B2 (en) Conversion of monoscopic visual content using image-depth database
US9661307B1 (en) Depth map generation using motion cues for conversion of monoscopic visual content to stereoscopic 3D
CN104065946B (zh) 基于图像序列的空洞填充方法
Pan et al. 3D video disparity scaling for preference and prevention of discomfort
Lu et al. A survey on multiview video synthesis and editing
Cho et al. Event-image fusion stereo using cross-modality feature propagation
Wang et al. Example-based video stereolization with foreground segmentation and depth propagation
CN111652922A (zh) 一种基于双目视觉的单目视频深度估计方法及系统
Li et al. Graph-based saliency fusion with superpixel-level belief propagation for 3D fixation prediction
Zhang et al. Stereoscopic learning for disparity estimation
Lin et al. NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction
Patil et al. Review on 2D-to-3D image and video conversion methods
Chen et al. 2D-to-3D conversion system using depth map enhancement
Chen et al. Automatic 2d-to-3d video conversion using 3d densely connected convolutional networks
Hong et al. Object-based error concealment in 3D video
Priya et al. 3d Image Generation from Single 2d Image using Monocular Depth Cues
Zhang et al. 3D video conversion system based on depth information extraction
Zhang et al. Temporal3D: 2D-to-3D Video Conversion Network with Multi-frame Fusion
Bae et al. Efficient and scalable view generation from a single image using fully convolutional networks
Wang et al. A novel depth propagation algorithm with color guided motion estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16925551

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16925551

Country of ref document: EP

Kind code of ref document: A1