CN106612427A - Method for generating spatial-temporal consistency depth map sequence based on convolution neural network - Google Patents
Method for generating spatial-temporal consistency depth map sequence based on convolution neural network Download PDFInfo
- Publication number
- CN106612427A CN106612427A CN201611244732.0A CN201611244732A CN106612427A CN 106612427 A CN106612427 A CN 106612427A CN 201611244732 A CN201611244732 A CN 201611244732A CN 106612427 A CN106612427 A CN 106612427A
- Authority
- CN
- China
- Prior art keywords
- super
- pixel
- depth
- space
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/122—Improving the 3D impression of stereoscopic images by modifying image signal contents, e.g. by filtering or adding monoscopic depth cues
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/128—Adjusting depth or disparity
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for generating a spatial-temporal consistency depth map sequence based on a convolution neural network, which can be used in a film and television work 2D-to-3D technology. The method comprises the following steps: (1) collecting a training set, wherein each training sample in the training set is composed of a continuous RGB image sequence and a corresponding depth map sequence; (2) carrying out spatial-temporal consistency super pixel segmentation on each image sequence in the training set, and constructing a spatial similarity matrix and a temporal similarity matrix; (3) constructing a convolution neural network composed of a single super pixel depth regression network and a spatial-temporal consistency conditional random field loss layer; (4) training the convolution neural network; and (5) for an RGB image sequence of unknown depth, using the trained neural network to recover a corresponding depth map sequence through forward propagation. The problem that a depth recovery method based on clues relies too much on the scene hypothesis and the problem that the frames of a depth map generated by the existing depth recovery method based on a convolution neural network are discontinuous are avoided.
Description
Technical field
The present invention relates to computer vision field of stereo videos, and in particular to a kind of space-time one based on convolutional neural networks
The generation method of cause property depth map sequence.
Background technology
The general principle of three-dimensional video-frequency is that the image superposition that two width are had into horizontal parallax is played, and spectators pass through anaglyph spectacles
The picture of right and left eyes is respectively seen, so as to produce three-dimensional perception.Three-dimensional video-frequency can give people to provide 3 D stereo sight on the spot in person
Sense, it is very popular.Constantly rise however as the popularization degree of 3D video display hardware, the shortage of 3D movie and television contents is therewith
Come.Directly high cost is shot by 3D video cameras, post-production difficulty is big, is typically only capable to used in big cost film.Therefore shadow
The 2D/3D switch technologies for being regarded as product are a kind of effective approach for solving a film source difficult problem in short supply, can not only significantly expand three-dimensional shadow
The subject matter and quantity of piece, moreover it is possible to which the films and television programs for making some classical return to fluorescent screen.
Because the horizontal parallax in three-dimensional video-frequency is directly related to the corresponding depth of each pixel, therefore each frame of video is obtained
Corresponding depth map is the key point of 2D/3D switch technologies.Depth map can be schemed and be assigned by manually scratching to each frame of video
Depth value is given to produce, but cost is very expensive.Meanwhile, there is also the automanual depth drawing generating method of some, i.e., first by
The artificial depth map for drawing some key frames in video, it is adjacent that these depth maps are expanded to other by computer by propagation algorithm
Frame.Although these methods can save portion of time, when mass disposal films and television programs 2D to 3D is changed, still need
Want the artificial operation of burdensome.
Comparatively speaking, full automatic depth recovery method can farthest save labour turnover.Some algorithms can be with
By motion, focus on, block or shade even depth clue, using specific rule depth map is recovered, but generally only to spy
Determine scene effective.For example, the method based on inferred motion structure can be little according to adjacent interframe distant objects relative displacement, nearby
The big clue of object relative displacement recovers the depth of the static scene that mobile camera shoots, but such method is in reference object
It is invalid in the case that mobile or video camera is static;The depth of shallow depth image can be recovered based on the depth recovery method for focusing on,
But the poor effect in the case of the big depth of field.Various scenes, therefore the depth based on Depth cue are generally comprised in films and television programs
Restoration methods are difficult commonly used.
Convolutional neural networks are a kind of deep neural networks for being particularly well-suited to image, and it is by convolutional layer, active coating, Chi Hua
The elementary cell stacking such as layer and depletion layer is constituted, and can be input into x to the complicated function of specific output y with analog image, is solving figure
As occupying dominance status in all kinds of machine vision problems such as classification, image segmentation.Nearly one or two years comes, and certain methods are by convolution
Neutral net is used for depth recovery, show that the mapping for being input to depth map output from RGB image is closed using the study of substantial amounts of data
System.Do not rely on various based on the depth recovery of convolutional neural networks it is assumed that having good universality, and recover precision very
Height, therefore have very big application potential in the 2D-3D conversions of films and television programs.However, existing method is in training convolutional nerve
Single image optimization is all based on during network, and have ignored the continuous sexual intercourse of interframe.If applying to recover image sequence
Depth, the depth map that adjacent each frame is recovered can occur obvious saltus step.And the depth map saltus step of consecutive frame can cause synthesis
Virtual view flicker, have a strong impact on user's perception.Additionally, the continuity of interframe also provides important line to depth recovery
Rope, and in existing method, these information are simply neglected.
The content of the invention
Present invention aims to the deficiencies in the prior art, there is provided a kind of space-time based on convolutional neural networks is consistent
The generation method of property depth map sequence, the continuity of RGB image and depth map in time domain is introduced in convolutional neural networks,
By multiple image combined optimization during training, to generate the continuous depth map in time domain, and improve the accuracy of depth recovery.
The purpose of the present invention is achieved through the following technical solutions:A kind of space-time based on convolutional neural networks is consistent
The generation method of property depth map sequence, comprises the steps:
1) training set is collected.Each training sample of training set is a continuous RGB image sequence comprising m frames, with
And its corresponding depth map sequence;
2) space-time consistency super-pixel segmentation is carried out to each image sequence in training set, and is built spatially
Similarity matrix S(s)With temporal similarity matrix S(t);
3) build convolutional neural networks, the neutral net by comprising parameter W single super-pixel depth Recurrent networks, and
Space-time consistency condition random field loss layer comprising parameter alpha is constituted.The effect of wherein single super-pixel depth Recurrent networks is
Do not considering to return out each super-pixel one depth value in the case that space-time consistency is constrained;Space-time consistency condition with
The effect of airport loss layer is using step 2) in time for setting up and similarity matrix spatially single super-pixel is returned
Row constraint is entered in the output of network, the estimating depth figure smoothed in final output time domain and spatial domain.
4) using the RGB image sequence and depth map sequence in training set to step 3) in build convolutional neural networks enter
Row training, draws network parameter W and α.
5) to the RGB image sequence of unknown depth, depth map is recovered by propagated forward using the neutral net for training
Sequence.
Further, described step 2) it is specially:
(2.1) space-time consistency super-pixel segmentation is carried out to each the continuous RGB image sequence in training set.Will input
Sequence labelling is I=[I1,…,Im], wherein ItIt is t frame RGB images, has m frames.Space-time consistency super-pixel segmentation is by m frames
N is divided into respectively1,…,nmSame object is corresponded in individual super-pixel, and generation a later frame in each super-pixel and former frame
The corresponding relation of super-pixel.Whole image sequence is includedIndividual super-pixel.It is for each super-pixel p, its is heavy
The real depth value of heart position is designated as dp, and define the real depth vector d=[d of n super-pixel1;…;dn]。
(2.2) the Space Consistency similarity matrix S of this n super-pixel is set up(s), method is:S(s)It is a n × n
Matrix, whereinDescribe the frame in similarity relation of p-th super-pixel and q-th super-pixel:
Wherein cpAnd cqIt is respectively the color histogram feature of super-pixel p and q, γ is the parameter for manually setting, and can be set
It is set to all neighbouring super pixels right | | cp-cq||2The median of value.
(2.3) the Space Consistency similarity matrix S of this n super-pixel is set up(t), method is:S(t)It is a n × n
Matrix, whereinDescribe the similarity relation of the interframe of p-th super-pixel and q-th super-pixel:
Wherein, the corresponding relation of consecutive frame super-pixel is drawn by the space-time consistency super-pixel segmentation in step (2.1).
Further, described step 3) in build convolutional neural networks be made up of two parts:Single super-pixel depth
Degree Recurrent networks, and space-time consistency condition random field loss layer:
(3.1) single super-pixel depth Recurrent networks are by first 31 layers of VGG16 networks, 1 super-pixel pond layer, and 3
Full articulamentum is constituted.Wherein, the feature in each super-pixel spatial dimension of super-pixel pond layer carries out average pond.The network
Input is the continuous RGB image of m frames, and output is a n-dimensional vector z=[z1,…zp], wherein p-th element zpIt is that this is continuous
Estimation of Depth of p-th super-pixel of the RGB image sequence Jing after space-time consistency super-pixel segmentation when any constraint is not considered
Value.The parameter of the needs study of the convolutional neural networks is designated as W.
(3.2) single super-pixel Recurrent networks in the input step (3.1) of space-time consistency condition random field loss layer
Output z=[z1,…zn], the super-pixel real depth vector d=[d defined in step (2.1)1;…;dn], and step
(2.2) the Space Consistency similarity matrix for and in (2.3) drawingWith time consistency similarity matrixHere,
The conditional probability function of space-time consistency condition random field is:
Wherein energy function E (d, I) is defined as:
The Section 1 ∑ of the energy functionp∈N(dp-zp)2It is the gap of single super-pixel predicted value and actual value;Section 2It is Space Consistency constraint, shows if super-pixel p and q are adjacent in same frame, Er Qieyan
Color ratio it is more close (Than larger), then depth should be similar;Section 3It is time consistency
Property constraint, show if super-pixel p and q are the super-pixel of the same object of correspondence in adjacent two frameIts depth should
It is similar.The energy function matrix form can be write as:
E (d, I)=dTLd-2zTd+zTz
Wherein:
M=α(s)S(s)+α(t)S(t)
S(s)And S(t)It is the room and time similarity matrix drawn in step (2.2) and step (2.3), α(s)And α(t)It is
Two parameters of study are needed,It is the unit matrix of n × n, D is a diagonal matrix, Dpp=∑qMpq。
And
Wherein L-1The inverse matrix of L is represented, | L | represents the determinant of L.
Therefore, loss function can be defined as the negative logarithm of conditional probability function:
Further, step 4) in convolutional neural networks training process be specially:
(4.1) using stochastic gradient descent method to network parameter W, α(s)And α(t)It is optimized, in each iteration, ginseng
Number updates with the following methods:
Wherein lr is learning rate.
(4.2) cost function J is calculated the partial derivative of parameter W by following formula in step (4.1):
WhereinSuccessively it is calculated by the backpropagation of convolutional neural networks.
(4.3) in step (4.2) cost function J to parameter alpha(s)And α(t)Partial derivativeWithBy following formula meters
Calculate:
Wherein Tr () represents the mark for seeking matrix, matrix A(s)And A(t)It is matrix L to α(s)And α(t)Partial derivative, by following public affairs
Formula is calculated:
δ (p=q) values as p=q are 1, and otherwise value is 0.
Further, step 5) in, the method for recovering the RGB image sequence of a unknown depth is specially:
(5.1) space-time consistency super-pixel segmentation is carried out to the RGB image sequence according to the method in step 2, and is counted
Calculate space similarity matrix S(s)With time similarity matrix S(t);
(5.2) propagated forward is carried out to the RGB image sequence using the convolutional neural networks for training, obtains single super picture
Plain network exports z;
(5.3) depth through space-time consistency constraint is output asCalculated by following formula:
Wherein matrix L is calculated by the method described in step (3.2).Represent p-th super-pixel of RGB image sequence
Depth value.
(5.4) by eachGive the relevant position of the super-pixel respective frame, you can draw the depth map of m two field pictures.
Beneficial effects of the present invention are as follows:
First, compared to the depth recovery method based on Depth cue, the present invention is learnt from RGB using convolutional neural networks
Image does not rely on the ad hoc hypothesis to scene to the Function Mapping of depth map;
Second, only single-frame images is optimized compared to the existing depth recovery method based on convolutional neural networks, this
Bright addition space-time consistency constraint, by constructing space-time consistency random field loss layer to multiple image combined optimization, can be with defeated
Go out the depth map of space-time consistency, it is to avoid the interframe jump of depth map.
3rd, compared to the existing depth recovery method based on convolutional neural networks, what the present invention was added is space-time one
The constraint of cause property, can improve the precision of depth recovery.
The data set LYB 3D-TV that the present invention is proposed in public data collection NYU depth v2 and an inventor oneself
Upper and Eigen, David, Christian Puhrsch, and Rob Fergus. " Depth map prediction from a
single image using a multi-scale deep network."Advances in neural information
Other existing methods such as processing systems.2014. are compared.As a result show, method proposed by the present invention
The continuous cause property of time domain for recovering depth map, and the accuracy for improving estimation of Depth can be significantly increased.
Description of the drawings
Fig. 1 is the example flow chart of the present invention;
Fig. 2 is convolutional neural networks structure chart proposed by the present invention;
Fig. 3 is the structure chart of single super-pixel depth Recurrent networks;
Fig. 4 is the schematic diagram that single super-pixel acts on multiple image.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail with specific embodiment.
Embodiment flow chart as shown in Figure 1, the inventive method comprises the steps:
1) training set is collected.Each training sample of training set is a continuous RGB image sequence comprising m frames, with
And its corresponding depth map sequence;
2) using Chang Jason et al.A video representation using temporal
The method proposed in superpixels.CVPR 2013 carries out the super picture of space-time consistency to each image sequence in training set
Element segmentation, and build similarity matrix S spatially(s)With temporal similarity matrix S(t);
3) build convolutional neural networks, the neutral net by comprising parameter W single super-pixel depth Recurrent networks, and
Space-time consistency condition random field loss layer comprising parameter alpha is constituted.The effect of wherein single super-pixel depth Recurrent networks is
In the case where space-time consistency constraint is not considered to returning out a depth value to each super-pixel;Space-time consistency condition
The effect of random field loss layer is to use step 2) in time for setting up and similarity matrix spatially to single super-pixel time
Row constraint is entered in the output for returning network, the estimating depth figure smoothed in final output time domain and spatial domain.
4) using the RGB image sequence and depth map sequence in training set to step 3) in build convolutional neural networks enter
Row training, draws network parameter W and α.
5) to the RGB image sequence of unknown depth, depth map is recovered by propagated forward using the neutral net for training
Sequence.
With regard to step 2) be embodied as be described as follows:
(2.1) using Chang Jason et al.A video representation using temporal
The method proposed in superpixels.CVPR 2013 carries out space-time one to each the continuous RGB image sequence in training set
Cause property super-pixel segmentation.List entries is labeled as into I=[I1,…,Im], wherein ItIt is t frame RGB images, has m frames.Space-time
M frames are divided into respectively n by uniformity super-pixel segmentation1,…,nmIndividual super-pixel, and generate each super-pixel and front in a later frame
The corresponding relation of the super-pixel of correspondence same object in one frame.Whole image sequence is includedIndividual super-pixel.For
Each super-pixel p, the real depth value of its position of centre of gravity is designated as d by usp, and define the real depth of n super-pixel to
Amount d=[d1;…;dn]。
(2.2) the Space Consistency similarity matrix S of this n super-pixel is set up(s), method is:S(s)It is a n × n
Matrix, whereinDescribe the frame in similarity relation of p-th super-pixel and q-th super-pixel:
Wherein cpAnd cqIt is respectively the color histogram feature of super-pixel p and q, γ is the parameter for manually setting, and can be set
It is set to all neighbouring super pixels right | | cp-cq||2The median of value.
(2.3) the Space Consistency similarity matrix S of this n super-pixel is set up(t), method is:S(t)It is a n × n
Matrix, whereinDescribe the similarity relation of the interframe of p-th super-pixel and q-th super-pixel:
Wherein, the corresponding relation of consecutive frame super-pixel is drawn by the space-time consistency super-pixel segmentation in step (2.1).
With regard to step 3) be embodied as be described as follows:
(3.1) convolutional neural networks that this method builds are made up of two parts:Single super-pixel depth Recurrent networks, with
And space-time condition for consistence random field loss layer, its overall network structure is as shown in Figure 2;
(3.2) the single super-pixel depth Recurrent networks described in step (3.1) are by document Simonyan, Karen, and
Andrew Zisserman."Very deep convolutional networks for large-scale image
recognition."arXivpreprint arXiv:First 31 layers of the VGG16 networks proposed in 1409.1556 (2014), two
Individual convolutional layer, 1 super-pixel pond layer, and 3 full articulamentums are constituted, and the network structure is as shown in Figure 3.Wherein, super-pixel pond
Changing the feature in layer each super-pixel spatial dimension carries out the layers such as average pond, other convolution, Chi Hua, activation to be convolution refreshing
The conventional layer of Jing networks.For the continuous RGB image input of m frames, the network acts solely on first each frame, such as bag
Containing ntThe t two field pictures of individual super-pixel, the network exports a ntThe vectorial z of dimensiont, represent each super-pixel of the frame in and do not examining
The depth considered under any constraint returns output.Afterwards, the output of m two field pictures is spliced into into oneThe vectorial z of dimension
=[z1;…,;zn], the estimating depth regressand value of common n super-pixel in the image sequence is represented, as shown in Figure 4.The convolutional Neural
The parameter of the needs study of network is designated as W.
(3.3) described in the input step (3.2) of the space-time consistency condition random field loss layer described in step (3.1)
Single super-pixel Recurrent networks output z=[z1,…zn], and, the super-pixel real depth defined in step (2.1) to
Amount d=[d1;…;dn], and the Space Consistency similarity matrix drawn in step (2.2) and (2.3)And time consistency
Property similarity matrixHere, the conditional probability function of space-time consistency condition random field is:
Wherein energy function E (d, I) is defined as:
The Section 1 ∑ of the energy functionp∈N(dp-zp)2It is the gap of single super-pixel predicted value and actual value;Section 2It is Space Consistency constraint, shows if super-pixel p and q are adjacent in same frame, Er Qieyan
Color ratio it is more close (Than larger), then depth should be similar;Section 3It is time consistency
Property constraint, show if super-pixel p and q are the super-pixel of the same object of correspondence in adjacent two frameIts depth should
It is similar.The energy function matrix form can be write as:
E (d, I)=dTLd-2zTd+zTz
Wherein:
M=α(s)S(s)+α(t)S(t)
S(s)And S(t)It is the room and time similarity matrix drawn in step (2.2) and step (2.3), α(s)And α(t)It is
Two parameters of study are needed,It is the unit matrix of n × n, D is a diagonal matrix, Dpp=∑qMpq。
And
Wherein L-1The inverse matrix of L is represented, | L | represents the determinant of L.
Therefore, loss function can be defined as the negative logarithm of conditional probability function:
Step 4) in convolutional neural networks training process, specially:
(4.1) using stochastic gradient descent method to network parameter W, α(s)And α(t)It is optimized, in each iteration, ginseng
Number updates with the following methods:
Wherein lr is learning rate.
(4.2) cost function J is calculated the partial derivative of parameter W by following formula in step (4.1):
WhereinSuccessively it is calculated by the backpropagation of convolutional neural networks.
(4.3) in step (4.2) cost function J to parameter alpha(s)And α(t)Partial derivativeCalculated by following formula:
Tr () is the computing of the mark for seeking matrix;Wherein matrix A(s)And A(t)It is matrix L to α(s)And α(t)Partial derivative, by
Following formula are calculated:
δ (p=q) values as p=q are 1, and otherwise value is 0.
Step 5) in, the method for recovering the RGB image sequence of a unknown depth is specially:
(5.1) space-time consistency super-pixel segmentation is carried out to the RGB image sequence according to the method in step 2, and is counted
Calculate space similarity matrix S(s)With time similarity matrix S(t);
(5.2) propagated forward is carried out to the RGB image sequence using the convolutional neural networks for training, obtains single super picture
Plain network exports z;
(5.3) depth through space-time consistency constraint is output asCalculated by following formula:
Wherein matrix L is calculated by the method described in step (3.3).Represent p-th super-pixel of RGB image sequence
Depth value.
(5.4) by eachGive the relevant position of the super-pixel respective frame, you can draw the depth map of m two field pictures.
Specific embodiment:The data that the present invention is proposed in public data collection NYU depth v2 and an inventor oneself
Compare with other existing methods of concentration on collection LYB3D-TV.Wherein, NYU depth v2 data sets are by 795 training
Scene and 654 test scenes are constituted, and each scene includes the continuous rgb images of 30 frames and its corresponding depth map.LYU
3D-TV databases take from TV play《Thinkling sound's Ya lists》Some scenes, we have chosen 5124 frame pictures in 60 scenes and its
The depth map of manual mark is used as training set, and the depth map of 1278 frame pictures in 20 scenes and its craft mark is used as survey
Examination collection.We are contrasted method proposed by the present invention and following method in depth recovery precision:
1.Depth transfer:Karsch,Kevin,Ce Liu,and Sing Bing Kang."Depth
transfer:Depth extraction from video using non-parametric sampling."IEEE
transactions on pattern analysis and machine intelligence 36.11(2014):2144-
2158.
2.discrete-continuous CRF:Liu,Miaomiao,Mathieu Salzmann,and Xuming
He."Discrete-continuous depth estimation from a single image."Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition.2014.
3.Multi-scale CNN:Eigen,David,Christian Puhrsch,and Rob Fergus."Depth
map prediction from a single image using a multi-scale deep network."Advances
in neural information processing systems.2014(Multi-scale CNN),
4.2D-DCNF:Liu,Fayao,et al."Learning depth from single monocular
images using deep convolutional neural fields."IEEE transactions on pattern
analysis and machine intelligence.
As a result show, the precision of our method has been lifted relative to control methods, and recover the interframe of depth map
Chattering is significantly reduced.
Table 1:In the depth recovery accuracy comparison of NYU depth v2 databases
Table 2:In the depth recovery accuracy comparison of LYB-3D TV databases
Claims (5)
1. a kind of generation method of the space-time consistency depth map sequence based on convolutional neural networks, it is characterised in that under including
Row step:
1) training set is collected.Each training sample of training set is a continuous RGB image sequence comprising m frames, Yi Jiqi
Corresponding depth map sequence;
2) carry out space-time consistency super-pixel segmentation to each image sequence in training set, and build spatially it is similar
Degree matrix S(s)With temporal similarity matrix S(t);
3) convolutional neural networks are built, the neutral net is by the single super-pixel depth Recurrent networks comprising parameter W, and includes
The space-time consistency condition random field loss layer of parameter alpha is constituted.
4) using the RGB image sequence and depth map sequence in training set to step 3) in build convolutional neural networks instruct
Practice, draw network parameter W and α.
5) to the RGB image sequence of unknown depth, depth map sequence is recovered by propagated forward using the neutral net for training.
2. the generation method of space-time consistency depth map sequence according to claim 1, it is characterised in that described step
2) it is specially:
(2.1) space-time consistency super-pixel segmentation is carried out to each the continuous RGB image sequence in training set.By list entries
It is labeled as I=[I1,…,Im], wherein ItIt is t frame RGB images, has m frames.Space-time consistency super-pixel segmentation distinguishes m frames
It is divided into n1,…,nmIndividual super-pixel, and generate the super picture of correspondence same object in each super-pixel and former frame in a later frame
The corresponding relation of element.Whole image sequence is includedIndividual super-pixel.For each super-pixel p, by its center of gravity position
The real depth value put is designated as dp, and define the real depth vector d=[d of n super-pixel1;…;dn]。
(2.2) the Space Consistency similarity matrix S of this n super-pixel is set up(s), method is:S(s)It is the matrix of a n × n,
WhereinDescribe the frame in similarity relation of p-th super-pixel and q-th super-pixel:
Wherein cpAnd cqIt is respectively the color histogram feature of super-pixel p and q, γ is the parameter for manually setting, and be may be set to
All neighbouring super pixels are right | | cp-cq||2The median of value.
(2.3) the time consistency similarity matrix S of this n super-pixel is set up(t), method is:S(t)It is the matrix of a n × n,
WhereinDescribe the similarity relation of the interframe of p-th super-pixel and q-th super-pixel:
Wherein, the corresponding relation of consecutive frame super-pixel is drawn by the space-time consistency super-pixel segmentation in step (2.1).
3. the generation method of space-time consistency depth map sequence according to claim 2, it is characterised in that described step
3) convolutional neural networks built in are made up of two parts:Single super-pixel depth Recurrent networks, and space-time consistency bar
Part random field loss layer:
(3.1) single super-pixel depth Recurrent networks are by first 31 layers of VGG16 networks, 1 super-pixel pond layer, and 3 connect entirely
Connect layer composition.Wherein, the feature in each super-pixel spatial dimension of super-pixel pond layer carries out average pond.The input of the network
It is the continuous RGB image of m frames, output is a n-dimensional vector z=[z1,…zn], wherein p-th element zpIt is the continuous RGB figures
As estimation of Depth value of p-th super-pixel of the sequence Jing after space-time consistency super-pixel segmentation when any constraint is not considered.Should
The parameter of the needs study of convolutional neural networks is designated as W.
(3.2) input of space-time consistency condition random field loss layer is the defeated of single super-pixel Recurrent networks in step (3.1)
Go out z=[z1,…zn], the super-pixel real depth vector d=[d defined in step (2.1)1;…;dn], and step (2.2)
(2.3) the Space Consistency similarity matrix drawn inWith time consistency similarity matrixLoss function is defined
For:
Wherein L-1The inverse matrix of L is represented, and:
M=α(s)S(s)+α(t)S(t)
Wherein, S(s)And S(t)It is the room and time similarity matrix drawn in step (2.2) and step (2.3), α(s)And α(t)
It is two parameters for needing study,It is the unit matrix of n × n, D is a diagonal matrix, Dpp=∑qMpq。
4. the generation method of space-time consistency depth map sequence according to claim 3, it is characterised in that described step
4) convolutional neural networks training process is specially in:
(4.1) using stochastic gradient descent method to network parameter W, α(s)And α(t)It is optimized, in each iteration, parameter is used
In the following manner updates:
Wherein lr is learning rate.
(4.2) loss function J is calculated the partial derivative of parameter W by following formula:
WhereinSuccessively it is calculated by the backpropagation of convolutional neural networks.
(4.3) loss function J is to parameter alpha(s)And α(t)Partial derivativeWithCalculated by following formula:
Tr () is the computing of the mark for seeking matrix;Wherein matrix A(s)And A(t)It is matrix L to α(s)And α(t)Partial derivative, by following
Formula is calculated:
δ (p=q) values as p=q are 1, and otherwise value is 0.
5. the generation method of space-time consistency depth map sequence according to claim 4, it is characterised in that described step
5) in, the method for recovering the RGB image sequence of a unknown depth is specially:
(5.1) space-time consistency super-pixel segmentation is carried out to the RGB image sequence, and calculates space similarity matrix S(s)With
Time similarity matrix S(t);
(5.2) propagated forward is carried out to the RGB image sequence using the convolutional neural networks for training, obtains single super-pixel net
Network exports z;
(5.3) depth through space-time consistency constraint is output asCalculated by following formula:
Wherein matrix L is calculated by the method described in step (3.2).Represent the depth of p-th super-pixel of RGB image sequence
Estimate.
(5.4) by eachGive the relevant position of the super-pixel respective frame, you can draw the depth map of m two field pictures.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611244732.0A CN106612427B (en) | 2016-12-29 | 2016-12-29 | A kind of generation method of the space-time consistency depth map sequence based on convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611244732.0A CN106612427B (en) | 2016-12-29 | 2016-12-29 | A kind of generation method of the space-time consistency depth map sequence based on convolutional neural networks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106612427A true CN106612427A (en) | 2017-05-03 |
CN106612427B CN106612427B (en) | 2018-07-06 |
Family
ID=58636373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611244732.0A Active CN106612427B (en) | 2016-12-29 | 2016-12-29 | A kind of generation method of the space-time consistency depth map sequence based on convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106612427B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292846A (en) * | 2017-06-27 | 2017-10-24 | 南方医科大学 | The restoration methods of incomplete CT data for projection under a kind of circular orbit |
CN107992848A (en) * | 2017-12-19 | 2018-05-04 | 北京小米移动软件有限公司 | Obtain the method, apparatus and computer-readable recording medium of depth image |
CN108335322A (en) * | 2018-02-01 | 2018-07-27 | 深圳市商汤科技有限公司 | Depth estimation method and device, electronic equipment, program and medium |
CN108389226A (en) * | 2018-02-12 | 2018-08-10 | 北京工业大学 | A kind of unsupervised depth prediction approach based on convolutional neural networks and binocular parallax |
CN108596102A (en) * | 2018-04-26 | 2018-09-28 | 北京航空航天大学青岛研究院 | Indoor scene object segmentation grader building method based on RGB-D |
CN109215067A (en) * | 2017-07-03 | 2019-01-15 | 百度(美国)有限责任公司 | High-resolution 3-D point cloud is generated based on CNN and CRF model |
CN109657839A (en) * | 2018-11-22 | 2019-04-19 | 天津大学 | A kind of wind power forecasting method based on depth convolutional neural networks |
CN110163246A (en) * | 2019-04-08 | 2019-08-23 | 杭州电子科技大学 | The unsupervised depth estimation method of monocular light field image based on convolutional neural networks |
CN110782490A (en) * | 2019-09-24 | 2020-02-11 | 武汉大学 | Video depth map estimation method and device with space-time consistency |
CN111259782A (en) * | 2020-01-14 | 2020-06-09 | 北京大学 | Video behavior identification method based on mixed multi-scale time sequence separable convolution operation |
CN114596637A (en) * | 2022-03-23 | 2022-06-07 | 北京百度网讯科技有限公司 | Image sample data enhancement training method and device and electronic equipment |
US11423615B1 (en) * | 2018-05-29 | 2022-08-23 | HL Acquisition, Inc. | Techniques for producing three-dimensional models from one or more two-dimensional images |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102196292A (en) * | 2011-06-24 | 2011-09-21 | 清华大学 | Human-computer-interaction-based video depth map sequence generation method and system |
US20130177236A1 (en) * | 2012-01-10 | 2013-07-11 | Samsung Electronics Co., Ltd. | Method and apparatus for processing depth image |
CN103955942A (en) * | 2014-05-22 | 2014-07-30 | 哈尔滨工业大学 | SVM-based depth map extraction method of 2D image |
CN105359190A (en) * | 2013-09-05 | 2016-02-24 | 电子湾有限公司 | Estimating depth from a single image |
CN105657402A (en) * | 2016-01-18 | 2016-06-08 | 深圳市未来媒体技术研究院 | Depth map recovery method |
CN105979244A (en) * | 2016-05-31 | 2016-09-28 | 十二维度(北京)科技有限公司 | Method and system used for converting 2D image to 3D image based on deep learning |
-
2016
- 2016-12-29 CN CN201611244732.0A patent/CN106612427B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102196292A (en) * | 2011-06-24 | 2011-09-21 | 清华大学 | Human-computer-interaction-based video depth map sequence generation method and system |
US20130177236A1 (en) * | 2012-01-10 | 2013-07-11 | Samsung Electronics Co., Ltd. | Method and apparatus for processing depth image |
CN105359190A (en) * | 2013-09-05 | 2016-02-24 | 电子湾有限公司 | Estimating depth from a single image |
CN103955942A (en) * | 2014-05-22 | 2014-07-30 | 哈尔滨工业大学 | SVM-based depth map extraction method of 2D image |
CN105657402A (en) * | 2016-01-18 | 2016-06-08 | 深圳市未来媒体技术研究院 | Depth map recovery method |
CN105979244A (en) * | 2016-05-31 | 2016-09-28 | 十二维度(北京)科技有限公司 | Method and system used for converting 2D image to 3D image based on deep learning |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107292846B (en) * | 2017-06-27 | 2020-11-10 | 南方医科大学 | Recovery method of incomplete CT projection data under circular orbit |
CN107292846A (en) * | 2017-06-27 | 2017-10-24 | 南方医科大学 | The restoration methods of incomplete CT data for projection under a kind of circular orbit |
CN109215067A (en) * | 2017-07-03 | 2019-01-15 | 百度(美国)有限责任公司 | High-resolution 3-D point cloud is generated based on CNN and CRF model |
CN109215067B (en) * | 2017-07-03 | 2023-03-10 | 百度(美国)有限责任公司 | High-resolution 3-D point cloud generation based on CNN and CRF models |
CN107992848B (en) * | 2017-12-19 | 2020-09-25 | 北京小米移动软件有限公司 | Method and device for acquiring depth image and computer readable storage medium |
CN107992848A (en) * | 2017-12-19 | 2018-05-04 | 北京小米移动软件有限公司 | Obtain the method, apparatus and computer-readable recording medium of depth image |
US11308638B2 (en) | 2018-02-01 | 2022-04-19 | Shenzhen Sensetime Technology Co., Ltd. | Depth estimation method and apparatus, electronic device, program, and medium |
CN108335322A (en) * | 2018-02-01 | 2018-07-27 | 深圳市商汤科技有限公司 | Depth estimation method and device, electronic equipment, program and medium |
CN108335322B (en) * | 2018-02-01 | 2021-02-12 | 深圳市商汤科技有限公司 | Depth estimation method and apparatus, electronic device, program, and medium |
CN108389226A (en) * | 2018-02-12 | 2018-08-10 | 北京工业大学 | A kind of unsupervised depth prediction approach based on convolutional neural networks and binocular parallax |
CN108596102A (en) * | 2018-04-26 | 2018-09-28 | 北京航空航天大学青岛研究院 | Indoor scene object segmentation grader building method based on RGB-D |
US11423615B1 (en) * | 2018-05-29 | 2022-08-23 | HL Acquisition, Inc. | Techniques for producing three-dimensional models from one or more two-dimensional images |
CN109657839A (en) * | 2018-11-22 | 2019-04-19 | 天津大学 | A kind of wind power forecasting method based on depth convolutional neural networks |
CN110163246B (en) * | 2019-04-08 | 2021-03-30 | 杭州电子科技大学 | Monocular light field image unsupervised depth estimation method based on convolutional neural network |
CN110163246A (en) * | 2019-04-08 | 2019-08-23 | 杭州电子科技大学 | The unsupervised depth estimation method of monocular light field image based on convolutional neural networks |
CN110782490B (en) * | 2019-09-24 | 2022-07-05 | 武汉大学 | Video depth map estimation method and device with space-time consistency |
CN110782490A (en) * | 2019-09-24 | 2020-02-11 | 武汉大学 | Video depth map estimation method and device with space-time consistency |
CN111259782A (en) * | 2020-01-14 | 2020-06-09 | 北京大学 | Video behavior identification method based on mixed multi-scale time sequence separable convolution operation |
CN114596637A (en) * | 2022-03-23 | 2022-06-07 | 北京百度网讯科技有限公司 | Image sample data enhancement training method and device and electronic equipment |
CN114596637B (en) * | 2022-03-23 | 2024-02-06 | 北京百度网讯科技有限公司 | Image sample data enhancement training method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106612427B (en) | 2018-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106612427B (en) | A kind of generation method of the space-time consistency depth map sequence based on convolutional neural networks | |
US10540590B2 (en) | Method for generating spatial-temporally consistent depth map sequences based on convolution neural networks | |
Sun et al. | Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume | |
Zou et al. | Df-net: Unsupervised joint learning of depth and flow using cross-task consistency | |
Zhou et al. | Moving indoor: Unsupervised video depth learning in challenging environments | |
Yue et al. | Image denoising by exploring external and internal correlations | |
CN109360156A (en) | Single image rain removing method based on the image block for generating confrontation network | |
CN102026013A (en) | Stereo video matching method based on affine transformation | |
Peng et al. | LVE-S2D: Low-light video enhancement from static to dynamic | |
Zhang et al. | Multiscale-vr: Multiscale gigapixel 3d panoramic videography for virtual reality | |
Li et al. | Enforcing temporal consistency in video depth estimation | |
CN107018400B (en) | It is a kind of by 2D Video Quality Metrics into the method for 3D videos | |
Meng et al. | Perception inspired deep neural networks for spectral snapshot compressive imaging | |
Cho et al. | Event-image fusion stereo using cross-modality feature propagation | |
Dong et al. | Cycle-CNN for colorization towards real monochrome-color camera systems | |
Guo et al. | Adaptive estimation of depth map for two-dimensional to three-dimensional stereoscopic conversion | |
Yeh et al. | An approach to automatic creation of cinemagraphs | |
Li et al. | Graph-based saliency fusion with superpixel-level belief propagation for 3D fixation prediction | |
Dong et al. | Pyramid convolutional network for colorization in monochrome-color multi-lens camera system | |
WO2022257184A1 (en) | Method for acquiring image generation apparatus, and image generation apparatus | |
CN106028018B (en) | Real scene shooting double vision point 3D method for optimizing video and system towards naked eye 3D display | |
Kim et al. | Light field angular super-resolution using convolutional neural network with residual network | |
Zhou et al. | 1st Place Solution of Egocentric 3D Hand Pose Estimation Challenge 2023 Technical Report: A Concise Pipeline for Egocentric Hand Pose Reconstruction | |
Lee et al. | Efficient Low Light Video Enhancement Based on Improved Retinex Algorithms | |
CN112200756A (en) | Intelligent bullet special effect short video generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |