CN110111244A - Image conversion, depth map prediction and model training method, device and electronic equipment - Google Patents
Image conversion, depth map prediction and model training method, device and electronic equipment Download PDFInfo
- Publication number
- CN110111244A CN110111244A CN201910381527.6A CN201910381527A CN110111244A CN 110111244 A CN110111244 A CN 110111244A CN 201910381527 A CN201910381527 A CN 201910381527A CN 110111244 A CN110111244 A CN 110111244A
- Authority
- CN
- China
- Prior art keywords
- prediction
- view
- network model
- right view
- depth map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 321
- 238000000034 method Methods 0.000 title claims abstract description 83
- 238000006243 chemical reaction Methods 0.000 title abstract description 7
- 238000009877 rendering Methods 0.000 claims abstract description 80
- 238000003384 imaging method Methods 0.000 claims abstract description 68
- 230000006870 function Effects 0.000 claims description 115
- 239000011159 matrix material Substances 0.000 claims description 63
- 238000004891 communication Methods 0.000 claims description 49
- 238000013507 mapping Methods 0.000 claims description 48
- 238000013519 translation Methods 0.000 claims description 44
- 230000006641 stabilisation Effects 0.000 claims description 27
- 238000011105 stabilization Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 12
- 241000208340 Araliaceae Species 0.000 claims description 4
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 4
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 4
- 235000008434 ginseng Nutrition 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims 2
- 238000005070 sampling Methods 0.000 description 44
- 230000004913 activation Effects 0.000 description 8
- 230000000694 effects Effects 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 4
- 238000003475 lamination Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/08—Projecting images onto non-planar surfaces, e.g. geodetic screens
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the invention provides image conversion, depth map prediction and model training method, device and electronic equipment, the available two-dimentional 2D image to be converted for three-dimensional 3D rendering;Using 2D image as the first monocular view for being used to generate 3D rendering, it is input to trained depth map prediction network model in advance;Depth map prediction network model is to predict that network model and initial camera parameter prediction network model are trained acquisition to initial depth figure based on multiple and different 3D film source samples;Obtain first predetermined depth figure of depth map prediction network model output;First predetermined depth figure is handled based on first predetermined depth figure, the camera parameter of 2D image, preset camera imaging formula and preset first sample mode, obtains the second monocular view;Based on the first monocular view and the second monocular view, 3D rendering is generated.As it can be seen that the depth prediction based on a 2D image may be implemented by 2D image and be converted to 3D rendering using the embodiment of the present invention.
Description
Technical field
The present invention relates to the technical fields that 2D image is converted to 3D rendering, predict more particularly to image conversion, depth map
With model training method, device and electronic equipment.
Background technique
Currently, can be by the method for predetermined depth figure when 2D image is converted into 3D rendering, wherein the pre- survey grid of depth map
The process of network model training is usually: the continuous multiframe 2D image input initial depth figure of a video source is predicted network model
In, the depth map of prediction is obtained, seeks loss function value with true depth map, according to loss function come the pre- survey grid of percentage regulation figure
The network parameter of network model, the final depth map that obtains predict network model.Network model is predicted further according to trained depth map
The depth map of prediction, obtains 3D rendering.This depth map prediction network model is with continuous a large amount of 2D image and each 2D
The true depth map of image is trained acquisition to single network model.
Inventor has found that at least there are the following problems for the prior art in the implementation of the present invention:
The prior art carries out during depth map prediction, it is necessary to have true depth map to coach, can just train depth
Figure prediction network model.
Summary of the invention
The embodiment of the present invention is designed to provide a kind of conversion of image, depth map prediction and model training method, device
And electronic equipment can train depth map prediction network model, and realize 2D to realize that not having to true depth map coaches
Image is converted to 3D rendering.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of method that two dimension 2D image is converted to three-dimensional 3D rendering, it is described
Method includes:
Obtain the two-dimentional 2D image to be converted for three-dimensional 3D rendering;
Using the 2D image as the first monocular view for being used to generate 3D rendering, it is input to preparatory trained depth map
Predict network model;The depth map prediction network model is to be predicted based on multiple and different 3D film source samples initial depth figure
Network model and initial camera parameter prediction network model are trained acquisition;The first monocular view is left view or right view
Figure;
Obtain first predetermined depth figure of depth map prediction network model output;
Based on first predetermined depth figure, the camera parameter of 2D image, preset camera imaging formula and preset
One sample mode handles first predetermined depth figure, obtains the second monocular view;The second monocular view be with
The corresponding right view of first monocular view or left view;
Based on the first monocular view and the second monocular view, 3D rendering is generated.
Optionally, the training process of the depth map prediction network model includes:
Multiple and different 3D film sources of different cameral shooting are obtained as training sample;Wherein, each training sample includes
Left view and corresponding right view;
Right view input initial depth figure in preset quantity training sample is predicted into network model, obtains initial depth
The second right depth map of prediction of each training sample of figure prediction network model output;
The left view of preset quantity training sample and right view are inputted into initial camera parameter prediction network model, obtained
The prediction camera parameter of each training sample of initial camera parameter prediction network model output;
It is the prediction camera parameter of the second right depth map of prediction, each training sample based on each training sample, pre-
If camera imaging formula and preset second sample mode, the second right depth map of prediction is handled, each training is obtained
Second prediction right view of sample;
According to right view true in preset loss function, each training sample and its corresponding second prediction
Right view calculates penalty values;
Judge whether initial depth figure prediction network model and initial camera parameter prediction network model are equal according to penalty values
Converge to stabilization;
If converging to stabilization, the quantity of frequency of training is increased once, and judges whether to reach preset training time
Number;If not reaching preset frequency of training, returns to the right view by preset quantity training sample and input initially
Depth map predicts network model, obtains the second right depth of prediction of each training sample of initial depth figure prediction network model output
The step of degree figure;If reaching preset frequency of training, current initial depth figure prediction network model is what training was completed
Depth map predicts network model;
If not converged to stable, the quantity of frequency of training is increased once, and adjusts the initial depth figure prediction
The network parameter of network model and initial camera parameter prediction network model, return is described will be in preset quantity training sample
Right view inputs depth map prediction network model to be trained, and obtains each training of initial depth figure prediction network model output
The second of sample predicts the step of right depth map.
Optionally, the prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is seat of the binocular image reference point in right view
Mark, K are camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor
Rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus,
X0 and y0 is principal point coordinate;
The prediction camera ginseng of the right depth map of the second prediction based on each training sample, each training sample
Several, preset camera imaging formula and preset second sample mode handle the second right depth map of prediction, obtain each
The step of second prediction right view of training sample, comprising:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera
In imaging formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second prediction is obtained
Right view.
Optionally, the initial depth figure predicts network model are as follows: view-based access control model geometry group VGG or U-net network structure
Network;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, each direction packet
Containing level 2 volume product, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
Optionally, the first monocular view is left view, and first predetermined depth figure is the first left depth map of prediction;
It is described based on first predetermined depth figure, the camera parameter of the 2D image, preset camera imaging formula and
The step of preset first sample mode handles first predetermined depth figure, obtains the second monocular view, comprising:
Based on the left depth map of first prediction, the camera parameter of the 2D image, preset camera imaging formula and pre-
If the first sample mode the left depth map of first prediction is handled, obtain the first prediction right view;
It is described be based on the first monocular view and the second monocular view, generate 3D rendering the step of, comprising:
Based on the left view and the first prediction right view, 3D rendering is generated.
Optionally, the camera parameter of the 2D image, comprising: the camera intrinsic parameter and rotation translation parameters of 2D image;
It is described to predict left depth map, the camera parameter of the 2D image, preset camera imaging formula based on described first
And the step of preset first sample mode handles the right depth map of first prediction, and right view is predicted in acquisition first,
Include:
Bring the left depth map of the first prediction, the camera intrinsic parameter of 2D image and rotation translation parameters into the preset camera
In imaging formula, first prediction mapping point of the left view in right view is obtained;
According to first prediction mapping point of the left view in right view, left view is sampled, the first prediction is obtained
Right view.
Second aspect, the embodiment of the invention provides a kind of training method of depth map prediction network model, the methods
Include:
Multiple and different 3D film sources of different cameral shooting are obtained as training sample;Wherein, each training sample includes
Left view and corresponding right view;
Right view input initial depth figure in preset quantity training sample is predicted into network model, obtains initial depth
The second right depth map of prediction of each training sample of figure prediction network model output;
The left view of preset quantity training sample and right view are inputted into initial camera parameter prediction network model, obtained
The prediction camera parameter of each training sample of initial camera parameter prediction network model output;
It is the prediction camera parameter of the second right depth map of prediction, each training sample based on each training sample, pre-
If camera imaging formula and preset second sample mode, the second right depth map of prediction is handled, each training is obtained
Second prediction right view of sample;
According to right view true in preset loss function, each training sample and its corresponding second prediction
Right view calculates penalty values;
Judge whether initial depth figure prediction network model and initial camera parameter prediction network model are equal according to penalty values
Converge to stabilization;
If converging to stabilization, the quantity of frequency of training is increased once, and judges whether to reach preset training time
Number;If not reaching preset frequency of training, returns to the right view by preset quantity training sample and input initially
Depth map predicts network model, obtains the second right depth of prediction of each training sample of initial depth figure prediction network model output
The step of degree figure;If reaching preset frequency of training, current initial depth figure prediction network model is what training was completed
Depth map predicts network model;
If not converged to stable, the quantity of frequency of training is increased once, and adjusts the initial depth figure prediction
The network parameter of network model and initial camera parameter prediction network model, return is described will be in preset quantity training sample
Right view inputs depth map prediction network model to be trained, and obtains each training of initial depth figure prediction network model output
The second of sample predicts the step of right depth map.
Optionally,
The prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is seat of the binocular image reference point in right view
Mark, K are camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor
Rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus,
X0 and y0 is principal point coordinate;
The prediction camera ginseng of the right depth map of the second prediction based on each training sample, each training sample
Several, preset camera imaging formula and preset second sample mode handle the second right depth map of prediction, obtain each
The step of second prediction right view of training sample, comprising:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera
In imaging formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second prediction is obtained
Right view.
Optionally,
The initial depth figure predicts network model are as follows: the network of view-based access control model geometry group VGG or U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, each direction packet
Containing level 2 volume product and 1 layer of average 1 layer FC layers of pondization;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
The third aspect, the embodiment of the invention provides a kind of depth map prediction techniques, which comprises
Obtain the first monocular view of depth map to be predicted;
By the first monocular view, it is input to trained depth map prediction network model in advance;The depth map is pre-
Survey network model are as follows: acquisition is trained based on use any of the above-described training method;The first monocular view is left view
Or right view;
Obtain first predetermined depth figure of depth map prediction network model output.
Fourth aspect, it is described the embodiment of the invention provides the device that a kind of two dimension 2D image is converted to three-dimensional 3D rendering
Device includes:
First 2D image acquisition unit, for obtaining the two-dimentional 2D image to be converted for three-dimensional 3D rendering;
First 2D image input units, for using the 2D image as the first monocular view for being used to generate 3D rendering,
It is input to trained depth map prediction network model in advance;The depth map prediction network model is based on multiple and different 3D
Film source sample predicts that network model and initial camera parameter prediction network model are trained acquisition to initial depth figure;Described
One monocular view is left view or right view;
First predetermined depth figure acquiring unit, for obtaining first predetermined depth of depth map prediction network model output
Figure;
Second monocular view obtaining unit, for the camera parameter, default based on first predetermined depth figure, 2D image
Camera imaging formula and preset first sample mode first predetermined depth figure is handled, obtain the second monocular view
Figure;The second monocular view is right view corresponding with the first monocular view or left view;
3D rendering generation unit generates 3D rendering for being based on the first monocular view and the second monocular view.
Optionally, described device includes depth map prediction network model training unit;The depth map predicts network model
Training unit, comprising:
Training sample obtains module, for obtaining multiple and different 3D film sources of different cameral shooting as training sample;
Wherein, each training sample includes left view and corresponding right view;
The second right depth map of prediction obtains module, for the right view input in preset quantity training sample is initial deep
Degree figure prediction network model obtains the second right depth of prediction of each training sample of initial depth figure prediction network model output
Figure;
Predict that camera parameter obtains module, it is initial for inputting the left view of preset quantity training sample and right view
Camera parameter predicts network model, obtains the prediction camera of each training sample of initial camera parameter prediction network model output
Parameter;
Second prediction right view obtains module, for the second right depth map of prediction, each based on each training sample
The prediction camera parameter of a training sample, preset camera imaging formula and preset second sample mode, it is right to the second prediction
Depth map is handled, and the second prediction right view of each training sample is obtained;
Penalty values computing module, for according to true right view in preset loss function, each training sample
And its corresponding second prediction right view calculates penalty values;
Judgment module is restrained, for judging that initial depth figure prediction network model and initial camera parameter are pre- according to penalty values
Survey whether network model converges to stabilization;
Frequency of training judgment module, if increased the quantity of frequency of training once, and judge for converging to stabilization
Whether preset frequency of training is reached;If not reaching preset frequency of training, triggers the right depth map of second prediction and obtain
It obtains module and the right view input initial depth figure in preset quantity training sample is predicted into network model, obtain initial depth figure
Predict the second right depth map of prediction of each training sample of network model output;If reaching preset frequency of training, when
Preceding initial depth figure prediction network model is the depth map prediction network model that training is completed;
Network parameter adjusts module, if increased the quantity of frequency of training once, and adjust for not converged to stable
The network parameter of the whole prediction of initial depth the figure network model and initial camera parameter prediction network model, triggering described second
Predict that right depth map obtains module and the right view in preset quantity training sample is inputted to depth map prediction network to be trained
Model obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
Optionally, the prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is seat of the binocular image reference point in right view
Mark, K are camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor
Rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus,
X0 and y0 is principal point coordinate;
The second prediction right view obtains module, is specifically used for:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera
In imaging formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second prediction is obtained
Right view.
Optionally, the initial depth figure predicts network model are as follows: view-based access control model geometry group VGG (Visual Geometry
) or the network of U-net network structure Group;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, each direction packet
Containing level 2 volume product, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
Optionally, the first monocular view is left view, and first predetermined depth figure is the first left depth map of prediction;
Second monocular view obtaining unit, comprising: the first prediction right view obtains module;
The first prediction right view obtains module, for based on first prediction left depth map, the 2D image
Camera parameter, preset camera imaging formula and preset first sample mode are predicted at left depth map described first
Reason obtains the first prediction right view;
The 3D rendering generation unit, is specifically used for: based on the left view and the first prediction right view, generating 3D
Image.
Optionally, the camera parameter of the 2D image, comprising: the camera intrinsic parameter and rotation translation parameters of 2D image;
The first prediction right view obtains module, is specifically used for:
Bring the left depth map of the first prediction, the camera intrinsic parameter of 2D image and rotation translation parameters into the preset camera
In imaging formula, first prediction mapping point of the left view in right view is obtained;
According to first prediction mapping point of the left view in right view, left view is sampled, the first prediction is obtained
Right view.
5th aspect, the embodiment of the invention provides a kind of training device of depth map prediction network model, described devices
Include:
Training sample obtains module, for obtaining multiple and different 3D film sources of different cameral shooting as training sample;
Wherein, each training sample includes left view and corresponding right view;
The second right depth map of prediction obtains module, for the right view input in preset quantity training sample is initial deep
Degree figure prediction network model obtains the second right depth of prediction of each training sample of initial depth figure prediction network model output
Figure;
Predict that camera parameter obtains module, it is initial for inputting the left view of preset quantity training sample and right view
Camera parameter predicts network model, obtains the prediction camera of each training sample of initial camera parameter prediction network model output
Parameter;
Second prediction right view obtains module, for the second right depth map of prediction, each based on each training sample
The prediction camera parameter of a training sample, preset camera imaging formula and preset second sample mode, it is right to the second prediction
Depth map is handled, and the second prediction right view of each training sample is obtained;
Penalty values computing module, for according to true right view in preset loss function, each training sample
And its corresponding second prediction right view calculates penalty values;
Judgment module is restrained, for judging that initial depth figure prediction network model and initial camera parameter are pre- according to penalty values
Survey whether network model converges to stabilization;
Frequency of training judgment module, if increased the quantity of frequency of training once, and judge for converging to stabilization
Whether preset frequency of training is reached;If not reaching preset frequency of training, triggers the right depth map of second prediction and obtain
It obtains module and the right view input initial depth figure in preset quantity training sample is predicted into network model, obtain initial depth figure
Predict the second right depth map of prediction of each training sample of network model output;If reaching preset frequency of training, when
Preceding initial depth figure prediction network model is the depth map prediction network model that training is completed;
Network parameter adjusts module, if increased the quantity of frequency of training once, and adjust for not converged to stable
The network parameter of the whole prediction of initial depth the figure network model and initial camera parameter prediction network model, triggering described second
Predict that right depth map obtains module and the right view in preset quantity training sample is inputted to depth map prediction network to be trained
Model obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
Optionally,
The prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is seat of the binocular image reference point in right view
Mark, K are camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor
Rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus,
X0 and y0 is principal point coordinate;
The second prediction right view obtains module, is specifically used for:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera
In imaging formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second prediction is obtained
Right view.
Optionally,
The initial depth figure predicts network model are as follows: view-based access control model geometry group VGG (Visual Geometry Group)
Or the network of U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, each direction packet
Containing level 2 volume product, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
6th aspect, the embodiment of the invention provides a kind of depth map prediction meanss, described device includes:
First monocular view obtaining unit, for obtaining the first monocular view of depth map to be predicted;
First monocular view input unit, for by the first monocular view, being input to preparatory trained depth map
Predict network model;The depth map predicts network model are as follows: is trained acquisition based on the above-mentioned training method of use;Described
One monocular view is left view or right view;
First depth map obtaining unit, for obtaining first predetermined depth figure of depth map prediction network model output.
7th aspect, the embodiment of the invention provides a kind of electronic equipment, including processor, communication interface, memory and
Communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes that any of the above-described two dimension 2D image is converted to three
The step of tieing up the method for 3D rendering.
Eighth aspect, the embodiment of the invention provides a kind of electronic equipment, including processor, communication interface, memory and
Communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Any of the above-described depth map prediction is realized or realized to processor when for executing the program stored on memory,
The step of training method of network model.
9th aspect, the embodiment of the invention provides a kind of electronic equipment, including processor, communication interface, memory and
Communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes the depth map prediction side of any of the above-described image
The step of method.
Present invention implementation additionally provides a kind of computer readable storage medium, storage in the computer readable storage medium
There is computer program, the computer program realizes that any of the above-described two dimension 2D image is converted to three-dimensional 3D figure when being executed by processor
The step of method of picture;Or the step of realizing the training method of any of the above-described depth map prediction network model;Or on realizing
The step of stating the depth map prediction technique of any image.
The embodiment of the invention also provides a kind of computer program products comprising instruction, when it runs on computers
When, so that computer executes any of the above-described two dimension 2D image and is converted to three-dimensional 3D rendering method;Or realize any of the above-described depth
The training method of figure prediction network model;Or realize the depth map prediction technique of any of the above-described image.
The embodiment of the present invention the utility model has the advantages that
Image provided in an embodiment of the present invention is converted, depth map is predicted and model training method, device and electronic equipment, can
To obtain the two-dimentional 2D image to be converted for three-dimensional 3D rendering;Using the 2D image as the first monocular for being used to generate 3D rendering
View is input to trained depth map prediction network model in advance;Depth map prediction network model be based on it is multiple not
Same 3D film source sample predicts that network model and initial camera parameter prediction network model are trained acquisition to initial depth figure;
The first monocular view is left view or right view;Obtain first predetermined depth figure of depth map prediction network model output;
Based on first predetermined depth figure, the camera parameter of 2D image, preset camera imaging formula and preset first sampling side
Formula handles first predetermined depth figure, obtains the second monocular view;The second monocular view be and the first monocular
The corresponding right view of view or left view;Based on the first monocular view and the second monocular view, 3D rendering is generated.It can
See, does not need true depth map based on 3D rendering to the prediction network model training of initial depth figure using the embodiment of the present invention
It coaches, depth map prediction network model can be trained, realize that 2D image is converted to 3D rendering.
Certainly, implement any of the products of the present invention or method it is not absolutely required at the same reach all the above excellent
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of process for the method that a kind of two dimension 2D image provided in an embodiment of the present invention is converted to three-dimensional 3D rendering
Figure;
Fig. 2 is depth prediction network structure provided in an embodiment of the present invention;
Fig. 3 is the training principle that depth map provided in an embodiment of the present invention predicts network model and camera parameter predicts network
Figure;
Fig. 4 is the structure chart that camera parameter provided in an embodiment of the present invention predicts network;
Fig. 5 is the training flow chart that depth map provided in an embodiment of the present invention predicts network model;
Fig. 6 is a kind of flow chart of the depth map prediction technique of image provided in an embodiment of the present invention;
Fig. 7 provides the structural schematic diagram that a kind of two dimension 2D image is converted to three-dimensional 3D rendering device for the embodiment of the present invention;
Fig. 8 provides a kind of structural schematic diagram of the training device of depth map prediction network model for the embodiment of the present invention;
Fig. 9 provides a kind of structural schematic diagram of the depth map prediction meanss of image for the embodiment of the present invention;
Figure 10 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention;
Figure 11 is the structural schematic diagram of another electronic equipment provided in an embodiment of the present invention;
Figure 12 is the structural schematic diagram of another electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to realize that not having to true depth map coaches, depth map prediction network model can be trained, and realize 2D
Image is converted to 3D rendering, and the embodiment of the invention provides image conversion, depth map prediction and model training method, device and electricity
Sub- equipment.
The conversion of image provided by the embodiment of the present invention, depth map prediction and model training method can be applied to any need
Image is converted, the electronic equipment of depth map prediction and model training, such as: computer or mobile terminal are not done specific herein
It limits.For convenience, hereinafter referred to as electronic equipment.
The method that two dimension 2D image provided in an embodiment of the present invention is converted to three-dimensional 3D rendering, as shown in Figure 1, this method
Specifically process flow includes:
Step S101 obtains the two-dimentional 2D image to be converted for three-dimensional 3D rendering.
It is enforceable, the available two-dimentional 2D image to be converted for three-dimensional 3D rendering of electronic equipment.
Step S102 is input to preparatory training using the 2D image as the first monocular view for being used to generate 3D rendering
Good depth map predicts network model;The depth map prediction network model is based on multiple and different 3D film source samples to initial
Depth map prediction network model and initial camera parameter prediction network model are trained acquisition;The first monocular view is a left side
View or right view.
It is enforceable, it, can will be described using the 2D image as in the first monocular view for being used to generate 3D rendering
2D image is as the left view for generating 3D rendering.
Depth map used in the present embodiment predicts network model, can be as shown in Figure 2 based on VGG (Visual
Geometry Group, visual geometric group) or U-net network structure network, comprising: coded portion and decoded portion;Coding
14 layers of convolution are passed through in part, and decoded portion passes through 14 layers of convolution.
It can be found in the encoding and decoding of the depth map prediction network model coding and decoding part of the embodiment of the present invention shown in table 1
Table.
Table 1
。
As shown in table 1, coded portion includes the first cascade down-sampling network, the second cascade down-sampling network, third cascade
Down-sampling network, fourth stage connection down-sampling network and level V connection down-sampling network, level V connection down-sampling network, the 6th cascade
Down-sampling network and the 7th cascade down-sampling network.Each down-sampling cascade network separately includes two convolutional layers, certainly may be used
To be adjusted according to actual needs to the structure of cascade network.
It is enforceable, be illustrated by taking right view as an example, coded portion to the right view in sample twice convolution respectively into
Row increases channel and reduces size processing, obtains the second coding down-sampled images of the last layer convolutional layer output.Such as 1 institute of table
Show, the right view having a size of 265*512*3 is input in the first cascade down-sampling network, wherein 265 can indicate right view
It is wide;512 can indicate the height of right view;3 can indicate the port number of the right view.First cascade down-sampling network includes conv1 (the
One convolutional layer) and conv2 (second convolutional layer), conv1 (first convolutional layer) right view of 265*512*3 is increased
The process of convolution of dimension is added to obtain the characteristic pattern 1 of 265*512*32, conv2 (second convolutional layer) reduces characteristic pattern 1
The process of convolution of size obtains the characteristic pattern 2 of 128*265*32;Characteristic pattern 2 is passed through into conv3 (third convolutional layer) convolution again
Processing obtains the characteristic pattern 3 of 128*265*64.And so on, it eventually passes through conv14 (the 14th convolutional layer) process of convolution and obtains
To the down-sampled images of 2*4*512.Again by down-sampled images decoded portion.
Decoded portion includes: the first cascade up-sampling network, the second cascade up-sampling network, third cascade up-sampling net
Network, fourth stage connection up-sampling network and level V connection up-sampling network, level V connection up-sampling network, the 6th cascade up-sampling net
Network and the 7th cascade up-sampling network.Each up-sampling cascade network separately includes up-sampling and two convolutional layers, certainly may be used
To be adjusted according to actual needs to the structure of cascade network.Each up-sampling cascade network includes up-sampling bilinear interpolation
The processing of increased in size and two convolutional layers reduce the processing of dimension, and one of convolutional layer, which is done, reduces dimension processing, another
Convolution, which is not done, reduces dimension processing.
Decoded portion carries out the first up-sampling to the down-sampled images obtained by coded portion, to the image of 2*4*512
Bilinear interpolation, increased in size handle to obtain the up-sampling intermediate image 1 of 4*8*512, and conv1 (first convolutional layer) is to above adopting
1 process of convolution of sample intermediate image obtains the up-sampling characteristic pattern 1 of 4*8*512, then up-sampling characteristic pattern 1 is passed through conv2 (second
A convolutional layer) process of convolution obtain up-sampling characteristic pattern 2.Illustrate here twice convolution do not do reduce channel processing be
For model needs, can adjust according to the actual situation.
Again by characteristic pattern 2 by the second up-sampling bilinear interpolation, increased in size handles to obtain the up-sampling of 8*16*512
Intermediate image 2, up-sampling characteristic pattern of the conv3 (third convolutional layer) to up-sampling 2 process of convolution 8*16*512 of intermediate image
3, then up-sampling characteristic pattern 3 is obtained to up-sample characteristic pattern 4 by conv4 (the 4th convolutional layer) process of convolution.It illustrates
Convolution, which is not done and reduces channel processing, twice here can be adjusted according to the actual situation for model needs.
Characteristic pattern 4 is up-sampled into bilinear interpolation by third again, increased in size handles to obtain the up-sampling of 8*16*512
Intermediate image 3, conv5 (the 5th convolutional layer) to up-sampling 3 convolution of intermediate image reduce channel processing 16*32*512 on adopt
Sample characteristic pattern 5, then up-sampling characteristic pattern 5 is obtained to up-sample characteristic pattern 6 by conv6 (the 6th convolutional layer) process of convolution.
And so on.It illustrates, a right depth map is exported respectively in Conv8, Conv10, Conv12 and Conv14, in table
Shown in Conv8_out, Conv10_out, Conv12_out and Conv14_out.4 predictions can be exported by being equivalent to a sample
Right depth map, finally according to this 4 times predict right depth map be averaging penalty values.
It should be noted that seven cascade sampling networks are provided in optional embodiment of the present invention, in actual implementation mistake
Cheng Zhong can be more than seven or less than seven cascade sampling networks according to the specific requirements setting for implementing personnel.
Step S103 obtains first predetermined depth figure of depth map prediction network model output.
In a kind of specific embodiment, if using the first monocular view as left view, first predetermined depth
Figure is the first left depth map of prediction.
Step S104, based on first predetermined depth figure, the camera parameter of 2D image, preset camera imaging formula and
Preset first sample mode handles first predetermined depth figure, obtains the second monocular view;Second monocular
View is right view corresponding with the first monocular view or left view.
It is enforceable, if using the first monocular view as left view, can based on the left depth map of first prediction,
The camera parameter of 2D image, preset camera imaging formula and preset first sample mode are to the left depth map of first prediction
It is handled, obtains the first prediction right view.Wherein, the camera parameter of 2D image can use the phase obtained during model training
Machine parameter, it is pre-set to be also possible to user;
In a kind of specific embodiment, if will acquire the two-dimentional 2D image to be converted for three-dimensional 3D rendering as left
View then can bring the left depth map of the first prediction, the camera intrinsic parameter of 2D image and rotation translation parameters into preset camera
In imaging formula, first prediction mapping point of the left view in right view is obtained;According to left view in right view first
It predicts mapping point, left view is sampled, obtain the first prediction right view.
Wherein, preset camera imaging formula can be camera motion imaging formula:
Wherein ,~it is mapping operations;Ps is coordinate of the binocular image reference point in left view, and Pt is that binocular image is related
Coordinate of the point in right view, K are camera Intrinsic Matrix, K-11For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) be Pt this
The depth of a bit, Tt→sFor rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, wherein
Fx and fy is camera focus, and x0 and y0 are principal point coordinate;
The camera parameter of 2D image may include: the camera intrinsic parameter and rotation translation parameters of 2D image.It is enforceable,
If the first monocular view is right view, the second monocular view is left view corresponding with the first monocular view, is not done herein
It is specific to limit.
Step S105 is based on the first monocular view and the second monocular view, generates 3D rendering.
It is enforceable, in a kind of specific embodiment, 3D can be generated based on left view and the first prediction right view
The left view, can be can be used as the picture that left eye monocular is seen by image, and the picture that right view is seen as right eye monocular leads to
It crosses existing 3D equipment and watches the left view and right view, obtain the corresponding 3D video of 2D video data to be converted.Or pass through
It is existing that left view and right view are handled to obtain the mode of 3D video, the left view and right view are handled, is obtained to be converted
The corresponding 3D video of 2D video data.In the embodiment of the present invention specifically without limitation.
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, the second monocular view in the present embodiment, be based on first predetermined depth figure, 2D image camera parameter,
What preset camera imaging formula and preset first sample mode obtained after handling first predetermined depth figure.?
With reference to camera parameter during prediction, so that the prediction right view obtained is truer.
The training schematic diagram of depth map prediction network model and camera parameter prediction network provided in an embodiment of the present invention, such as
Shown in Fig. 3, may include:
301 predict that network model, depth map predict that the network structure of network model is as shown in Figure 2 for depth map;302 be phase
Machine parameter prediction network model, camera parameter predict the network structure of network model as shown in figure 4, by preset quantity training sample
Right view R input initial depth figure in this predicts network model, obtains each of initial depth figure prediction network model output
The second right depth map Z ' of prediction of training sample;The preset quantity training sample that will be inputted in depth map prediction network model simultaneously
Left view and right view elder generation concat (splicing) in this input initial camera parameter prediction network model, obtain at the figure in 6 channels
Obtain 10 prediction camera parameters of each training sample of initial camera parameter prediction network model output, comprising: camera internal reference
Matrix number K, camera intrinsic parameter inverse matrix K-1 and rotational translation matrix T;According to the right depth of the camera parameter of prediction and prediction
Z ' is schemed, according to preset camera imaging formulaWherein ,~and it is mapping operations, obtain right view
The mapping point that figure is predicted in left view obtains the pixel of left view from left view Sampler (sampling);According to left view
In identical pixel corresponding coordinate in right view, the right view predicted;By the right view of prediction and the true right side
View R calculates penalty values according to preset loss function, judges depth map prediction network model model and camera according to penalty values
Whether parameter prediction network model model converges to stabilization.
Camera parameter predicts that the structure chart of network is as shown in Figure 4, comprising: is divided into both direction up and down after 5 layers of convolution, each
Direction includes level 2 volume product, 1 layer of average pond and 1 layer of full FC layers of connection.First by left view and the spliced ruler of corresponding right view
The very little image for 256*512*6 inputs down-sampling cascade network, wherein 6 be port number, wherein passing through down-sampling cascade network every time
Network reduces size, increases port number, obtains 1 down-sampled images, and each down-sampling cascade network can have a convolutional layer, volume
Reduce size after product and increases port number;Reduce size by 5 down-sampling cascade networks, increases port number, obtain 8*16*
512 the 5th down-sampled images.5th down-sampled images are separated into both direction up and down, and a direction is by level 1 volume product
Layer increases dimension, reduces size, obtains the image of 4*8*512, then the image of 4*8*512 is increased dimension by level 1 volume lamination,
Size is reduced, the image of 2*4*1024 is obtained, the image of 1*1*1024 is obtained using 1 layer of average pond, connects entirely using 1 layer
It connects layer (FC) and obtains first time of 1*1*6 and connect image entirely, exporting 6 parameters can be rotary flat shifting parameter;Another direction
Increase dimension by level 1 volume lamination, reduce size, obtains the image of 4*8*512, then by the image of 4*8*512 by level 1 volume product
Layer increases dimension, reduces size, obtains the image of 2*4*1024, obtains the image of 1*1*1024 using 1 layer of average pond, then
Second of full connection image of 1*1*4 is obtained by 1 layer of full articulamentum (FC), exporting 4 parameters can be camera intrinsic parameter, phase
Machine focal length fx and fy, principal point coordinate x0 and y0, the activation primitive of output layer are softplus function (softplus function)
Or exported without activation primitive, remainder layer is relu activation primitive.
The training method of depth map prediction network model provided in an embodiment of the present invention, can be as shown in Figure 5.Fig. 5 is this
A kind of two dimension 2D image of inventive embodiments is converted to the training stream of depth map prediction network model in the method for three-dimensional 3D rendering
Cheng Tu, comprising:
Step S501 obtains multiple and different 3D film sources of different cameral shooting as training sample;Wherein, each training
Sample includes left view and corresponding right view.
Right view input initial depth figure in preset quantity training sample is predicted network model, obtained by step S502
Obtain the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
It is enforceable, the right view in 8 training samples can be inputted into initial depth figure and predict network model, obtained just
Beginning depth map predicts that the second right depth map of prediction of each training sample of network model output, initial depth figure predict network mould
Type can be the network of view-based access control model geometry group VGG or U-net network structure;
Coded portion may include 14 convolutional layers, and decoded portion may include 14 convolutional layers, can specifically refer to
Table 1, wherein activation primitive can be softplus (softplus function) function and elu function.Wherein softplus
(softplus function) function can be used as output layer activation primitive, and remainder layer is elu activation primitive.
The left view of preset quantity training sample and right view are inputted initial camera parameter prediction network by step S503
Model obtains the prediction camera parameter of each training sample of initial camera parameter prediction network model output.
Enforceable, prediction camera parameter may include: prediction camera intrinsic parameter and prediction rotation translation parameters;It is wherein pre-
The camera intrinsic parameter of survey is activation primitive softplus (softplus function) output, and the rotation translation parameters of prediction can
Think no activation primitive output.Preset camera imaging formula can be with are as follows:~it is mapping
Operation;Ps is coordinate of the binocular image reference point in left view, and Pt is coordinate of the binocular image reference point in right view, K
For camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor rotation
Translation matrix;The camera Intrinsic Matrix include (fx, fy, x0, y0) 4 parameters, wherein fx and fy be camera focus, x0 and
Y0 is principal point coordinate.
Step S504, the prediction phase of the right depth map of the second prediction, each training sample based on each training sample
Machine parameter, preset camera imaging formula and preset second sample mode handle the second right depth map of prediction, obtain
Second prediction right view of each training sample.
It, can be first by the right depth map of the second prediction, prediction camera intrinsic parameter and prediction in a kind of specific embodiment
Rotation translation parameters is brought into the preset camera imaging formula, is obtained second prediction mapping of the right view in left view and is sat
Mark;According still further to second prediction mapping point of the right view in left view, left view is sampled, the second right view of prediction is obtained
Figure.
Step S505, according to right view true in preset loss function, each training sample and its corresponding
Second prediction right view calculate penalty values.
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
Step S506 judges initial depth figure prediction network model and initial camera parameter prediction network mould according to penalty values
Whether type converges to stabilization.
It is enforceable, can be judged according to penalty values size variation and fluctuating change initial depth figure prediction network model and
Whether initial camera parameter prediction network model converges to stabilization.
If it is determined that result be it is yes, i.e., if converging to stabilization, then follow the steps S507;The result judged be it is no,
I.e. if not converging to stabilization, S509 is thened follow the steps.
The quantity of frequency of training is increased once, and judges whether to reach preset frequency of training by step S507.
If it is determined that result be it is yes, that is, reach preset frequency of training, then return to step S502;If it is determined that
As a result be it is no, that is, reach preset frequency of training, then follow the steps S508.
Step S508 determines that current initial depth figure prediction network model is the depth map prediction network mould that training is completed
Type.Process terminates.
Step S509 adjusts the net of the prediction of initial depth the figure network model and initial camera parameter prediction network model
Network parameter, returns to step S502.
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, during depth map prediction network model training, camera parameter prediction is added in the embodiment of the present invention
Network model learns camera parameter, eliminates the shadow that different cameral parameter during model training predicts depth map
It rings, the depth map for predicting depth map prediction network model is truer, and 3D rendering depth stereovision is richer, and stereoscopic effect is more
By force.
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, during depth map prediction network model training, camera parameter prediction is added in the embodiment of the present invention
Network model learns camera parameter, reduces the shadow that different cameral parameter during model training predicts depth map
It rings, the depth map for predicting depth map prediction network model is truer, and 3D rendering depth stereovision is richer, and stereoscopic effect is more
By force.
The depth map prediction technique of image provided in an embodiment of the present invention, as shown in fig. 6, the specific process flow of this method
Include:
Step S601 obtains the first monocular view of depth map to be predicted.
The first monocular view is input to trained depth map prediction network model in advance by step S602;It is described
Depth map predicts that network model can be with are as follows: based on being trained acquisition using Fig. 5 training method;The first monocular view is a left side
View or right view.
Step S603 obtains first predetermined depth figure of depth map prediction network model output.
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, during depth map prediction network model training, camera parameter prediction is added in the embodiment of the present invention
Network model learns camera parameter, reduces the shadow that different cameral parameter during model training predicts depth map
It rings, the depth map for predicting depth map prediction network model is truer, and 3D rendering depth stereovision is richer, and stereoscopic effect is more
By force.
The embodiment of the present invention provides the structural schematic diagram that a kind of two dimension 2D image is converted to three-dimensional 3D rendering device, such as Fig. 7
It is shown, comprising:
First 2D image acquisition unit 701, for obtaining the two-dimentional 2D image to be converted for three-dimensional 3D rendering;
First 2D image input units 702, for being regarded the 2D image as the first monocular for being used to generate 3D rendering
Figure is input to trained depth map prediction network model in advance;The depth map prediction network model is based on multiple and different
3D film source sample network model and initial camera parameter prediction network model, which are trained acquisition, to be predicted to initial depth figure;Institute
Stating the first monocular view is left view or right view;
First predetermined depth figure acquiring unit 703, for obtaining the first pre- depth measurement of depth map prediction network model output
Degree figure;
Second monocular view obtaining unit 704, for based on first predetermined depth figure, 2D image camera parameter,
Preset camera imaging formula and preset first sample mode handle first predetermined depth figure, and it is single to obtain second
Eye diagram;The second monocular view is right view corresponding with the first monocular view or left view;
3D rendering generation unit 705 generates 3D figure for being based on the first monocular view and the second monocular view
Picture.
Optionally, described device includes depth map prediction network model training unit;The depth map predicts network model
Training unit, comprising:
Training sample obtains module, for obtaining multiple and different 3D film sources of different cameral shooting as training sample;
Wherein, each training sample includes left view and corresponding right view;
The second right depth map of prediction obtains module, for the right view input in preset quantity training sample is initial deep
Degree figure prediction network model obtains the second right depth of prediction of each training sample of initial depth figure prediction network model output
Figure;
Predict that camera parameter obtains module, it is initial for inputting the left view of preset quantity training sample and right view
Camera parameter predicts network model, obtains the prediction camera of each training sample of initial camera parameter prediction network model output
Parameter;
Second prediction right view obtains module, for the second right depth map of prediction, each based on each training sample
The prediction camera parameter of a training sample, preset camera imaging formula and preset second sample mode, it is right to the second prediction
Depth map is handled, and the second prediction right view of each training sample is obtained;
Penalty values computing module, for according to true right view in preset loss function, each training sample
And its corresponding second prediction right view calculates penalty values;
Judgment module is restrained, for judging that initial depth figure prediction network model and initial camera parameter are pre- according to penalty values
Survey whether network model converges to stabilization;
Frequency of training judgment module, if increased the quantity of frequency of training once, and judge for converging to stabilization
Whether preset frequency of training is reached;If not reaching preset frequency of training, triggers the right depth map of second prediction and obtain
It obtains module and the right view input initial depth figure in preset quantity training sample is predicted into network model, obtain initial depth figure
Predict the second right depth map of prediction of each training sample of network model output;If reaching preset frequency of training, when
Preceding initial depth figure prediction network model is the depth map prediction network model that training is completed;
Network parameter adjusts module, if increased the quantity of frequency of training once, and adjust for not converged to stable
The network parameter of the whole prediction of initial depth the figure network model and initial camera parameter prediction network model, triggering described second
Predict that right depth map obtains module and the right view in preset quantity training sample is inputted to depth map prediction network to be trained
Model obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
Optionally, the prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is seat of the binocular image reference point in right view
Mark, K are camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor
Rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus,
X0 and y0 is principal point coordinate;
The second prediction right view obtains module, is specifically used for:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera
In imaging formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second prediction is obtained
Right view.
Optionally, the initial depth figure predicts network model are as follows: view-based access control model geometry group VGG or U-net network structure
Network;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, each direction packet
Containing level 2 volume product, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
Optionally, the first monocular view is left view, and first predetermined depth figure is the first left depth map of prediction;
Second monocular view obtaining unit, comprising: the first prediction right view obtains module;
The first prediction right view obtains module, for based on first prediction left depth map, the 2D image
Camera parameter, preset camera imaging formula and preset first sample mode are predicted at left depth map described first
Reason obtains the first prediction right view;
The 3D rendering generation unit, is specifically used for: based on the left view and the first prediction right view, generating 3D
Image.
Optionally, the camera parameter of the 2D image, comprising: the camera intrinsic parameter and rotation translation parameters of 2D image;
The first prediction right view obtains module, is specifically used for:
Bring the left depth map of the first prediction, the camera intrinsic parameter of 2D image and rotation translation parameters into the preset camera
In imaging formula, first prediction mapping point of the left view in right view is obtained;
According to first prediction mapping point of the left view in right view, left view is sampled, the first prediction is obtained
Right view.
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, the second monocular view in the present embodiment, be based on first predetermined depth figure, 2D image camera parameter,
After preset camera imaging formula and preset first sample mode handle first predetermined depth figure, acquisition.
With reference to camera parameter during prediction, so that the prediction right view obtained is truer.
The embodiment of the present invention provides a kind of structural schematic diagram of the training device of depth map prediction network model, such as Fig. 8 institute
Show, comprising:
Training sample obtains module 801, for obtaining multiple and different 3D film sources of different cameral shooting as training sample
This;Wherein, each training sample includes left view and corresponding right view;
The second right depth map of prediction obtains module 802, for inputting just the right view in preset quantity training sample
Beginning depth map predicts network model, and the second prediction for obtaining each training sample of initial depth figure prediction network model output is right
Depth map;
Predict that camera parameter obtains module 803, for inputting the left view of preset quantity training sample and right view
Initial camera parameter prediction network model obtains the prediction of each training sample of initial camera parameter prediction network model output
Camera parameter;
Second prediction right view obtains module 804, for based on each training sample the second right depth map of prediction,
The prediction camera parameter of each training sample, preset camera imaging formula and preset second sample mode are predicted second
Right depth map is handled, and the second prediction right view of each training sample is obtained;
Penalty values computing module 805, for according to right view true in preset loss function, each training sample
Figure and its corresponding second prediction right view calculate penalty values;
Judgment module 806 is restrained, for judging that initial depth figure prediction network model and initial camera are joined according to penalty values
Whether number prediction network model converges to stabilization;
Frequency of training judgment module 807, if the quantity of frequency of training is increased once for converging to stabilization, and
Judge whether to reach preset frequency of training;If not reaching preset frequency of training, the right depth of second prediction is triggered
Figure obtains module and the right view input initial depth figure in preset quantity training sample is predicted network model, obtains initial deep
The second right depth map of prediction of each training sample of degree figure prediction network model output;If reaching preset frequency of training,
Then current initial depth figure prediction network model is the depth map prediction network model that training is completed;
Network parameter adjusts module 808, if the quantity of frequency of training increased once for not converged to stable,
And the network parameter of the prediction of initial depth the figure network model and initial camera parameter prediction network model is adjusted, described in triggering
The second right depth map of prediction obtains module and the right view in preset quantity training sample is inputted to depth map prediction to be trained
Network model obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
Optionally, the prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is seat of the binocular image reference point in right view
Mark, K are camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor
Rotational translation matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus,
X0 and y0 is principal point coordinate;
The second prediction right view obtains module, is specifically used for:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera
In imaging formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second prediction is obtained
Right view.
Optionally,
The initial depth figure predicts network model are as follows: the network of view-based access control model geometry group VGG or U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, each direction packet
Containing level 2 volume product, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents
True right view,Represent prediction right view;Indicate the structure of prediction right view and true right view
Similitude;Indicate the absolute value error L1 of prediction right view and true right view.
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,The first derivative of right depth map in y-direction is represented,The first derivative of right view in the x direction is represented,Represent the right side
The first derivative of view in y-direction;I, j represent pixel coordinate.
Final third loss function is
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, during depth map prediction network model training, camera parameter prediction is added in the embodiment of the present invention
Network model learns camera parameter, reduces the shadow that different cameral parameter during model training predicts depth map
It rings, the depth map for predicting depth map prediction network model is truer, and 3D rendering depth stereovision is richer, and stereoscopic effect is more
By force.
The embodiment of the present invention provides a kind of structural schematic diagram of the depth map prediction meanss of image, as shown in Figure 9, comprising:
First monocular view obtaining unit 901, for obtaining the first monocular view of depth map to be predicted;
First monocular view input unit 902, for by the first monocular view, being input to preparatory trained depth
Figure prediction network model;The depth map predicts network model are as follows: is trained acquisition based on use Fig. 8 device;Described first
Monocular view is left view or right view;
First depth map obtaining unit 903, for obtaining first predetermined depth figure of depth map prediction network model output.
As it can be seen that, based on 3D rendering to the prediction network model training of initial depth figure, not needed true using the embodiment of the present invention
Real depth map coaches, and can train depth map prediction network model, realize that 2D image is converted to 3D rendering.
In addition, during depth map prediction network model training, camera parameter prediction is added in the embodiment of the present invention
Network model learns camera parameter, reduces the shadow that different cameral parameter during model training predicts depth map
It rings, the depth map for predicting depth map prediction network model is truer, and 3D rendering depth stereovision is richer, and stereoscopic effect is more
By force.
The embodiment of the invention also provides a kind of electronic equipment, as shown in Figure 10, including processor 1001, communication interface
1002, memory 1003 and communication bus 1004, wherein processor 1001, communication interface 1002, memory 1003 pass through communication
Bus 1004 completes mutual communication,
Memory 1003, for storing computer program;
Processor 1001 when for executing the program stored on memory 1003, realizes following steps:
Obtain the two-dimentional 2D image to be converted for three-dimensional 3D rendering;
Using the 2D image as the first monocular view for being used to generate 3D rendering, it is input to preparatory trained depth map
Predict network model;The depth map prediction network model is to be predicted based on multiple and different 3D film source samples initial depth figure
Network model and initial camera parameter prediction network model are trained acquisition;The first monocular view is left view or right view
Figure;
Obtain first predetermined depth figure of depth map prediction network model output;
Based on first predetermined depth figure, the camera parameter of 2D image, preset camera imaging formula and preset
One sample mode handles first predetermined depth figure, obtains the second monocular view;The second monocular view be with
The corresponding right view of first monocular view or left view;
Based on the first monocular view and the second monocular view, 3D rendering is generated.
The embodiment of the invention also provides another electronic equipments, as shown in figure 11, including processor 1101, communication interface
1102, memory 1103 and communication bus 1104, wherein processor 1101, communication interface 1102, memory 1103 pass through communication
Bus 1104 completes mutual communication,
Memory 1103, for storing computer program;
Processor 1101 when for executing the program stored on memory 1103, obtains the multiple of different cameral shooting
Different 3D film sources are as training sample;Wherein, each training sample includes left view and corresponding right view;
Right view input initial depth figure in preset quantity training sample is predicted into network model, obtains initial depth
The second right depth map of prediction of each training sample of figure prediction network model output;
The left view of preset quantity training sample and right view are inputted into initial camera parameter prediction network model, obtained
The prediction camera parameter of each training sample of initial camera parameter prediction network model output;
It is the prediction camera parameter of the second right depth map of prediction, each training sample based on each training sample, pre-
If camera imaging formula and preset second sample mode, the second right depth map of prediction is handled, each training is obtained
Second prediction right view of sample;
According to right view true in preset loss function, each training sample and its corresponding second prediction
Right view calculates penalty values;
Judge whether initial depth figure prediction network model and initial camera parameter prediction network model are equal according to penalty values
Converge to stabilization;
If converging to stabilization, the quantity of frequency of training is increased once, and judges whether to reach preset training time
Number;If not reaching preset frequency of training, returns to the right view by preset quantity training sample and input initially
Depth map predicts network model, obtains the second right depth of prediction of each training sample of initial depth figure prediction network model output
The step of degree figure;If reaching preset frequency of training, current initial depth figure prediction network model is what training was completed
Depth map predicts network model;
If not converged to stable, the quantity of frequency of training is increased once, and adjusts the initial depth figure prediction
The network parameter of network model and initial camera parameter prediction network model, return is described will be in preset quantity training sample
Right view inputs depth map prediction network model to be trained, and obtains each training of initial depth figure prediction network model output
The second of sample predicts the step of right depth map.
The embodiment of the invention also provides another electronic equipments, as shown in figure 12, including processor 1201, communication interface
1202, memory 1203 and communication bus 1204, wherein processor 1201, communication interface 1202, memory 1203 pass through communication
Bus 1204 completes mutual communication,
Memory 1203, for storing computer program;
Processor 1201 when for executing the program stored on memory 1203, obtains the first of depth map to be predicted
Monocular view;
By the first monocular view, it is input to trained depth map prediction network model in advance;The depth map is pre-
Survey network model are as follows: acquisition is trained based on use any of the above-described training method;The first monocular view is left view
Or right view;
Obtain first predetermined depth figure of depth map prediction network model output.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
In another embodiment provided by the invention, a kind of computer readable storage medium is additionally provided, which can
It reads to be stored with computer program in storage medium, the computer program realizes any of the above-described any two dimension when being executed by processor
2D image is converted to the step of method of three-dimensional 3D rendering;Or realize the training side of any of the above-described depth map prediction network model
The step of method;Or the step of realizing the depth map prediction technique of any of the above-described image.
In another embodiment provided by the invention, a kind of computer program product comprising instruction is additionally provided, when it
When running on computers, so that computer executes any two dimension 2D image in above-described embodiment and is converted to three-dimensional 3D rendering side
Method;Or realize the training method of any of the above-described depth map prediction network model;Or realize the depth map of any of the above-described image
Prediction technique.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present invention.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
Solid State Disk (SSD)) etc..
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For the embodiments such as computer readable storage medium and computer program product, since it is substantially similar to the method embodiment, institute
To be described relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (23)
1. a kind of method that two dimension 2D image is converted to three-dimensional 3D rendering, which is characterized in that the described method includes:
Obtain the two-dimentional 2D image to be converted for three-dimensional 3D rendering;
Using the 2D image as the first monocular view for being used to generate 3D rendering, it is input to trained depth map prediction in advance
Network model;The depth map prediction network model is to predict network to initial depth figure based on multiple and different 3D film source samples
Model and initial camera parameter prediction network model are trained acquisition;The first monocular view is left view or right view;
Obtain first predetermined depth figure of the depth map prediction network model output;
It is adopted based on first predetermined depth figure, the camera parameter of 2D image, preset camera imaging formula and preset first
Sample loading mode handles first predetermined depth figure, obtains the second monocular view;The second monocular view be and first
The corresponding right view of monocular view or left view;
Based on the first monocular view and the second monocular view, 3D rendering is generated.
2. the method according to claim 1, wherein the training process packet of depth map prediction network model
It includes:
Multiple and different 3D film sources of different cameral shooting are obtained as training sample;Wherein, each training sample includes left view
Figure and corresponding right view;
Right view input initial depth figure in preset quantity training sample is predicted into network model, it is pre- to obtain initial depth figure
Survey the second right depth map of prediction of each training sample of network model output;
The left view of preset quantity training sample and right view are inputted into initial camera parameter prediction network model, obtained initial
Camera parameter predicts the prediction camera parameter of each training sample of network model output;
It is the prediction camera parameter of the second right depth map of prediction, each training sample based on each training sample, preset
Camera imaging formula and preset second sample mode handle the second right depth map of prediction, obtain each training sample
Second prediction right view;
According to right view true in preset loss function, each training sample and its right view of corresponding second prediction
Figure calculates penalty values;
Judge whether initial depth figure prediction network model and initial camera parameter prediction network model restrain according to penalty values
To stabilization;
If converging to stabilization, the quantity of frequency of training is increased once, and judge whether to reach preset frequency of training;Such as
Fruit does not reach preset frequency of training, then returns to the right view by preset quantity training sample and input initial depth figure
It predicts network model, obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output
Step;If reaching preset frequency of training, current initial depth figure prediction network model is the depth map that training is completed
Predict network model;
If not converged to stable, the quantity of frequency of training is increased once, and it is pre- to adjust the current initial depth figure
The network parameter of network model and initial camera parameter prediction network model is surveyed, return is described will be in preset quantity training sample
Right view input depth map prediction network model to be trained, obtain each instruction of initial depth figure prediction network model output
The step of practicing the second right depth map of prediction of sample.
3. according to the method described in claim 2, it is characterized in that,
The prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is coordinate of the binocular image reference point in right view, and K is
Camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor rotary flat
Move matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus, x0 and y0
For principal point coordinate;
The prediction camera parameter, pre- of the second right depth map of prediction based on each training sample, each training sample
If camera imaging formula and preset second sample mode, the second right depth map of prediction is handled, each training is obtained
The step of second prediction right view of sample, comprising:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera imaging
In formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second right view of prediction is obtained
Figure.
4. according to the method described in claim 2, it is characterized in that,
The initial depth figure predicts network model are as follows: the network of view-based access control model geometry group VGG or U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, and each direction includes 2
Layer convolution, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;A weight is 0.85;It represents true
Right view,Represent prediction right view;Indicate that prediction right view is similar to the structure of true right view
Property;Indicate the absolute value error L1 of prediction right view and true right view;
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,Generation
The first derivative of the right depth map of table in y-direction,The first derivative of right view in the x direction is represented,Right view is represented to exist
First derivative on the direction y;I, j represent pixel coordinate;
Final third loss function is
5. the method according to claim 1, wherein
The first monocular view is left view, and first predetermined depth figure is the first left depth map of prediction;
It is described based on first predetermined depth figure, the camera parameter of the 2D image, preset camera imaging formula and default
The first sample mode first predetermined depth figure is handled, obtain the second monocular view the step of, comprising:
Based on first predetermined depth figure, the camera parameter of the 2D image, preset camera imaging formula and preset
One sample mode handles the left depth map of first prediction, obtains the first prediction right view;
It is described be based on the first monocular view and the second monocular view, generate 3D rendering the step of, comprising:
Based on the left view and the first prediction right view, 3D rendering is generated.
6. according to the method described in claim 5, it is characterized in that,
The camera parameter of the 2D image, comprising: the camera intrinsic parameter and rotation translation parameters of 2D image;
It is described based on the left depth map of first prediction, the camera parameter of the 2D image, preset camera imaging formula and pre-
If the first sample mode the step of right depth map of first prediction is handled, obtains the first prediction right view, comprising:
Bring the left depth map of the first prediction, the camera intrinsic parameter of 2D image and rotation translation parameters into the preset camera imaging
In formula, first prediction mapping point of the left view in right view is obtained;
According to first prediction mapping point of the left view in right view, left view is sampled, the first right view of prediction is obtained
Figure.
7. a kind of training method of depth map prediction network model characterized by comprising
Multiple and different 3D film sources of different cameral shooting are obtained as training sample;Wherein, each training sample includes left view
Figure and corresponding right view;
Right view input initial depth figure in preset quantity training sample is predicted into network model, it is pre- to obtain initial depth figure
Survey the second right depth map of prediction of each training sample of network model output;
The left view of preset quantity training sample and right view are inputted into initial camera parameter prediction network model, obtained initial
Camera parameter predicts the prediction camera parameter of each training sample of network model output;
It is the prediction camera parameter of the second right depth map of prediction, each training sample based on each training sample, preset
Camera imaging formula and preset second sample mode handle the second right depth map of prediction, obtain each training sample
Second prediction right view;
According to right view true in preset loss function, each training sample and its right view of corresponding second prediction
Figure calculates penalty values;
Judge whether initial depth figure prediction network model and initial camera parameter prediction network model restrain according to penalty values
To stabilization;
If converging to stabilization, the quantity of frequency of training is increased once, and judge whether to reach preset frequency of training;Such as
Fruit does not reach preset frequency of training, then returns to the right view by preset quantity training sample and input initial depth figure
It predicts network model, obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output
Step;If reaching preset frequency of training, current initial depth figure prediction network model is the depth map that training is completed
Predict network model;
If not converged to stable, the quantity of frequency of training is increased once, and it is pre- to adjust the current initial depth figure
The network parameter of network model and initial camera parameter prediction network model is surveyed, return is described will be in preset quantity training sample
Right view input depth map prediction network model to be trained, obtain each instruction of initial depth figure prediction network model output
The step of practicing the second right depth map of prediction of sample.
8. the method according to the description of claim 7 is characterized in that
The prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is coordinate of the binocular image reference point in right view, and K is
Camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor rotary flat
Move matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus, x0 and y0
For principal point coordinate;
The prediction camera parameter, pre- of the second right depth map of prediction based on each training sample, each training sample
If camera imaging formula and preset second sample mode, the second right depth map of prediction is handled, each training is obtained
The step of second prediction right view of sample, comprising:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera imaging
In formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second right view of prediction is obtained
Figure.
9. according to the method described in claim 8, it is characterized in that,
The initial depth figure predicts network model are as follows: the network of view-based access control model geometry group VGG or U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, and each direction includes 2
Layer convolution, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents true
Right view,Represent prediction right view;Indicate that prediction right view is similar to the structure of true right view
Property;Indicate the absolute value error L1 of prediction right view and true right view;
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,Generation
The first derivative of the right depth map of table in y-direction,The first derivative of right view in the x direction is represented,Right view is represented to exist
First derivative on the direction y;I, j represent pixel coordinate;
Final third loss function is
10. a kind of depth map prediction technique of image characterized by comprising
Obtain the first monocular view of depth map to be predicted;
By the first monocular view, it is input to trained depth map prediction network model in advance;The pre- survey grid of depth map
Network model are as follows: acquisition is trained based on any one of use claim 7~9 training method;The first monocular view is a left side
View or right view;
Obtain first predetermined depth figure of the depth map prediction network model output.
11. the device that a kind of two dimension 2D image is converted to three-dimensional 3D rendering, which is characterized in that described device includes:
First 2D image acquisition unit, for obtaining the two-dimentional 2D image to be converted for three-dimensional 3D rendering;
First 2D image input units, for using the 2D image as the first monocular view for being used to generate 3D rendering, input
Network model is predicted to trained depth map in advance;The depth map prediction network model is based on multiple and different 3D film sources
Sample predicts that network model and initial camera parameter prediction network model are trained acquisition to initial depth figure;Described first is single
Eye diagram is left view or right view;
First predetermined depth figure acquiring unit, for obtaining first predetermined depth figure of depth map prediction network model output;
Second monocular view obtaining unit, for based on first predetermined depth figure, the camera parameter of 2D image, preset phase
Machine imaging formula and preset first sample mode handle first predetermined depth figure, obtain the second monocular view;
The second monocular view is right view corresponding with the first monocular view or left view;
3D rendering generation unit generates 3D rendering for being based on the first monocular view and the second monocular view.
12. device according to claim 11, which is characterized in that described device includes depth map prediction network model training
Unit;The depth map predicts network model training unit, comprising:
Training sample obtains module, for obtaining multiple and different 3D film sources of different cameral shooting as training sample;Wherein,
Each training sample includes left view and corresponding right view;
The second right depth map of prediction obtains module, for the right view in preset quantity training sample to be inputted initial depth figure
It predicts network model, obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output;
Predict that camera parameter obtains module, for the left view of preset quantity training sample and right view to be inputted initial camera
Parameter prediction network model obtains the prediction camera ginseng of each training sample of initial camera parameter prediction network model output
Number;
Second prediction right view obtains module, for the right depth map of the second prediction, Ge Gexun based on each training sample
Prediction camera parameter, preset camera imaging formula and preset second sample mode for practicing sample, to the second right depth of prediction
Figure is handled, and the second prediction right view of each training sample is obtained;
Penalty values computing module, for according to right view true in preset loss function, each training sample and its
Corresponding second prediction right view calculates penalty values;
Judgment module is restrained, for judging initial depth figure prediction network model and initial camera parameter prediction net according to penalty values
Whether network model converges to stabilization;
Frequency of training judgment module, if increased the quantity of frequency of training once, and judge whether for converging to stabilization
Reach preset frequency of training;If not reaching preset frequency of training, triggers the right depth map of second prediction and obtain mould
Right view input initial depth figure in preset quantity training sample is predicted network model by block, obtains the prediction of initial depth figure
The second right depth map of prediction of each training sample of network model output;It is current if reaching preset frequency of training
Initial depth figure prediction network model is the depth map prediction network model that training is completed;
Network parameter adjusts module, if increased the quantity of frequency of training once, and adjust institute for not converged to stable
The network parameter for stating the current prediction of initial depth figure network model and initial camera parameter prediction network model triggers described the
The two right depth maps of prediction obtain module and the right view in preset quantity training sample are inputted the pre- survey grid of depth map to be trained
Network model obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
13. device according to claim 12, which is characterized in that
The prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is coordinate of the binocular image reference point in right view, and K is
Camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor rotary flat
Move matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus, x0 and y0
For principal point coordinate;
The second prediction right view obtains module, is specifically used for:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera imaging
In formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second right view of prediction is obtained
Figure.
14. device according to claim 12, which is characterized in that
The initial depth figure predicts network model are as follows: the network of view-based access control model geometry group VGG or U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, and each direction includes 2
Layer convolution, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents true
Right view,Represent prediction right view;Indicate that prediction right view is similar to the structure of true right view
Property;Indicate the absolute value error L1 of prediction right view and true right view;
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,Generation
The first derivative of the right depth map of table in y-direction,The first derivative of right view in the x direction is represented,Right view is represented to exist
First derivative on the direction y;I, j represent pixel coordinate;
Final third loss function is
15. device according to claim 11, which is characterized in that
The first monocular view is left view, and first predetermined depth figure is the first left depth map of prediction;
Second monocular view obtaining unit, comprising: the first prediction right view obtains module;
The first prediction right view obtains module, for the camera based on the left depth map of first prediction, the 2D image
Parameter, preset camera imaging formula and preset first sample mode handle the left depth map of first prediction, obtain
Obtain the first prediction right view;
The 3D rendering generation unit, is specifically used for: based on the left view and the first prediction right view, generating 3D figure
Picture.
16. device according to claim 15, which is characterized in that
The camera parameter of the 2D image, comprising: the camera intrinsic parameter and rotation translation parameters of 2D image;
The first prediction right view obtains module, is specifically used for:
Bring the left depth map of the first prediction, the camera intrinsic parameter of 2D image and rotation translation parameters into the preset camera imaging
In formula, first prediction mapping point of the left view in right view is obtained;
According to first prediction mapping point of the left view in right view, left view is sampled, the first right view of prediction is obtained
Figure.
17. a kind of training device of depth map prediction network model characterized by comprising
Training sample obtains module, for obtaining multiple and different 3D film sources of different cameral shooting as training sample;Wherein,
Each training sample includes left view and corresponding right view;
The second right depth map of prediction obtains module, for the right view in preset quantity training sample to be inputted initial depth figure
It predicts network model, obtains the second right depth map of prediction of each training sample of initial depth figure prediction network model output;
Predict that camera parameter obtains module, for the left view of preset quantity training sample and right view to be inputted initial camera
Parameter prediction network model obtains the prediction camera ginseng of each training sample of initial camera parameter prediction network model output
Number;
Second prediction right view obtains module, for the right depth map of the second prediction, Ge Gexun based on each training sample
Prediction camera parameter, preset camera imaging formula and preset second sample mode for practicing sample, to the second right depth of prediction
Figure is handled, and the second prediction right view of each training sample is obtained;
Penalty values computing module, for according to right view true in preset loss function, each training sample and its
Corresponding second prediction right view calculates penalty values;
Judgment module is restrained, for judging initial depth figure prediction network model and initial camera parameter prediction net according to penalty values
Whether network model converges to stabilization;
Frequency of training judgment module, if increased the quantity of frequency of training once, and judge whether for converging to stabilization
Reach preset frequency of training;If not reaching preset frequency of training, triggers the right depth map of second prediction and obtain mould
Right view input initial depth figure in preset quantity training sample is predicted network model by block, obtains the prediction of initial depth figure
The second right depth map of prediction of each training sample of network model output;It is current if reaching preset frequency of training
Initial depth figure prediction network model is the depth map prediction network model that training is completed;
Network parameter adjusts module, if increased the quantity of frequency of training once, and adjust institute for not converged to stable
The network parameter for stating initial depth figure prediction network model and initial camera parameter prediction network model triggers second prediction
Right depth map obtains module and the right view in preset quantity training sample is inputted to depth map prediction network model to be trained,
Obtain the second right depth map of prediction of each training sample of initial depth figure prediction network model output.
18. device according to claim 17, which is characterized in that
The prediction camera parameter includes: prediction camera intrinsic parameter and prediction rotation translation parameters;
The preset camera imaging formula are as follows:
Wherein ,~it is mapping operations;
Ps is coordinate of the binocular image reference point in left view, and Pt is coordinate of the binocular image reference point in right view, and K is
Camera Intrinsic Matrix, K-1For the inverse matrix of camera Intrinsic Matrix, Dt (Pt) is the depth of Pt this point, Tt→sFor rotary flat
Move matrix;The camera Intrinsic Matrix includes (fx, fy, x0, y0) 4 parameters, and wherein fx and fy is camera focus, x0 and y0
For principal point coordinate;
The second prediction right view obtains module, is specifically used for:
Bring the right depth map of the second prediction, prediction camera intrinsic parameter and prediction rotation translation parameters into the preset camera imaging
In formula, second prediction mapping point of the right view in left view is obtained;
According to second prediction mapping point of the right view in left view, left view is sampled, the second right view of prediction is obtained
Figure.
19. device according to claim 18, which is characterized in that
The initial depth figure predicts network model are as follows: the network of view-based access control model geometry group VGG or U-net network structure;
The initial camera parameter prediction network model includes: to be divided into both direction up and down after 5 layers of convolution, and each direction includes 2
Layer convolution, 1 layer of average pond and 1 layer of full FC layers of connection;
The preset loss function includes: SSIM+L1 loss function and First-order Gradient loss function;
Penalty values are obtained according to the SSIM+L1 loss function according to prediction right view and true right view
The SSIM+L1 loss function formula are as follows:
Wherein,Indicate penalty values;N expression takes N number of sample every time;R indicates right view;α weight is 0.85;It represents true
Right view,Represent prediction right view;Indicate that prediction right view is similar to the structure of true right view
Property;Indicate the absolute value error L1 of prediction right view and true right view;
According to right depth map and practical right view is predicted, according to the First-order Gradient loss function, penalty values are obtained
The First-order Gradient loss function formula are as follows:
Indicate penalty values,The first derivative of right depth map in the x direction is represented, N expression takes N number of sample every time,Generation
The first derivative of the right depth map of table in y-direction,The first derivative of right view in the x direction is represented,Right view is represented to exist
First derivative on the direction y;I, j represent pixel coordinate;
Final third loss function is
20. a kind of depth map prediction meanss of image, which is characterized in that described device includes:
First monocular view obtaining unit, for obtaining the first monocular view of depth map to be predicted;
First monocular view input unit, for by the first monocular view, being input to trained depth map prediction in advance
Network model;The depth map predicts network model are as follows: is trained based on any one of use claim 7~9 training method
It obtains;The first monocular view is left view or right view;
First depth map obtaining unit, for obtaining first predetermined depth figure of depth map prediction network model output.
21. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing
Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 1-6.
22. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing
Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes any method and step of claim 7-9.
23. a kind of electronic equipment, which is characterized in that including processor, communication interface, memory and communication bus, wherein processing
Device, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes method and step described in any one of claim 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910381527.6A CN110111244B (en) | 2019-05-08 | 2019-05-08 | Image conversion, depth map prediction and model training method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910381527.6A CN110111244B (en) | 2019-05-08 | 2019-05-08 | Image conversion, depth map prediction and model training method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110111244A true CN110111244A (en) | 2019-08-09 |
CN110111244B CN110111244B (en) | 2024-01-26 |
Family
ID=67488916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910381527.6A Active CN110111244B (en) | 2019-05-08 | 2019-05-08 | Image conversion, depth map prediction and model training method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110111244B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429501A (en) * | 2020-03-25 | 2020-07-17 | 贝壳技术有限公司 | Depth map prediction model generation method and device and depth map prediction method and device |
CN111445518A (en) * | 2020-03-25 | 2020-07-24 | 贝壳技术有限公司 | Image conversion method and device, depth map prediction method and device |
CN112468828A (en) * | 2020-11-25 | 2021-03-09 | 深圳大学 | Code rate allocation method and device for panoramic video, mobile terminal and storage medium |
CN112530003A (en) * | 2020-12-11 | 2021-03-19 | 北京奇艺世纪科技有限公司 | Three-dimensional human hand reconstruction method and device and electronic equipment |
CN116740158A (en) * | 2023-08-14 | 2023-09-12 | 小米汽车科技有限公司 | Image depth determining method, device and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106412560A (en) * | 2016-09-28 | 2017-02-15 | 湖南优象科技有限公司 | Three-dimensional image generating method based on depth map |
WO2018046964A1 (en) * | 2016-09-12 | 2018-03-15 | Ucl Business Plc | Predicting depth from image data using a statistical model |
CN109255831A (en) * | 2018-09-21 | 2019-01-22 | 南京大学 | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate |
-
2019
- 2019-05-08 CN CN201910381527.6A patent/CN110111244B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018046964A1 (en) * | 2016-09-12 | 2018-03-15 | Ucl Business Plc | Predicting depth from image data using a statistical model |
CN106412560A (en) * | 2016-09-28 | 2017-02-15 | 湖南优象科技有限公司 | Three-dimensional image generating method based on depth map |
CN109255831A (en) * | 2018-09-21 | 2019-01-22 | 南京大学 | The method that single-view face three-dimensional reconstruction and texture based on multi-task learning generate |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111429501A (en) * | 2020-03-25 | 2020-07-17 | 贝壳技术有限公司 | Depth map prediction model generation method and device and depth map prediction method and device |
CN111445518A (en) * | 2020-03-25 | 2020-07-24 | 贝壳技术有限公司 | Image conversion method and device, depth map prediction method and device |
CN111445518B (en) * | 2020-03-25 | 2023-04-18 | 如你所视(北京)科技有限公司 | Image conversion method and device, depth map prediction method and device |
CN112468828A (en) * | 2020-11-25 | 2021-03-09 | 深圳大学 | Code rate allocation method and device for panoramic video, mobile terminal and storage medium |
CN112468828B (en) * | 2020-11-25 | 2022-06-17 | 深圳大学 | Code rate distribution method and device for panoramic video, mobile terminal and storage medium |
CN112530003A (en) * | 2020-12-11 | 2021-03-19 | 北京奇艺世纪科技有限公司 | Three-dimensional human hand reconstruction method and device and electronic equipment |
CN112530003B (en) * | 2020-12-11 | 2023-10-27 | 北京奇艺世纪科技有限公司 | Three-dimensional human hand reconstruction method and device and electronic equipment |
CN116740158A (en) * | 2023-08-14 | 2023-09-12 | 小米汽车科技有限公司 | Image depth determining method, device and storage medium |
CN116740158B (en) * | 2023-08-14 | 2023-12-05 | 小米汽车科技有限公司 | Image depth determining method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110111244B (en) | 2024-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111244A (en) | Image conversion, depth map prediction and model training method, device and electronic equipment | |
JP7392227B2 (en) | Feature pyramid warping for video frame interpolation | |
US20180063504A1 (en) | Selective culling of multi-dimensional data sets | |
CN110599395B (en) | Target image generation method, device, server and storage medium | |
JP2019534606A (en) | Method and apparatus for reconstructing a point cloud representing a scene using light field data | |
EP3511909A1 (en) | Image processing method and device for projecting image of virtual reality content | |
US11871127B2 (en) | High-speed video from camera arrays | |
CN113034380A (en) | Video space-time super-resolution method and device based on improved deformable convolution correction | |
Li et al. | PolarGlobe: A web-wide virtual globe system for visualizing multidimensional, time-varying, big climate data | |
CN105493501A (en) | Virtual video camera | |
CN109934764A (en) | Processing method, device, terminal, server and the storage medium of panoramic video file | |
JP2018109958A (en) | Method and apparatus for encoding signal transporting data to reconfigure sparse matrix | |
CN109120869A (en) | Double light image integration methods, integration equipment and unmanned plane | |
CN111667438B (en) | Video reconstruction method, system, device and computer readable storage medium | |
CN109934307A (en) | Disparity map prediction model training method, prediction technique, device and electronic equipment | |
CN113852829A (en) | Method and device for encapsulating and decapsulating point cloud media file and storage medium | |
CN115359173A (en) | Virtual multi-view video generation method and device, electronic equipment and storage medium | |
CN111243085A (en) | Training method and device for image reconstruction network model and electronic equipment | |
CN110324585B (en) | SLAM system implementation method based on high-speed mobile platform | |
CN107707830A (en) | Panoramic video based on one-way communication plays camera system | |
CN112785634A (en) | Computer device and synthetic depth map generation method | |
CN115272667A (en) | Farmland image segmentation model training method and device, electronic equipment and medium | |
JP5086120B2 (en) | Depth information acquisition method, depth information acquisition device, program, and recording medium | |
CN115209185A (en) | Video frame insertion method and device and readable storage medium | |
JP2011210232A (en) | Image conversion device, image generation system, image conversion method, and image generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |