CN110223382B - Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning - Google Patents

Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning Download PDF

Info

Publication number
CN110223382B
CN110223382B CN201910509328.9A CN201910509328A CN110223382B CN 110223382 B CN110223382 B CN 110223382B CN 201910509328 A CN201910509328 A CN 201910509328A CN 110223382 B CN110223382 B CN 110223382B
Authority
CN
China
Prior art keywords
viewpoint
camera
dimensional model
coordinates
free viewpoint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910509328.9A
Other languages
Chinese (zh)
Other versions
CN110223382A (en
Inventor
杨路
李佑华
杨经纶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201910509328.9A priority Critical patent/CN110223382B/en
Publication of CN110223382A publication Critical patent/CN110223382A/en
Application granted granted Critical
Publication of CN110223382B publication Critical patent/CN110223382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Abstract

The invention discloses a single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning, which comprises the following steps of: generating a training sample; acquiring high-level semantics of the picture by utilizing a feature extraction network; image semantic decoupling is converted into output viewpoint-independent three-dimensional model point cloud and camera viewpoint parameters through a decoupling network; reconstructing a viewpoint-independent three-dimensional model; estimating a camera viewpoint and generating a free viewpoint; generating a free viewpoint three-dimensional model; and deep learning model training. The method can simply and efficiently reconstruct the three-dimensional model of the free viewpoint from the single-frame image, improves the generalization of the model and widens the application range.

Description

Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning
Technical Field
The invention relates to the field of three-dimensional model reconstruction, in particular to a single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning.
Background
The free viewpoint three-dimensional model is based on a common three-dimensional model, allows people to have the same stereoscopic impression when the watching angles are switched, and can provide a more real and natural visual environment for stereoscopic multimedia. Due to the complexity of the three-dimensional model, the traditional method has high cost for generating the free viewpoint three-dimensional model, needs workers to manually render under different viewpoints to generate more three-dimensional models with different viewpoints, and has low efficiency and complex operation. How to simply and efficiently generate the free viewpoint three-dimensional model is always a research hotspot of researchers, and has great application potential.
The viewpoint-independent three-dimensional model can be regarded as a special free viewpoint three-dimensional model under the initial viewpoint, and the model shape is the same between the two models, but the viewpoint difference exists. The three-dimensional model of the free viewpoint may be generated by performing a viewpoint transformation on the viewpoint-independent three-dimensional model. The viewpoint-independent three-dimensional model has wide application prospects in the fields of attitude estimation, object tracking, target detection and the like. In the three-dimensional object posture estimation, a researcher needs to perform characteristic adaptation on a pre-established viewpoint-independent three-dimensional model and a two-dimensional contour in an image so as to realize posture estimation; in the three-dimensional object tracking detection, the situation that the viewpoint of an object is changed obviously often exists, and at the moment, a camera is required to track the three-dimensional motion of the object independently of the viewpoint, so that the feature extraction and the result matching are conveniently and efficiently carried out.
At present, deep learning has achieved great success in the field of single-frame image three-dimensional reconstruction. A researcher can easily complete the full extraction of the prior knowledge of shape semantics, viewpoint semantics and the like of a single-frame image by utilizing the strong feature extraction capability of the convolutional neural network, so that high-level abstract semantic features with strong generalization capability are obtained, the high-level abstract semantic features are converted into geometric parameters with specific significance through certain mapping, and the reconstruction of a three-dimensional model is guided. However, many studies at present always bind the shape of the three-dimensional model and the learning of the viewpoint together, and the generated three-dimensional model is only suitable for a single camera viewpoint, lacks certain changeability, cannot be flexibly changed along with viewpoint changes, and is limited in practical application. How to widen the application range of the generation method and improve the generation efficiency of the three-dimensional model is a problem which is puzzling.
Disclosure of Invention
The invention aims to solve the technical problem of providing a method for reconstructing a three-dimensional model of a free viewpoint of a single-frame image based on deep learning, which can simply and efficiently reconstruct the three-dimensional model of the free viewpoint from the single-frame image.
In order to solve the technical problems, the invention adopts the technical scheme that:
a single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning comprises the following steps:
the method comprises the following steps: sampling and rendering the CAD model to generate a single-frame image of the initial viewpoint real shape point cloud and different viewpoints at different distances;
step two: gradually acquiring high-level semantics of the image through deepening of a feature extraction network;
step three: converting the high-level semantics of the image through a decoupling network, and outputting point cloud coordinates of the viewpoint-independent three-dimensional model and camera viewpoint parameters;
step four: correcting the point cloud coordinates of the viewpoint-independent three-dimensional model output by the decoupling network, and performing three-dimensional shape reconstruction by triangular plate fitting to obtain a viewpoint-independent three-dimensional model;
step five: camera viewpoint parameters output by the decoupling network are subjected to homogeneous transformation to obtain camera viewpoints, and transformation is performed on the basis to generate free viewpoints;
step six: multiplying the free viewpoint by the viewpoint-independent three-dimensional model to obtain a free viewpoint three-dimensional model;
step seven: inputting the training sample into a neural network for automatic training, gradually updating network parameters, and optimizing a free viewpoint three-dimensional model to obtain an optimal result.
Further, in the step one, OpenGL is adopted to sample and render the CAD model to generate a training sample.
Further, in step two, ResNet is used as a feature extraction network, that is, feature extraction is performed on the input image of each training sample by using the following formula:
Figure BDA0002092932940000031
wherein N is a positive integer;
Figure BDA0002092932940000032
representing semantic information generated by a jth image in an ith CAD training sample in the nth class; ResNet represents a feature extraction network;
Figure BDA0002092932940000033
representing the jth image in the ith CAD training sample in the nth class.
Further, the third step is specifically:
coupling and converting the extracted high-level semantics by using a neural network global average pooling layer and a full connection layer to the convolved feature diagram to obtain viewpoint-independent three-dimensional model point cloud and camera viewpoint parameters; the camera viewpoint parameters include: the camera attitude is represented by euler angles including a pitch angle pitch (γ), a roll angle roll (β), and an yaw angle yaw (α); the camera coordinates are determined by the coordinates t of the camera in the initial coordinate systemx,ty,tzAnd (4) showing.
Further, the fourth step is specifically:
correcting the point cloud coordinates output by the decoupling network, and fitting the point cloud coordinates which are densely distributed in a piece by a triangular plate to form a three-dimensional curved surface model irrelevant to a viewpoint;
correcting the negative value in the point cloud coordinate required for the viewpoint-independent three-dimensional reconstruction by the following formula:
Figure BDA0002092932940000034
wherein, Y represents the output result,
Figure BDA0002092932940000035
indicating the final output result, the ReLU function indicates a positive-valued modified response unit.
Further, the fifth step is specifically:
camera attitude and coordinates are obtained by performing homogeneous transformation on camera viewpoint parameters output by the decoupling network; calculating three Euler angles including a pitch angle pitch (gamma), a roll angle (beta) and an yaw angle yaw (alpha) to obtain a rotation matrix which represents the posture of the camera; t is tx,ty,tzRepresenting the distance of the camera from the CAD model under the initial coordinate system, and representing the coordinates of the camera; obtaining the coordinates of the CAD model under a camera coordinate system through homogeneous transformation, wherein the coordinates are as follows:
Figure BDA0002092932940000041
t=(tx,ty,tz)T
Figure BDA0002092932940000042
wherein x, y, z represent fixed coordinates of the CAD model; x ', y ', z ' represent CAD model coordinates in a camera coordinate system; r is a rotation matrix calculated from Euler angles including pitch (γ), roll (β) and yaw (α)Obtaining, representing a camera pose; t is tx,ty,tzRepresenting the distance of the camera from the CAD model under the initial coordinate system, and representing the coordinates of the camera; t is a posture transformation matrix which represents a camera viewpoint and comprises a posture and coordinates;
generating a free viewpoint on the basis of the estimated camera viewpoint; taking the CAD model as a sphere center, changing the coordinate of the camera to move on the sphere, adjusting the posture of the camera to be directed at the CAD model, recording R 'and t' at the moment, and obtaining a free viewpoint according to the following formula:
Figure BDA0002092932940000043
Figure BDA0002092932940000044
t'=(x,y,z)
Figure BDA0002092932940000045
where T ', i.e., (x, y, z), represents the coordinates of the free viewpoint, R ' represents the pose of the camera at that time, and T ' represents the pose transformation matrix at that time, representing the pose and coordinates of the free viewpoint.
Further, the sixth step is:
multiplying the learned viewpoint-independent three-dimensional model by the free viewpoint to obtain a three-dimensional model of the free viewpoint by the following formula:
Figure BDA0002092932940000051
wherein the Model isiRepresenting viewpoint-independent three-dimensional models, modelscRepresenting a free viewpoint three-dimensional model.
Further, the seventh step is specifically:
training a deep learning model through the weighted sum of the chamfering distance between the three-dimensional model predicted by the network and the real free viewpoint three-dimensional model and the earth moving distance; the following formula:
Figure BDA0002092932940000052
Figure BDA0002092932940000053
Loss=λ1lossEMD2lossCD
therein, lossEMDAnd lossCDRespectively representing chamfering distance and earth moving distance loss between the three-dimensional model predicted by the network and the real free viewpoint three-dimensional model; lambda [ alpha ]1,λ2Representing a loss weight; p represents a three-dimensional model of network prediction; q represents a real free viewpoint three-dimensional model; i | · | |2 represents a two-norm; f (x) represents an equivalence map.
Compared with the prior art, the invention has the beneficial effects that: 1) based on the deep learning neural network, the three-dimensional reconstruction task is completed by training and learning, the three-dimensional reconstruction efficiency is improved, and the operation difficulty is reduced; 2) the neural network decoupling method is utilized to decouple the three-dimensional model shape reconstruction and the viewpoint learning, so that the viewpoint-independent three-dimensional model and the camera viewpoint estimation can be completed, and the function is strong; 3) on the basis, the learned camera viewpoint can be converted into a free viewpoint, and the three-dimensional model under the free viewpoint is obtained by multiplying the three-dimensional model independent of the viewpoint, so that the application range is wide.
Drawings
FIG. 1 is a flow chart of a method for reconstructing a free viewpoint three-dimensional model of a single frame image based on deep learning;
FIG. 2 is a schematic diagram of training sample generation;
FIG. 3 is a data transmission diagram of a feature extraction network;
FIG. 4 is a graph of image semantic decoupling;
fig. 5 is a schematic diagram of a free viewpoint three-dimensional model reconstruction process.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. According to the method, through a deep learning neural network, the viewpoint-independent three-dimensional model reconstruction in the three-dimensional reconstruction is decoupled from a camera viewpoint estimation task. On one hand, the viewpoint-independent three-dimensional model does not change along with the change of the camera viewpoint, and has remarkable mobility and adaptability; on the other hand, a free viewpoint can be derived from the estimated camera viewpoint and multiplied by the viewpoint-independent three-dimensional model to obtain a free viewpoint three-dimensional model, so that the application range of the method is expanded.
It is worth noting that the neural network decoupling of the present invention does not adopt any mandatory measures, but is automatically guided and completed through the physical computation that the viewpoint independent model and the free viewpoint jointly generate the free viewpoint three-dimensional model, which strongly proves that the neural network itself can be guided and normatively learned through the physical computation, and does not need to adopt a strong supervision mode to learn in a black box manner, so that the neural network decoupling has certain interpretability and strong persuasiveness.
The invention can be realized on Windows and Linux platforms, and the programming language can also be selected and can be realized by Python.
As shown in fig. 1, the method for reconstructing a free viewpoint three-dimensional model of a single frame image based on deep learning includes the following steps:
the method comprises the following steps: generating training samples
And using OpenGL to sample and render the CAD model to generate training samples for neural network training. Of course, the method for generating the training samples is not limited to OpenGL, and any method capable of achieving the same technical effect may be used for generating the training samples.
A CAD data set is selected. The ModelNet data set provided by Princeton is widely used in the fields of computer vision, computer graphics, robots and cognitive science, the coverage content is comprehensive, and the model object is excellent. The ModelNet has 662 target classes, 127915 CAD, and ten classes of over-oriented data, including three subsets. The present embodiment selects a ModelNet40 subset that contains 40 classes of CAD models.
And generating a single-frame image. Using the camera simulation function of OpenGL, a three-dimensional object is placed at a proper position in a scene, the camera is adjusted to different angles and different distances, so that the three-dimensional object is projected on a two-dimensional film, a view port of 224 × 224 is fixedly used to obtain images after washing, 24 images are generated by each CAD model, and the camera angle and distance corresponding to each image are recorded at the same time, as shown in the right side of fig. 2.
And sampling the real shape of the CAD model. In the embodiment, OpenGL is used to fix the CAD model at a proper position, and point cloud sampling is performed on the CAD model to obtain the real shape of the model, and 4096 points are collected in total, as shown in the left side of fig. 2.
And dividing a training set and a testing set. For convenience of training, the obtained sample data set is disordered, and each category is extracted 4/5 to form a training data set; the remaining 1/5 data are stored separately for each category for testing the model effect of each category.
Step two: high level semantic extraction of images
And performing feature extraction on the input image of each training sample to obtain the high-level semantic features of the input image. Because each CAD model has 24 input images with different angles and distances, the 24 images of each sample are extracted by using the same feature extraction model respectively. In this embodiment, ResNet is used as the feature extraction network, but in other embodiments, feature extraction networks with other numbers of layers may be used. While using networks of different depths, including ResNe, for three-dimensional models of different complexity51And ResNe101
The input picture of each training sample may be feature extracted by:
Figure BDA0002092932940000081
wherein N is a positive integer, and in this embodiment, N is 40, and represents 40 training data categories in the ModelNet40 subset;
Figure BDA0002092932940000082
representing semantic information generated by a jth image in an ith CAD training sample in the nth class, wherein the maximum value of j is 24; ResNet represents a feature extraction network;
Figure BDA0002092932940000083
representing the jth image in the ith CAD training sample in the nth class.
The core of ResNet () consists of a short-circuit connection (short) and a building block (building block). All the feature extraction networks are divided into 5 parts, and the component conditions are as follows: conv1 is a common convolutional layer; conv2_ x, conv3_ x, conv4_ x and conv5_ x are building blocks (building blocks) consisting of convolutional layers and short-circuit connections (shorts), as shown on the left side of fig. 4.
When a short circuit connection (shortcut) participates in a building block (building block), two situations are considered for judging whether the number of channels of an input feature graph is the same as that of channels of an output feature graph. The relationship between the input and the output is as follows:
Figure BDA0002092932940000084
wherein x represents the input feature map, F (x) represents the output feature map, c represents the number of channels of the feature map, and W is a convolution operation for adjusting the number of channels of the feature map.
Step three: high-level semantic decoupling of images
And coupling and converting the extracted high-level semantics by using a neural network global average pooling layer and a full connection layer to the convolved feature diagram to obtain viewpoint-independent three-dimensional model point cloud and camera viewpoint parameters. The camera viewpoint parameters include: a camera attitude, represented by euler angles (pitch (γ), roll (β), yaw (α)); position of camera, by coordinates t of camera in initial coordinate systemx,ty,tz(CAD model is origin) representation;
in the global average pooling, a global semantic average value, namely a single element output, is obtained after the feature map of one channel is subjected to global average pooling. The relationship between the input and the output of the global average pooling layer is as follows:
GAP(i)=mean(conv(i))
GAP (i) is the output of the global average pooling layer, conv (i) is an input feature graph, mean () represents the average value of the global area of input data, and i represents the number of input and output channels;
and in the full-connection layer, the output of the global average pooling is used as the input of the full-connection layer, and the point cloud coordinates of the free viewpoint three-dimensional model and the total number of the camera viewpoint estimation parameters are matched by setting the number of neurons of the full-connection layer. The input and output of the fully connected layer are related by the following formula:
Figure BDA0002092932940000091
wherein X represents input data, and i represents the number of input channels; y represents the output data, and j represents the number of output channels, namely the total number of the three-dimensional model point clouds and the camera viewpoint parameters.
Step four: viewpoint-independent three-dimensional model reconstruction
And correcting the point cloud coordinates output by the decoupling network, and fitting the densely distributed point clouds in a triangular plate to form a continuous, accurate and good-state three-dimensional curved surface model unrelated to the viewpoint.
The negative values in the point cloud coordinates needed for the viewpoint-independent reconstruction are corrected by:
Figure BDA0002092932940000092
wherein, Y represents the output result,
Figure BDA0002092932940000093
indicating the final output result, the ReLU function indicates a positive-valued modified response unit.
Step five: camera viewpoint estimation and free viewpoint generation
To decoupling networkAnd obtaining the camera posture and the coordinates through homogeneous transformation of the output camera viewpoint parameters. Calculating three Euler angles including a pitch angle pitch (gamma), a roll angle (beta) and an yaw angle yaw (alpha) to obtain a rotation matrix which represents the posture of the camera; t is tx,ty,tzRepresenting the distance of the camera from the CAD model under the initial coordinate system, and representing the coordinates of the camera; the coordinates of the CAD model under the camera coordinate system can be obtained through homogeneous transformation, and the following formula is as follows:
Figure BDA0002092932940000101
t=(tx,ty,tz)T
Figure BDA0002092932940000102
wherein x, y, z represent the fixed position of the CAD model; x ', y ', z ' represent CAD model coordinates in a camera coordinate system; r is a rotation matrix, is obtained by calculation of Euler angles including a pitch angle pitch (gamma), a roll angle (beta) and an yaw angle yaw (alpha), and represents the posture of the camera; t is tx,ty,tzRepresenting the distance of the camera from the CAD model under the initial coordinate system, and representing the coordinates of the camera; t is a pose transformation matrix representing the camera viewpoint (including pose and coordinates).
Based on the estimated camera viewpoint, a free viewpoint is generated. Taking the CAD model as a sphere center, changing the position of the camera to move on the sphere, adjusting the posture of the camera to the orientation to align with the CAD model, recording R 'and t' at the moment, and obtaining a free viewpoint according to the following formula:
Figure BDA0002092932940000103
Figure BDA0002092932940000104
t'=(x,y,z)
Figure BDA0002092932940000105
where T ', i.e., (x, y, z), represents the position of the free viewpoint, R ' represents the pose of the camera at this time, and T ' represents the pose transformation matrix at this time, representing the free viewpoint (including the pose and coordinates).
Step six: free viewpoint three-dimensional model generation
The viewpoint-independent three-dimensional model and the free viewpoint three-dimensional model only have a viewpoint difference, namely the camera posture and the position are different, but the shape information of the three-dimensional models is the same. Multiplying the learned viewpoint-independent three-dimensional model by the free viewpoint by the following formula to obtain a three-dimensional model of the free viewpoint:
Figure BDA0002092932940000111
wherein the Model isiRepresenting viewpoint-independent three-dimensional models, modelscRepresenting a free viewpoint three-dimensional model.
Step seven: deep learning model training
Training a deep learning model through the weighted sum of the chamfering distance between the three-dimensional model predicted by the network and the real free viewpoint three-dimensional model and the earth moving distance; the following formula:
Figure BDA0002092932940000112
Figure BDA0002092932940000113
Loss=λ1lossEMD2lossCD
therein, lossEMDAnd lossCDRespectively representing chamfering distance and earth moving distance loss between the three-dimensional model predicted by the network and the three-dimensional model of the real free viewpointLosing; lambda [ alpha ]1,λ2Representing a loss weight; p represents a three-dimensional model of network prediction; q represents a real free viewpoint three-dimensional model; i | · | |2 represents a two-norm; f (x) represents an equivalence map.
The method for reconstructing the single-frame image free viewpoint three-dimensional model based on the deep learning is different from the conventional three-dimensional reconstruction method, and the three-dimensional model can be automatically recovered from a single image without manual operation. More particularly, the invention separates the generation of the viewpoint-independent three-dimensional model from the estimation of the camera viewpoint and simultaneously generates the free viewpoint three-dimensional model. The viewpoint-independent three-dimensional model can provide convenience for the fields of attitude estimation, tracking detection and the like; the free viewpoint three-dimensional model can be used for expanding a three-dimensional data set, the data cost is reduced, and the three-dimensional reconstruction working efficiency is improved. Generally, compared with the traditional work, the invention provides a more flexible three-dimensional model reconstruction method, which improves the generalization of the model and widens the application range on the premise of ensuring the basic reconstruction task.

Claims (7)

1. A single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning is characterized by comprising the following steps:
the method comprises the following steps: sampling and rendering the CAD model to generate a single-frame image of the initial viewpoint real shape point cloud and different viewpoints at different distances;
step two: gradually acquiring high-level semantics of the image through deepening of a feature extraction network;
step three: converting the high-level semantics of the image through a decoupling network, and outputting point cloud coordinates of the viewpoint-independent three-dimensional model and camera viewpoint parameters;
step four: correcting the point cloud coordinates of the viewpoint-independent three-dimensional model output by the decoupling network, and performing three-dimensional shape reconstruction by triangular plate fitting to obtain a viewpoint-independent three-dimensional model;
step five: camera viewpoint parameters output by the decoupling network are subjected to homogeneous transformation to obtain camera viewpoints, and transformation is performed on the basis to generate free viewpoints;
step six: multiplying the free viewpoint by the viewpoint-independent three-dimensional model to obtain a free viewpoint three-dimensional model;
step seven: inputting the training sample into a neural network for automatic training, gradually updating network parameters, and optimizing a free viewpoint three-dimensional model to obtain an optimal result.
2. The method for reconstructing the free viewpoint three-dimensional model of single frame image based on deep learning of claim 1, wherein in the first step, the CAD model is sampled and rendered by OpenGL to generate training samples.
3. The method for reconstructing the free viewpoint three-dimensional model of the single frame image based on the deep learning of claim 1, wherein in the second step, ResNet is used as the feature extraction network, that is, the feature extraction is performed on the input image of each training sample by using the following formula:
Figure FDA0002711277460000011
wherein N is a positive integer;
Figure FDA0002711277460000012
representing semantic information generated by a jth image in an ith CAD training sample in the nth class; ResNet represents a feature extraction network;
Figure FDA0002711277460000013
representing the jth image in the ith CAD training sample in the nth class.
4. The method for reconstructing the free viewpoint three-dimensional model of the single frame image based on the deep learning as claimed in claim 1, wherein the third step is specifically as follows:
coupling and converting the extracted high-level semantics by using the global average pooling of the neural network and the characteristics of the convoluted full-connection layer pair diagram to obtain the viewpoint-independent three-dimensional model point cloud and the camera viewpoint parameters(ii) a The camera viewpoint parameters include: the camera attitude is represented by Euler angles including a pitch angle gamma, a roll angle beta and an aircraft yaw angle alpha; the camera coordinates are determined by the coordinates t of the camera in the initial coordinate systemx,ty,tzAnd (4) showing.
5. The method for reconstructing the free viewpoint three-dimensional model of the single frame image based on the deep learning as claimed in claim 1, wherein the step four is specifically as follows:
correcting the point cloud coordinates output by the decoupling network, and fitting the point cloud coordinates which are densely distributed in a piece by a triangular plate to form a three-dimensional curved surface model irrelevant to a viewpoint;
correcting the negative value in the point cloud coordinate required for the viewpoint-independent three-dimensional reconstruction by the following formula:
Figure FDA0002711277460000021
wherein, Y represents the output result,
Figure FDA0002711277460000022
indicating the final output result, the ReLU function indicates a positive-valued modified response unit.
6. The method for reconstructing the free viewpoint three-dimensional model of the single frame image based on the deep learning as claimed in claim 1, wherein the step five is specifically as follows:
camera attitude and coordinates are obtained by performing homogeneous transformation on camera viewpoint parameters output by the decoupling network; calculating three Euler angles including a pitch angle gamma, a roll angle beta and an yaw angle alpha to obtain a rotation matrix, and representing the posture of the camera; t is tx,ty,tzRepresenting the distance of the camera from the CAD model under the initial coordinate system, and representing the coordinates of the camera; obtaining the coordinates of the CAD model under a camera coordinate system through homogeneous transformation, wherein the coordinates are as follows:
Figure FDA0002711277460000031
t=(tx,ty,tz)T
Figure FDA0002711277460000032
wherein x, y, z represent fixed coordinates of the CAD model; x ', y ', z ' represent CAD model coordinates in a camera coordinate system; r is a rotation matrix which is obtained by calculation of Euler angles including a pitch angle gamma, a roll angle beta and an aircraft yaw angle alpha and represents the posture of the camera; t is tx,ty,tzRepresenting the distance of the camera from the CAD model under the initial coordinate system, and representing the coordinates of the camera; t is a posture transformation matrix which represents a camera viewpoint and comprises a posture and coordinates;
generating a free viewpoint on the basis of the estimated camera viewpoint; taking the CAD model as a sphere center, changing the coordinate of the camera to move on the sphere, adjusting the posture of the camera to be directed at the CAD model, recording R 'and t' at the moment, and obtaining a free viewpoint according to the following formula:
Figure FDA0002711277460000033
Figure FDA0002711277460000034
t'=(x,y,z)
Figure FDA0002711277460000035
where T ', i.e., (x, y, z), represents the coordinates of the free viewpoint, R ' represents the pose of the camera at that time, and T ' represents the pose transformation matrix at that time, representing the pose and coordinates of the free viewpoint.
7. The method for reconstructing the free viewpoint three-dimensional model of the single frame image based on the deep learning as claimed in claim 1, wherein the sixth step is:
multiplying the learned viewpoint-independent three-dimensional model by the free viewpoint to obtain a three-dimensional model of the free viewpoint by the following formula:
Figure FDA0002711277460000041
wherein the Model isiRepresenting viewpoint-independent three-dimensional models, modelscRepresenting a free viewpoint three-dimensional model.
CN201910509328.9A 2019-06-13 2019-06-13 Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning Active CN110223382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910509328.9A CN110223382B (en) 2019-06-13 2019-06-13 Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910509328.9A CN110223382B (en) 2019-06-13 2019-06-13 Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning

Publications (2)

Publication Number Publication Date
CN110223382A CN110223382A (en) 2019-09-10
CN110223382B true CN110223382B (en) 2021-02-12

Family

ID=67816761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910509328.9A Active CN110223382B (en) 2019-06-13 2019-06-13 Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN110223382B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11375176B2 (en) * 2019-02-05 2022-06-28 Nvidia Corporation Few-shot viewpoint estimation
CN110675488B (en) * 2019-09-24 2023-02-28 电子科技大学 Method for constructing modeling system of creative three-dimensional voxel model based on deep learning
CN110798673B (en) * 2019-11-13 2021-03-19 南京大学 Free viewpoint video generation and interaction method based on deep convolutional neural network
CN113569761B (en) * 2021-07-30 2023-10-27 广西师范大学 Student viewpoint estimation method based on deep learning
CN115375884B (en) * 2022-08-03 2023-05-30 北京微视威信息科技有限公司 Free viewpoint synthesis model generation method, image drawing method and electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719286A (en) * 2009-12-09 2010-06-02 北京大学 Multiple viewpoints three-dimensional scene reconstructing method fusing single viewpoint scenario analysis and system thereof
CN106778687A (en) * 2017-01-16 2017-05-31 大连理工大学 Method for viewing points detecting based on local evaluation and global optimization

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100583127C (en) * 2008-01-14 2010-01-20 浙江大学 An identification method for movement by human bodies irrelevant with the viewpoint based on stencil matching
US9767598B2 (en) * 2012-05-31 2017-09-19 Microsoft Technology Licensing, Llc Smoothing and robust normal estimation for 3D point clouds
JP5295416B1 (en) * 2012-08-01 2013-09-18 ヤフー株式会社 Image processing apparatus, image processing method, and image processing program
US10389994B2 (en) * 2016-11-28 2019-08-20 Sony Corporation Decoder-centric UV codec for free-viewpoint video streaming
US11665308B2 (en) * 2017-01-31 2023-05-30 Tetavi, Ltd. System and method for rendering free viewpoint video for sport applications

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719286A (en) * 2009-12-09 2010-06-02 北京大学 Multiple viewpoints three-dimensional scene reconstructing method fusing single viewpoint scenario analysis and system thereof
CN106778687A (en) * 2017-01-16 2017-05-31 大连理工大学 Method for viewing points detecting based on local evaluation and global optimization

Also Published As

Publication number Publication date
CN110223382A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN110223382B (en) Single-frame image free viewpoint three-dimensional model reconstruction method based on deep learning
CN108416840B (en) Three-dimensional scene dense reconstruction method based on monocular camera
CN111462329B (en) Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
CN108921926B (en) End-to-end three-dimensional face reconstruction method based on single image
CN111968217B (en) SMPL parameter prediction and human body model generation method based on picture
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
CN111553949B (en) Positioning and grabbing method for irregular workpiece based on single-frame RGB-D image deep learning
CN111899328B (en) Point cloud three-dimensional reconstruction method based on RGB data and generation countermeasure network
CN109815847B (en) Visual SLAM method based on semantic constraint
CN111062326B (en) Self-supervision human body 3D gesture estimation network training method based on geometric driving
CN113822284B (en) RGBD image semantic segmentation method based on boundary attention
CN112767467B (en) Double-image depth estimation method based on self-supervision deep learning
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN113392584B (en) Visual navigation method based on deep reinforcement learning and direction estimation
CN111797692B (en) Depth image gesture estimation method based on semi-supervised learning
CN113313732A (en) Forward-looking scene depth estimation method based on self-supervision learning
US20220351463A1 (en) Method, computer device and storage medium for real-time urban scene reconstruction
CN110930306A (en) Depth map super-resolution reconstruction network construction method based on non-local perception
CN115147545A (en) Scene three-dimensional intelligent reconstruction system and method based on BIM and deep learning
CN113593043B (en) Point cloud three-dimensional reconstruction method and system based on generation countermeasure network
CN116188550A (en) Self-supervision depth vision odometer based on geometric constraint
CN117132651A (en) Three-dimensional human body posture estimation method integrating color image and depth image
CN112669452A (en) Object positioning method based on convolutional neural network multi-branch structure
CN114897955B (en) Depth completion method based on micro-geometric propagation
CN113592947B (en) Method for realizing visual odometer by semi-direct method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant