CN110223382A

CN110223382A - Single-frame images free view-point method for reconstructing three-dimensional model based on deep learning

Info

Publication number: CN110223382A
Application number: CN201910509328.9A
Authority: CN
Inventors: 杨路; 李佑华; 杨经纶
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-06-13
Filing date: 2019-06-13
Publication date: 2019-09-10
Anticipated expiration: 2039-06-13
Also published as: CN110223382B

Abstract

The invention discloses a kind of single-frame images free view-point method for reconstructing three-dimensional model based on deep learning, comprising the following steps: training sample generates；Picture high-level semantic is obtained using feature extraction network；The unrelated threedimensional model point cloud of output viewpoint and camera view parameter are converted by image, semantic decoupling by Decoupling network；The unrelated reconstructing three-dimensional model of viewpoint；Camera view estimation is generated with free view-point；Free view-point threedimensional model generates；And deep learning model training.The method of the present invention can simply and efficiently rebuild the threedimensional model of free view-point from single-frame images, improve the generalization of model, widened use scope.

Description

Single-frame images free view-point method for reconstructing three-dimensional model based on deep learning

Technical field

The present invention relates to reconstructing three-dimensional model field more particularly to a kind of single-frame images free view-points based on deep learning Method for reconstructing three-dimensional model.

Background technique

Free view-point threedimensional model is to allow people same when switching viewing angle on the basis of common threedimensional model Three-dimensional sense having the same, this can provide the visual environment of more true nature for three-dimensional multimedia.Due to three-dimensional mould The complexity of type itself, conventional method generation free view-point threedimensional model cost is higher, needs worker people under different points of view Work rendering generates the threedimensional model of more different points of view, and efficiency is lower, complicated for operation.How free view-point is simply and efficiently generated Threedimensional model always is the research hotspot of researcher, possesses huge application potential.

The unrelated threedimensional model of viewpoint can be regarded as a kind of special, free view-point threedimensional model under initial viewpoint, and two Mould shapes are identical between person, only there is viewpoint difference.It can be given birth to by carrying out viewpoint change to the unrelated threedimensional model of viewpoint At the threedimensional model of free view-point.The unrelated threedimensional model of viewpoint has extensively in Attitude estimation, object tracking, the fields such as target detection General application prospect.Such as in three-dimension object Attitude estimation, researcher need will the unrelated threedimensional model of viewpoint that pre-establish with Two-dimensional silhouette in image carries out feature adaptation, to realize Attitude estimation；In three-dimension object tracing detection, often there is The case where significant viewpoint of object changes needs video camera to move object dimensional and carries out the unrelated tracking of viewpoint, at this time convenient for efficient Ground carries out feature extraction and matches with result.

Currently, deep learning achieves great success in single-frame images three-dimensional reconstruction field.Researcher utilizes convolutional Neural The powerful ability in feature extraction of network can be easily accomplished to priori knowledges such as the shape semanteme of single-frame images and viewpoint semantemes It sufficiently extracts, and then obtains the higher level of abstraction semantic feature with powerful generalization ability, one is passed through to higher level of abstraction semantic feature Fixed mapping becomes the geometric parameter with certain sense, and then instructs reconstructing three-dimensional model.But many researchs are always by three at present The study of dimension module shape and viewpoint is bound together, and the threedimensional model of generation is only suitable for single camera view, is lacked certain Changeability can not follow viewpoint to change and flexibly change, more limit in practical applications.How this kind generation method is widened Use scope, improve the formation efficiency of threedimensional model, be one make us puzzlement the problem of.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of, and the single-frame images free view-point based on deep learning is three-dimensional Model reconstruction method can simply and efficiently rebuild the threedimensional model of free view-point from single-frame images.

In order to solve the above technical problems, the technical solution adopted by the present invention is that:

A kind of single-frame images free view-point method for reconstructing three-dimensional model based on deep learning, comprising the following steps:

Step 1: sampling CAD model and rendered, and it is different from different points of view to generate initial viewpoint true shape point cloud Single-frame images under；

Step 2: by the intensification of feature extraction network, the high-level semantic of image is gradually obtained；

Step 3: being converted image high-level semantic by Decoupling network, the point cloud coordinate of the unrelated threedimensional model of output viewpoint With camera view parameter；

Step 4: the unrelated threedimensional model point cloud coordinate of the viewpoint that Decoupling network is exported is fitted by amendment with triangular plate 3D shape reconstruction is carried out, the unrelated threedimensional model of viewpoint is obtained；

Step 5: obtaining camera view by homogeneous transformation to the camera view parameter of Decoupling network output, basic herein On converted, generate free view-point；

Step 6: free view-point threedimensional model unrelated with viewpoint is multiplied, free view-point threedimensional model is obtained；

Step 7: training sample input neural network is trained automatically, and progressive updating network parameter optimizes free view-point three Dimension module obtains optimal result.

Further, in step 1, sampling is carried out to CAD model using OpenGL and renders generation training sample.

Further, in step 2, using ResNet as feature extraction network, i.e., using following formula to each trained sample This input picture carries out feature extraction:

Wherein, N is positive integer；Indicate that jth is opened produced by image in i-th of CAD training sample in n-th of classification Semantic information；ResNet indicates feature extraction network；Indicate that jth is schemed in i-th of CAD training sample in n-th of classification Picture.

Further, the step 3 specifically:

The high-level semantic that will be extracted is averaged pondization and full articulamentum to the feature after convolution using the neural network overall situation Coupling conversion is illustrated, the unrelated threedimensional model point cloud of viewpoint and camera view parameter are obtained；The camera view parameter includes: camera posture It is indicated by Eulerian angles, including pitch angle pitch (γ), roll angle roll (β), angle of drift yaw (α)；Camera coordinates are existed by camera Coordinate t under initial coordinate system_x,t_y,t_zIt indicates.

Further, the step 4 specifically:

The point cloud coordinate of Decoupling network output is modified, then to the point cloud coordinate of dense distribution in blocks with triangular plate Fitting forms the unrelated three-dimensional surface model of viewpoint；

The negative value in point cloud coordinate required for the unrelated three-dimensional reconstruction of viewpoint is corrected by following formula:

Wherein, Y indicate output as a result,Indicate final output, ReLU function representation positive value amendment response unit.

Further, the step 5 specifically:

Camera posture and coordinate are obtained by homogeneous transformation to the camera view parameter of Decoupling network output；By three kinds of Eulers Spin matrix is calculated in angle, including pitch angle pitch (γ), roll angle roll (β), angle of drift yaw (α), indicates camera appearance State；t_x,t_y,t_zCamera is represented under initial coordinate system away from the distance of CAD model, indicates camera coordinates；It is obtained by homogeneous transformation Coordinate of the CAD model under camera coordinates system, such as following formula:

T=(t_x,t_y,t_z)^T

Wherein, x, y, z represent the fixed coordinates of CAD model；X', y', z' represent the seat of the CAD model under camera coordinates system Mark；R is spin matrix, includes that pitch angle pitch (γ), roll angle roll (β) and angle of drift yaw (α) are calculated by Eulerian angles It arrives, indicates camera posture；t_x,t_y,t_zCamera is represented under initial coordinate system away from the distance of CAD model, indicates camera coordinates；T is Posture changing matrix represent camera view, including posture and coordinate；

On the basis of the camera view of estimation, free view-point is generated；Using CAD model as the centre of sphere, change camera coordinates in ball It is moved on face, by camera pose adjustment to towards CAD model is directed at, records R' at this time, t' obtains free view-point, as follows Formula:

T'=(x, y, z)

Wherein, t' is the coordinate that (x, y, z) represents free view-point, and R' represents camera posture at this time, and T' is indicated at this time Posture changing matrix represent the posture and coordinate of free view-point.

Further, the step 6 are as follows:

The unrelated threedimensional model of the viewpoint learnt is multiplied with free view-point by following formula, obtains the three-dimensional mould of free view-point Type:

Wherein, Model_iIndicate the unrelated threedimensional model of viewpoint, Model_cIndicate free view-point threedimensional model.

Further, the step 7 specifically:

Pass through the chamfering distance and earth movement between the threedimensional model and true free view-point threedimensional model of neural network forecast The weighted sum of distance is to deep learning model training；Such as following formula:

Loss=λ₁loss_EMD+λ₂loss_CD

Wherein, loss_EMDWith loss_CDRespectively represent neural network forecast threedimensional model and true free view-point threedimensional model it Between chamfering distance and earth moving distance lose；λ₁, λ₂Represent loss weight；P represents the threedimensional model of neural network forecast；Q is represented True free view-point threedimensional model；| | | | 2 indicate two norms；F (x) represents equivalent mappings.

Compared with prior art, the beneficial effects of the present invention are: 1) being based on deep learning neural network, training study is completed Three-dimensional reconstruction task improves the efficiency of three-dimensional reconstruction, reduces operation difficulty；2) neural network decoupling method is utilized, by three Dimension module shape similarity metric and viewpoint study decouple, and can complete the unrelated threedimensional model of viewpoint and camera view is estimated, powerful； 3) camera view on this basis, learnt can be transformed to free view-point, and threedimensional model unrelated with viewpoint is multiplied to obtain certainly By the threedimensional model under viewpoint, application range is wider.

Detailed description of the invention

Fig. 1 is the flow chart of the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning；

Fig. 2 is that training sample generates schematic diagram；

Fig. 3 is the data transmission scheme of feature extraction network；

Fig. 4 is image, semantic decoupling figure；

Fig. 5 is free view-point reconstructing three-dimensional model process schematic.

Specific embodiment

The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.The present invention passes through depth Neural network is practised, the unrelated reconstructing three-dimensional model of viewpoint in three-dimensional reconstruction is decoupled with camera view estimation task.On the one hand, depending on The unrelated threedimensional model of point does not change with the variation of camera view, possesses significant migration and suitability；On the other hand, may be used Free view-point is derived by the camera view estimated, threedimensional model unrelated with viewpoint is multiplied, free view-point threedimensional model is obtained, Expand the use scope of this method.

It is worth noting that, neural network decoupling of the invention does not use any compulsory measure, but pass through viewpoint Independence model and free view-point generate this physical computing of free view-point threedimensional model jointly voluntarily to guide completion, this is effectively To demonstrate neural network itself can be guided and specification study by physical computing, not needed using supervising by force Mode carries out the study of black box formula, has certain interpretation and stronger convincingness.

The present invention can realize that programming language is also alternatively, can use on Windows and Linux platform Python is realized.

As shown in Figure 1, the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning includes the following steps:

Step 1: training sample is generated

Sampling is carried out to CAD model using OpenGL and renders generation training sample, is used for neural metwork training.Certainly, The method for generating training sample is not limited to OpenGL, and any method that can be realized identical technical effect may be used to generate Training sample.

Select CAD data collection.Using Princeton provide ModelNet data set, the data set computer vision, Content is comprehensive, and model object is excellent using extensively, covering for computer graphics, robot and cognitive science field.ModelNet Target classification in a total of 662, the data of 127915 CAD and the labeled direction direction of ten classes include three subsets.This Embodiment selects ModelNet40 subset, which includes the CAD model of 40 classifications.

Single-frame images generates.Using the camera analog functuion of OpenGL, three-dimension object is put into appropriate location in the scene, Camera is adjusted to different angle and different distance, is projected in three-dimension object on two-dimentional film, the fixed view for using 224*224 Mouthful, the image after being rinsed, each CAD model generates 24 images, while record the corresponding camera angle of every image and Distance, as shown in the right side of fig 2.

The sampling of CAD model true shape.The present embodiment utilizes OpenGL, CAD model is secured in place, to CAD Model carries out Points Sample, obtains model true shape, acquires 4096 points altogether, as shown in the left side Fig. 2.

Training set and test set divide.In order to facilitate training, the present embodiment upsets obtained sample data set, every one kind 4/5 training dataset Chou Qu not be collectively constituted；Remaining 1/5 data independent storage of all categories, for testing the mould of each type Type effect.

Step 2: image high-level semantic extracts

Feature extraction is carried out to the input picture of each training sample, obtains the high-level semantics features of input picture.Due to Each CAD model has 24 subtended angle degree to be made respectively from apart from different input pictures, therefore to 24 images of each sample The extraction of high-level semantic is carried out with identical Feature Selection Model.The present embodiment using ResNet as feature extraction network, when So, the feature extraction network of other numbers of plies can also be selected in other examples.It is directed to the three of differing complexity simultaneously Dimension module, using the network of different depth, including ResNe₅₁With ResNe₁₀₁。

Feature extraction can be carried out by input picture of the following formula to each training sample:

Wherein, N is positive integer, and N takes 40 in the present embodiment, indicates 40 training data classes in ModelNet40 subset Not；Indicate that jth opens semantic information caused by image, the maximum value of j in i-th of CAD training sample in n-th of classification It is 24；ResNet indicates feature extraction network；Indicate that jth opens image in i-th of CAD training sample in n-th of classification.

ResNet () core is short circuit connection (shortcut) and structure block (building block).All features It extracts network and is all divided into 5 parts, composition situation is respectively: conv1 is common convolutional layer；Conv2_x, conv3_x, conv4_ X, conv5_x are structure block (building block), connect (shortcut) with short circuit by convolutional layer and form, on the left of Fig. 4 It is shown.

Short circuit connection (shortcut) participate in composition structure block (building block) when, for input feature vector figure with it is defeated Whether characteristic pattern channel number is identical out, to be divided into two kinds of situations and consider.It inputs the following formula of relationship between output:

Wherein, x represents input feature vector figure, and F (x) represents output characteristic pattern, and c represents the number of active lanes of characteristic pattern, and W is convolution Operation, for adjusting the number of active lanes of characteristic pattern.

Step 3: image high-level semantic decoupling

The high-level semantic that will be extracted is averaged pondization and full articulamentum to the feature after convolution using the neural network overall situation Coupling conversion is illustrated, the unrelated threedimensional model point cloud of viewpoint and camera view parameter are obtained.The camera view parameter includes: camera appearance State is indicated by Eulerian angles (pitch angle pitch (γ), roll angle roll (β), angle of drift yaw (α))；Camera position is existed by camera Coordinate t under initial coordinate system_x,t_y,t_z(CAD model is origin) indicates；

In the average Chi Huazhong of the overall situation, the characteristic pattern overall situation in a channel be averaged Chi Huahou obtain one it is global it is semantic averagely Value, i.e. a single element output.The following formula of relationship between the input and output of the average pond layer of the overall situation:

GAP (i)=mean (conv (i))

Wherein, GAP (i) is global average pond layer output, and conv (i) is input feature vector figure, and mean () representative takes input The average value of data global area, i represent input and output channel number；

In full articulamentum, the output in global average pond is inputted as full articulamentum, by the way that full articulamentum is arranged Neuron number, the point cloud coordinate of matching free view-point threedimensional model and the total quantity of camera view estimation parameter.Full articulamentum Input with output the following formula of relationship:

Wherein, X represents input data, and i indicates input channel number；Y expression output data, j expression output channel number, i.e., three The total quantity of dimension module point cloud and camera view parameter.

Step 4: the unrelated reconstructing three-dimensional model of viewpoint

The point cloud coordinate of Decoupling network output is modified, it is then quasi- with triangular plate to the point cloud of dense distribution in blocks It closes, forms the unrelated three-dimensional surface model of the viewpoint of continuous, accurate, good state.

The negative value in point cloud coordinate required for the unrelated reconstruction of viewpoint is corrected by following formula:

Step 5: camera view estimation is generated with free view-point

Camera posture and coordinate are obtained by homogeneous transformation to the camera view parameter of Decoupling network output.By three kinds of Eulers Spin matrix is calculated in angle, including pitch angle pitch (γ), roll angle roll (β), angle of drift yaw (α), indicates camera appearance State；t_x,t_y,t_zCamera is represented under initial coordinate system away from the distance of CAD model, indicates camera coordinates；It can be obtained by homogeneous transformation To coordinate of the CAD model under camera coordinates system, such as following formula:

T=(t_x,t_y,t_z)^T

Wherein, x, y, z represent the fixation position of CAD model；X', y', z' represent the seat of the CAD model under camera coordinates system Mark；R is spin matrix, includes that pitch angle pitch (γ), roll angle roll (β) and angle of drift yaw (α) are calculated by Eulerian angles It arrives, indicates camera posture；t_x,t_y,t_zCamera is represented under initial coordinate system away from the distance of CAD model, indicates camera coordinates；T is Posture changing matrix represent camera view (including posture and coordinate).

On the basis of the camera view of estimation, free view-point is generated.Using CAD model as the centre of sphere, change camera position in ball It is moved on face, by camera pose adjustment to towards CAD model is directed at, records R' at this time, free view-point can be obtained in t', such as Following formula:

T'=(x, y, z)

Wherein, t' is the position that (x, y, z) represents free view-point, and R' represents camera posture at this time, and T' is indicated at this time Posture changing matrix represent free view-point (including posture and coordinate).

Step 6: free view-point threedimensional model generates

The unrelated threedimensional model of viewpoint and free view-point threedimensional model only exist the gap of viewpoint, i.e. camera posture and position Difference, but the shape information of threedimensional model is identical.It is regarded by following formula by the unrelated threedimensional model of the viewpoint learnt and freely Point is multiplied, and the threedimensional model of free view-point can be obtained:

Step 7: deep learning model training

Loss=λ₁loss_EMD+λ₂loss_CD

Single-frame images free view-point method for reconstructing three-dimensional model provided by the invention based on deep learning is different from previous Three-dimensional rebuilding method, the present invention can without manual operations, restore threedimensional model from single image automatically.It is more unique It is that the present invention separates the generation of the unrelated threedimensional model of viewpoint with camera view estimation, while generating free view-point three-dimensional mould Type.The unrelated threedimensional model of viewpoint can be Attitude estimation, and the fields such as tracing detection provide convenience；Free view-point threedimensional model is available In enlarging 3-D data set, data cost is reduced, promotes three-dimensional reconstruction working efficiency.Generally, opposite conventional operation, the present invention It provides the more flexible method for reconstructing three-dimensional model of one kind and improves the general of model under the premise of guaranteeing basic reconstruction tasks The property changed, has widened use scope.

Claims

1. a kind of single-frame images free view-point method for reconstructing three-dimensional model based on deep learning, which is characterized in that including following Step:

Step 1: sampling CAD model and rendered, and initial viewpoint true shape point cloud and different points of view different distance are generated Under single-frame images；

Step 3: being converted image high-level semantic by Decoupling network, the point cloud coordinate and phase of the unrelated threedimensional model of output viewpoint Machine viewpoint parameter；

Step 4: the unrelated threedimensional model point cloud coordinate of the viewpoint that Decoupling network is exported is fitted with triangular plate and is carried out by amendment 3D shape is rebuild, and the unrelated threedimensional model of viewpoint is obtained；

Step 5: obtaining camera view by homogeneous transformation to the camera view parameter of Decoupling network output, on this basis into Row transformation, generates free view-point；

Step 7: training sample input neural network is trained automatically, and progressive updating network parameter optimizes free view-point three-dimensional mould Type obtains optimal result.

2. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, in step 1, sampling is carried out to CAD model using OpenGL and renders generation training sample.

3. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, in step 2, using ResNet as feature extraction network, i.e., using following formula to the input picture of each training sample Carry out feature extraction:

Wherein, N is positive integer；Indicate that jth opens language caused by image in i-th of CAD training sample in n-th of classification Adopted information；ResNet indicates feature extraction network；Indicate that jth opens image in i-th of CAD training sample in n-th of classification.

4. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, the step 3 specifically:

The high-level semantic that will be extracted is averaged pondization and full articulamentum to the feature diagram after convolution using the neural network overall situation Coupling conversion, obtains the unrelated threedimensional model point cloud of viewpoint and camera view parameter；The camera view parameter includes: camera posture by Europe Angle, including pitch angle pitch (γ), roll angle roll (β), angle of drift yaw (α) is drawn to indicate；Camera coordinates are by camera initial Coordinate t under coordinate system_x,t_y,t_zIt indicates.

5. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, the step 4 specifically:

The point cloud coordinate of Decoupling network output is modified, it is then quasi- with triangular plate to the point cloud coordinate of dense distribution in blocks It closes, forms the unrelated three-dimensional surface model of viewpoint；

6. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, the step 5 specifically:

Camera posture and coordinate are obtained by homogeneous transformation to the camera view parameter of Decoupling network output；By three kinds of Eulerian angles, Including pitch angle pitch (γ), roll angle roll (β), angle of drift yaw (α), spin matrix is calculated, indicates camera posture； t_x,t_y,t_zCamera is represented under initial coordinate system away from the distance of CAD model, indicates camera coordinates；CAD is obtained by homogeneous transformation Coordinate of the model under camera coordinates system, such as following formula:

T=(t_x,t_y,t_z)^T

Wherein, x, y, z represent the fixed coordinates of CAD model；X', y', z' represent the CAD model coordinate under camera coordinates system；R is Spin matrix includes that pitch angle pitch (γ), roll angle roll (β) and angle of drift yaw (α) are calculated by Eulerian angles, is indicated Camera posture；t_x,t_y,t_zCamera is represented under initial coordinate system away from the distance of CAD model, indicates camera coordinates；T is that posture becomes Matrix is changed, camera view, including posture and coordinate are represented；

On the basis of the camera view of estimation, free view-point is generated；Using CAD model as the centre of sphere, change camera coordinates on spherical surface Movement records R' at this time by camera pose adjustment to towards CAD model is directed at, and t' obtains free view-point, such as following formula:

T'=(x, y, z)

Wherein, t' is the coordinate that (x, y, z) represents free view-point, and R' represents camera posture at this time, and T' indicates posture at this time Transformation matrix represents the posture and coordinate of free view-point.

7. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, the step 6 are as follows:

The unrelated threedimensional model of the viewpoint learnt is multiplied with free view-point by following formula, obtains the threedimensional model of free view-point:

8. the single-frame images free view-point method for reconstructing three-dimensional model based on deep learning as described in claim 1, feature It is, the step 7 specifically:

Pass through the chamfering distance and earth moving distance between the threedimensional model and true free view-point threedimensional model of neural network forecast Weighted sum to deep learning model training；Such as following formula:

Loss=λ₁loss_EMD+λ₂loss_CD

Wherein, loss_EMDWith loss_CDIt respectively represents between the threedimensional model of neural network forecast and true free view-point threedimensional model Chamfering distance is lost with earth moving distance；λ₁, λ₂Represent loss weight；P represents the threedimensional model of neural network forecast；Q represents true Free view-point threedimensional model；| | | | 2 indicate two norms；F (x) represents equivalent mappings.