CN110288695A

CN110288695A - Single-frame images threedimensional model method of surface reconstruction based on deep learning

Info

Publication number: CN110288695A
Application number: CN201910509313.2A
Authority: CN
Inventors: 杨路; 杨经纶; 李佑华
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-06-13
Filing date: 2019-06-13
Publication date: 2019-09-27
Anticipated expiration: 2039-06-13
Also published as: CN110288695B

Abstract

The present invention provides a kind of single-frame images threedimensional model method of surface reconstruction based on deep learning generates the single-frame images under model true shape point cloud and different points of view different distance comprising steps of being sampled and being rendered to CAD model；Feature extraction is carried out to image by convolutional neural networks, obtains the high-level semantic of two dimensional image；Control point coordinates required for the high-level semantic of acquisition is changed by full Connection Neural Network module converter for three-dimensional reconstruction stage nurbs surface and weighting parameter；It is updated using NURBS model of the obtained control point coordinates and weighting parameter to initialization, gradually carries out three-dimensional reconstruction；Training sample input deep learning model is trained automatically, obtains optimal model parameters, complete three-dimensional reconstruction by training deep learning model.The present invention simply and efficiently can carry out three-dimensional reconstruction to single-frame images, and the threedimensional model of reconstruction has the characteristics that details is abundant, surface is smooth and whole good.

Description

Single-frame images threedimensional model method of surface reconstruction based on deep learning

Technical field

The present invention relates to computer vision field more particularly to a kind of single-frame images threedimensional model tables based on deep learning Face method for reconstructing.

Background technique

Three-dimensional reconstruction refers to the process of the founding mathematical models in a computer of the three-dimension object in real scene, is to calculate One popular research direction of machine visual field.Compared to two dimensional image, threedimensional model is capable of providing the Global Information of object, more Add comprehensive displaying object properties, thus is suffered from widely in multiple fields such as computer animation, human-computer interaction, modern medicines Using.

With the fast development of deep learning, single-frame images three-dimensional reconstruction field achieves new breakthrough.Researcher utilizes Convolutional neural networks carry out feature extraction to single-frame images and rebuild specific expression-form by specific Feature Mapping method Threedimensional model.However between different expression-forms, there is some difference, accuracy, robustness and algorithm complexity all shadows Ring the generation effect of threedimensional model.The expression-form of threedimensional model mainly has voxel, point cloud and grid at present.

Voxel expression-form is similar with the two-dimensional pixel of image, only expands to three-dimensional stand from two-dimensional pixel Cube unit.But in three dimensions, voxel resolution promotion be by cube as unit of it is increased, this will lead to algorithm complexity Degree increases rapidly.Since the memory of current computer limits, the reconstruction of high-resolution three-dimension voxel model is very difficult.

Point cloud expression-form completes three-dimensional reconstruction by way of directly returning three-dimensional space point coordinate, and method is simple, It is easily achieved.But connect with point without part due to putting, point cloud effect when indicating continuous threedimensional model is bad, easy Lead to loss in detail, the threedimensional model of reconstruction usually lacks surface fluency.

Grid expression-form rebuilds threedimensional model by the connection on vertex and side.The expression-form can effectively avoid a little The excessively high problem of freedom degree between point is capable of forming stable, details threedimensional model abundant.But it in the conventional method, pushes up The acceptance region of point is relatively narrow, it means that a vertex cannot effectively receive the feature on other vertex from multiple sides, be easy Vertex is caused to fall into local optimum, the whole inadequate robust of the threedimensional model of reconstruction.

Nurbs surface is a kind of common expression-form in conventional three-dimensional modeling field, and there is good convexity and geometry to connect Continuous property can generate most of basic three-dimension curved surface.Pass through the variation of control point coordinates and weight, the expression-form adjustable three The local shape of curved surface is tieed up, and then preferably completes reconstructing three-dimensional model work.But traditional control point weight and coordinate Setting needs to calculate by a large amount of initialization and manual fine-tuning, this makes the reconstruction of complex-curved threedimensional model more difficult.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of single-frame images threedimensional model surface based on deep learning Method for reconstructing simply and efficiently can carry out three-dimensional reconstruction to single-frame images, and the threedimensional model of reconstruction is abundant with details, surface Smoothness, whole good feature.

In order to solve the above technical problems, the technical solution adopted by the present invention is that:

A kind of single-frame images threedimensional model method of surface reconstruction based on deep learning, comprising the following steps:

Step 1: sampling CAD model and rendered, and model true shape point cloud and different points of view different distance are generated Under single-frame images；

Step 2: feature extraction is carried out to image by convolutional neural networks, obtains the high-level semantic of two dimensional image；

Step 3: by the high-level semantic of acquisition by full Connection Neural Network module converter be three-dimensional reconstruction stage NURBS Control point coordinates required for curved surface changes and weighting parameter；

Step 4: it is updated using NURBS model of the obtained control point coordinates and weighting parameter to initialization, gradually Carry out three-dimensional reconstruction；

Step 5: training sample input deep learning model is trained automatically, obtains optimal mould by training deep learning model Shape parameter completes three-dimensional reconstruction.

Further, in step 1, CAD model is sampled and is rendered using OpenGL, generate training sample.

Further, in step 2, high level is carried out using identical VGG16 model to the image of each training sample Extraction of semantics carries out feature extraction by input picture of the following formula to each training sample:

In formula, N is positive integer；It indicates in n-th of classification in i-th of CAD training sample produced by jth picture High-level semantic；Extract_VGG16Indicate VGG16 feature extraction network；Indicate i-th of CAD training sample in n-th of classification Middle jth picture.

Further, in step 2, Extract_VGG16Including input layer, convolutional layer, pond layer and output layer, to volume Lamination exports result and carries out non-liner revision；Relationship such as following formula between convolutional layer or between convolutional layer and input layer:

In formula, Conv (i, j) indicates the data of i row j column in kth convolutional layer, W_k-1,k(m, n) indicates that -1 convolutional layer of kth arrives The data that m row n is arranged in the convolution kernel of kth convolutional layer, b_k-1,kIndicate linear list of -1 hidden layer of kth to k-th of hidden layer The link biasing of member；I indicates the input picture of input layer, W_i,k(m, n) indicates linear unit of the input layer to k-th of hidden layer Link weight, b_i,kIndicate that input layer is biased to the link of the linear unit of k-th of hidden layer；ReLU is defeated to every layer of convolutional layer Result carries out non-liner revision, formula out are as follows:

The following formula of relationship between the input and output of pond layer:

Pool (i, j)=max_m,nI(i+m,i+n)

Pool is the output of pond layer as a result, I is input grapheme, and pond layer takes the maximum value of input data regional area.

Further, the step 4 specifically:

NURBS model initialization: as effective object, there is coordinate (x at each control point at initialization control point_i,y_i,z_i) With weight w；

NURBS reconstructing three-dimensional model: control point coordinates required for being changed using obtained NURBS and weight parameter are to first Beginning model is updated, and obtains target three-dimensional；Initialization is shown below with model modification:

Wherein, C (μ, v) represents shape function in each section；w_iRepresent the weight at control point, C_iThat is (x_i,y_i,z_i) represent The coordinate at control point；N_i,d(μ) represents interval function, μ, and v represents the section node of curved surface different directions, and n represents section sum, The corresponding section n+1 node, d represent the order of interval function.

Further, the step 5 specifically:

In the forward propagation process, network convolution kernel and characteristic pattern carry out dot product calculating, gradually obtain high-level semantic, Quan Lian It connects neural network module and parameter needed for obtaining phase of regeneration is returned to high-level semantic, complete the control node coordinate of NURBS With right value update；In back-propagation process, mask convolution core and full articulamentum undated parameter；

Neural network is trained using the threedimensional model of generation and the chamfering distance of true three-dimension model, that is, under passing through Formula is trained neural network:

In formula, loss represents loss function, L_CDRepresent seek generate threedimensional model and true three-dimension model chamfering away from From, P represents the threedimensional model generated, and Q represents true shape threedimensional model, | | | | 2 indicate two norms.

Compared with prior art, the beneficial effects of the present invention are: 1) being based on deep learning neural network, training study is completed Three-dimensional reconstruction task improves the efficiency of three-dimensional reconstruction, reduces operation difficulty；2) NURBS conventional three-dimensional modeling method is combined, By the parameter learning to control point coordinates and weight, the recurrence complexity of neural network can be greatly reduced, reduces network Scale promotes network operation speed；3) mathematical constraint of NURBS makes the threedimensional model details rebuild abundant, and surface is smooth, whole Body works well.

Detailed description of the invention

Fig. 1 is the work flow diagram of the embodiment of the present invention；

Fig. 2 is that the training sample of the embodiment of the present invention generates schematic diagram；

Fig. 3 is the data transmission scheme of the image characteristics extraction network of the embodiment of the present invention；

Fig. 4 is the schematic diagram that the image information of the embodiment of the present invention is converted into the control coordinate and weight of nurbs surface；

Fig. 5 is that the initial nurbs surface model gradual deformation of the embodiment of the present invention is the schematic diagram of target three-dimensional.

Specific embodiment

The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.The present invention passes through depth It practises neural network and seeks control point coordinates and weight parameter value required for nurbs surface changes, on the one hand, deep learning nerve Network can use its powerful ability in feature extraction and analytical calculation ability, required for the variation of automatic returning nurbs surface Parameter；On the other hand, NURBS method does not need to carry out recurrence calculating to each point of threedimensional model, it is only necessary to seek The weight and coordinate at control point, so that it may realize three-dimensional reconstruction, reduce network parameter, reduce computation complexity.

As shown in Figure 1, the single-frame images threedimensional model method of surface reconstruction based on deep learning includes the following steps:

Step 1: training sample is generated

Sampling is carried out to CAD model using OpenGL and renders generation training sample, the nerve for reconstructing three-dimensional model Network training.Certainly, the method for generating training sample is not limited to OpenGL, and any method that can be realized same technique effect is all It can be used for generating training sample.It is specific as follows:

Select CAD data collection.2015, Princeton University, Stanford University and TTIC researcher issued cooperation Project ShapeNet data set, it is intended to which Yu Jianli mono- has abundant 3D graphic data set annotate, large-scale, for all over the world Researcher these data are provided, to support computer graphics, computer vision, robot technology and other related disciplines Research.The present invention uses a subset of ShapeNet data set, and it is one that it, which includes 50,000 kinds of models and 13 primary categories, The three-dimensional CAD model of a set.Points Sample is carried out to CAD model using OpenGL, obtains mould shapes.It is regarded from initial coordinate Angle is set out, and is sampled using OpenGL to CAD model, each model acquire 4096 points, available CAD model it is true Real shape information, as shown in the left side Fig. 2.

24 width single-frame images are obtained with different distance rendering CAD model from different perspectives using OpenGL, and record every The corresponding camera angle of picture and distance.On OpenGL platform, by mobile virtual camera to the angle of CAD model and away from From the picture of available different points of view is as training sample, as shown in the right side of fig 2.

Data set is divided into trained and test set, wherein 4/5 for training, remaining is 1/5 for testing.In order to facilitate instruction Practice, the present embodiment upsets obtained sample data set, and each classification extracts 4/5 and collectively constitutes training dataset；Remaining 1/5 Data independent storage of all categories, for testing the modelling effect of each type.

Step 2: image characteristics extraction

Feature extraction is carried out to the input picture of each training sample, obtains the semantic feature of input picture.Due to each CAD model has 24 subtended angle degree and uses phase respectively from apart from different input pictures, therefore to 24 pictures of each sample Same VGG16 model carries out the extraction of high-level semantic.Feature can be carried out by input picture of the following formula to each training sample It extracts:

Wherein, N is positive integer, and N takes 13 in the present embodiment, indicates 13 training data classifications；Indicate n-th of class High-level semantic caused by jth picture in i-th of CAD training sample in not, j maximum takes 24 in the present embodiment； Extract_VGG16Indicate VGG16 feature extraction network；Indicate that jth is schemed in i-th of CAD training sample in n-th of classification Piece.

Extract_VGG16Including input layer, convolutional layer, pond layer and output layer.Convolutional layer is 15 in the embodiment of the present invention It is a, as shown in Figure 3.A indicates that input layer, B1 indicate that the first convolutional layer, B15 indicate that the 15th convolutional layer, C1 indicate the first pond Layer, C15 indicate that the 15th pond layer, D indicate output layer.Each of first convolutional layer unit need to only experience input layer part Image-region is experienced without doing to global image, and the second convolutional layer can experience the first convolutional layer regional area again, according to this class It pushes away, in higher convolutional layer, these are experienced different local neurons integrating can be obtained by global information.In order to add The convergence of fast network model and the accuracy for improving result carry out non-liner revision to convolutional layer output result.Convolutional layer it Between or convolutional layer and input layer between relationship such as following formula:

When semantic layer makes a small amount of translation, the semantic information approximation that pond layer can help convolutional layer to obtain is constant, pond Change the following formula of relationship between the input and output of layer:

Pool (i, j)=max_m,nI(i+m,i+n)

Step 3: characteristics of image conversion

The picture high-level semantic extracted is returned using full Connection Neural Network module, exports the three-dimensional reconstruction stage Control point coordinates required for nurbs surface changes and weighting parameter.Full Connection Neural Network module includes l layers, defaults l=1, 2,3, suitable for simple, common, complicated reconstructing three-dimensional model task.Convolution obtains high-level semantics features firstly the need of expanding into Then column vector is connected with full articulamentum, is unfolded as follows formula:

Wherein,Vector is unfolded in high-level semantics to extract from input picture, and Conv is the output multi-dimensional matrix of convolutional layer One multi-dimensional matrix is converted into a column vector by semanteme, Flatten ().

Connection such as following formula between full articulamentum and high-level semantics expansion vector and full articulamentum:

Para indicates that output parameter, i indicate input channel, and j indicates output channel,Indicate that vector is unfolded in high-level semantics Or a upper full articulamentum, b (j) represent the link biasing of jth linear unit.

The coordinate and weight parameter at control point required for output parameter dimension changes with nurbs surface match, as follows Formula:

Para(j)→Num_ct*((x,y,z)+w)

Wherein, Num_ctThe quantity at control point is represented, (x, y, z) represents the coordinate at control point, and w represents the weight at control point.

Step 4: reconstructing three-dimensional model

The present invention does not directly generate threedimensional model, but will be first with weight parameter by the control point coordinates that conversion obtains Beginning threedimensional model is gradually deformed into target three-dimensional, this is a kind of mapping of three-dimensional to three-dimensional, and it is higher accurate to have Degree and globality.

Step 4 specifically includes:

NURBS model initialization: this example is initialized 7*9 control point as effective object, and each control point has Coordinate (x, y, z) and weight w, as shown in Figure 4.

NURBS reconstructing three-dimensional model: control point coordinates required for being changed using obtained NURBS and weight parameter are to first Beginning model is updated, and gradually obtains target three-dimensional.In the present embodiment, initialization is shown below with model modification:

Step 5: deep learning model training

Training set is fully entered into neural network, entire depth learning model is trained.In the forward propagation process, Network convolution kernel and characteristic pattern carry out dot product calculating, gradually obtain high-level semantic, full Connection Neural Network module is to high-level semantic Parameter needed for obtaining phase of regeneration is returned, the control node coordinate and right value update of NURBS are completed；In back-propagation process In, mask convolution core and full articulamentum undated parameter.

Neural network is trained using the threedimensional model of generation and the chamfering distance of true three-dimension model.For each Point, chamfering distance algorithm find nearest adjoint point in another set, and by square summation of distance.The algorithm is as set The function of midpoint, be it is continuous, piecewise smooth, the range searching to each point is independent, therefore can be easy to Ground parallelization.Neural network is trained by following formula:

Wherein, loss represents loss function, L_CDRepresent seek generate threedimensional model and true three-dimension model chamfering away from From, P represents the threedimensional model generated, and Q represents true shape threedimensional model, | | | | 2 indicate two norms.

Single-frame images threedimensional model method of surface reconstruction provided by the invention based on deep learning, is different from traditional NURBS three-dimensional rebuilding method and the common deep learning three-dimensional rebuilding method based on voxel, point cloud and grid.The present invention is not necessarily to Various parameters required for NURBS reconstructing three-dimensional model are calculated, therefore need a large amount of manual operation unlike conventional three-dimensional models, Which greatly improves the efficiency of reconstructing three-dimensional model；The sky of point of the present invention without individually predicting each threedimensional model simultaneously Between coordinate, solve model complicated difficult to calculate, the problem of precision is difficult to improve, stronger enhances the smoothness of threedimensional model Degree, ensure that the details integrality of reconstruction model, improves the precision and speed of three-dimensional reconstruction.

Further, since the single-frame images threedimensional model method of surface reconstruction based on deep learning is a kind of based on statistical learning Machine learning method, need a large amount of training sample, that is, demarcated the point cloud data of the shape information of CAD model from it is different The single frames picture of angle and distance.By obtaining the priori knowledge in relation to mould shapes and viewpoint from single frames picture, depth is obtained The weight parameter of learning model is spent, and then obtains the deep neural network of this single-frame images nurbs surface reconstructing three-dimensional model Model.For this reason, the present invention is sampled using OpenGL and rendering tool, flexible progress data generation increase instruction Practice sample size, and then improves the reconstruction ability of deep neural network model.

Claims

1. a kind of single-frame images threedimensional model method of surface reconstruction based on deep learning, which comprises the following steps:

Step 1: sampling CAD model and rendered, and generates under model true shape point cloud and different points of view different distance Single-frame images；

Step 3: by the high-level semantic of acquisition by full Connection Neural Network module converter be three-dimensional reconstruction stage nurbs surface Control point coordinates required for changing and weighting parameter；

Step 4: it is updated using NURBS model of the obtained control point coordinates and weighting parameter to initialization, is gradually carried out Three-dimensional reconstruction；

Step 5: training sample input deep learning model is trained automatically, obtains optimal models ginseng by training deep learning model Number completes three-dimensional reconstruction.

2. the single-frame images threedimensional model method of surface reconstruction based on deep learning as described in claim 1, which is characterized in that In step 1, CAD model is sampled and rendered using OpenGL, generates training sample.

3. the single-frame images threedimensional model method of surface reconstruction based on deep learning as claimed in claim 2, which is characterized in that In step 2, high-level semantic extraction is carried out using identical VGG16 model to the image of each training sample, that is, under passing through Formula carries out feature extraction to the input picture of each training sample:

In formula, N is positive integer；Indicate in n-th of classification height caused by jth picture in i-th of CAD training sample Layer is semantic；Extract_VGG16Indicate VGG16 feature extraction network；It indicates in n-th of classification in i-th of CAD training sample J picture.

4. the single-frame images threedimensional model method of surface reconstruction based on deep learning as claimed in claim 3, which is characterized in that In step 2, Extract_VGG16Including input layer, convolutional layer, pond layer and output layer, convolutional layer output result is carried out Non-liner revision；Relationship such as following formula between convolutional layer or between convolutional layer and input layer:

In formula, Conv (i, j) indicates the data of i row j column in kth convolutional layer, W_k-1,k(m, n) indicates that -1 convolutional layer of kth is rolled up to kth The data that m row n is arranged in the convolution kernel of lamination, b_k-1,kIndicate linear unit from -1 hidden layer of kth to k-th of hidden layer chain Connect biasing；I indicates the input picture of input layer, W_i,k(m, n) indicate linear unit from input layer to k-th of hidden layer link Weight, b_i,kIndicate that input layer is biased to the link of the linear unit of k-th of hidden layer；ReLU exports result to every layer of convolutional layer Carry out non-liner revision, formula are as follows:

Pool (i, j)=max_m,nI(i+m,i+n)

Pool is the output of pond layer as a result, I is that input is semantic, and pond layer takes the maximum value of input data regional area.

5. the single-frame images threedimensional model method of surface reconstruction based on deep learning as described in claim 1, which is characterized in that The step 4 specifically:

NURBS model initialization: as effective object, there are coordinate (x, y, z) and weight in each control point at initialization control point w；

NURBS reconstructing three-dimensional model: control point coordinates required for being changed using obtained NURBS and weight parameter are to introductory die Type is updated, and obtains target three-dimensional；Initialization is shown below with model modification:

Wherein, C (μ, v) represents shape function in each section；w_iRepresent the weight at control point, C_iThat is (x_i,y_i,z_i) represent control The coordinate of point；N_i,d(μ) represents interval function, μ, and v represents the section node of curved surface different directions, and n represents section sum, corresponding n + 1 section node, d represent the order of interval function.

6. the single-frame images threedimensional model method of surface reconstruction based on deep learning as described in claim 1, which is characterized in that The step 5 specifically:

In the forward propagation process, network convolution kernel and characteristic pattern carry out dot product calculating, gradually obtain high-level semantic, full connection mind Parameter needed for obtaining phase of regeneration is returned to high-level semantic through network module, completes the control node coordinate and power of NURBS Value updates；In back-propagation process, mask convolution core and full articulamentum undated parameter；

Neural network is trained using the threedimensional model of generation and the chamfering distance of true three-dimension model, that is, passes through following formula pair Neural network is trained:

In formula, loss represents loss function, L_CDRepresent the chamfering distance of the threedimensional model and true three-dimension model of seeking generating, P The threedimensional model generated is represented, Q represents true shape threedimensional model, | | | | 2 indicate two norms.