CN110047144A

CN110047144A - A kind of complete object real-time three-dimensional method for reconstructing based on Kinectv2

Info

Publication number: CN110047144A
Application number: CN201910257175.3A
Authority: CN
Inventors: 卢朝阳; 郑熙映
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-04-01
Filing date: 2019-04-01
Publication date: 2019-07-23

Abstract

The present invention relates to a kind of complete object real-time three-dimensional method for reconstructing based on Kinectv2, including data acquisition → depth completion → points cloud processing → ICP point cloud registering → point Yun Ronghe → curve reestablishing.It is acquired the beneficial effects of the present invention are: being somebody's turn to do the complete real-time three-dimensional method for reconstructing of object based on Kinect2.0 and passing through data, depth completion, points cloud processing, ICP point cloud registering, point Yun Ronghe and curve reestablishing and etc. collective effect, so that the effect promoting that this method rebuilds the complete real-time three-dimensional of object, it substitutes traditional three-dimensional reconstruction and generally uses multiple pictures, monocular or binocular camera carry out depth acquisition, this process is computationally intensive, it is difficult to guarantee simultaneously in real-time and precision aspect, and the high-precision three-dimensional scanning device selling at exorbitant prices of profession, so that the application of three-dimensional reconstruction and popularization degree have the defects that limitation.

Description

A kind of complete object real-time three-dimensional method for reconstructing based on Kinectv2

Technical field

The present invention relates to technical field of computer vision, specially a kind of object based on Kinect2.0 complete real-time three Tie up method for reconstructing.

Background technique

Along with the progress of science and technology and the diversification of life requirement, computer vision technique passes through continuous iteration and update, Us can be helped to obtain more information in digital picture and video, among these, three-dimensional reconstruction became in recent years Image analysis is turned to three-dimensional by two-dimensional space by the hot spot given more sustained attention, and it is more excellent to be supplied to us with more three-dimensional visual angle The solution of change.

Existing three-dimensional reconstruction generally uses multiple pictures, monocular or binocular camera to carry out depth acquisition, this process meter Calculation amount is big, is difficult to guarantee simultaneously in real-time and precision aspect, and the high-precision three-dimensional scanning device selling at exorbitant prices of profession, so that The application of three-dimensional reconstruction and popularization degree are restricted.

Summary of the invention

The purpose of the present invention is to provide a kind of complete real-time three-dimensional method for reconstructing of object based on Kinect2.0, with solution Certainly the problems mentioned above in the background art.

To achieve the above object, the invention provides the following technical scheme: a kind of object based on Kinect2.0 is completely real-time Three-dimensional rebuilding method, including data acquisition → depth completion → points cloud processing → ICP point cloud registering → point Yun Ronghe → curved surface weight It builds.

Preferably, the complete real-time three-dimensional method for reconstructing of the object based on Kinect2.0 is as follows:

(1) data acquire: depth image and color image data are obtained by Kinect2.0, to the depth image of acquisition Noise pretreatment is carried out, is then aligned cromogram with the depth map after noise reduction, result is inputted into depth degree completion network In.

(2) cromogram and depth map: being put into designed depth completion network by depth completion in Torch frame, Depth map lack part is predicted using the information of cromogram, and global optimization will be carried out in conjunction with original depth-map, is obtained Point cloud data is translated by camera internal reference matrix after depth map after completion.

(3) points cloud processing: calculating its normal vector by a coordinate information for cloud each point, corresponding in the point cloud registering after being Point matching is prepared.

(4) ICP point cloud registering: the point cloud data of prediction point cloud data and present frame that global data cube obtains carries out Registration, obtains the registration transformation matrix of current point cloud.

(5) point Yun Ronghe: according to the registration transformation matrix obtained before, the point cloud data of present frame is stood with global data Cube fusion is gone out new prediction data by global data Cube computation then by light projecting algorithm, is used for next frame point Cloud Registration of Measuring Data, while the point cloud data surface under current visual angle is rendered, situation is rebuild in observation in real time.

(6) curve reestablishing: after all frame point cloud datas of acquisition have all merged, by global point cloud data from global data It is extracted in cube, body surface is reconstructed by curve reestablishing algorithm, forms complete threedimensional model.

Preferably, need to use image coordinate system, camera coordinates system and world coordinate system in the points cloud processing.

Preferably, the point cloud data in described cloud fusion is merged with global data cube specifically: when initialization, entirely All voxel values are D=1, W=0 in office's cube, and the initial position of camera is set as (1.5,1.5, -0.3), obtains convenient for camera The preferable visual field is obtained, cube centre coordinate is (1.5,1.5,1.5).After obtaining the i-th frame point cloud data, incorporated cube It needs to carry out following steps in body:

(1) the coordinate V of voxel is obtained first under global coordinate system^g(x, y, z), is then registrated by ICP and is converted It is simultaneously transformed into camera coordinates system from global coordinate system and obtains V (x, y, z) by matrix；

(2) the camera coordinates V (x, y, z) for obtaining step (1) is obtained according to camera internal reference matrix conversion to image coordinate system To correspondence image coordinate (u, v)；

(3) if the depth value D (u, v) at first frame depth image (u, v) is not 0, compare D (u, v) and voxel phase The size of z in machine coordinate V (x, y, z) illustrates that this voxel is closer apart from camera if D (u, v) > z, is rebuilding the outer of surface Portion；If D (u, v) < z illustrates that this voxel is farther apart from camera, in the inside for rebuilding surface；

(4) the last result according in (3) updates distance D and weight W in the voxel.

It is as follows to update formula used:

W_i(x, y, z)=min (max weight, W_i-1(x,y,z)+1)

sdf_i=V.z-D_i(u,v)

\*MERGEFORMAT(0-1)

Wherein W_i(x, y, z) is the weight of voxel in present frame global data cube, W_i-1(x, y, z) is that previous frame is complete The weight of voxel in office data cube, max weight are weight limit, are set as 1, D herein_i(x, y, z) is current Distance of the voxel to body surface, D in frame global data cube_i-1(x, y, z) is voxel in previous frame global data cube To the distance of body surface, d_i(x, y, z) is voxel in the global data cube being calculated according to present frame depth data To the distance of body surface, V.z indicates z-axis coordinate of the voxel under camera coordinates system, D_i(u, v) indicates present frame depth image Depth value at (u, v), maxtrunctaion, mintruncation are truncation range.

Preferably, the curve reestablishing is divided into:

(1) global point cloud data are split, obtain voxel grid data.

(2) contour surface is extracted using linear interpolation method in each voxel cell.

(3) trigonometric ratio contour surface generates surface mesh, completes the building from cloud to threedimensional model.

Preferably, which kind of feature the design principle and process of the depth completion network, building including database select It is trained and network structure and global optimization procedure, is finally completed depth information completion.

Compared with prior art, have following the utility model has the advantages that by data acquisition, depth completion, points cloud processing, ICP point Cloud registration, point Yun Ronghe and curve reestablishing and etc. collective effect so that this method rebuild the complete real-time three-dimensional of object Effect promoting substitutes traditional three-dimensional reconstruction and multiple pictures, monocular or binocular camera is generally used to carry out depth acquisition, this mistake Journey is computationally intensive, is difficult to guarantee simultaneously in real-time and precision aspect, and the high-precision three-dimensional scanning device selling at exorbitant prices of profession, So that the application of three-dimensional reconstruction and popularization degree have the defects that limitation, and Surface prediction method is obtained by training cromogram Line and Ouluding boundary recalculate all depth then in conjunction with original depth-map, the depth image after then obtaining completion, this Kind training cromogram method is not limited by testee and environment, and network does not need weight independently of the Observational depth of input New training, meanwhile, Kinect2.0 obtains depth information using TOF depth transducer, and this mode is used with Kinect1.0 Structure light Light Coding it is different, refer to the near-infrared pulse for issuing modulation by depth transducer, it is anti-to encounter object After penetrating, ccd sensor is converted into testee to camera by the time difference or phase difference that calculate light transmitting with receive reflection Distance, the depth information of object each section is obtained with this.

Detailed description of the invention

Fig. 1 is a kind of coming from for the complete real-time three-dimensional method for reconstructing of object based on Kinect2.0 of the invention The depth map rendered structure schematic diagram of Matterport3D data set；

Fig. 2 be the complete real-time three-dimensional method for reconstructing of a kind of object based on Kinect2.0 of the invention to two planes into Row acquisition structural schematic diagram；

Fig. 3 is a kind of VGG network structure of the complete real-time three-dimensional method for reconstructing of object based on Kinect2.0 of the invention Schematic diagram；

Fig. 4 is a kind of activation primitive structure of the complete real-time three-dimensional method for reconstructing of object based on Kinect2.0 of the invention Schematic diagram.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts all other Embodiment shall fall within the protection scope of the present invention.

The present invention provides a kind of technical solution: a kind of complete real-time three-dimensional method for reconstructing of object based on Kinect2.0, packet Include data acquisition → depth completion → points cloud processing → ICP point cloud registering → point Yun Ronghe → curve reestablishing.

Further, the complete real-time three-dimensional method for reconstructing of object based on Kinect2.0 is as follows:

Further, it needs to use image coordinate system, camera coordinates system and world coordinate system in points cloud processing.

Further, the point cloud data in the fusion of point cloud is merged with global data cube specifically: global when initialization All voxel values are D=1 in cube, and W=0, the initial position of camera is set as (1.5,1.5, -0.3), convenient for camera acquisition The preferable visual field, cube centre coordinate are (1.5,1.5,1.5).After obtaining the i-th frame point cloud data, cube is incorporated In need to carry out following steps:

It is as follows to update formula used:

W_i(x, y, z)=min (max weight, W_i-1(x,y,z)+1)

sdf_i=V.z-D_i(u,v)

\*MERGEFORMAT(0-1)

Further, curve reestablishing is divided into:

(1) global point cloud data are split, obtain voxel grid data.

Further, the design principle and process of depth completion network, building including database, select which kind of feature into Row training and network structure and global optimization procedure are finally completed depth information completion.

Embodiment 1

First group we use the blue disk of surface frosted, it can be seen that be hacked in disk border part from depth map Color minuscule hole package, when generating three-dimensional planar by fusion, discovery rough surface is rough, in part because of generation The influence of structural noise；Second group of white reflection disk smooth using surface, when observing depth map, it is seen that above There is one piece of black cavity, and this block region is precisely that indoor light reflects the region on disk, so as to cause when fusion three Dimensional plane lacks；We use the black disk of frosted to third group, and corresponding depth map disk upper area occurs fine and closely woven Black color dots block, then discovery model plane degrees of fusion is poor when observation fusion threedimensional model, and depth is lost serious.

From above it was found that, being influenced by Kinect2.0 equipment itself and acquisition environment, we carry out deeply to object When degree acquisition, suitable intensity of illumination can be set, and the placement position etc. for adjusting testee is arranged by optimizing body surface material Reduction partial noise is applied to be mixed into.

It is corresponding pre- it is necessary to carry out after obtaining depth image according to noise characteristic of the Kinect2.0 in acquisition Processing, i.e., carry out denoising operation to picture, since depth map is different from the data structure of cromogram, adopted for different demands Filter is also different, and therefore, we are filtered depth image with current main-stream filter.Purpose is to filter out It can keep characteristic details while noise very well again.

The general general formula of Image filter arithmetic indicates as follows:

Wherein I is noise image, and I' is filtered image, and Ω is the contiguous range of pixel (x, y), and W (i, j) is filter Weight of the wave device at pixel (i, j), W_pAs normalized parameter.

In terms of experimental evaluation, since the Kinect2.0 depth image obtained always includes noise, it is difficult to obtain not noisy The depth true value image of sound compares each filter effect using subjective assessment comment herein as judgment criteria Valence repairs situation by depth image cavity after denoising and algorithm time loss is judged.

Embodiment 2

Depth completion network design

Since Kinect2.0 can not usually perceive the depth map of bright object, transparent region and distant surface, when tested When object or environment are excessively complicated, the effect denoised using depth image of the filter to acquisition is limited, is unable to satisfy pair The repairing of most of depth absent region.When there is bottleneck in traditional reparation algorithm, it is contemplated that the high score that Kinect2.0 is obtained Resolution cromogram has detailed information abundant, we attempt sight turning to deep learning field, it is desirable to pass through mass data Sample training goes out the network that can be predicted depth map and be repaired.We introduce a kind of method and attempt to existing thus Database is constructed, and designing a depth network can be to cromogram and the end-to-end training of depth map progress and assessment, prediction The Differential Properties of cromogram part and the original depth-map for combining Kinect2.0 acquisition, so that all pixels depth value is solved, Achieve the purpose that depth completion.

We provide cromogram as input for network, it is trained to predict local surface normal under supervision and block side Boundary then combines the result of prediction with original depth, to global lines optimization is carried out after input depth regularization, completes Depth information completion.

Database sharing

There is presently no large-scale training sets to be matched for RGB-D image with complete depth image, most of depths Spending estimation method is all that the pixel captured towards commercial RGB-D video camera is trained and assesses, and is passed through from these data Building network can only be learned to reproduce the depth information of measurement, cannot carry out to the not collected regional depth with notable feature Prediction.

In order to solve this problem, it would be desirable to create a RGB-D image and its with the complete depth image under visual angle The data set of pairing.A kind of most direct mode is exactly the RGB-D cameras capture image using consumer level low cost, then is passed through High-cost high accuracy depth sensor captures identical image and is matched, but the excessively time-consuming simultaneously higher cost of this method.It is logical It crosses and observation point is carried out to RGB-D data set such as Matterport3D, ScanNet, SceneNN and SUN3D disclosed in current main-stream We have found that such data set acquires various pieces in scene in same large scene after analysis, if these differences regarded Partial depth map is combined rendering under angle, so that it may obtain a high quality depth map close to true value.

We select Matterport3D database to be combined rendering, under an indoor large scene, for each room Between we the triangular mesh of 1 to 6,000,000 resolution sizes extracted by the method for screening Poisson resurfacing, then RGB-D image in scene is sampled, graph cut is carried out to the grid after reconstruction according to sampled point camera pose, thus Obtain complete depth image.

Show the building process of single width complete depth map in data set.(e) original color figure, original depth-map are shown And the complete depth map of building, camera acquisition visual angle is red in Fig. 1；The grid model that Poisson resurfacing obtains is By merging the RGB-D creation of image of each visual angle acquisition inside scene, such as (a) (b) (c) (d) image is in Fig. 1 respectively Middle label is b, is acquired under the visual angle of c, d

Fig. 1 shows that the depth map from Matterport3D data set renders example.Obtained depth data has Following characteristic:

1. the depth map completed maintains better marginal information, cavity is significantly reduced.This is because carrying out surface weight Build is to use all video camera different perspectivess to observe as a result, the advantage of doing so is that if we acquire depth information of scene When obtain less than the depth of body surface or remote scene when, can in conjunction with other cameras different perspectives closely acquire It arrives.Although can completely does not reappear scene depth true value, such treatment process is averagely filled with original depth in database Spend 60% pixel of image missing.

2. handling obtained complete depth map provides better resolution ratio for scene medium and long distance surface.This is because sieve The three-dimensional grid size for selecting Poisson resurfacing to use is equivalent to the resolution ratio of a depth camera, does not have in fusion usually Resolution loss can project the resolution ratio that all low coverages acquire in view plane, for apart from the farther surface of video camera Higher pixel resolution is provided.

3. obtained complete depth image has smaller noise than original image.Since resurfacing algorithm passes through filtering The noise depth sample that multiple camera views are combined with average method, is denoised to image.

By this method, we introduce a new data set, wherein including 105000 training sets and 12000 Test set is opened, these images are corresponding by the large-scale depth map calculated with 72 real scenes.

Training characteristics analysis

Which type of feature we should train to be suitable for depth completion, and direct method is by designing a net Network inputs original depth and colouring information, passes through net regression to complete depth.But it is difficult to predict from monocular image Absolute depth information, because it needs to consider the factors such as object size in scene, scene type.Therefore, we do not go directly to ask Depth information is solved, and is an attempt to go to predict some surface local attributes by training network, is then gone using these prediction results Solve depth.

Firstly the need of looking for which local attribute is suitable for depth prediction in cromogram, for the depth letter of absent region Breath, it would be desirable to be gone to speculate depth plane and edge according to colour information.For this purpose, we are primarily focused on prediction object table On face normal and Ouluding boundary.Since surface normal is the important attribute on solid surface, the local neighborhood of pixel is only depended on, Closely related with the local lighting variation directly observed in color image, and Ouluding boundary can then be generated using pixel as unit Local mode, in depth network have robustness, Galliani et al. object on table top carry out multiple view reconstruction in just make Restore the geometry of missing with surface normal.

When not considering original depth-map, all depth theories are only solved by surface normal and Ouluding boundary merely On be infeasible.In some extreme cases, the depth relationship between image different piece cannot only be speculated by normal, Such as in Fig. 2, it is not according only to the surface normal for obtaining cromogram that camera, which is acquired two planes by this visual angle, It can be inferred that the depth for blocking plane below.In this case, in terms of camera angle, the visibility region of latter plane is complete Occluded boundary surrounds, and for other regions of previous plane, can not determine the depth information of the part.

However, the sight in the region is surrounded and not included completely for some region Occluded boundary in scene actual measurement The such case for surveying original depth is almost impossible generation.Therefore, we use prediction cromogram in original depth-map Surface normal and Ouluding boundary, then depth lack part can be solved and be predicted very well.

The network architecture and training

After having to training dataset and being explicitly trained as well feature, which type of deep learning network is selected then to show It obtains most important.The convolutional Neural based on physics renders that we select Zhang et al. to propose in 2017 in research herein Network, this network understand mainly for indoor scene, in surface normal prediction, semantic segmentation and object boundary context of detection exhibition Good performance is showed.Convolutional layer conv1-conv5 is consistent in the head end encoder and VGG-16 of the model, and in rear bone The upper full convolutional neural networks with asymmetric encoding device and decoder constructed on frame, while being corresponding maximum pond and non-pond layer Provided with quick connection and shared pool mask, this is extremely important for study picture characteristics.

1. Crosslinking Structural

VGG convolutional neural networks are the network models that Oxford University put forward in 2014, due to its terseness and practical Property, once become one of most popular convolutional neural networks model at that time.The network in image classification and object detection task Show it is extraordinary as a result, 2014 ILSVRC match in, VGG achieves 92.3% accuracy in Top-5, The depth for demonstrating increase network can influence the final performance of network to a certain extent.

If Fig. 3 shows VGG network structure, and one of structure of the VGG-16 as VGG, it include 16 hidden layers, As shown in D column in figure, including 13 convolutional layer conv, 5 down-sampling layer maxpooling and 3 full articulamentum FC, and Whole network all employs an equal amount of 3x3 convolution kernel and 2x2 maximum pond, and 16 layer networks include 138,000,000 parameters altogether. VGG-16 is to replace in AlexNet using several continuous 3x3 convolution kernels compared to one improved place of AlexNet network Biggish convolution kernel (11x11,7x7,5x5) is an advantage over use using the small convolution kernel of accumulation under the premise of given receptive field Big convolution kernel improves to a certain extent because multilayered nonlinear layer can spend smaller cost to increase network depth The learning effect of neural network.

Each convolutional layer has an activation primitive in VGG-16, disappears to solve part gradient in back-propagation process Problem.Here activation primitive selects RELU, which is widely used in convolutional neural networks, and advantage is to be saturated, It can disappear to avoid gradient, while calculate the small convergence rate of cost faster.Its expression formula is σ (x)=max (0, x).

But the place that VGG-16 also has it to limit to, storage overhead first is very big, more computing resources are consumed, such as It is 15x15 to the tile size that each pixel uses, then required memory space is 225 times of original picture；Calculate effect Rate is low, since adjacent block of pixels is basically there exist repetition, during convolution one by one, brings and largely computes repeatedly, Pixel block size limits the sensing region of picture.

Therefore Zhang et al. is mapped using the Analysis On Multi-scale Features of the full convolutional network FCN combination VGG-16 with jump layer Characteristic carry out normal estimation, specific design philosophy is to maintain VGG-16 in the network structure of conv1-conv5, will after The full articulamentum in face replaces with convolutional layer and displacement convolutional layer, keeps its decoder and encoder symmetrical.In order to promote the resolution of result Rate alleviates the problem of gradient disappears, corresponding convolutional layer is carried out jump connection in network upstream and downstream；Simultaneously in order to further more Maximum pond bring loss of spatial information is mended, network saves the pond interchanger in downstream, and as respective layer upstream Non- pond interchanger uses, and realizes end-to-end study, generates normal forecast image.

The network designed in this way can receive the input picture of arbitrary dimension, using warp lamination to the last one convolutional layer Feature map up-sampled (Up-sampling), result can be restored to size identical with input picture, thus right Each pixel generates a prediction, while retaining the spatial information in original input picture, avoids and is brought using block of pixels Repetition storage and calculate convolution the problem of, more efficiently.

Different from normal surface estimation before, the main target of this paper is one network of training, only predicts original depth The normal of pixel in hole.The pixel definition that we will be observed that is the depth data from raw sensor and rendering mesh, Unobservable pixel definition be from rendering mesh rather than the depth data of raw sensor.For any given pixel Collect (observing, unobservable or both to have), we are by shielding the ladder in other pixels in back-propagation process Degree carrys out training pattern, only loses to these pixels.We use between true value and estimated value by element dot product it is inverse as Loss function:

Wherein N and N^*Respectively represent predicted value and true value plane normal vector map.

, using RGB-D image as input, the depth map that corresponding rendering obtains trains network as true value for we, instructs in this way Surface normal can individually be predicted from color by practising the network come, and when from normal reverse depth, is only used and observed Depth as regularization, be in brief exactly without depth prediction and having depth optimization.This dominance of strategies is trained net Network does not need the sensor re -training for different depth independently of Observational depth, can also be optimized and is generalized in conjunction with more Kind depth observation carries out regularization.

Global optimization

After cromogram of the network to input is predicted, we have obtained plane normal image N and Ouluding boundary image Next B needs that original depth-map is combined to carry out depth completion.We complete completion work by solving an equation group, Objective function is defined as the weighted sum of four square-errors:

E=λ_DE_D+λ_SE_S+λ_NE_NB

Wherein E_DFor estimating depth value D (p) at pixel p and original depth value D₀The distance between (p), E_NRepresent estimation Consistency between depth and prediction surface normal N (p), E_SIt indicates the variance of adjacent pixel estimating depth, encourages adjacent pixel With same depth；B ∈ [0,1] drops normal item according to prediction a probability B (p) of the pixel on Ouluding boundary Power.

In function expression above, objective function is nonlinear, because in E_NMiddle surface normal and tangent vector v The result that (p, q) dot product obtains is normalized.However we can abandon vector and be normalized to this approximate error term, this Kind approximation will increase the sensibility of scaled error, because depth is smaller, tangent line is shorter, E_NItem may be smaller, then in depth In completion setting, we pass through data item E_DIt forces when global optimization by being tieed up with the being consistent property of original depth observed Hold correct ratio.Since the matrix of equation group is that symmetric positive definite is sparse, we can pass through sparse Cholesky Factor minute Solution is effectively solved, this step we realized using cs_cholsol function in CSparse sparse matrix operation library, most Whole solution is the global minimum of approximate objective function.By such method, we by the surface normal estimated with block Boundary combines original depth information, by global optimization, completes the solution of all pixels depth.

Experimental analysis

Comparative experiments is carried out herein to verify the feasibility and performance test of network plan.Using Matterport3D The data set rendered carries out finetuning to FCN, then using RMSprop training network, learning rate when pre-training It is set as 1 × 10^-3, rate halves after every 300,000 iteration, and finetuning learning rate is set as 1 × 10^-4, every 10,000 times Rate halves after iteration.Cromogram is only used as input, and calculates all losses that pixel is presented, then in global optimization Take λ_D=10³, λ_N=1, λ_S=10^-3。

Experimental result will be assessed from following parameter herein:

(1) when assessing depth prediction, relative to the median error (Rel) of rendering depth, root-mean-square error (RMSE), prediction Pixel percentage that depth declines in a section (δ=| predicted- true |/true), δ value takes 1.05,1.10, 1.25,1.25^2,1.25^3.These indexs are all the standards being generally used in depth prediction.

(2) when the prediction of assessment surface normal, mean error (Mean), median error (Median), as unit of degree.And Predict that normal is less than 11.25,22.5,30 degree of threshold value of pixel percentage.

First group of experiment tests which type of input and is most suitable for prediction network.Our input control is in color, original These three situations of beginning depth, color and original depth.In table 1 intuitively from the point of view of merely enter cromogram and prediction that the two all inputs Effect is similar, but in assessment prediction normal, merely enters the median error of colorIn all being inputted lower than the two It is worth errorAnd also slightly below the two all inputs the median error Rel relative to rendering depth.Therefore, it was demonstrated that should Prediction network only uses cromogram best as effect when input.

It is best that second group of experiment tests depth completion effect when which type of information to carry out global optimization in conjunction with.Experiment point Not Tong Guo the absolute depth (D) in 8 directions of neural network forecast, surface normal (N) and depth derivates (DD), then pass through global optimization Equation completes depth completion using different combinations.We are provided with following several situations and carry out test performance: having unobstructed side Boundary B, wherein testing tri- kinds of situations of DD, N+DD, N respectively.

Predict that the effect of normal N solution depth is best the result shows that utilizing containing Ouluding boundary B, wherein Rel It is 0.117 minimum in D, DD, N+DD intermediate value for 0.088, RMSE.Prediction is tied in addition, experimental result also demonstrates Ouluding boundary The contribution function of fruit, corresponding three rows of the NO of Ouluding boundary B three rows corresponding with YES show after being compared containing screening in table 2 The test result of rib circle is better than the case where without Ouluding boundary, illustrates that prediction Ouluding boundary is being added, it is flat to reduce prediction Face normal facilitates optimizer and obtains more accurately depth information in the influence of global optimization.Generally speaking, surface normal is utilized It is best that postposition depth effect is solved with Ouluding boundary.When input sample size is in 320*240, in NVIDIA 1080Ti GPU Upper average time spends 0.009 second, reaches real time handling requirement.

The working principle of the present embodiment: it is somebody's turn to do the complete real-time three-dimensional method for reconstructing of object based on Kinect2.0, is passed through first Kinect2.0 obtains depth image and color image data, comes with current main-stream filter to depth to the depth image of acquisition Degree image is filtered, removes noise pretreatment, is then aligned cromogram with the depth map after noise reduction, and in Torch Cromogram and depth map are put into designed depth completion network in frame, depth map is lacked using the information of cromogram Part is predicted, and global optimization, the depth map after obtaining completion, then by camera will be carried out in conjunction with original depth-map The ginseng matrix of three kinds of coordinate systems such as image coordinate system, camera coordinates system and world coordinate system production, and in the effect of ginseng matrix Under, it is translated into point cloud data, its normal vector is calculated by a coordinate information for cloud each point, it is right in the point cloud registering after being Matching should be put to prepare, so that the point cloud data of prediction point cloud data and present frame that global data cube obtains is matched Standard will obtain the registration transformation matrix of current point cloud, and according to registration transformation matrix, by the point cloud data of present frame and global number It is merged according to cube, then by light projecting algorithm, new prediction data is gone out by global data Cube computation, for next Frame cloud data registration, while the point cloud data surface under current visual angle is rendered, situation is rebuild in observation in real time, works as acquisition All frame point cloud datas all merged after, global point cloud data are extracted from global data cube, pass through curved surface Algorithm for reconstructing reconstructs body surface, forms complete threedimensional model.

Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, It is still possible to modify the technical solutions described in the foregoing embodiments, or part of technical characteristic is carried out etc. With replacement, all within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in this Within bright protection scope.

Claims

1. a kind of complete real-time three-dimensional method for reconstructing of object based on Kinect2.0, which is characterized in that including data acquisition → depth Spend completion → points cloud processing → ICP point cloud registering → point Yun Ronghe → curve reestablishing.

2. the complete real-time three-dimensional method for reconstructing of a kind of object based on Kinect2.0 according to claim 1, feature exist In: the complete real-time three-dimensional method for reconstructing of the object based on Kinect2.0 is as follows:

(1) data acquire: obtaining depth image and color image data by Kinect2.0, carry out to the depth image of acquisition Cromogram, is then aligned by noise pretreatment with the depth map after noise reduction, and result is inputted in depth degree completion network.

(2) cromogram and depth map: being put into designed depth completion network by depth completion in Torch frame, utilizes The information of cromogram predicts depth map lack part, and will carry out global optimization in conjunction with original depth-map, obtains completion Point cloud data is translated by camera internal reference matrix after depth map afterwards.

(3) points cloud processing: its normal vector is calculated by a coordinate information for cloud each point, corresponding points in the point cloud registering after being With preparing.

(4) ICP point cloud registering: the point cloud data of prediction point cloud data and present frame that global data cube obtains is matched Standard obtains the registration transformation matrix of current point cloud.

(5) point Yun Ronghe: according to the registration transformation matrix obtained before, by the point cloud data of present frame and global data cube Fusion is gone out new prediction data by global data Cube computation then by light projecting algorithm, is used for next frame point cloud number It is rendered according to registration, while to the point cloud data surface under current visual angle, situation is rebuild in observation in real time.

(6) curve reestablishing: after all frame point cloud datas of acquisition have all merged, by global point cloud data from global data cube It is extracted in body, body surface is reconstructed by curve reestablishing algorithm, forms complete threedimensional model.

3. the complete real-time three-dimensional method for reconstructing of a kind of object based on Kinect2.0 according to claim 2, feature exist In: it needs to use image coordinate system, camera coordinates system and world coordinate system in the points cloud processing.

4. the complete real-time three-dimensional method for reconstructing of a kind of object based on Kinect2.0 according to claim 2, feature exist In: the point cloud data in described cloud fusion is merged with global data cube specifically: when initialization, institute in global cube There is voxel value for D=1, W=0, the initial position of camera is set as (1.5,1.5, -0.3), the preferable visual field is obtained convenient for camera, Cube centre coordinate is (1.5,1.5,1.5).After obtaining the i-th frame point cloud data, is incorporated in cube and need to carry out Following steps:

(1) the coordinate V of voxel is obtained first under global coordinate system^g(x, y, z) is then registrated to obtain transformation matrix simultaneously by ICP It is transformed into camera coordinates system from global coordinate system and obtains V (x, y, z)；

(2) the camera coordinates V (x, y, z) for obtaining step (1) is according to camera internal reference matrix conversion to image coordinate system, obtains pair Answer image coordinate (u, v)；

(3) if the depth value D (u, v) at first frame depth image (u, v) is not 0, compare D (u, v) and voxel camera is sat The size of z illustrates that this voxel is closer apart from camera if D (u, v) > z in mark V (x, y, z), in the outside for rebuilding surface；Such as Fruit D (u, v) < z, illustrates that this voxel is farther apart from camera, in the inside for rebuilding surface；

It is as follows to update formula used:

W_i(x, y, z)=min (max weight, W_i-1(x,y,z)+1)

sdf_i=V.z-D_i(u,v)

Wherein W_i(x, y, z) is the weight of voxel in present frame global data cube, W_i-1(x, y, z) is previous frame global data The weight of voxel in cube, max weight are weight limit, are set as 1, D herein_i(x, y, z) is present frame overall situation number According to the distance of voxel in cube to body surface, D_i-1(x, y, z) is voxel in previous frame global data cube to object table The distance in face, d_i(x, y, z) be in the global data cube being calculated according to present frame depth data voxel to object table The distance in face, V.z indicate z-axis coordinate of the voxel under camera coordinates system, D_i(u, v) is indicated at present frame depth image (u, v) Depth value, maxtrunctaion, mintruncation be truncation range.

5. the complete real-time three-dimensional method for reconstructing of a kind of object based on Kinect2.0 according to claim 2, feature exist In: the curve reestablishing is divided into:

(1) global point cloud data are split, obtain voxel grid data.

6. the complete real-time three-dimensional method for reconstructing of a kind of object based on Kinect2.0 according to claim 1, feature exist In the design principle and process of the depth completion network, building including database, select which kind of feature be trained and Network structure and global optimization procedure are finally completed depth information completion.