CN106023316A

CN106023316A - Kinect-based dynamic sequence capture method

Info

Publication number: CN106023316A
Application number: CN201610333612.1A
Authority: CN
Inventors: 李桂清; 林力挺; 郑颖龙
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2016-05-19
Filing date: 2016-05-19
Publication date: 2016-10-12

Abstract

The invention discloses a Kinect-based dynamic sequence capture method. The method comprises the steps of firstly obtaining multi-view depth information of a scene by utilizing a plurality of depth cameras such as Kinect and the like; secondly performing denoising and complementation operations of the depth information; thirdly mapping a depth map onto world coordinates through a transformation matrix to obtain point cloud data; fourthly performing further denoising operation based on point cloud position information after obtaining the point cloud data; fifthly performing registration of the multi-view point cloud data by adopting a rigid registration method; sixthly performing complementation operation on the registered point cloud data; and finally obtaining a human body three-dimensional mesh according to a point cloud data reconstruction surface. The method is suitable for capturing human body action sequences in various environments, has the characteristics of high robustness and real-time property, and has very good promotion and application prospects.

Description

A kind of dynamic sequence method for catching based on Kinect

Technical field

The present invention relates to field of Computer Graphics, refer in particular to a kind of dynamic sequence method for catching based on Kinect.

Background technology

3 d geometric modeling is one of important content of computer graphics, and wherein human body three-dimensional modeling is the most relatively lived One of research field jumped.Recent years, along with popularizing of three-dimensional scanning device, use instrument more easily, to dynamic human body The acquisition of sequence three-dimensional data and modeling, by graphics and the extensive concern of association area.Dynamic human body Series Modeling is in wound The aspects such as meaning industry, medical field, 3D printing have important practical value and development prospect.

But, it was observed that in the method for dynamic human body Series Modeling, there is also some yet unresolved issues, such as, For the fusion problem of the dynamic sequence data that far and near different scale scanning obtains, and the most simultaneously to multiple fortune The problem of dynamic modeling time series, there is presently no good solution.

The present invention is directed to above problem, propose to utilize profile registration technique, the cloud data collecting various visual angles enters Row rapid modeling, builds the template of follow-up required matching.To Large Scale Motion, during building motion sequence, use rigidity Method for registering is fitted, and obtains motion sequence；Little yardstick (such as countenance) is moved, uses method based on priori Carry out model reconstruction.Model for multiple target motion sequence, utilize Kinect framework information that different motion human body is made a distinction. By realizing above thinking, build a human body motion sequence and obtain platform.

Summary of the invention

It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of dynamic sequence seizure side based on Kinect Method, it is adaptable to the seizure of the human action sequence under multiple environment, has high robust, the characteristic of real-time, has well Utilization and extention prospect.

For achieving the above object, technical scheme provided by the present invention is: a kind of dynamic sequence based on Kinect catches Method, comprises the following steps:

1) acquisition of multi views depth information；

2) completion of depth information；

3) sample and map and obtain cloud data；

4) the denoising operation of cloud data；

5) registration of multi views cloud data；

6) completion of cloud data；

7) model meshes is obtained according to Surface Reconstruction from Data Cloud surface.

In step 1) in, carry out the acquisition of multi views depth information, be to use multiple stage Kinect depth camera to carry out deeply The seizure of degree information, uses time-division method to avoid crosstalk each other.

In step 2) in, carry out the completion of depth information, be to use 3 layers of convolutional neural networks that face position is carried out the degree of depth The denoising of information and completion, this convolutional neural networks includes three-layer coil lamination, and use Euler's distance function is as loss function, defeated Entering data for comprising noise, low-quality depth image, output data are the depth image that quality is high.

In step 3) in, sampling and mapping obtains cloud data, is to be obtained by affine matrix computing set in advance, tool Bodily form formula is determined by the position of depth camera.

In step 4) in, the denoising operation of cloud data, is to use two-sided filter to carry out.

In step 5) in, the registration of multi views cloud data, is the registration using ICP algorithm to carry out cloud data.

In step 6) in, the completion of cloud data, is to use Visual Hull algorithm to mend the cavity of cloud data Entirely.

In step 7) in, obtain prototype network according to Surface Reconstruction from Data Cloud surface, be to use Poisson restructing algorithm by some cloud Data trigonometric ratio obtains.

The present invention compared with prior art, has the advantage that and beneficial effect:

1, the present invention proposes the multi-angle depth information human 3d model restructing algorithm of complete set.Compared with tradition Relatively, in the case of having Large Scale Motion and information is not enough to detail is high part has done special process, have more Robustness.

2, based on the seizure carrying out motion sequence in the cheap depth camera, and ordinary PC such as Kinect in real time.Whole Cover system can acquire simply, and cheap.

Accompanying drawing explanation

Fig. 1 is the overall flow figure of the inventive method.

Fig. 2 is the convolutional Neural net being optimized the depth image under having serious loss of learning proposed in the present invention Network schematic diagram.

Detailed description of the invention

Below in conjunction with specific embodiment, the invention will be further described.

Dynamic sequence method for catching based on Kinect of the present invention, images first with multiple stage Kinect even depth Head obtains the multi views depth information of scene；Then the denoising completion work of depth information is carried out；Then by a conversion square Battle array depth map mapped on world coordinates to cloud data；Do into one based on a cloud positional information again after obtaining cloud data The denoising work of step；Then Rigid Registration method is used to carry out the registration of multi views cloud data；To the cloud data after registration Do completion operation；Human body three-dimensional grid is obtained finally according to Surface Reconstruction from Data Cloud surface.As it is shown in figure 1, described in the present embodiment Dynamic sequence method for catching, specifically includes following steps:

1) acquisition of multi views depth information

The core of Kinect skeleton tracking process flow process is an illumination condition regardless of surrounding, can allow The CMOS infrared sensor in the Kinect perception world.This sensor carrys out perception environment by the way of black and white spectrum: black represents Infinity, pure white represent infinitely near.Gray zone correspondence object between black and white is to the physical distance of sensor.It collects visual field model Enclose interior every bit, and form a width and represent the depth image of surrounding.Sensor generates the depth of field with the speed of 30 frames per second Image stream, real-time 3D ground reproduces surrounding.But due to reasons such as equipment costs, the information acquired have a large amount of cavity, Noise, and resolution is relatively low, needs to do extensive work on data de-noising and completion.

2) completion of depth information

The step for depth information that depth camera is acquired do preliminary denoising, completion work.Due to the degree of depth It is relatively low that photographic head obtains depth information resolution, and details is less and there is noise, is needing the part of higher details by tradition Denoising, top sampling method can not obtain preferable effect (such as face part), need introduce priori.Through comparing We select to use convolutional neural networks to unify these two work (as shown in Figure 2) afterwards.The convolutional Neural net that we use Network is mainly made up of three-layer coil lamination, finally uses Euler's distance function as loss function.After training, by fixing resolution The depth image of rate can be obtained by the image after details filling, noise reduction after inputting this network.

3) sample and map and obtain cloud data

Depth image is converted to cloud data by this step.Point cloud i.e. comprises X, Y, Z triaxial coordinate and (may also include normal direction Amount and colouring information) set of point of information is a kind of basic data structure of threedimensional model operation.

4) the denoising operation of cloud data

Step 2) in denoising completion be primarily directed to need the part (face) of higher details for, and to entirety portion Divide the strategy needing other.After comparing, the step in we use bilateral filtering to carry out denoising.

Bilateral filtering (Bilateral filter) is a kind of wave filter that can protect limit denoising.Why can reach this Denoising effect, being because wave filter is to be made up of two functions.One function is to be determined filter coefficient by geometric space distance. Another is determined filter coefficient by pixel value difference.In two-sided filter, the value of output pixel depends on the value of neighborhood territory pixel Weighted array, its form in two dimensional image is:

g (i, j) = \frac{Σ_{k, l} f (k, l) w (i, j, k, l)}{Σ_{k, l} w (i, j, k, l)}

Wherein (k, l) in order to be positioned at, ((i, j) for being positioned at (i, filtered output pixel j), w for g for k, field pixel l) for f (i, j, k, l) be weight coefficient, and it depends on defining territory core

d (i, j, k, l) = \exp (- \frac{{(i - k)}^{2} + {(j - l)}^{2}}{2 σ_{d}^{2}}),

With codomain core

r (i, j, k, l) = \exp (- \frac{{|| f (i, j) - f (k, l) ||}^{2}}{2 σ_{r}^{2}}),

Product

w (i, j, k, j) = \exp (- \frac{{(i - k)}^{2} + {(j - l)}^{2}}{2 σ_{d}^{2}} - \frac{{|| f (i, j) - f (k, l) ||}^{2}}{2 σ_{r}^{2}})

Wherein σ_dAnd σ_rFor default smoothing parameter.

After bilateral filtering, the quality of cloud data can be significantly improved.

5) registration of multi views cloud data

After multi views cloud data is done above operation respectively, find the right of multi views cloud data by registration operation It is right to put, and is converged by the surface point that multi views point cloud comprehensively obtains block mold.The step for we use classics ICP algorithm Carry out.

ICP (Iterative Closest Point) algorithm is mainly used in the registration problems of three-dimensional body.It is appreciated that For: find out the spatial alternation of two point sets from the three-dimensional data point set of different coordinates for given two, with coupling.ICP algorithm Essence be, Optimum Matching algorithm based on method of least square.It repeats " determine corresponding point---calculate optimum rigidity and become Change " process, until certain convergence criterion is met.Utilize ICP algorithm to carry out the purpose of Rigid Registration, be to find Source Rotation R between point set P and Target point set Q and translation T conversion, meet optimal conditions.

[1] Source point set P takes a P_i∈P；

[2] corresponding point Q in Target point set Q are calculated_i∈ Q, makes

[3] spin matrix R and translation vector T is calculated so that

[4] P after conversion is calculated¹,

P_i∈P；

[5] the Source point after conversion and distance d of Target point are calculated¹

d^{1} = \frac{1}{n} Σ_{i = 1}^{n} {|| P_{i}^{1} - Q_{i} ||}^{2};

[6] judge whether to meet condition of convergence d¹＜ τ or set iterations as 1,

If it is, directly terminate；

Otherwise, next step is entered

Kth time:

[1] Source point set P takes a little

[2] corresponding point in Target point set Q are calculatedMake

[3] spin matrix R is calculated^kWith translation vector T^kSo that

\underset{R^{k}, T^{k}}{argmin} Σ_{i = 1}^{n} {|| R^{k} P_{i}^{k} + T^{k} - Q_{i}^{k} ||}^{2}

[4] P after conversion is calculated^k+1,

P_{i}^{k + 1} = R^{k} P_{i}^{k} + T^{k}, P_{i}^{k} &Element; P^{k};

[5] the Source point after conversion and distance d of Target point are calculated^k+1

d^{k + 1} = \frac{1}{n} Σ_{i = 1}^{n} {|| P_{i}^{k + 1} - Q_{i}^{k} ||}^{2}

[6] judge whether to meet following either condition:

·d^k+1＜ τ；

·d^k+1-d^k＜ τ；

Iterations k is more than the maximum iterations preset.

If it is, directly terminate；

Otherwise, would be repeated for+1 iteration of kth.

6) completion of cloud data

Due to Large Scale Motion or sensing apparatus itself, it is likely present the disappearance of cloud data, mainly shows Cavity for model surface.For cavity, we use Visual hull algorithm to carry out completion.Visual hull algorithm due to Can robustly obtain the shape compacted and the most popular, especially apply more in without template registration technique.Its principle is logical Cross multi views and estimate the convex closure region at object place, this convex closure approximate representation is become body surface.This step is during registrating Carry out.

7) model meshes is obtained according to Surface Reconstruction from Data Cloud surface

The step for cloud data carried out surface reconstruction obtain the grid data of model.We the step for make Using Poisson algorithm for reconstructing, this algorithm belongs to implicit expression reconstructing method, can produce the most smooth three-dimensional grid and can process number According to noise and incomplete problem.

After completing the step for, i.e. can get final anthropometric dummy.The method that the present invention uses is ensureing quality On the basis of, all employ the algorithm that efficiency is higher, it is ensured that the real-time of algorithm, be worthy to be popularized.

Embodiment described above is only the preferred embodiments of the invention, not limits the practical range of the present invention with this, therefore The change that all shapes according to the present invention, principle are made, all should contain within the scope of the present invention.

Claims

1. a dynamic sequence method for catching based on Kinect, it is characterised in that comprise the following steps:

1) acquisition of multi views depth information；

2) completion of depth information；

3) sample and map and obtain cloud data；

4) the denoising operation of cloud data；

5) registration of multi views cloud data；

6) completion of cloud data；

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 1) In, carry out the acquisition of multi views depth information, be the seizure using multiple stage Kinect depth camera to carry out depth information, use Time-division method avoids crosstalk each other.

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 2) In, carry out the completion of depth information, be denoising and the benefit using 3 layers of convolutional neural networks that face position carries out depth information Entirely, this convolutional neural networks includes three-layer coil lamination, uses Euler's distance function to make an uproar for comprising as loss function, input data Sound, low-quality depth image, output data are the depth image that quality is high.

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 3) In, sampling and mapping obtains cloud data, is to be obtained by affine matrix computing set in advance, and concrete form is by depth camera The position of head determines.

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 4) In, the denoising operation of cloud data, is to use two-sided filter to carry out.

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 5) In, the registration of multi views cloud data, is the registration using ICP algorithm to carry out cloud data.

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 6) In, the completion of cloud data, is to use Visual Hull algorithm that the cavity of cloud data is carried out completion.

A kind of dynamic sequence method for catching based on Kinect the most according to claim 1, it is characterised in that: in step 7) In, obtain prototype network according to Surface Reconstruction from Data Cloud surface, be to use Poisson restructing algorithm to be obtained by cloud data trigonometric ratio.