CN103106688B

CN103106688B - Based on the indoor method for reconstructing three-dimensional scene of double-deck method for registering

Info

Publication number: CN103106688B
Application number: CN201310053829.3A
Authority: CN
Inventors: 贾松敏; 郭兵; 王可; 李秀智
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2013-02-20
Filing date: 2013-02-20
Publication date: 2016-04-27
Anticipated expiration: 2033-02-20
Also published as: CN103106688A

Abstract

The invention belongs to the crossing domain of computer vision and intelligent robot, relate to a kind of method for reconstructing of the indoor scene on a large scale based on double-deck method for registering.Solve that existing indoor scene method for reconstructing equipment needed thereby is expensive, computation complexity is high and the problem of poor real.The method comprises: Kinect demarcates, SURF feature point extraction with mate, feature point pairs to the right mapping of three dimensions point, based on three dimensions point bilayer registration, the more new scene of RANSAC and ICP method.The present invention adopts Kinect to obtain environmental data, based on RANSAC and ICP, proposes double-deck method for registering, realizes economy indoor 3 D scene rebuilding fast, effectively improves real-time and the reconstruction precision of reconstruction algorithm.The method is applicable to service robot field and other computer vision fields relevant with 3 D scene rebuilding.

Description

Based on the indoor method for reconstructing three-dimensional scene of double-deck method for registering

Technical field

The invention belongs to the crossing domain of computer vision and intelligent robot, relate to a kind of indoor environment three-dimensional reconstruction, particularly relate to a kind of method for reconstructing of the indoor scene on a large scale based on double-deck method for registering.

Background technology

In recent years, along with the development of infotech, constantly increase the demand of 3 D scene rebuilding technology, economy fast indoor method for reconstructing three-dimensional scene becomes numerous areas crucial technical problem urgently to be resolved hurrily.In home-services robot field, the Intelligent home service robot market demand that aging population causes is day by day strong.At present, on market, most of service robot because cannot providing single simple service by perception three-dimensional environment under specific scene, and this problem seriously governs the development of home-services robot industry.

3 D scene rebuilding is one of the study hotspot problem in the fields such as computer vision, intelligent robot, virtual reality.Traditional three-dimensional rebuilding method can be divided into two classes according to the mode difference obtaining three-dimensional data: based on the three-dimensional rebuilding method of laser scanner technique and the three-dimensional rebuilding method of view-based access control model.Still larger limitation is deposited for the indoor existing method of large-scale 3 D scene rebuilding problem.

Based on the three-dimensional rebuilding method of laser, obtain the depth data of scene or range image by laser scanner, utilize the registration of depth data to realize aliging of frame data and global data.So only be that of obtaining the geological information of three-dimensional scenic, need the texture information by increasing a video camera acquisition scene and be mapped to reconstruct on geometric model, this will solve one by the mapping problems of photo to geometry.Although the three-dimensional rebuilding method based on laser scanning can obtain the 3-D geometric model of degree of precision, but the difficulty of texture is larger, thus it is more difficult to generate realistic three-dimensional model, laser equipment price is high simultaneously, generally be applied in the fields such as digital prospect, topographic(al) reconnaissance, digital museum, be difficult to popularize at large-scale civil area.

The three-dimensional rebuilding method of view-based access control model, namely computer vision methods is adopted to carry out three-dimensional object model reconstruction, refer to and utilize digital camera as imageing sensor, the technology such as integrated use image procossing vision calculating carry out three-dimensional non-contact measurement, obtain the three-dimensional information of object with computer program.It is advantageous that and do not limit by body form, rebuild speed, can realize full-automatic or semi-automatic modeling etc., be an important development direction of three-dimensional reconstruction.According to the difference using video camera number, monocular vision method, binocular vision method, trinocular vision method or multi-vision visual method can be divided into.Monocular vision method uses a video camera to carry out three-dimensional reconstruction, and derive depth information by the two dimensional character of image, these two dimensional characters comprise light and shade, texture, focus, profile etc.Its advantage is that device structure is simple, uses single width or several images just can reconstruct object dimensional model.But condition more satisfactoryization usually required, practical situations is not very desirable, and reconstruction effect is general.Binocular vision method, also claims stereo vision method, and binocular parallax information is converted to depth information.Its advantage is that method is ripe, can stably obtain good reconstructed results; Unfortunately operand is still bigger than normal, and rebuilds successful reduction when parallax range is larger.The basic thought of multi-vision visual method provides extra constraint by increasing video camera, avoids the problem in binocular vision with this.Its advantage rebuilds effect to be better than binocular vision method, but device structure is more complicated, and cost is higher, also more difficult in control.

In recent years, along with the development of RGBD (the colored and degree of depth) sensor technology, the Kinect that such as Microsoft releases, for 3 D scene rebuilding provides new scheme.Research at present about Kinect three-dimensional rebuilding method achieves some achievements in the three-dimensional reconstruction of single object, and the research in indoor scene reconstruction is still in the starting stage.The people such as RichardA.Newcombe adopt Kinect to obtain environmental information, utilize ICP method, realize the three-dimensional reconstruction of environmental information.Because the method realizes in GPU hardware, higher to GPU hardware configuration requirement, and be subject to the restriction of GPU internal memory, the scope of 3m × 3m × 3m can only be rebuild, the demand that indoor three-dimensional scenic on a large scale creates cannot be met.

Summary of the invention

In order to overcome Problems existing in above-mentioned three-dimensional rebuilding method, the invention provides a kind of economy fast, based on the indoor method for reconstructing three-dimensional scene of double-deck method for registering.

The technical solution used in the present invention is as follows:

Kinect is utilized to obtain RGB and the deep image information of environment, by extracting the SURF unique point of RGB image, using Feature Points Matching information as associated data, in conjunction with random sampling consistance (RandomsampleConsensus, RANSAC) method and the most point of proximity (Iterativeclosestpoint of iteration, ICP) method, proposes the double-deck method for registering of a kind of three-dimensional data.The method mainly comprises following content: the first, utilizes RANSAC method to obtain the rotation translation transformation matrix of adjacent two frames (Frame-To-Frame) three-dimensional data, accumulates the relative position change that this result obtains Kinect.By setting threshold value, increase by frame data when Kinect change in location exceedes a certain size, this data setting is key frame (KeyFrame) and completes first registration; The second, utilize ICP method to obtain the precise transformation matrix of adjacent key frame (KeyFrame-To-KeyFrame), complete accuracy registration.Transformation matrix between the KeyFrame data utilizing double-deck method for registering to obtain and adjacent KeyFrame, completes the reconstruction of three-dimensional environment.

Based on the indoor method for reconstructing three-dimensional scene of double-deck method for registering, it comprises the following steps:

Step one, carries out Kinect demarcation.

In image measurement process and machine vision applications, for determining three-dimensional geometry position and its mutual relationship in the picture between corresponding point of certain point of space object surface, must set up the geometric model of camera imaging, these geometric model parameters form camera parameter.These parameters must just can obtain with calculating by experiment in most conditions, and this process solving parameter is just referred to as camera calibration (or camera calibration).In image measurement and machine vision applications, the demarcation of camera parameters is unusual the key link, and the precision of its calibration result and stability directly affect the accuracy of net result.

Kinect is the XBOX360 body sense periphery peripheral hardware that a kind of Microsoft issues, and provides the degree of depth and colored (RGB) image information simultaneously.Depth information utilizes thermal camera to adopt active mode to obtain, each frame is made up of 640*480 pixel, and investigation depth scope is 0.5 ~ 4.0 meter, and regulation of longitudinal angle scope is 43 °, lateral angles scope is 57 °, can obtain the depth information of object within the scope of 6 square metres.Meanwhile, the RGB camera of a 640*480 pixel Kinect is equipped with.There is provided simultaneously RGB information and this characteristic of depth information most important for three-dimensional reconstruction, facilitate depth information to align with RGB information.

The calibrating parameters of Kinect sensor comprises thermal camera (depth transducer) internal reference, RGB video camera internal reference and three parts of the outer ginseng between thermal camera and RGB video camera.The present invention adopts the plane reference method of Zhang Zhengyou to demarcate RGB video camera.Thermal camera intrinsic parameter and outer between thermal camera and RGB video camera join the data using official of Microsoft to provide.

Step 2, the extraction of unique point with mate.

Feature extraction: by analyzing image information, determines whether each point in image belongs to a characteristics of image.The result of feature extraction is that the point on image is divided into different subsets, and these subsets often belong to isolated point, continuous print curve or continuous print region.SURF (Speeded-UpRobustFeatures) unique point is the most popular method of current computed image feature, the feature that the method is extracted has the performance of Scale invariant, invariable rotary, has unchangeability to illumination variation and affine, perspective transform simultaneously.SURF, in multiplicity, uniqueness, robustness 3, all surmounts or the close congenic method in the past proposed, and in computing velocity, has obvious advantage.

The present invention extracts the SURF unique point of RGB image, comprises feature point detection and unique point describes two parts.Adopt the nearest neighbor algorithm based on Euclidean distance to carry out Feature Points Matching, the data structure utilizing K-D to set is searched for, and determines whether to accept this coupling right according to the distance ratio of nearest two unique points.

Step 3, images match point maps to three-dimensional coordinate.

Set up the transformational relation between the plane of delineation and space three-dimensional point coordinate according to the calibration model of Kinect, determine the projection model of three dimensions point to the plane of delineation, the function representation with below:

u＝π(p)

Wherein, p is three dimensions point, and u is plane of delineation coordinate, and π (p) representation space three-dimensional point is to the mapping function of the plane of delineation.Obtain the corresponding point pair at the plane of delineation by the coupling of image characteristic point, utilize three dimensions point to obtain three dimensions point coordinate corresponding to image characteristic point to the projection model of the plane of delineation, obtain the three-dimensional point pair that two frame data are corresponding further.

Step 4, the double-deck registration of the three dimensions point based on RANSAC and ICP.

Registration refers to the coupling of the different images geographic coordinate obtained with different imaging means in the same area.Comprise geometric correction, projective transformation and common scale chi three kinds to process.Registration result is expressed as matrix:

T _cw＝[R _cw,t _cw]

Wherein, subscript " cw " expression is tied to current Kinect coordinate system from world coordinates, R _cwrepresent that world coordinates is tied to the rotation matrix of current coordinate system, t _cwrepresent that world coordinates is tied to the translation of current coordinate system.T _cwdescribe Kinect and rotate translation relation under world coordinate system.P is put under Kinect coordinate system _cto world coordinates p _wtransformation relation be:

p _c＝T _cwp _w

The problem high for three dimensional point cloud method for registering complexity, calculated amount is large, the present invention is based on RANSAC and ICP method, proposes a kind of double-deck method for registering, is made up of first registration and accuracy registration two parts.First registration adopts RANSAC, to obtain KeyFrame and relative transform matrix; Adopt ICP to realize accuracy registration, the basis of first registration realizes the alignment of three-dimensional data points and provides three-dimension varying information accurately for upgrading three-dimensional scenic.

Step 5, scene update.

Each the frame three-dimensional data obtained by Kinect approximately comprises 250,000 point.There is very large information redundancy in adjacent two frame data, in order to improve the sharpness of reconstructed results, to the description that the three-dimensional map generated provides an essence to want, reduce the burden of system in internal memory, the present invention adopts KeyFrame Data Update three-dimensional scenic.

The invention has the beneficial effects as follows: adopt Kinect to obtain environmental data, for the feature of Kinect sensor, propose a kind of double-deck method for registering based on RANSAC and ICP, realize large-scale quick indoor 3 D scene rebuilding.Efficiently solve three-dimensional rebuilding method cost and real time problems, improve reconstruction precision.

Accompanying drawing explanation

Fig. 1 is the indoor method for reconstructing three-dimensional scene block diagram based on Kinect;

Fig. 2 is Kinect coordinate system schematic diagram;

Fig. 3 is the double-deck method for registering process flow diagram based on RANSAC and ICP;

Fig. 4 is the actual environment schematic diagram that application the present invention creates three-dimensional scenic: in figure, and (a) is experiment real scene, the two-dimensional geometry schematic diagram that (b) is experimental situation;

Fig. 5 is the result schematic diagram that application the present invention creates three-dimensional scenic.

Embodiment

The present invention is described in further detail by reference to the accompanying drawings.As shown in Figure 1, the present invention includes following step:

Step one, carry out Kinect demarcation, concrete grammar is as follows:

(1) a chessboard template is printed.The present invention adopts an A4 paper, chessboard be spaced apart 0.25cm.

(2) from multiple angle shot chessboard.During shooting, should chessboard be allowed to take screen as far as possible, and ensure that each angle of chessboard is in screen, altogether shooting 8 template picture.

(3) unique point in image is detected, i.e. each black point of crossing of chessboard.

(4) parameter that Kinect demarcates is obtained.

The internal reference matrix K of thermal camera _ir:

K_{i r} = [\begin{matrix} f_{u I R} & 0 & u_{I R} \\ 0 & f_{v I R} & v_{I R} \\ 0 & 0 & 1 \end{matrix}]

Wherein, (f _uIR, f _vIR) be the focal length of thermal camera, value (5,5), (u _iR, v _iR) be thermal camera as planar central coordinate, value (320,240).

The Intrinsic Matrix K of RGB video camera _c:

K_{c} = [\begin{matrix} f_{u} & 0 & u_{0} \\ 0 & f_{v} & v_{0} \\ 0 & 0 & 1 \end{matrix}]

Wherein, (f _u, f _v) be the focal length of RGB video camera, (u ₀, v ₀) be RGB camera image plane centre coordinate.

External parameter between thermal camera and RGB video camera is:

T＝[R _IRc,t _IRc]

Wherein, R _iRcfor rotation matrix, t _iRcfor translation vector, directly use the parameter that official of Microsoft provides:

R_{I R c} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]

t _IRc＝[0.07500] ^T

In the present invention, Kinect coordinate system as shown in Figure 2, is upwards y-axis positive dirction, is forward z-axis positive dirction, is to the right x-axis positive dirction.The initial point position of Kinect is set as world coordinate system initial point, and X, Y, the Z-direction of world coordinate system are identical with the x, y, z direction of Kinect initial point position.

Step 2, the extraction of unique point with mate, method is as follows:

(1) integral image is obtained.Integral image refers to the Cumulate Sum calculating all pixels of given gray level image, for the integration I (X) of certain some X=(x, y) in image is:

I_{Σ} (X) = Σ_{i = 0}^{i \leq x} Σ_{j = 0}^{j \leq y} I (i, j)

Wherein, I (i, j) represents that in image, pixel coordinate is the pixel value of (i, j).

In integral image, calculate the gray-scale value sum of a rectangular area with 3 plus and minus calculations, have nothing to do with the area of rectangle.Can see in step below, the convolution mask used in SURF feature point extraction is frame-shaped template, substantially increases operation efficiency.

(2) approximate Hessian matrix H is asked for _approx.For certain some X=(x, y) in image I, the Hessian matrix H (X, s) on the s yardstick of X point is defined as:

H (X, s) = [\begin{matrix} L_{x x} (X, s) & L_{x y} (X, s) \\ L_{x y} (X, s) & L_{y y} (X, s) \end{matrix}]

Wherein, L _xx(X, s), L _xy(X, s), L _yy(X, s) represents the convolution of Gauss's second-order partial differential coefficient at X place and image I.Square frame filtering is used to be similar to the second order Gauss filtering replaced in Hessian matrix.Frame-shaped Filtering Template is respectively D with the value after image convolution _xx, D _yy, D _xy, replace L with them further _xx, L _yy, L _xyobtain approximate Hessian matrix H _approx, its determinant is:

det(H _approx)＝D _xxD _yy-(wD _xy) ²

Wherein, w is weight coefficient, and in enforcement of the present invention, value is 0.9.

(3) location feature point.The feature point detection of SURF based on Hessian matrix, according to the local maximum location feature point position of Hessian matrix determinant.

With the frame-shaped wave filter of different size, process is carried out to original image and obtain yardstick image pyramid, according to H _approxobtain the extreme value of scalogram picture at (X, s) place.

Use frame-shaped wave filter to build metric space, in every single order, select the scalogram picture of 4 layers, the structure parameter on 4 rank is in table 1.

Size (the unit: s) of 16 templates in quadravalence before table 1 metric space

Use H _approxmatrix obtains extreme value, and in 3 dimension (X, s) metric spaces, the regional area to each 3 × 3 × 3 carries out non-maxima suppression (retain maximum value, other values are set to 0).Point response being greater than 26 neighborhood values elects unique point as.Utilize quadratic fit function accurately to locate unique point, fitting function D (X) is:

D (X) = D + \frac{\partial D^{T}}{\partial X} X + \frac{1}{2} X^{T} \frac{\partial D}{\partial X^{2}} X

So far, the position of unique point, dimensional information (X, s) is obtained.

(4) direction character of unique point is determined.With Haar wavelet filter, circle shaped neighborhood region is processed, obtain the response in x, y direction in this neighborhood corresponding to each point.Choose the Gaussian function (σ gets 2s, and s is the yardstick of this Feature point correspondence) centered by unique point, be weighted these responses, the vector that search length is maximum, its direction is the direction corresponding to this unique point.

(5) construction feature description vectors.Centered by unique point, determine a foursquare neighborhood, the length of side gets 20s, is the y-axis direction of this neighborhood unique point direction setting.Square area is divided into 4 × 4 sub regions, carries out processing (Haar small echo template size is 2s × 2s) with Haar wavelet filter in each subregion.Use d _xrepresent the little wave response of Haar of horizontal direction, use d _yrepresent the little wave response of Haar of vertical direction.For all d _x, d _yin order to the Gaussian function weighting centered by unique point, the σ of this Gaussian function is 3.3s.In every sub regions respectively to d _x, d _y, | d _x|, | d _y| summation, obtains 4 dimensional vector V (Σ d _x, Σ d _y, Σ | d _x|, Σ | d _y|).The vector of 4 × 4 sub regions is coupled together and just obtains one 64 vector tieed up, this vector has rotation, scale invariability, then after being normalized, has illumination invariant.So far, the proper vector of Expressive Features point is obtained.

(6) characteristic matching.Adopt the nearest neighbor method based on Euclidean distance, utilize K-D to set to search in image to be matched, find the first two unique point nearest with the unique point Euclidean distance in benchmark image, if minimum distance is less than the proportion threshold value (0.7) of setting except the value closely obtained in proper order, then accept this pair match point.

Step 3, images match point maps to three-dimensional coordinate.

According to calibrating parameters, Kinect depth image and RGB image mid point mapping calculation method as follows:

1 p=(x in depth image _d, y _d) coordinate P under Kinect coordinate system _3D=(x, y, z) is:

\{\begin{matrix} P_{3 D} . x = (x_{d} - u_{I R}) \times P_{3 D} . z / f_{u I R} \\ P_{3 D} . y = (y_{d} - v_{I R}) \times P_{3 D} . z / f_{v I R} \\ P_{3 D} . z = d e p t h (x_{d}, y_{d}) \end{matrix}

Wherein, P _3D.x, P _3Dand P .y _3D.z P is respectively _3Dthe coordinate x of=(x, y, z), y, z, depth (x _d, y _d) represent the depth value of depth image mid point p.

So, derive the 3D coordinate corresponding to pixel of RGB image, and then obtain the coordinate (x in RGB image _rgb, y _rgb).Computing formula is as follows:

\{\begin{matrix} x_{r g b} = (P_{3 D}^{'} . x * f_{u} / P_{3 D}^{'} . z) + u_{0} \\ y_{r g b} = (P_{3 D}^{'} . y * f_{v} / P_{3 D}^{'} . z) + v_{0} \end{matrix}

Wherein, P ' _3D ^t=R _iRc* P _3D ^t+ t _iRc.

According to above-mentioned conversion relation, the matching double points obtained is converted to three dimensions point pair in step 2.

Step 4, the double-deck registration of the three dimensions point based on RANSAC and ICP, method as shown in Figure 3, comprises the following steps:

(1) first registration.The corresponding point that Feature Points Matching obtains are to there is larger error hiding.Remove the three dimensions point pair of error hiding at first registration stage application RANSAC, find by iteration the most imperial palace point set that meets transformation model and estimate transformation matrix T.KeyFrame is to each relative transform matrix of current data in accumulation, obtains the transformation matrix of the relative KeyFrame of current Kinect.Go out translational movement and the anglec of rotation modulus value of Kinect according to this matrix computations, compare with the threshold value of setting, judge whether that choosing this frame is KeyFrame.In embodiment, the threshold value of translational movement is set to 0.4, and angle threshold is 40 degree.

The detailed process of RANSAC is as follows:

1) from the initial N from reference point collection A and point set B subject to registration to the 7 pairs of data of random selecting three-dimensional matching double points;

2) basis matrix is utilized to solve minimal configuration 7 methods, by the transformation matrix T of the 7 pairs of data Calculation Basis point sets chosen and point set data subject to registration ^aB;

3) transformation matrix T is utilized ^aBby the unique point set of image subject to registration a remaining N-7 three-dimensional point in (expression comprises the point set B subject to registration of N number of point) under transforming to reference point cloud coordinate system;

4) the point set P ' after conversion is calculated _n-7with benchmark point set between error of coordinate;

5) from N to finding out the feature point pairs number of error of coordinate in certain threshold value matching double points, be designated as m;

6) 1 is repeated) ~ 5) step n (n is set by user, and in the present embodiment, iterations is set as 50) is secondary, make m value obtain maximum set for most imperial palace point set, be interior point, all the other N-m are Mismatching point, are exterior point.Most imperial palace point set is utilized to estimate the least square solution of transformation model, as the transformation matrix T of current adjacent two frame data.

(2) accuracy registration.Obtain KeyFrame and relative transform matrix thereof through first registration, the present invention adopts the precise transformation matrix of ICP calculating K eyFrame, using first registration result as prior transformation, tries to achieve the Accurate translation matrix of KeyFrame-To-KeyFrame.

In the depth map of Kinect, pixel value is the region of 0 is invalid metrical information.For convenience of describe obtain effective information in depth map and invalid information, be defined as follows function for above-mentioned phenomenon:

Wherein, X is picture planimetric coordinates point.

In order to obtain k moment Kinect pose according to the transformational relation of Kinect coordinate system and world coordinate system, set up following energy function:

E = m i n \underset{p^{k} &Element; Ω}{Σ} | T_{c w}^{k} p_{w} - p^{k} |

Wherein, p _wfor the point under world coordinate system, p ^kfor the point under current coordinate system, Ω is the set in the k moment plane of delineation with the pixel of significant depth value, that is:

Ω={ p ^k| u=π (p ^k) and M (u)=1}

Above-mentioned set up energy function is the mathematical description of three-dimensional ICP method.In ICP algorithm, by minimization energy function, obtain at k moment Kinect pose under world coordinate system.Usual ICP algorithm is under the prerequisite of supposition relative pose, constantly sets up the corresponding relation between some cloud, and by optimizing corresponding point error iterative.Therefore, in the solution procedure of ICP algorithm, initial relative pose setting plays vital effect, and inappropriate initial pose will make ICP algorithm be absorbed in local optimum, cannot obtain correct result.Based on the ICP algorithm of unordered some cloud, along with an increase for cloud quantity, the space complexity of algorithm and time complexity will significantly improve, and greatly reduce the execution efficiency of this algorithm.Therefore initial relative pose is the prerequisite setting up corresponding relation between some cloud, plays vital effect in the iterative process of ICP method.The Relative Transformation that first registration obtains by this method as initial relative pose, to obtain the optimal estimation of current KeyFrame.

Suppose in the side-play amount of k-1 moment and k moment Kinect be then

T_{c w}^{k} = T_{i n c}^{k} T_{c w}^{k - 1}

Kinect is at x, y, and the rotation amount (α, beta, gamma) on z-axis direction and translational movement are in three directions (t _x, t _y, t _z).When enough hour of above-mentioned two vectors, launch according to first order Taylor formula, make x=(α, beta, gamma, t _x, t _y, t _z), then:

\begin{matrix} T_{i n c}^{k} = \exp (x) \\ = [\begin{matrix} 1 & γ & - β \\ - γ & 1 & α \\ β & - α & 1 \end{matrix}] [\begin{matrix} t_{x} \\ t_{y} \\ t_{z} \end{matrix}] \\ = R [R_{i n c} | t_{i n c}] \end{matrix}

For the kth moment obtain spatial point world coordinates be by this spot projection under the coordinate system of Kinect during the k-1 moment, so energy function is transformed to:

\begin{matrix} E = \min \underset{p^{k} &Element; Ω}{Σ} | | T_{c w}^{k} p_{w} - p^{k} | | \\ = \min \underset{p^{k} &Element; Ω}{Σ} | | T_{i n c}^{k} T_{c w}^{k - 1} p_{w} - p^{k} | | \\ = \min \underset{p^{k} &Element; Ω}{Σ} | | T_{i n c}^{k} p_{w}^{k - 1} - p^{k} | | \end{matrix}

Ω={ p ^k| u=π (p ^k) and M (u)=1}

Wherein, p _wand p ^kfor corresponding point, for p _wcoordinate under kth-1 moment camera coordinates system.

By

T_{i n c}^{k} p_{w}^{k - 1} = R_{i n c} p_{w}^{k - 1} + t_{i n c} = G (p_{w}^{k - 1}) x + p_{w}^{k - 1}

Obtain the final expression of energy function:

\underset{x &Element; s e (3)}{m i n} \underset{Ω}{Σ} | | G (p_{w}^{k - 1}) x + p_{w}^{k - 1} - p^{k} | |

Ω={ p ^k| u=π (p ^k) and M (u)=1}

Wherein, serve as reasons the antisymmetric matrix formed.The threshold value of setting energy function is 0.05, utilizes Cholesky to decompose and obtains hexa-atomic group of solution x=(α, beta, gamma, t _x, t _y, t _z), be mapped to special European group (rigid motion group) SE (3) space in Lie group, and current Kinect pose can be obtained in conjunction with the pose of k-1 moment Kinect.

Step 5, scene update.

The renewal of scene is divided into two kinds of situations, and one carries out scene update at first time, now the position of Kinect is set as the initial point of world coordinate system, and adds the contextual data of current acquisition; Another kind is newly-increased frame KeyFrame data, according to formula current newly-increased KeyFrame data are transformed in world coordinate system, complete the renewal of contextual data.

Provide the experiment embodiment that application the method for the invention carries out three-dimensional environment establishment under indoor true environment below.

The depth camera that experiment adopts be Kinect-XBOX360, RGB image resolution ratio is 640 × 480, and most high frame rate is 30fps.As shown in Figure 4, Fig. 4 (a) is the real scene of experiment to indoor environment, the two-dimensional geometry schematic diagram that Fig. 4 (b) is experimental situation.Hand-held Kinect during experiment, from starting point place, stops by behind the fixed route walking place of reaching home, synchronously increases progressively generation global map in the process of walking, the scope of 9m × 9m in the map covering chamber of establishment.Fig. 5 is reconstructed results schematic diagram.

Experimental result shows, the method for the invention can be used for indoor three-dimensional scenic on a large scale and creates, and has higher precision and good real-time.

The above, be only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention, and all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims

1., based on an indoor method for reconstructing three-dimensional scene for double-deck method for registering, it is characterized in that comprising the following steps:

Step one, carry out Kinect demarcation, method is as follows:

(1) a chessboard template is printed;

(2) from multiple angle shot chessboard;

(3) unique point in image is detected, i.e. each black point of crossing of chessboard;

(4) parameter that Kinect demarcates is obtained:

The internal reference matrix K of thermal camera _ir:

K_{i r} = [\begin{matrix} f_{u I R} & 0 & u_{I R} \\ 0 & f_{v I R} & v_{I R} \\ 0 & 0 & 1 \end{matrix}]

Wherein, (f _uIR, f _vIR) be the focal length of thermal camera, value (5,5), (u _iR, v _iR) be thermal camera as planar central coordinate, value (320,240);

The Intrinsic Matrix K of RGB video camera _c:

K_{c} = [\begin{matrix} f_{u} & 0 & u_{0} \\ 0 & f_{v} & v_{0} \\ 0 & 0 & 1 \end{matrix}]

Wherein, (f _u, f _v) be the focal length of RGB video camera, (u ₀, v ₀) be RGB camera image plane centre coordinate;

External parameter between thermal camera and RGB video camera is:

T＝[R _IRc,t _IRc]

R_{I R c} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]

t _IRc＝[0.07500] ^T

Kinect coordinate system is upwards y-axis positive dirction, is forward z-axis positive dirction, is to the right x-axis positive dirction; The initial point position of Kinect is set as world coordinate system initial point, and X, Y, the Z-direction of world coordinate system are identical with the x, y, z direction of Kinect initial point position;

Step 2, the extraction of unique point with mate, method is as follows:

(1) integral image is obtained: for the integration I (X) of certain some X=(x, y) in image be:

I_{Σ} (X) = Σ_{i = 0}^{x} Σ_{j = 0}^{y} I (i, j)

Wherein, I (i, j) represents that in image, pixel coordinate is the pixel value of (i, j);

In integral image, calculate the gray-scale value sum of a rectangular area with 3 plus and minus calculations, have nothing to do with the area of rectangle;

(2) approximate Hessian matrix H is asked for _approx: for certain some X=(x, y) in image I, the essian matrix H (X, s) on the s yardstick of X point is defined as:

H (X, s) = [\begin{matrix} L_{x x} (X, s) & L_{x y} (X, s) \\ L_{x y} (X, s) & L_{y y} (X, s) \end{matrix}]

Wherein, L _xx(X, s), L _xy(X, s), L _yy(X, s) represents the convolution of Gauss's second-order partial differential coefficient at X place and image I; Use square frame filtering to be similar to the second order Gauss filtering replaced in Hessian matrix, frame-shaped Filtering Template is respectively D with the value after image convolution _xx, D _yy, D _xy, replace L with them further _xx, L _yy, L _xyobtain approximate Hessian matrix H _approx, its determinant is:

det(H _approx)＝D _xxD _yy-(wD _xy) ²

Wherein, w is weight coefficient;

(3) feature point detection of location feature point: SURF is based on Hessian matrix, according to the local maximum location feature point position of Hessian matrix determinant;

With the frame-shaped wave filter of different size, process is carried out to original image and obtain yardstick image pyramid, according to H _approxobtain the extreme value of scalogram picture at (X, s) place;

Use frame-shaped wave filter to build metric space, in every single order, select the scalogram picture of 4 layers, use H _approxmatrix obtains extreme value, and in 3 dimension (X, s) metric spaces, the regional area to each 3 × 3 × 3 carries out non-maxima suppression; Point response being greater than 26 neighborhood values elects unique point as; Utilize quadratic fit function accurately to locate unique point, fitting function D (X) is:

D (X) = D + \frac{\partial D^{T}}{\partial X} X + \frac{1}{2} X^{T} \frac{\partial D}{\partial X^{2}} X

Thus obtain position, the dimensional information (X, s) of unique point;

(4) determine the direction character of unique point: with Haar wavelet filter, circle shaped neighborhood region is processed, obtain the response in x, y direction in this neighborhood corresponding to each point; Choose the Gaussian function centered by unique point, σ gets 2s, and s is the yardstick of this Feature point correspondence, and be weighted these responses, the vector that search length is maximum, its direction is the direction corresponding to this unique point;

(5) construction feature description vectors: determine a foursquare neighborhood centered by unique point, the length of side gets 20s is the y-axis direction of this neighborhood unique point direction setting; Square area is divided into 4 × 4 sub regions, processes in each subregion with Haar wavelet filter, Haar small echo template size is 2s × 2s; Use d _xrepresent the little wave response of Haar of horizontal direction, use d _yrepresent the little wave response of Haar of vertical direction, for all d _x, d _yin order to the Gaussian function weighting centered by unique point, the σ of this Gaussian function is 3.3s; In every sub regions respectively to d _x, d _y, | d _x|, | d _y| summation, obtains 4 dimensional vector V (Σ d _x, Σ d _y, Σ | d _x|, Σ | d _y|), the vector of 4 × 4 sub regions is coupled together and just obtains one 64 vector tieed up, this vector has rotation, scale invariability, after normalization, has illumination invariant; This vector is the proper vector of Expressive Features point;

(6) characteristic matching: adopt the nearest neighbor method based on Euclidean distance, utilize K-D to set to search in image to be matched, find the first two unique point nearest with the unique point Euclidean distance in benchmark image, if minimum distance is less than the proportion threshold value of setting except the value closely obtained in proper order, then accept this pair match point;

Step 3, images match point maps to three-dimensional coordinate, and method is as follows:

Ask 1 p=(x in depth image _d, y _d) coordinate P under Kinect coordinate system _3D=(x, y, z):

\{\begin{matrix} P_{3 D} . x = (x_{d} - u_{I R}) \times P_{3 D} . z / f_{u I R} \\ P_{3 D} . y = (y_{d} - v_{I R}) \times P_{3 D} . z / f_{v I R} \\ P_{3 D} . z = d e p t h (x_{d}, y_{d}) \end{matrix}

Wherein, P _3D.x, P _3Dand P .y _3D.z P is respectively _3Dthe coordinate x of=(x, y, z), y, z, depth (x _d, y _d) represent the depth value of depth image mid point p;

3D coordinate corresponding to RGB image pixel obtains the coordinate (x in RGB image _rgb, y _rgb):

\{\begin{matrix} x_{r g b} = (P_{3 D}^{'} . x * f_{u} / P_{3 D}^{'} . z) + u_{0} \\ y_{r g b} = (P_{3 D}^{'} . y * f_{v} / P_{3 D}^{'} . z) + v_{0} \end{matrix}

Wherein, P ' _3D ^t=R _iRc* P _3D ^t+ t _iRc;

According to above-mentioned conversion relation, the matching double points obtained is converted to three dimensions point pair in step 2;

Step 4, the double-deck registration of the three dimensions point based on RANSAC and ICP method, method is as follows:

(1) first registration: the three dimensions point pair removing error hiding at first registration stage application RANSAC, finds by iteration the most imperial palace point set that meets transformation model and estimates transformation matrix T'; KeyFrame is to each relative transform matrix of current data in accumulation, obtains the transformation matrix of the relative KeyFrame of current Kinect; Go out translational movement and the anglec of rotation modulus value of Kinect according to this matrix computations, compare with the threshold value of setting, judge whether that choosing current data is KeyFrame;

(2) accuracy registration: for obtaining k moment Kinect pose according to the transformational relation of Kinect coordinate system and world coordinate system, set up following energy function:

E = m i n \underset{p^{k} &Element; Ω}{Σ} | T_{c w}^{k} p_{w} - p^{k} |

Ω={ p ^k| u=π (p ^k) and M (u)=1}

Wherein, M (X) be describe the function of effective information and invalid information in acquisition depth map:

Wherein, X is picture planimetric coordinates point;

Suppose in the side-play amount of k-1 moment and k moment Kinect be then:

T_{c w}^{k} = T_{i n c}^{k} T_{c w}^{k - 1}

When Kinect is at x, y, the rotation amount (α, beta, gamma) on z-axis direction and translational movement are in three directions (t _x, t _y, t _z) enough hour, launch according to first order Taylor formula, make x=(α, beta, gamma, t _x, t _y, t _z), then:

\begin{matrix} T_{i n c}^{k} = \exp (x) \\ = [\begin{matrix} 1 & γ & - β \\ - γ & 1 & α \\ β & - α & 1 \end{matrix} | \begin{matrix} t_{x} \\ t_{y} \\ t_{z} \end{matrix}] \\ = [R_{i n c} | t_{i n c}] \end{matrix}

For the kth moment obtain spatial point world coordinates be by this spot projection under the coordinate system of Kinect during the k-1 moment, energy function is transformed to:

\begin{matrix} E = \min \underset{p^{k} &Element; Ω}{Σ} | | T_{c w}^{k} p_{w} - p^{k} | | \\ = \min \underset{p^{k} &Element; Ω}{Σ} | | T_{i n c}^{k} T_{c w}^{k - 1} p_{w} - p^{k} | | \\ = \min \underset{p^{k} &Element; Ω}{Σ} | | T_{i n c}^{k} p_{w}^{k - 1} - p^{k} | | \end{matrix}

Ω={ p ^k| u=π (p ^k) and M (u)=1}

Wherein, p _wand p ^kfor corresponding point, for p _wcoordinate under kth-1 moment camera coordinates system;

By

T_{i n c}^{k} p_{w}^{k - 1} = R_{i n c} p_{w}^{k - 1} + t_{i n c} = G (p_{w}^{k - 1}) x + p_{w}^{k - 1}

Obtain the final expression of energy function:

\underset{x &Element; s e (3)}{m i n} \underset{Ω}{Σ} | | G (p_{w}^{k - 1}) x + p_{w}^{k - 1} - p^{k} | |

Ω={ p ^k| u=π (p ^k) and M (u)=1}

Wherein,

G (p_{w}^{k - 1}) = [{[p_{w}^{k - 1}]}_{\times} | I_{3 \times 3}],

serve as reasons the antisymmetric matrix formed;

The threshold value of setting energy function, utilizes Cholesky to decompose and obtains hexa-atomic group of solution x=(α, beta, gamma, t _x, t _y, t _z), be mapped to special European group SE (3) space in Lie group, and current Kinect pose can be obtained in conjunction with the pose of k-1 moment Kinect;

Step 5, scene update, method is as follows:

2. the indoor method for reconstructing three-dimensional scene based on double-deck method for registering according to claim 1, is characterized in that, matrix T is changed in the application RANSAC changes persuing described in step 4 ^,method as follows:

(1) from the initial N from reference point collection A and point set B subject to registration to the 7 pairs of data of random selecting three-dimensional matching double points;

(2) basis matrix is utilized to solve minimal configuration 7 methods, by the transformation matrix T of the 7 pairs of data Calculation Basis point sets chosen and point set data subject to registration ^aB;

(3) transformation matrix T is utilized ^aBby the unique point set of image subject to registration in a remaining N-7 three-dimensional point under transforming to reference point cloud coordinate system;

(4) the point set P ' after conversion is calculated _n-7with benchmark point set between error of coordinate;

(5) from N to finding out the feature point pairs number of error of coordinate in certain threshold value matching double points, be designated as m;

(6) repeat (1) ~ (5) n time, making m value obtain maximum set is most imperial palace point set, and be interior point, all the other N-m are Mismatching point, are exterior point; Most imperial palace point set is utilized to estimate the least square solution of transformation model, as the transformation matrix T' of current adjacent two frame data.