CN111105460A

CN111105460A - RGB-D camera pose estimation method for indoor scene three-dimensional reconstruction

Info

Publication number: CN111105460A
Application number: CN201911361680.9A
Authority: CN
Inventors: 李纯明; 方硕
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-05-05
Anticipated expiration: 2039-12-26
Also published as: CN111105460B

Abstract

The invention provides an RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes, which combines the camera pose and depth map joint optimization of local interframes with a camera pose estimation method for combining RGB-D feature matching, eliminates the influence of single-frame depth noise or cavities on feature matching and camera pose estimation after feature matching by using dense RGB-D alignment of the local interframes, and can also reduce redundant RGB-D information; by combining the feature extraction and matching of the RGB and the depth information, the camera pose estimation error caused by RGB repeated texture and weak texture can be reduced. The invention solves the problems of serious depth loss, repeated texture and structure, weak texture, severe illumination change, severe camera motion and the like caused by distance limitation or infrared interference.

Description

RGB-D camera pose estimation method for indoor scene three-dimensional reconstruction

Technical Field

The invention belongs to the technical field of positioning and tracking, and particularly relates to an RGB-D camera pose estimation method for three-dimensional reconstruction of an indoor scene.

Background

At present, with the rise of a plurality of consumption-level RGB-D camera products, a plurality of teams at home and abroad are dedicated to the research on a more robust, accurate, efficient and large-scale RGB-D camera three-dimensional reconstruction technology. And the camera pose estimation, namely the estimation of the inter-frame relative transformation matrix T (the rotation matrix R and the translational vector T), is the most important link in the three-dimensional reconstruction based on the RGB-D camera.

The current camera pose estimation method based on the RGB-D camera mainly comprises the following steps: feature point methods, direct methods, iterative nearest neighbor algorithms (ICP), RGB-D alignment methods, and the like. The feature point method and the direct method only use RGB information to estimate the position and posture of the camera, and abandon the use of depth information. The feature point method estimates the pose by using a feature point matching method, is suitable for scenes capable of providing rich feature points and can be relocated by using the feature points, but the information used by the feature points is too little and time-consuming in calculation, most information in RGB images is lost, and the feature point method is often invalid in the environment of weak texture and repeated texture; the direct method can obtain a dense or semi-dense map without calculating a feature descriptor, so that the map can be normally used under the condition of feature loss, but the assumption that the gray scale is unchanged is too severe, the requirement that the camera movement speed cannot be too fast, automatic exposure cannot be carried out and the like is unfavorable to the conditions that the illumination change is large and the camera movement is large; the traditional iterative nearest neighbor algorithm (ICP) calculates the optimal rigid body transformation by repeatedly selecting the corresponding relationship point pairs, using only depth information, but not RGB information, then applies the transformation, finds the corresponding relationship point pairs, and calculates a new optimal transformation until the convergence accuracy requirement for correct registration is satisfied. Although the geometrical structural features of the point cloud are fully utilized, the point cloud does not depend on RGB features and luminosity, but is sensitive to the initial value of the pose and needs a better initial value, the method of matching the RGB feature point method as the initial value of the ICP method is also adopted to provide the better initial value of the ICP algorithm, but the dependence on the RGB features is increased, and weak textures cannot be better processed. The current RGB-D three-dimensional reconstruction algorithms Kinectfusion, Elasticfusion and a plurality of variants of the Kinectfusion and Elasticfusion which have wide influence mainly comprise a camera pose tracking link based on an ICP algorithm; the RGB-D alignment method uses both RGB information and depth information to solve for the relative camera pose between two frames by minimizing depth and photometric errors. The framework of the BundleFusion algorithm is an RGB-D alignment based approach. But noise problems with depth cameras often affect the quality of RGB-D alignment.

Therefore, how to solve the problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera motion and the like caused by distance limitation or infrared interference is very worth paying attention to the method for accurately and robustly estimating the pose change of the camera and realizing the three-dimensional reconstruction of the indoor scene.

Disclosure of Invention

Aiming at the defects in the prior art, the RGB-D camera pose estimation method for the indoor scene three-dimensional reconstruction solves the problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera motion and the like caused by distance limitation or infrared interference.

In order to achieve the above purpose, the invention adopts the technical scheme that:

the scheme provides an indoor scene three-dimensional reconstruction RGB-D camera pose estimation method, which comprises the following steps:

s1, acquiring each RGB-D frame in the RGB-D camera;

s2, aligning the RGB image and the depth image according to each RGB-D frame, preprocessing the depth image, and deleting abnormal depth data to obtain an aligned RGB-D frame;

s3, performing optical flow tracking on the RGB image according to the aligned RGB-D frame, and determining a local alignment and optimization interval of the pose estimation of the RGB-D camera;

s4, performing RGB-D camera pose estimation on the RGB-D frames in the local optimization interval, and converting the RGB-D information in the interval into an RGB-D key frame coordinate system of the interval to obtain optimized RGB-D key frames;

and S5, extracting and matching feature points of the optimized RGB-D key frames by combining the RGB-D information to obtain pose estimation among the RGB-D key frames, and finishing estimation of the RGB-D camera pose of the indoor scene three-dimensional reconstruction.

Further, the abnormal depth data in step S2 includes:

points outside the RGB-D camera effective distance;

3D points with the distance from the closest point in the RGB-D frame point cloud being greater than a preset threshold value, wherein the threshold value is 0.9 times of the maximum point pair distance of the frame point cloud; and

and respectively forming an included angle between a 3D point in the RGB-D frame and a transverse and longitudinal main optical axis to exceed a preset threshold value, wherein the threshold value of the included angle of the main optical axis is 60-70 degrees.

Still further, the step S3 includes the following steps:

s301, extracting an ORB corner point of an aligned RGB image of a first frame RGB-D, and extracting an ORB corner point of an aligned RGB image of a next frame RGB-D;

s302, performing optical flow tracking based on unchanged luminosity according to the extracted ORB angular points, and judging whether the optical flow tracking is successful, if so, entering a step S303, otherwise, entering a step S304;

s303, calculating by using an epipolar geometry method to obtain the relative pose of the adjacent RGB-D interframe cameras, judging whether the L-2 norm of a lie algebra with the changed relative pose is within a preset threshold value, if so, recording the RGB-D frames as frames to be selected of a local optimization interval, returning to the step S302, and otherwise, entering the step S304;

s304, judging whether the current RGB-D frame is the first frame and has no RGB-D frame to be selected, if so, entering the step S305, otherwise, entering the step S306;

s305, recording the RGB-D frame as a group of frames to be selected of a new local optimization interval, judging whether a next RGB-D frame exists, if so, returning to the step S304, otherwise, returning to the step S302;

s306, forming a group of RGB-D camera pose estimation local alignment and optimization intervals by all current RGB-D frames to be selected, entering step S4, recording the RGB-D frames as a group of new frames to be selected of local optimization intervals, judging whether a next RGB-D frame exists, if so, returning to step S304, otherwise, returning to step S302.

Still further, the threshold value in step S303 is 10.

Still further, the step S4 includes the following steps:

s401, according to the RGB-D sequence in the local optimization interval, selecting the second time in the interval

The RGB-D frame of (1) is a key frame of the local optimization interval, wherein n is_iThe number of RGB-D frames representing the ith local optimization interval,

represents rounding down;

s402, according to the key frame, calculating by utilizing the minimized inverse depth error and the photometric error to obtain the camera pose in each local optimization interval;

and S403, transforming the 3D points of the adjacent RGB-D frames in the camera pose in the local optimization interval to the camera coordinate system of the key frame to obtain the optimized RGB-D key frame.

Still further, the camera pose T in each local optimization interval in step S402 satisfies the following expression:

wherein E is_zTo reverse depth error, E_IPhotometric error, α, is the relative weight that balances the inverse depth error with the photometric error, z (X)_j) Represents a key point X_jDepth at i-th frame, Z_j(x_j) Representing a keypoint X on the depth image of the jth frame_jProjection position and key point X of_jCorresponding depth, p_ZFor an inverse depth error robustness function, I_i(x_i) Representing point x on the ith frame_iCorresponding luminosity, p_IIs a photometric error-robustness function, x_i2D feature point location, E, representing the current keyframe_alignThe total error is indicated.

Still further, the step S5 includes the following steps:

s501, extracting key points of the optimized RGB-D key frame in combination with the RGB image and the depth image in combination with the RGB-D information;

s502, combining a two-dimensional image feature descriptor SIFT and a three-dimensional point cloud feature descriptor FPFH according to the key points to generate a joint descriptor;

s503, matching corresponding points between the RGB-D key frames according to the joint descriptors;

s504, filtering the RGB-D key frames by utilizing a PnP algorithm, eliminating wrong matching point pairs, obtaining pose estimation between the RGB-D key frames, and finishing estimation of the pose of the RGB-D camera for three-dimensional reconstruction of the indoor scene.

Still further, the step S504 includes the steps of:

s5041, randomly selecting 8 groups of matching point pairs obtained in the step S503, and calculating by utilizing a PnP algorithm to obtain a rotation matrix R and a translational vector t of the pose of the camera;

s5042, forming a judgment function by using a 3D point reprojection error, an epipolar geometric model and a homography matrix error according to the rotation matrix R and the translational vector t of the camera pose;

s5043, judging whether the random matching pairs are eliminated or not according to the judgment function, if so, entering the step S5044, and if not, returning to the step S5041;

s5044, eliminating all matching point pairs which do not meet the judgment function, calculating the pose estimation between the RGB-D key frames by utilizing a PnP algorithm according to all matching point pairs which meet the judgment function, and finishing the estimation of the RGB-D camera pose of the indoor scene three-dimensional reconstruction.

Still further, the expression of the RGB-D camera pose estimation E (R, t) in step S504 is as follows:

wherein K represents an internal reference matrix, g_iRepresenting the 3D characteristic point of the ith key frame, R representing a rotation matrix, t representing a balance vector, x_iAnd 2D feature points representing the ith key frame.

The invention has the beneficial effects that:

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

Examples

In order to solve the problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera motion and the like caused by distance limitation or infrared interference, as shown in fig. 1, the invention provides an RGB-D camera pose estimation method for three-dimensional reconstruction of an indoor scene, which comprises the following steps:

s1, acquiring RGB-D information of each frame in the RGB-D camera;

s2, aligning the RGB image and the depth image according to the RGB-D information, preprocessing the depth image, and deleting abnormal depth data;

the abnormal depth data in step S2 includes any one of the following condition points:

the first condition is:

points outside the RGB-D camera effective distance;

the second condition is:

3D points with the distance from the closest point in the RGB-D frame point cloud being greater than a preset threshold value, wherein the threshold value is 0.9 times of the maximum point pair distance of the frame point cloud;

the third condition is:

and respectively enabling included angles between the 3D points in the RGB-D frame and the transverse and longitudinal main optical axes to exceed a preset threshold, wherein the threshold of the included angle of the main optical axis is 60-70 degrees.

S3, performing optical flow tracking on the RGB image according to the aligned RGB-D frame, and determining a local alignment and optimization interval of the pose estimation of the RGB-D camera, wherein the implementation method comprises the following steps:

In this embodiment, for the RGB-D data corresponding to the preprocessed pixel coordinates one to one, optical flow tracking is performed on the RGB image to determine a local RGB-D information alignment and optimization interval, where the optical flow tracking of each frame of RGB image specifically includes extraction of an ORB corner point and optical flow tracking based on a luminosity invariant assumption, which can preliminarily estimate pose change between two frames. Continuous frames with successful optical flow tracking and pose change within a given threshold value form a local optimization interval; if the optical flow tracking fails or the pose change exceeds a given threshold, it is the start of the next local optimization interval.

S4, performing RGB-D camera pose estimation on the RGB-D frames in the local optimization interval, and converting the RGB-D information in the interval into an RGB-D key frame coordinate system of the interval to obtain an optimized RGB-D key frame, wherein the implementation method comprises the following steps:

represents rounding down;

the camera pose T in each local optimization interval satisfies the following expression:

wherein E is_zTo reverse depth error, E_IPhotometric error, α, is the relative weight that balances the inverse depth error with the photometric error, z (X)_j) Represents a key point X_jDepth at i-th frame, Z_j(x_j) Representing a keypoint X on the depth image of the jth frame_jProjection position and key point X of_jCorresponding depth, p_ZFor an inverse depth error robustness function, I_i(x_i) Representing point x on the ith frame_iCorresponding luminosity, p_IIs a photometric error-robustness function, x_i2D feature point location, E, representing the current keyframe_alignRepresents the total error;

and S403, transforming the 3D points of the adjacent RGB-D frames in the optimized camera pose to the camera coordinate system of the key frame to obtain the optimized RGB-D key frame.

In this embodiment, in the step S3, the fine-scale RGB-D information matching alignment of the RGB-D frames in each local optimization interval is realized, and the camera pose in each local optimization interval is solved by minimizing the inverse depth error and the photometric error. Selecting the first in the section

RGB-D is the key frame of the segment, and the adjacent RGB-D frame 3D point after pose optimization is transformed to the key frame camera coordinate system.

S5, extracting and matching feature points of the optimized RGB-D key frames by combining the RGB-D information to obtain pose estimation among the RGB-D key frames, and finishing estimation of the pose of the RGB-D camera for three-dimensional reconstruction of the indoor scene, wherein the implementation method comprises the following steps:

s504, filtering the RGB-D key frames by utilizing a PnP algorithm, eliminating wrong matching point pairs to obtain pose estimation between the RGB-D key frames, and finishing estimation of the pose of the RGB-D camera for three-dimensional reconstruction of an indoor scene, wherein the implementation method comprises the following steps:

The expression of the camera pose estimation E (R, t) in step S504 is as follows:

wherein K represents an internal reference matrix, g_iRepresenting the 3D characteristic point of the ith key frame, R representing a rotation matrix, t representing a translation vector, x_iAnd 2D feature points representing the ith key frame.

In the embodiment, for each optimized RGB-D key frame, key points combining an RGB camera and depth information are extracted, SIFT descriptors and FPFH descriptors are combined to generate a combined descriptor, matching of corresponding points between the RGB-D key frames is performed, an RANSAC algorithm is used for eliminating wrong matching point pairs, and a PnP algorithm is used for estimating the position and posture of the camera, so that registration of point clouds between the RGB-D key frames is achieved, and a three-dimensional point cloud model with a complete indoor scene is obtained.

According to the design, the camera pose and depth map joint optimization between local frames is combined with a camera pose estimation method combining RGB-D feature matching, the influence of single-frame depth noise or cavities on feature matching and camera pose estimation after feature matching is eliminated by utilizing dense RGB-D alignment between local frames, and redundant RGB-D information can be reduced; by combining the feature extraction and matching of the RGB and the depth information, the camera pose estimation error caused by RGB repeated texture and weak texture can be reduced. The problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera motion and the like caused by distance limitation or infrared interference are solved.

Claims

1. An indoor scene three-dimensional reconstruction RGB-D camera pose estimation method is characterized by comprising the following steps:

s1, acquiring each RGB-D frame in the RGB-D camera;

2. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes according to claim 1, wherein the abnormal depth data in the step S2 includes:

points outside the RGB-D camera effective distance;

3. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes according to claim 1, wherein the step S3 includes the steps of:

4. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scene according to claim 3, wherein the threshold value in the step S303 is 10.

5. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes according to claim 1, wherein the step S4 includes the steps of:

represents rounding down;

6. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes according to claim 5, wherein the camera pose T in each local optimization interval in the step S402 satisfies the following expression:

7. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes according to claim 1, wherein the step S5 includes the steps of:

8. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes according to claim 7, wherein the step S504 includes the steps of:

9. The RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scene according to claim 7, wherein the expression of the RGB-D camera pose estimation E (R, t) in step S504 is as follows: