CN111105460B

CN111105460B - RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scene

Info

Publication number: CN111105460B
Application number: CN201911361680.9A
Authority: CN
Inventors: 李纯明; 方硕
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2023-04-25
Anticipated expiration: 2039-12-26
Also published as: CN111105460A

Abstract

The invention provides an RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scenes, which combines the camera pose and depth map joint optimization between local frames with the camera pose estimation method for feature matching of joint RGB-D, eliminates the influence of single-frame depth noise or cavities on feature matching and camera pose estimation after the feature matching by using dense RGB-D alignment between the local frames, and can reduce redundant RGB-D information; the feature extraction and matching of RGB and depth information can reduce the camera pose estimation error caused by RGB repeated textures and weak textures. The invention solves the problems of serious depth loss, repeated textures and structures, weak textures, intense illumination change, intense camera movement and the like caused by distance limitation or infrared interference.

Description

RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scene

Technical Field

The invention belongs to the technical field of positioning and tracking, and particularly relates to an RGB-D camera pose estimation method for three-dimensional reconstruction of an indoor scene.

Background

At present, with the rise of a plurality of consumer-grade RGB-D camera products, a plurality of groups at home and abroad aim at researching more robust, accurate, efficient and large-scale RGB-D camera three-dimensional reconstruction technology. The camera pose estimation, i.e. the estimation of the inter-frame relative transformation matrix T (rotation matrix R and translation vector T), is the most important link in three-dimensional reconstruction based on RGB-D cameras.

The current camera pose estimation method based on the RGB-D camera mainly comprises the following steps: feature point method, direct method, iterative nearest neighbor algorithm (ICP), RGB-D alignment method, etc. The feature point method and the direct method only use RGB information to estimate the camera pose, and the utilization of depth information is abandoned. The feature point method uses a feature point matching method to estimate the pose, is suitable for providing scenes with rich feature points and can utilize the feature points to reposition, but the information utilized by the feature points is too little and the calculation is too time-consuming, so that most of information in RGB images is lost, and the pose is often invalid in the environments of weak textures and repeated textures; the direct method can obtain a dense or semi-dense map without calculating feature descriptors, so that the map can be normally used under the condition of missing features, but the gray scale is not too severely assumed, the camera is required to move at a high speed and cannot be automatically exposed, and the map is unfavorable and has larger illumination variation and camera movement; the traditional iterative nearest neighbor algorithm (ICP) only uses depth information, does not use RGB information, and calculates the optimal rigid transformation by repeatedly selecting corresponding relation point pairs, then applies the transformation, searches the corresponding relation point pairs, and calculates the new optimal transformation until the convergence accuracy requirement of correct registration is met. Although the geometrical structure characteristics of the point cloud are fully utilized and do not depend on RGB characteristics and luminosity, the point cloud is sensitive to pose initial values and needs to have a good initial value, so that the method of matching the RGB characteristic point method with the initial value of an ICP method is also used for providing the good initial value of an ICP algorithm, the dependence on the RGB characteristics is increased, and weak textures cannot be processed well. Currently, the widely influenced RGB-D three-dimensional reconstruction algorithm KinectFusion, elasticFusion and camera pose tracking links of a plurality of variants thereof are mainly based on an ICP algorithm; the RGB-D alignment method uses RGB information and depth information simultaneously, and solves the relative camera pose between two frames by minimizing the depth error and the luminosity error. The framework of the BundleFusion algorithm is a method based on RGB-D alignment. But the noise problem of the depth camera often affects the quality of the RGB-D alignment.

Therefore, how to solve the problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera movement and the like caused by distance limitation or infrared interference is very interesting in accurately and robustly estimating the pose change of a camera and realizing the three-dimensional reconstruction of an indoor scene.

Disclosure of Invention

Aiming at the defects in the prior art, the RGB-D camera pose estimation method for three-dimensional reconstruction of the indoor scene solves the problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera motion and the like caused by distance limitation or infrared interference.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the scheme provides an RGB-D camera pose estimation method for three-dimensional reconstruction of an indoor scene, which comprises the following steps:

s1, acquiring each RGB-D frame in an RGB-D camera;

s2, aligning an RGB image with a depth image according to each RGB-D frame, preprocessing the depth image, deleting abnormal depth data, and obtaining aligned RGB-D frames;

s3, performing optical flow tracking on the RGB image according to the aligned RGB-D frame, and determining a local alignment and optimization interval of the RGB-D camera pose estimation;

s4, carrying out RGB-D camera pose estimation on the RGB-D frames in the local optimization interval, and converting the RGB-D information in the interval into an RGB-D key frame coordinate system of the interval to obtain optimized RGB-D key frames;

and S5, extracting and matching feature points of the optimized RGB-D key frames by combining with RGB-D information to obtain pose estimation among the RGB-D key frames, and completing the estimation of the pose of the RGB-D camera for three-dimensional reconstruction of the indoor scene.

Further, the abnormal depth data in step S2 includes:

points outside the RGB-D camera effective distance;

3D points with the distance to the nearest point in the RGB-D frame point cloud being larger than a preset threshold, wherein the threshold is 0.9 times of the maximum point-to-point distance of the frame point cloud; and

the included angle between the 3D point in the RGB-D frame and the transverse main optical axis and the longitudinal main optical axis exceeds a preset threshold value, and the included angle threshold value of the main optical axis is 60-70 degrees.

Still further, the step S3 includes the steps of:

s301, extracting ORB corner points of the RGB image of the first frame of the aligned RGB-D, and extracting ORB corner points of the RGB image of the next frame of the aligned RGB-D;

s302, performing optical flow tracking based on unchanged luminosity according to the extracted ORB corner points, judging whether the optical flow tracking is successful, if yes, entering a step S303, otherwise, entering a step S304;

s303, calculating to obtain the relative pose of the adjacent RGB-D inter-frame cameras by using an epipolar geometry method, judging whether the L-2 norm of the lie algebra of the change of the relative pose is within a preset threshold, if so, recording the RGB-D frame as a frame to be selected in a local optimization interval, returning to the step S302, otherwise, entering the step S304;

s304, judging whether the current RGB-D frame is a first frame and no RGB-D frame to be selected exists, if so, entering a step S305, otherwise, entering a step S306;

s305, recording the RGB-D frame as a group of new frames to be selected in a local optimization interval, judging whether the next RGB-D frame exists, if so, returning to the step S304, otherwise, returning to the step S302;

s306, forming a group of local alignment and optimization intervals of RGB-D camera pose estimation by all current RGB-D frames to be selected, entering a step S4, recording the RGB-D frames as a group of new frames to be selected of the local optimization interval, judging whether the next RGB-D frame exists, if so, returning to the step S304, otherwise, returning to the step S302.

Still further, the threshold in step S303 is 10.

Still further, the step S4 includes the steps of:

s401, selecting the first interval according to the RGB-D sequence in the local optimization interval

Is the keyframe of the locally optimized interval, where n _i The number of RGB-D frames representing the i-th local optimization interval, < >>

Representing a downward rounding;

s402, calculating the camera pose in each local optimization interval by using a minimized inverse depth error and a luminosity error according to the key frame;

s403, transforming 3D points of adjacent RGB-D frames in the camera pose in the local optimization interval to the camera coordinate system of the key frame to obtain the optimized RGB-D key frame.

Still further, the camera pose T in each local optimization zone in step S402 satisfies the following expression:

wherein E is _z For inverse depth error, E _I Luminosity error, α is the relative weight of the equilibrium inverse depth error and luminosity error, z (X _j ) Representing the key point X _j Depth at the i-th frame, Z _j (x _j ) Key point X on depth image representing jth frame _j Projection position and key point X of (2) _j Corresponding depth ρ _Z Is an inverse depth error robust function, I _i (x _i ) Representing the point x on the ith frame _i Corresponding luminosity ρ _I Is a luminosity error robustness function, x _i Representing the 2D feature point position of the current key frame, E _align Indicating the total error.

Still further, the step S5 includes the steps of:

s501, extracting key points combining an RGB image and a depth image from the optimized RGB-D key frame by combining RGB-D information;

s502, combining a two-dimensional image feature descriptor SIFT and a three-dimensional point cloud feature descriptor FPFH according to the key points to generate a joint descriptor;

s503, matching corresponding points between the RGB-D key frames according to the joint descriptors;

s504, filtering the RGB-D key frames by using a PnP algorithm, removing wrong matching point pairs, obtaining pose estimation among the RGB-D key frames, and completing the estimation of the pose of the RGB-D camera for three-dimensional reconstruction of the indoor scene.

Still further, the step S504 includes the steps of:

s5041, randomly selecting 8 groups of matching point pairs obtained in the step S503, and calculating a rotation matrix R and a translation vector t of the pose of the camera by using a PnP algorithm;

s5042, forming a judging function by utilizing the 3D point re-projection error, the epipolar geometric model and the homography matrix error according to the rotation matrix R and the translation vector t of the camera pose;

s5043, judging whether to reject the random matching pair according to the judging function, if so, entering a step S5044, otherwise, returning to the step S5041;

s5044, eliminating all matching point pairs which do not meet the judging function, calculating to obtain pose estimation among RGB-D key frames by utilizing a PnP algorithm according to all matching point pairs which meet the judging function, and completing estimation of the pose of the RGB-D camera for three-dimensional reconstruction of the indoor scene.

Still further, the expression of the RGB-D camera pose estimation E (R, t) in step S504 is as follows:

wherein K represents an internal reference matrix, g _i 3D feature points representing the ith keyframe, R represents a rotation matrix, t represents a balance vector, and x _i Representing the 2D feature points of the ith keyframe.

The invention has the beneficial effects that:

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

Examples

In order to solve the problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera movement and the like caused by distance limitation or infrared interference, as shown in fig. 1, the invention provides an RGB-D camera pose estimation method for three-dimensional reconstruction of an indoor scene, which comprises the following implementation steps:

s1, acquiring RGB-D information of each frame in an RGB-D camera;

s2, aligning an RGB image with a depth image according to the RGB-D information, preprocessing the depth image, and deleting abnormal depth data;

the abnormal depth data in step S2 includes points of any one of the following conditions:

the first condition is:

points outside the RGB-D camera effective distance;

the second condition is:

3D points with the distance to the nearest point in the RGB-D frame point cloud being larger than a preset threshold, wherein the threshold is 0.9 times of the maximum point-to-point distance of the frame point cloud;

third condition:

and the included angle between the 3D point in the RGB-D frame and the transverse main optical axis and the longitudinal main optical axis exceeds a preset threshold value, and the included angle threshold value of the main optical axis is 60-70 degrees.

S3, carrying out optical flow tracking on the RGB image according to the aligned RGB-D frame, and determining a local alignment and optimization interval of the RGB-D camera pose estimation, wherein the implementation method is as follows:

In this embodiment, for the RGB-D data with one-to-one correspondence to the preprocessed pixel coordinates, optical flow tracking is performed on the RGB image to determine a local RGB-D information alignment and optimization interval, where each frame of RGB image optical flow tracking specifically includes, extracting an ORB corner and optical flow tracking based on a luminance invariant assumption, so that pose change between two frames can be estimated preliminarily. Continuous frames with successful optical flow tracking and pose change within a given threshold form a local optimization interval; if the optical flow tracking fails or the pose changes exceed a given threshold, the optical flow tracking is started in the next local optimization interval.

S4, carrying out RGB-D camera pose estimation on the RGB-D frame in the local optimization interval, and converting the RGB-D information in the interval into an RGB-D key frame coordinate system of the interval to obtain an optimized RGB-D key frame, wherein the implementation method is as follows:

s401, according to the local partOptimizing RGB-D sequence in interval, selecting the first one in the interval

Representing a downward rounding;

the camera pose T in each local optimization interval satisfies the following expression:

wherein E is _z For inverse depth error, E _I Luminosity error, α is the relative weight of the equilibrium inverse depth error and luminosity error, z (X _j ) Representing the key point X _j Depth at the i-th frame, Z _j (x _j ) Key point X on depth image representing jth frame _j Projection position and key point X of (2) _j Corresponding depth ρ _Z Is an inverse depth error robust function, I _i (x _i ) Representing the point x on the ith frame _i Corresponding luminosity ρ _I Is a luminosity error robustness function, x _i Representing the 2D feature point position of the current key frame, E _align Representing the total error;

s403, transforming 3D points of adjacent RGB-D frames in the optimized camera pose to a camera coordinate system of the key frame to obtain the optimized RGB-D key frame.

In this embodiment, in the step S3, matching alignment of RGB-D information of fine scale of RGB-D frames in each local optimization interval is achieved, and the pose of the camera in each local optimization interval is solved by minimizing inverse depth error and luminosity error. Selecting the first in the section

RGB-D is the key frame of the segment, and the 3D points of the adjacent RGB-D frames after pose optimization are transformed to the key frame camera coordinate system.

S5, extracting and matching feature points of the optimized RGB-D key frames by combining with RGB-D information to obtain pose estimation among the RGB-D key frames, and completing the estimation of the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene, wherein the implementation method comprises the following steps:

s504, filtering the RGB-D key frames by using a PnP algorithm, removing wrong matching point pairs, obtaining pose estimation among the RGB-D key frames, and completing the estimation of the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene, wherein the implementation method comprises the following steps:

The expression of the camera pose estimation E (R, t) in the step S504 is as follows:

wherein K represents an internal reference matrix, g _i 3D feature points representing the ith keyframe, R represents a rotation matrix, t represents a translation vector, and x _i Representing the 2D feature points of the ith keyframe.

In this embodiment, for each optimized RGB-D key frame, key points of the joint RGB camera and depth information are extracted, and SIFT and FPFH descriptors are combined to generate a joint descriptor to match corresponding points between the RGB-D key frames, an RANSAC algorithm is used to remove erroneous matching point pairs, and a PnP algorithm is used to perform camera pose estimation, so as to realize registration of point clouds between the RGB-D key frames, and obtain a relatively complete three-dimensional point cloud model of an indoor scene.

According to the invention, through the design, the camera pose and depth map joint optimization between the partial frames is combined with the camera pose estimation method of the feature matching of the joint RGB-D, and the influence of single-frame depth noise or holes on the feature matching and the camera pose estimation after the single-frame depth noise or holes is eliminated by using dense RGB-D alignment between the partial frames, and redundant RGB-D information can be reduced; the feature extraction and matching of RGB and depth information can reduce the camera pose estimation error caused by RGB repeated textures and weak textures. The problems of serious depth loss, repeated textures and structures, weak textures, severe illumination change, severe camera movement and the like caused by distance limitation or infrared interference are solved.

Claims

1. An RGB-D camera pose estimation method for three-dimensional reconstruction of indoor scene is characterized by comprising the following steps:

s1, acquiring each RGB-D frame in an RGB-D camera;

the step S3 includes the steps of:

s306, forming a group of local alignment and optimization intervals of RGB-D camera pose estimation by all current RGB-D frames to be selected, entering a step S4, recording the RGB-D frames as a group of new frames to be selected of the local optimization interval, judging whether the next RGB-D frame exists, if so, returning to the step S304, otherwise, returning to the step S302;

s4, performing RGB-D camera pose estimation on the RGB-D frames in the local alignment and optimization interval, and converting the RGB-D information in the interval into an RGB-D key frame coordinate system of the interval to obtain an optimized RGB-D key frame;

s5, extracting and matching feature points of the optimized RGB-D key frames by combining with RGB-D information to obtain pose estimation among the RGB-D key frames, and completing the estimation of the pose of an RGB-D camera for three-dimensional reconstruction of the indoor scene;

the step S5 includes the steps of:

2. The method for estimating the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene according to claim 1, wherein the abnormal depth data in step S2 comprises:

points outside the RGB-D camera effective distance;

3. The method for estimating the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene according to claim 2, wherein the threshold in step S303 is 10.

4. The method for estimating the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene according to claim 1, wherein said step S4 comprises the steps of:

s401, selecting the first interval according to the RGB-D sequence in the local alignment and optimization interval

Representing a downward rounding;

5. The method for estimating the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene according to claim 4, wherein the camera pose T in each local optimization interval in step S402 satisfies the following expression:

wherein E is _z For inverse depth error, E _I Luminosity error, alpha is the equilibrium inverse depth error and luminosityRelative weight of error, z (X _j ) Representing the key point X _j Depth at the i-th frame, Z _j (x _j ) Key point X on depth image representing jth frame _j Projection position and key point X of (2) _j Corresponding depth ρ _Z Is an inverse depth error robust function, I _i (x _i ) Representing the point x on the ith frame _i Corresponding luminosity ρ _I Is a luminosity error robustness function, x _i Representing the 2D feature point position of the current key frame, E _align Indicating the total error.

6. The method for estimating the pose of an RGB-D camera for three-dimensional reconstruction of an indoor scene according to claim 5, wherein said step S504 comprises the steps of:

7. The method for estimating the pose of the RGB-D camera for three-dimensional reconstruction of an indoor scene as set forth in claim 6, wherein the expression of the pose estimation E (R, t) of the RGB-D camera in step S504 is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,k represents an internal reference matrix, g _i 3D feature points representing the ith keyframe, R represents a rotation matrix, t represents a balance vector, and x _i Representing the 2D feature points of the ith keyframe.