CN106875437B

CN106875437B - RGBD three-dimensional reconstruction-oriented key frame extraction method

Info

Publication number: CN106875437B
Application number: CN201611222413.XA
Authority: CN
Inventors: 齐越; 韩尹波; 王晨
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2016-12-27
Filing date: 2016-12-27
Publication date: 2020-03-17
Anticipated expiration: 2036-12-27
Also published as: CN106875437A

Abstract

The invention discloses a key frame extraction method for RGBD three-dimensional reconstruction. Firstly, a plurality of data frames adjacent in time are divided into a group for an RGBD data stream acquired by a camera and a camera pose estimated by a visual odometer. For each set of data, projecting each frame of depth image to a first frame of depth image according to the camera pose and the camera parameters; and projecting each group of first frame RGB images to the rest of each frame image, and obtaining the gray value of each projected RGB image in a linear interpolation mode. And then estimating the fuzzy degree of each frame of RGB image, and combining the weight of the corresponding projection depth image to obtain the weight of the projection RGB image. And respectively fusing the projection depth image and the projection RGB image in the group according to the obtained weight to obtain an RGBD key frame. The invention reduces the holes and noise of the data collected by the depth camera, obtains clearer depth images and RGB images, and provides more reliable data sources for other works in three-dimensional reconstruction such as global optimization of camera pose, texture extraction and the like.

Description

RGBD three-dimensional reconstruction-oriented key frame extraction method

Technical Field

The invention belongs to the field of computer vision and computer graphic image processing, in particular to a method for extracting key frames in an RGBD data stream, which provides a more reliable data source for researching camera pose estimation optimization and texture reconstruction in three-dimensional reconstruction based on the RGBD data stream and has important significance for researching three-dimensional reconstruction technology based on the RGBD data.

Background

With the popularization of depth sensors and the development of three-dimensional reconstruction techniques, research on three-dimensional model reconstruction based on RGBD data is emerging in recent years. Compared with the traditional three-dimensional reconstruction based on the RGB image, the depth image provides three-dimensional information of a scene, and the feasibility and the precision of the three-dimensional reconstruction are greatly improved. And the key frame extraction plays an important role in the pose estimation, the camera relocation and the texture reconstruction of the camera.

The current methods for extracting key frames for three-dimensional reconstruction can be classified into the following three categories: the algorithm takes the time stamp or the frame number as a unit, and extracts data at regular intervals as key frames. The method is simple to realize and has extremely high time efficiency, and the selection of the key frame is influenced by the scanning rate; the second is a method based on interframe motion detection, which calculates the relative pose transformation of the camera of each frame data and the last key frame and determines whether to add a key frame sequence according to the size; and thirdly, extracting key frames according to the number of corresponding feature points between frames by a method based on image features, such as Philip and the like. The method has low dependence on the parameters of the camera, but is suitable for being applied to three-dimensional reconstruction with low real-time requirement due to the problem of actual operation efficiency.

The above method for extracting the key frame generally faces the problem of low quality of the key frame, such as noise of a depth image, a hole, motion blur of an RGB image, and the like, and has a certain influence on optimization of a camera pose, texture extraction, and the like.

Disclosure of Invention

In order to overcome the defects, the invention aims to research a key frame extraction method which integrates multi-frame data and improves the data quality under the condition of keeping the characteristics of original data as much as possible by combining the characteristics of RGBD data streams and the requirement of three-dimensional reconstruction and utilizing local accurate camera postures.

In order to achieve the above object, the present invention provides a key frame extraction method for RGBD three-dimensional reconstruction, which includes the following steps:

the method comprises the following steps that (1) RGBD data streams collected by a calibrated depth camera and a calibrated color camera are grouped according to time stamps, and depth images and RGB images of a plurality of adjacent frames and camera pose estimated by a visual odometer are a group of data;

mapping each frame of depth image into a three-dimensional space according to internal parameters of a depth camera for each group of data, then projecting to a first frame of depth image in the group according to the pose of the camera to obtain a projected depth image, updating pixel values of adjacent integer coordinates in the projected depth image in the projection process, and taking a depth value closest to the first frame of depth image in the group as an actual depth value for each pixel of the projected depth image;

calculating a weight value of each pixel of the projection depth image according to the error of the corresponding projection coordinate, and calculating a final pixel value of the depth key frame in a weighted average mode for each frame of projection depth image in the group;

mapping the RGB image of a first frame in each group into a three-dimensional space according to the internal and external parameters of the color camera and the depth camera and the pixel values of the depth key frame, then projecting the RGB images of the other frames in the group according to the camera pose of each frame, calculating the gray value of each pixel in a linear interpolation mode, and obtaining the projection RGB image corresponding to each frame image;

and (5) calculating the motion blur degree of each frame of input RGB image, calculating the weight corresponding to each pixel in the corresponding projection RGB image by combining the weight of the projection depth image, projecting the RGB image for each frame in the group, and calculating the gray value of the key frame in a weighting median manner.

In the step (1), the number of frames of each group of data is 5.

In the step (5), the weight of the projection RGB image pixel is calculated while considering the weight of the projection depth image pixel and the image blur degree.

The principle of the invention is as follows: firstly, for RGBD data streams collected by a camera, a plurality of adjacent frames of RGB and depth images are divided into a group, so that the similarity and continuity of data in the group are ensured. And mapping each frame of depth image to a three-dimensional space according to the internal parameters of the depth camera to obtain a corresponding three-dimensional point cloud, projecting the three-dimensional point cloud to a first frame of depth image of a group where each frame is located according to the camera pose of each frame, and calculating the projected depth image and a corresponding weight. And fusing all the projection depth images in the group in a weighted average mode to obtain a depth key frame. According to the internal and external parameters of the depth camera and the color camera and the pixel value of the depth key frame, mapping each group of first frame RGB images to a three-dimensional space to obtain corresponding three-dimensional point clouds, then respectively projecting the three-dimensional point clouds onto each frame RGB image in the group according to the camera pose of each frame, and calculating the pixel gray value by using a linear interpolation method to obtain a projection RGB image. When calculating the weight of the projection RGB image, the weight of the corresponding projection depth image and the motion blur degree of the RGB image are considered, and for the RGB image with low motion blur degree, the corresponding projection RGB image is endowed with higher weight. And finally, fusing all the projection RGB images in the group by a weighting median method to obtain an RGB key frame.

The method deeply analyzes the requirement on the RGBD key frame in the three-dimensional reconstruction, and has the advantages compared with the prior key frame extraction technology aiming at the three-dimensional reconstruction:

(1) the characteristic that the quality of original data collected by the depth camera is low and the high-precision characteristic of pose estimation of the local camera are considered, and holes and noise of a single-frame depth image are effectively reduced by fusing multi-frame depth images.

(2) The characteristic that motion blur exists in the original data acquired in the motion process of the camera is considered, the mapping capacity of the pixel plane provided by the depth image to the three-dimensional space is combined, the motion blur possibly brought by a single-frame RGB image is effectively reduced by fusing multi-frame RGB images, and the precision of the RGB images is improved.

Drawings

Fig. 1 shows an original depth image and a corresponding projected depth image after projection in the present invention, where fig. 1(a) is the original depth image and fig. 1(b) is the corresponding projected depth image;

FIG. 2 shows an original RGB image and a corresponding projected RGB image after projection in the present invention, wherein FIG. 2(a) is the original RGB image and FIG. 2(b) is the corresponding projected RGB image;

FIG. 3 shows key frames of depth images before and after fusion in the present invention, wherein FIG. 3(a) is the depth image before fusion, and FIG. 3(b) is the depth image after fusion;

FIG. 4 shows key frames of RGB images before and after fusion in the present invention, wherein FIG. 4(a) is the RGB image before fusion, and FIG. 4(b) is the RGB image after fusion;

fig. 5 shows a schematic diagram of key frame extraction for RGBD three-dimensional reconstruction according to the present invention.

Detailed Description

The embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The implementation process of the invention is mainly divided into five steps: grouping RGBD data frames, calculating a projection depth image, calculating a projection RGB image, and fusing projection data.

Step one, grouping RGBD data frames

For a given registered RGBD data stream Input₁～Input_nSeveral frames of RGB images with adjacent time stamps (with C)₁～C_kFor example), depth image (in D)₁～D_kFor example) and corresponding camera pose (in T)₁～T_kFor example) into one group.

Step two, calculating a projection depth image

The method mainly comprises the following steps:

step (2.1) according to the internal parameter K of the depth camera_dWill D₁～D_kEach pixel point in the three-dimensional space is mapped to the three-dimensional space respectively, and the method specifically comprises the following steps:

p＝K_d*(u,v,d)^T(1)

wherein p is the mapped three-dimensional point coordinate, K_dIs the internal reference matrix (3 x 3) of the depth camera, u and v are the original pixel coordinates, and d is the corresponding depth value under the pixel coordinates.

And (2.2) converting the point cloud data of each frame into a camera coordinate system of a first frame in the group through a camera pose matrix, wherein the method specifically comprises the following steps:

wherein pr is three-dimensional point cloud coordinate corresponding to the first frame of camera coordinate system, T₁The camera pose matrix, T, for the first frame of the group_iIs the camera pose matrix of the ith frame.

Step (2.3) according to the internal parameter K of the depth camera_dAnd mapping the three-dimensional point cloud to a pixel coordinate system to obtain the corresponding pixel coordinate in the projection depth image. The method specifically comprises the following steps:

where px, py are the mapped pixel coordinate values, and dr is the corresponding depth value.

Step (2.4) since the pixel coordinates px, py obtained in step (2.3) are usually not integers, the coordinates of the 4 pixels adjacent to the pixel coordinates in the projection depth image need to be updated. And for the condition that a plurality of three-dimensional points are mapped to the same pixel coordinate, taking the depth value as the value closest to the original depth image of the first frame.

Step three, fusing the projection depth images

And (3.1) calculating the weight corresponding to each pixel ur and vr of the projection depth image according to the finally obtained depth value dr and the corresponding pixel coordinate px and py. For pixels that are not mapped to, the weight is set to 0.

Wherein, w_d(ur, vr) are weights of the projected depth image pixels.

Step (3.2) for each pixel in the depth key frame, defining its depth value d according to all the projected depth images and the weights in the group_keyframeComprises the following steps:

wherein d is_keyframeFor the pixel values of the set of fused depth key frames, dⁱThe depth value of the projection depth image corresponding to the ith frame image under the pixel,

is the corresponding weight.

Step four, calculating the projection RGB image

It mainly comprises the following steps:

step (4.1) according to the color camera internal reference matrix K_cEach group of the first frame RGB image C₁Each pixel point in (a) and the corresponding depth d in the depth image_cMapping to a three-dimensional space, specifically:

p_c(x,y,z)＝K_c*(u_c,v_c,d_c)^T(6)

wherein p is_cTo mapped three-dimensional point coordinates, u_c,v_cAs pixel coordinates in the RGB image, d_cIs the depth value of the pixel coordinate in the corresponding depth image.

And (4.2) respectively converting the point cloud data into a camera coordinate system of each frame in the group through a camera pose matrix, wherein the method specifically comprises the following steps:

wherein, pr_cAnd the three-dimensional point cloud coordinates under the corresponding frame camera coordinate system.

Step (4.3) according to the internal parameters K of the color camera_cAnd mapping the three-dimensional point cloud to a pixel coordinate system to obtain the corresponding pixel coordinate of the three-dimensional point cloud in the projection RGB image. The method specifically comprises the following steps:

wherein, px_c,py_c,dr_cThe pixel coordinates and the depth values of the three-dimensional point cloud in the projection RGB image are obtained.

Step (4.4) because the pixel coordinate px obtained in step (4.3) is_c,py_cUsually not an integer, and it is necessary to obtain the pixel coordinate u in the projection RGB image by linear interpolation from the original RGB image of the corresponding frame_c,v_cThe gray value of (a).

Step five, projection RGB image fusion

Step (5.1) for each frame RGB image C_iEstimating its degree of motion blur blu_iFor each pixel of the projection RGB image, combining the weight value of each pixel of the corresponding projection depth image, calculating the weight value:

wherein the content of the first and second substances,

and projecting the weight values of the RGB image pixels for the ith frame.

And (5.2) calculating the gray value of each pixel in the RGB key frame according to the gray values and the weights of all the projection RGB images in the group as follows:

wherein, c_keyframePixel values for the set of fused RGB keyframes.

Claims

1. A key frame extraction method facing RGBD three-dimensional reconstruction is characterized by comprising the following steps:

2. The method for extracting key frames for RGBD three-dimensional reconstruction according to claim 1, wherein: in the step (1), the number of frames of each group of data is 5.

3. The method for extracting key frames for RGBD three-dimensional reconstruction according to claim 1, wherein: in the step (5), the weight of the projection RGB image pixel is calculated while considering the weight of the projection depth image pixel and the image blur degree.