CN106875437B - RGBD three-dimensional reconstruction-oriented key frame extraction method - Google Patents

RGBD three-dimensional reconstruction-oriented key frame extraction method Download PDF

Info

Publication number
CN106875437B
CN106875437B CN201611222413.XA CN201611222413A CN106875437B CN 106875437 B CN106875437 B CN 106875437B CN 201611222413 A CN201611222413 A CN 201611222413A CN 106875437 B CN106875437 B CN 106875437B
Authority
CN
China
Prior art keywords
frame
depth
image
projection
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611222413.XA
Other languages
Chinese (zh)
Other versions
CN106875437A (en
Inventor
齐越
韩尹波
王晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201611222413.XA priority Critical patent/CN106875437B/en
Publication of CN106875437A publication Critical patent/CN106875437A/en
Application granted granted Critical
Publication of CN106875437B publication Critical patent/CN106875437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20068Projection on vertical or horizontal image axis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a key frame extraction method for RGBD three-dimensional reconstruction. Firstly, a plurality of data frames adjacent in time are divided into a group for an RGBD data stream acquired by a camera and a camera pose estimated by a visual odometer. For each set of data, projecting each frame of depth image to a first frame of depth image according to the camera pose and the camera parameters; and projecting each group of first frame RGB images to the rest of each frame image, and obtaining the gray value of each projected RGB image in a linear interpolation mode. And then estimating the fuzzy degree of each frame of RGB image, and combining the weight of the corresponding projection depth image to obtain the weight of the projection RGB image. And respectively fusing the projection depth image and the projection RGB image in the group according to the obtained weight to obtain an RGBD key frame. The invention reduces the holes and noise of the data collected by the depth camera, obtains clearer depth images and RGB images, and provides more reliable data sources for other works in three-dimensional reconstruction such as global optimization of camera pose, texture extraction and the like.

Description

RGBD three-dimensional reconstruction-oriented key frame extraction method
Technical Field
The invention belongs to the field of computer vision and computer graphic image processing, in particular to a method for extracting key frames in an RGBD data stream, which provides a more reliable data source for researching camera pose estimation optimization and texture reconstruction in three-dimensional reconstruction based on the RGBD data stream and has important significance for researching three-dimensional reconstruction technology based on the RGBD data.
Background
With the popularization of depth sensors and the development of three-dimensional reconstruction techniques, research on three-dimensional model reconstruction based on RGBD data is emerging in recent years. Compared with the traditional three-dimensional reconstruction based on the RGB image, the depth image provides three-dimensional information of a scene, and the feasibility and the precision of the three-dimensional reconstruction are greatly improved. And the key frame extraction plays an important role in the pose estimation, the camera relocation and the texture reconstruction of the camera.
The current methods for extracting key frames for three-dimensional reconstruction can be classified into the following three categories: the algorithm takes the time stamp or the frame number as a unit, and extracts data at regular intervals as key frames. The method is simple to realize and has extremely high time efficiency, and the selection of the key frame is influenced by the scanning rate; the second is a method based on interframe motion detection, which calculates the relative pose transformation of the camera of each frame data and the last key frame and determines whether to add a key frame sequence according to the size; and thirdly, extracting key frames according to the number of corresponding feature points between frames by a method based on image features, such as Philip and the like. The method has low dependence on the parameters of the camera, but is suitable for being applied to three-dimensional reconstruction with low real-time requirement due to the problem of actual operation efficiency.
The above method for extracting the key frame generally faces the problem of low quality of the key frame, such as noise of a depth image, a hole, motion blur of an RGB image, and the like, and has a certain influence on optimization of a camera pose, texture extraction, and the like.
Disclosure of Invention
In order to overcome the defects, the invention aims to research a key frame extraction method which integrates multi-frame data and improves the data quality under the condition of keeping the characteristics of original data as much as possible by combining the characteristics of RGBD data streams and the requirement of three-dimensional reconstruction and utilizing local accurate camera postures.
In order to achieve the above object, the present invention provides a key frame extraction method for RGBD three-dimensional reconstruction, which includes the following steps:
the method comprises the following steps that (1) RGBD data streams collected by a calibrated depth camera and a calibrated color camera are grouped according to time stamps, and depth images and RGB images of a plurality of adjacent frames and camera pose estimated by a visual odometer are a group of data;
mapping each frame of depth image into a three-dimensional space according to internal parameters of a depth camera for each group of data, then projecting to a first frame of depth image in the group according to the pose of the camera to obtain a projected depth image, updating pixel values of adjacent integer coordinates in the projected depth image in the projection process, and taking a depth value closest to the first frame of depth image in the group as an actual depth value for each pixel of the projected depth image;
calculating a weight value of each pixel of the projection depth image according to the error of the corresponding projection coordinate, and calculating a final pixel value of the depth key frame in a weighted average mode for each frame of projection depth image in the group;
mapping the RGB image of a first frame in each group into a three-dimensional space according to the internal and external parameters of the color camera and the depth camera and the pixel values of the depth key frame, then projecting the RGB images of the other frames in the group according to the camera pose of each frame, calculating the gray value of each pixel in a linear interpolation mode, and obtaining the projection RGB image corresponding to each frame image;
and (5) calculating the motion blur degree of each frame of input RGB image, calculating the weight corresponding to each pixel in the corresponding projection RGB image by combining the weight of the projection depth image, projecting the RGB image for each frame in the group, and calculating the gray value of the key frame in a weighting median manner.
In the step (1), the number of frames of each group of data is 5.
In the step (5), the weight of the projection RGB image pixel is calculated while considering the weight of the projection depth image pixel and the image blur degree.
The principle of the invention is as follows: firstly, for RGBD data streams collected by a camera, a plurality of adjacent frames of RGB and depth images are divided into a group, so that the similarity and continuity of data in the group are ensured. And mapping each frame of depth image to a three-dimensional space according to the internal parameters of the depth camera to obtain a corresponding three-dimensional point cloud, projecting the three-dimensional point cloud to a first frame of depth image of a group where each frame is located according to the camera pose of each frame, and calculating the projected depth image and a corresponding weight. And fusing all the projection depth images in the group in a weighted average mode to obtain a depth key frame. According to the internal and external parameters of the depth camera and the color camera and the pixel value of the depth key frame, mapping each group of first frame RGB images to a three-dimensional space to obtain corresponding three-dimensional point clouds, then respectively projecting the three-dimensional point clouds onto each frame RGB image in the group according to the camera pose of each frame, and calculating the pixel gray value by using a linear interpolation method to obtain a projection RGB image. When calculating the weight of the projection RGB image, the weight of the corresponding projection depth image and the motion blur degree of the RGB image are considered, and for the RGB image with low motion blur degree, the corresponding projection RGB image is endowed with higher weight. And finally, fusing all the projection RGB images in the group by a weighting median method to obtain an RGB key frame.
The method deeply analyzes the requirement on the RGBD key frame in the three-dimensional reconstruction, and has the advantages compared with the prior key frame extraction technology aiming at the three-dimensional reconstruction:
(1) the characteristic that the quality of original data collected by the depth camera is low and the high-precision characteristic of pose estimation of the local camera are considered, and holes and noise of a single-frame depth image are effectively reduced by fusing multi-frame depth images.
(2) The characteristic that motion blur exists in the original data acquired in the motion process of the camera is considered, the mapping capacity of the pixel plane provided by the depth image to the three-dimensional space is combined, the motion blur possibly brought by a single-frame RGB image is effectively reduced by fusing multi-frame RGB images, and the precision of the RGB images is improved.
Drawings
Fig. 1 shows an original depth image and a corresponding projected depth image after projection in the present invention, where fig. 1(a) is the original depth image and fig. 1(b) is the corresponding projected depth image;
FIG. 2 shows an original RGB image and a corresponding projected RGB image after projection in the present invention, wherein FIG. 2(a) is the original RGB image and FIG. 2(b) is the corresponding projected RGB image;
FIG. 3 shows key frames of depth images before and after fusion in the present invention, wherein FIG. 3(a) is the depth image before fusion, and FIG. 3(b) is the depth image after fusion;
FIG. 4 shows key frames of RGB images before and after fusion in the present invention, wherein FIG. 4(a) is the RGB image before fusion, and FIG. 4(b) is the RGB image after fusion;
fig. 5 shows a schematic diagram of key frame extraction for RGBD three-dimensional reconstruction according to the present invention.
Detailed Description
The embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The implementation process of the invention is mainly divided into five steps: grouping RGBD data frames, calculating a projection depth image, calculating a projection RGB image, and fusing projection data.
Step one, grouping RGBD data frames
For a given registered RGBD data stream Input1~InputnSeveral frames of RGB images with adjacent time stamps (with C)1~CkFor example), depth image (in D)1~DkFor example) and corresponding camera pose (in T)1~TkFor example) into one group.
Step two, calculating a projection depth image
The method mainly comprises the following steps:
step (2.1) according to the internal parameter K of the depth cameradWill D1~DkEach pixel point in the three-dimensional space is mapped to the three-dimensional space respectively, and the method specifically comprises the following steps:
p=Kd*(u,v,d)T(1)
wherein p is the mapped three-dimensional point coordinate, KdIs the internal reference matrix (3 x 3) of the depth camera, u and v are the original pixel coordinates, and d is the corresponding depth value under the pixel coordinates.
And (2.2) converting the point cloud data of each frame into a camera coordinate system of a first frame in the group through a camera pose matrix, wherein the method specifically comprises the following steps:
Figure GDA0002332909680000041
wherein pr is three-dimensional point cloud coordinate corresponding to the first frame of camera coordinate system, T1The camera pose matrix, T, for the first frame of the groupiIs the camera pose matrix of the ith frame.
Step (2.3) according to the internal parameter K of the depth cameradAnd mapping the three-dimensional point cloud to a pixel coordinate system to obtain the corresponding pixel coordinate in the projection depth image. The method specifically comprises the following steps:
Figure GDA0002332909680000042
where px, py are the mapped pixel coordinate values, and dr is the corresponding depth value.
Step (2.4) since the pixel coordinates px, py obtained in step (2.3) are usually not integers, the coordinates of the 4 pixels adjacent to the pixel coordinates in the projection depth image need to be updated. And for the condition that a plurality of three-dimensional points are mapped to the same pixel coordinate, taking the depth value as the value closest to the original depth image of the first frame.
Step three, fusing the projection depth images
And (3.1) calculating the weight corresponding to each pixel ur and vr of the projection depth image according to the finally obtained depth value dr and the corresponding pixel coordinate px and py. For pixels that are not mapped to, the weight is set to 0.
Figure GDA0002332909680000043
Wherein, wd(ur, vr) are weights of the projected depth image pixels.
Step (3.2) for each pixel in the depth key frame, defining its depth value d according to all the projected depth images and the weights in the groupkeyframeComprises the following steps:
Figure GDA0002332909680000044
wherein d iskeyframeFor the pixel values of the set of fused depth key frames, diThe depth value of the projection depth image corresponding to the ith frame image under the pixel,
Figure GDA0002332909680000045
is the corresponding weight.
Step four, calculating the projection RGB image
It mainly comprises the following steps:
step (4.1) according to the color camera internal reference matrix KcEach group of the first frame RGB image C1Each pixel point in (a) and the corresponding depth d in the depth imagecMapping to a three-dimensional space, specifically:
pc(x,y,z)=Kc*(uc,vc,dc)T(6)
wherein p iscTo mapped three-dimensional point coordinates, uc,vcAs pixel coordinates in the RGB image, dcIs the depth value of the pixel coordinate in the corresponding depth image.
And (4.2) respectively converting the point cloud data into a camera coordinate system of each frame in the group through a camera pose matrix, wherein the method specifically comprises the following steps:
Figure GDA0002332909680000051
wherein, prcAnd the three-dimensional point cloud coordinates under the corresponding frame camera coordinate system.
Step (4.3) according to the internal parameters K of the color cameracAnd mapping the three-dimensional point cloud to a pixel coordinate system to obtain the corresponding pixel coordinate of the three-dimensional point cloud in the projection RGB image. The method specifically comprises the following steps:
Figure GDA0002332909680000052
wherein, pxc,pyc,drcThe pixel coordinates and the depth values of the three-dimensional point cloud in the projection RGB image are obtained.
Step (4.4) because the pixel coordinate px obtained in step (4.3) isc,pycUsually not an integer, and it is necessary to obtain the pixel coordinate u in the projection RGB image by linear interpolation from the original RGB image of the corresponding framec,vcThe gray value of (a).
Step five, projection RGB image fusion
Step (5.1) for each frame RGB image CiEstimating its degree of motion blur bluiFor each pixel of the projection RGB image, combining the weight value of each pixel of the corresponding projection depth image, calculating the weight value:
Figure GDA0002332909680000053
wherein the content of the first and second substances,
Figure GDA0002332909680000054
and projecting the weight values of the RGB image pixels for the ith frame.
And (5.2) calculating the gray value of each pixel in the RGB key frame according to the gray values and the weights of all the projection RGB images in the group as follows:
Figure GDA0002332909680000055
wherein, ckeyframePixel values for the set of fused RGB keyframes.

Claims (3)

1. A key frame extraction method facing RGBD three-dimensional reconstruction is characterized by comprising the following steps:
the method comprises the following steps that (1) RGBD data streams collected by a calibrated depth camera and a calibrated color camera are grouped according to time stamps, and depth images and RGB images of a plurality of adjacent frames and camera pose estimated by a visual odometer are a group of data;
mapping each frame of depth image into a three-dimensional space according to internal parameters of a depth camera for each group of data, then projecting to a first frame of depth image in the group according to the pose of the camera to obtain a projected depth image, updating pixel values of adjacent integer coordinates in the projected depth image in the projection process, and taking a depth value closest to the first frame of depth image in the group as an actual depth value for each pixel of the projected depth image;
calculating a weight value of each pixel of the projection depth image according to the error of the corresponding projection coordinate, and calculating a final pixel value of the depth key frame in a weighted average mode for each frame of projection depth image in the group;
mapping the RGB image of a first frame in each group into a three-dimensional space according to the internal and external parameters of the color camera and the depth camera and the pixel values of the depth key frame, then projecting the RGB images of the other frames in the group according to the camera pose of each frame, calculating the gray value of each pixel in a linear interpolation mode, and obtaining the projection RGB image corresponding to each frame image;
and (5) calculating the motion blur degree of each frame of input RGB image, calculating the weight corresponding to each pixel in the corresponding projection RGB image by combining the weight of the projection depth image, projecting the RGB image for each frame in the group, and calculating the gray value of the key frame in a weighting median manner.
2. The method for extracting key frames for RGBD three-dimensional reconstruction according to claim 1, wherein: in the step (1), the number of frames of each group of data is 5.
3. The method for extracting key frames for RGBD three-dimensional reconstruction according to claim 1, wherein: in the step (5), the weight of the projection RGB image pixel is calculated while considering the weight of the projection depth image pixel and the image blur degree.
CN201611222413.XA 2016-12-27 2016-12-27 RGBD three-dimensional reconstruction-oriented key frame extraction method Active CN106875437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611222413.XA CN106875437B (en) 2016-12-27 2016-12-27 RGBD three-dimensional reconstruction-oriented key frame extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611222413.XA CN106875437B (en) 2016-12-27 2016-12-27 RGBD three-dimensional reconstruction-oriented key frame extraction method

Publications (2)

Publication Number Publication Date
CN106875437A CN106875437A (en) 2017-06-20
CN106875437B true CN106875437B (en) 2020-03-17

Family

ID=59164925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611222413.XA Active CN106875437B (en) 2016-12-27 2016-12-27 RGBD three-dimensional reconstruction-oriented key frame extraction method

Country Status (1)

Country Link
CN (1) CN106875437B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292965B (en) * 2017-08-03 2020-10-13 北京航空航天大学青岛研究院 Virtual and real shielding processing method based on depth image data stream
CN107862735B (en) * 2017-09-22 2021-03-05 北京航空航天大学青岛研究院 RGBD three-dimensional scene reconstruction method based on structural information
CN108307174A (en) * 2018-01-26 2018-07-20 上海深视信息科技有限公司 A kind of depth image sensor precision improvement method and system
CN109191526B (en) * 2018-09-10 2020-07-07 杭州艾米机器人有限公司 Three-dimensional environment reconstruction method and system based on RGBD camera and optical encoder
CN109544677B (en) * 2018-10-30 2020-12-25 山东大学 Indoor scene main structure reconstruction method and system based on depth image key frame
CN109658449B (en) * 2018-12-03 2020-07-10 华中科技大学 Indoor scene three-dimensional reconstruction method based on RGB-D image
CN110503688B (en) * 2019-08-20 2022-07-22 上海工程技术大学 Pose estimation method for depth camera
CN111127633A (en) * 2019-12-20 2020-05-08 支付宝(杭州)信息技术有限公司 Three-dimensional reconstruction method, apparatus, and computer-readable medium
CN112802183A (en) * 2021-01-20 2021-05-14 深圳市日出印像数字科技有限公司 Method and device for reconstructing three-dimensional virtual scene and electronic equipment
CN113359154A (en) * 2021-05-24 2021-09-07 邓良波 Indoor and outdoor universal high-precision real-time measurement method
CN113938664A (en) * 2021-09-10 2022-01-14 思特威(上海)电子科技股份有限公司 Signal acquisition method of pixel array, image sensor, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034267A (en) * 2010-11-30 2011-04-27 中国科学院自动化研究所 Three-dimensional reconstruction method of target based on attention
KR20140108828A (en) * 2013-02-28 2014-09-15 한국전자통신연구원 Apparatus and method of camera tracking
CN103247075B (en) * 2013-05-13 2015-08-19 北京工业大学 Based on the indoor environment three-dimensional rebuilding method of variation mechanism
US9779508B2 (en) * 2014-03-26 2017-10-03 Microsoft Technology Licensing, Llc Real-time three-dimensional reconstruction of a scene from a single camera
CN104537709B (en) * 2014-12-15 2017-09-29 西北工业大学 It is a kind of that method is determined based on the real-time three-dimensional reconstruction key frame that pose changes
CN105809681A (en) * 2016-03-04 2016-07-27 清华大学 Single camera based human body RGB-D data restoration and 3D reconstruction method
CN106251399B (en) * 2016-08-30 2019-04-16 广州市绯影信息科技有限公司 A kind of outdoor scene three-dimensional rebuilding method and implementing device based on lsd-slam

Also Published As

Publication number Publication date
CN106875437A (en) 2017-06-20

Similar Documents

Publication Publication Date Title
CN106875437B (en) RGBD three-dimensional reconstruction-oriented key frame extraction method
CN110264416B (en) Sparse point cloud segmentation method and device
CN107833253B (en) RGBD three-dimensional reconstruction texture generation-oriented camera attitude optimization method
CN106780576B (en) RGBD data stream-oriented camera pose estimation method
CN113052835B (en) Medicine box detection method and system based on three-dimensional point cloud and image data fusion
CN110706269B (en) Binocular vision SLAM-based dynamic scene dense modeling method
CN111027415B (en) Vehicle detection method based on polarization image
CN109525786B (en) Video processing method and device, terminal equipment and storage medium
CN113362247A (en) Semantic live-action three-dimensional reconstruction method and system of laser fusion multi-view camera
CN110688905A (en) Three-dimensional object detection and tracking method based on key frame
CN102457724B (en) Image motion detecting system and method
CN109934873B (en) Method, device and equipment for acquiring marked image
CN111524233A (en) Three-dimensional reconstruction method for dynamic target of static scene
CN104079800A (en) Shaking preventing method for video image in video surveillance
CN110544294A (en) dense three-dimensional reconstruction method based on panoramic video
CN112232356A (en) Event camera denoising method based on cluster degree and boundary characteristics
CN112085031A (en) Target detection method and system
KR20140074201A (en) Tracking device
TW201436552A (en) Method and apparatus for increasing frame rate of an image stream using at least one higher frame rate image stream
KR101125061B1 (en) A Method For Transforming 2D Video To 3D Video By Using LDI Method
CN112906675B (en) Method and system for detecting non-supervision human body key points in fixed scene
CN110322479B (en) Dual-core KCF target tracking method based on space-time significance
CN111160262A (en) Portrait segmentation method fusing human body key point detection
CN106446764B (en) Video object detection method based on improved fuzzy color aggregated vector
CN115273080A (en) Lightweight visual semantic odometer method for dynamic scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant