CN106910242B - Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera - Google Patents

Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera Download PDF

Info

Publication number
CN106910242B
CN106910242B CN201710051366.5A CN201710051366A CN106910242B CN 106910242 B CN106910242 B CN 106910242B CN 201710051366 A CN201710051366 A CN 201710051366A CN 106910242 B CN106910242 B CN 106910242B
Authority
CN
China
Prior art keywords
depth image
depth
frame
fusion
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710051366.5A
Other languages
Chinese (zh)
Other versions
CN106910242A (en
Inventor
李建伟
高伟
吴毅红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201710051366.5A priority Critical patent/CN106910242B/en
Publication of CN106910242A publication Critical patent/CN106910242A/en
Application granted granted Critical
Publication of CN106910242B publication Critical patent/CN106910242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20028Bilateral filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a system for performing indoor complete scene three-dimensional reconstruction based on a consumption-level depth camera. The method comprises the steps of obtaining a depth image and carrying out self-adaptive bilateral filtering; carrying out visual odometer estimation by utilizing the filtered depth image, automatically segmenting an image sequence based on visual contents, carrying out closed-loop detection between segments, and carrying out global optimization; and performing weighted volume data fusion according to the optimized camera track information so as to reconstruct an indoor complete scene three-dimensional model. The embodiment of the invention realizes the edge protection and the noise removal of the depth map through the self-adaptive bilateral filtering algorithm, can effectively reduce the accumulated error in the estimation of the visual odometer and improve the registration precision based on the automatic segmentation algorithm of the visual content, and can effectively keep the geometric details of the surface of the object by adopting the weighted volume data fusion algorithm. Therefore, the technical problem of how to improve the three-dimensional reconstruction precision in the indoor scene is solved, and a complete, accurate and refined indoor scene model can be obtained.

Description

Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera
Technical Field
The invention relates to the technical field of computer vision, in particular to a method and a system for performing indoor complete scene three-dimensional reconstruction based on a consumption-level depth camera.
Background
High-precision three-dimensional reconstruction of an indoor scene is one of challenging research subjects in computer vision, and relates to theories and technologies in multiple fields of computer vision, computer graphics, pattern recognition, optimization and the like. There are many ways to realize three-dimensional reconstruction, and the traditional method adopts ranging sensors such as laser and radar or structured light technology to acquire the structural information of the scene or object surface for three-dimensional reconstruction, but most of the instruments are expensive and not easy to carry, so the application occasions are limited. With the development of computer vision technology, researchers have begun to research three-dimensional reconstruction using purely visual methods, and a great deal of useful research work has emerged.
After the consumer-grade depth camera is pushed out by Microsoft Kinect, people can directly and conveniently reconstruct the indoor scene in three dimensions by using the depth data. The KinectFusion algorithm proposed by Newcombe et al obtains depth information of each Point in an image by using Kinect, aligns coordinates of a three-dimensional Point under a current frame camera coordinate system with coordinates in a global model by Iterative approximate Closest Point (ICP) algorithm to estimate a pose of the current frame camera, and iteratively performs volume data fusion by a curved surface hidden Function (TSDF) to obtain a dense three-dimensional model. Although the Kinect acquisition depth is not influenced by illumination conditions and texture richness, the depth data range is only 0.5-4m, and the position and size of the grid model are fixed, so that the Kinect acquisition depth is only suitable for local and static indoor scenes.
Three-dimensional reconstruction of indoor scenes based on a consumer-grade depth camera generally has the following problems: (1) depth images acquired by a consumer-grade depth camera are low in resolution and high in noise, so that details on the surface of an object are difficult to maintain, and the depth value range is limited and cannot be directly used for three-dimensional reconstruction of a complete scene; (2) accumulated errors generated by camera pose estimation can cause wrong and distorted three-dimensional models; (3) the consumer-grade depth camera is generally hand-held shooting, the motion state of the camera is random, the quality of the obtained data is good or bad, and the reconstruction effect is influenced.
To perform a complete three-dimensional reconstruction of an indoor scene, whalan et al proposed a kininuous algorithm, which is a further extension of KinectFusion. The algorithm solves the problem of the consumption of the video memory of a grid model during the reconstruction of a large scene by using a mode of circularly utilizing the video memory of ShiftingTSDFVolume, searches matched key frames through DBoW for closed-loop detection, and finally optimizes a pose graph and a model, so that a large-scene three-dimensional model is obtained. Choi et al proposed an Elastic Fragment idea, segmenting the RGBD data stream every 50 frames, performing visual odometry estimation on each segment separately, extracting a geometric descriptor FPFH from point cloud data between every two segments to search for matching for closed loop detection, introducing line processes constraint to optimize the detection result and remove the wrong closed loop, and finally performing volume data fusion by using the optimized odometry information. The reconstruction of an indoor complete scene is realized through segmentation processing and closed-loop detection, but the preservation of the local geometric details of an object is not considered, and the fixed segmentation method is not robust when the reconstruction of a real indoor scene is carried out. Zeng et al proposed a 3D Match descriptor concept, which includes performing fixed segmentation processing on RGBD data streams and reconstructing the data streams to obtain local models, extracting key points from each segmented 3D model as the input of a 3D convolutional network (ConvNet), using feature vectors obtained by learning the network as the input of another matrix network (Metric network), and outputting matching results through similarity comparison. Due to the fact that the deep network has obvious feature learning advantages, compared with other descriptors, the reconstruction accuracy can be improved by using 3D Match for geometric registration. However, the method needs to perform local three-dimensional reconstruction first, perform geometric registration by using a deep learning network, and then output a global three-dimensional model, and network training needs a large amount of data, so that the efficiency of the whole reconstruction process is low.
In the aspect of improving the three-dimensional reconstruction precision, Angela et al propose a VSBR algorithm, and the main idea is to perform hierarchical optimization on TSDF data by using a Shape From Shaping (SFS) technique and then perform fusion to solve the problem that the surface details of an object are lost due to excessive smoothness during TSDF data fusion, thereby obtaining a relatively fine three-dimensional structure model. However, the method is only effective for monomer reconstruction under an ideal light source, and the accuracy improvement of an indoor scene is not obvious due to large light source change.
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the technical problem of how to improve the three-dimensional reconstruction accuracy in an indoor scene, a method and a system for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera are provided.
In order to achieve the above object, on one hand, the following technical solutions are provided:
a method for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera may include:
acquiring a depth image;
performing adaptive bilateral filtering on the depth image;
carrying out block fusion and registration processing based on visual contents on the filtered depth image;
and according to the processing result, performing weighted volume data fusion so as to reconstruct an indoor complete scene three-dimensional model.
Preferably, the performing adaptive bilateral filtering on the depth image specifically includes:
adaptive bilateral filtering is performed according to:
Figure BDA0001217870140000031
wherein said u and said ukRespectively representing any pixel and a domain pixel thereof on the depth image; the Z (u) and the Z (u)k) Respectively represent corresponding to the u and the ukDepth value of (d); the above-mentioned
Figure BDA0001217870140000032
Representing the corresponding depth value after filtering; said W represents in the field
Figure BDA0001217870140000035
A normalization factor of (a); said wsAnd said wcRepresenting a Gaussian kernel filtered in the spatial and value domains, respectivelyAnd (4) counting.
Preferably, the gaussian kernel function filtered in the spatial domain and the value domain is determined according to the following equation:
Figure BDA0001217870140000033
wherein, the deltasAnd said deltacThe variance of the spatial domain and the value domain gaussian kernel function respectively;
wherein, the deltasAnd said deltacDetermined according to the following formula:
Figure BDA0001217870140000034
wherein f represents a focal length of the depth camera, KsAnd said KcRepresenting a constant.
Preferably, the process of performing the visual content-based block fusion and registration on the filtered depth image specifically includes: and segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection.
Preferably, the segmenting the depth image sequence based on the visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection specifically includes:
segmenting a depth image sequence based on an automatic segmentation method for visual content detection, dividing similar depth image contents into segments, performing block fusion on each segment, determining a transformation relation between the depth images, and performing closed-loop detection between the segments according to the transformation relation so as to realize global optimization.
Preferably, the automatic segmentation method based on visual content detection is configured to segment a depth image sequence, divide similar depth image contents into one segment, perform block fusion on each segment, determine a transformation relationship between the depth images, and perform closed-loop detection between segments according to the transformation relationship, so as to implement global optimization, and specifically includes:
estimating a visual odometer by adopting a Kintinuous frame to obtain camera pose information under each frame of depth image;
according to the camera pose information, back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing a camera pose and segmenting when the similarity is lower than a similarity threshold value;
extracting a PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain a matching relation between the segments;
and constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by using a G2O frame to obtain optimized camera track information, thereby realizing the global optimization.
Preferably, the back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity between the depth image obtained after projection and the depth image of the initial frame, and initializing a camera pose and segmenting when the similarity is lower than a similarity threshold, specifically comprising:
step 1: calculating the similarity between each frame of depth image and a first frame of depth image;
step 2: judging whether the similarity is lower than a similarity threshold value;
and step 3: if yes, segmenting the depth image sequence;
and 4, step 4: and taking the depth image of the next frame as the depth image of the starting frame of the next segment, and repeatedly executing the step 1 and the step 2 until all the depth images of the frames are processed.
Preferably, the step 1 specifically includes:
according to the projection relation and the depth value of any frame of depth image, calculating a first space three-dimensional point corresponding to each pixel on the depth image by using the following formula:
p=π-1(up,Z(up))
wherein u ispIs any pixel on the depth image; z (u) as defined abovep) And said p represents said u, respectivelypCorresponding depth values and the first spatial three-dimensional points; the pi represents the projection relation;
and rotationally translating the first space three-dimensional point to a world coordinate system according to the following formula to obtain a second space three-dimensional point:
q=Tip
wherein, T isiRepresenting a rotation translation matrix from the spatial three-dimensional point corresponding to the depth map of the ith frame to a world coordinate system; the p represents the first three-dimensional point in space, and the q represents the second three-dimensional point in space; the i is a positive integer;
and back projecting the second space three-dimensional point to a two-dimensional image plane according to the following formula to obtain a projected depth image:
Figure BDA0001217870140000051
wherein u isqIs the pixel on the projected depth image corresponding to said q; f isxThe above-mentioned fyC to cxAnd c is as describedyRepresenting an internal reference of the depth camera; said xq、yq、zqCoordinates representing said q; the T represents a transpose of a matrix;
and respectively calculating the number of effective pixels on the depth image of the initial frame and the depth image projected by any frame, and taking the ratio of the two as the similarity.
Preferably, the performing weighted volume data fusion according to the processing result, so as to reconstruct the indoor complete scene three-dimensional model specifically includes: and according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function grid model, and representing a three-dimensional space by using a voxel grid so as to obtain an indoor complete scene three-dimensional model.
Preferably, according to the processing result, the depth image of each frame is fused by using a truncated symbolic distance function grid model, and a three-dimensional space is represented by using a voxel grid, so as to obtain a three-dimensional model of an indoor complete scene, specifically including:
based on the noise characteristics and the interest region model, performing weighted fusion on the truncated symbol distance function data by using a Volumetric method frame;
and (4) extracting a Mesh model by adopting a Marching cubes algorithm, thereby obtaining the indoor complete scene three-dimensional model.
Preferably, the truncated symbol distance function is determined according to:
fi(v)=[K-1zi(u)[uT,1]T]z-[vi]z
wherein f isi(v) Representing a truncated sign distance function, namely the distance from the grid to the surface of the object model, wherein the positive and negative represent that the grid is positioned on the shielded side or the visible side of the surface, and the zero-crossing point is a point on the surface; the K represents an intrinsic parameter matrix of the camera; the u represents a pixel; z isi(u) representing a depth value corresponding to the pixel u; v isiThe voxels are represented.
Preferably, the data weighted fusion is performed according to the following formula:
Figure BDA0001217870140000061
wherein v represents a voxel; f isi(v) And said wi(v) Respectively representing a truncated symbol distance function and a weight function thereof corresponding to the voxel v; the n is a positive integer; f (v) represents a truncated sign distance function value corresponding to the voxel v after fusion; w (v) represents the weight of the truncated sign distance function value corresponding to the voxel v after fusion;
wherein the weight function may be determined according to the following equation:
Figure BDA0001217870140000062
wherein d isiA radius representing the region of interest; delta. thesIs the noise variance in the depth data; w is a constant.
In order to achieve the above object, in another aspect, there is also provided a system for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera, the system including:
the acquisition module is used for acquiring a depth image;
the filtering module is used for carrying out self-adaptive bilateral filtering on the depth image;
the block fusion and registration module is used for carrying out block fusion and registration processing based on visual content on the filtered depth image;
and the volume data fusion module is used for carrying out weighted volume data fusion according to the processing result so as to reconstruct an indoor complete scene three-dimensional model.
Preferably, the filtering module is specifically configured to:
adaptive bilateral filtering is performed according to:
Figure BDA0001217870140000071
wherein said u and said ukRespectively representing any pixel and a domain pixel thereof on the depth image; the Z (u) and the Z (u)k) Respectively represent corresponding to the u and the ukDepth value of (d); the above-mentionedRepresenting the corresponding depth value after filtering; said W represents in the field
Figure BDA0001217870140000073
A normalization factor of (a); said wsAnd said wcRepresenting the gaussian kernel filtered in the spatial and value domains, respectively.
Preferably, the block fusion and registration module may be specifically configured to: and segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection.
Preferably, the block fusion and registration module may be further specifically configured to:
segmenting a depth image sequence based on an automatic segmentation method for visual content detection, dividing similar depth image contents into segments, performing block fusion on each segment, determining a transformation relation between the depth images, and performing closed-loop detection between the segments according to the transformation relation so as to realize global optimization.
Preferably, the block fusion and registration module specifically includes:
the camera pose information acquisition unit is used for estimating a visual odometer by adopting a Kintinuous frame to obtain camera pose information under each frame of depth image;
the segmentation unit is used for back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing a camera pose for segmentation when the similarity is lower than a similarity threshold value;
the registration unit is used for extracting the PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain the matching relationship between the segments;
and the optimization unit is used for constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by adopting a G2O frame to obtain optimized camera track information so as to realize the global optimization.
Preferably, the segmentation unit specifically includes:
the calculating unit is used for calculating the similarity between each frame of depth image and the first frame of depth image;
the judging unit is used for judging whether the similarity is lower than a similarity threshold value;
a segmentation subunit, configured to segment the depth image sequence when the similarity is lower than a similarity threshold;
and the processing unit is used for taking the next frame depth image as the starting frame depth image of the next segmentation, and repeatedly executing the calculating unit and the judging unit until all the frame depth images are processed.
Preferably, the volume data fusion module is specifically configured to: and according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function grid model, and representing a three-dimensional space by using a voxel grid so as to obtain an indoor complete scene three-dimensional model.
Preferably, the volume data fusion module specifically includes:
the weighted fusion unit is used for performing weighted fusion on the truncated symbol distance function data by using a Volumetric method frame based on the noise characteristics and the interest region;
and the extraction unit is used for extracting the Mesh model by adopting a Marching cubes algorithm so as to obtain the indoor complete scene three-dimensional model.
The embodiment of the invention provides a method and a system for performing indoor complete scene three-dimensional reconstruction based on a consumption-level depth camera. The method comprises the steps of obtaining a depth image; carrying out self-adaptive bilateral filtering on the depth image; carrying out block fusion and registration processing based on visual contents on the filtered depth image; and according to the processing result, performing weighted volume data fusion so as to reconstruct an indoor complete scene three-dimensional model. According to the embodiment of the invention, the depth image is subjected to the block fusion and registration based on the visual content, so that the accumulated error in the estimation of the visual odometer can be effectively reduced, the registration precision is improved, and the geometric details of the object surface can be effectively maintained by adopting a weighted volume data fusion algorithm, so that the technical problem of how to improve the three-dimensional reconstruction precision in an indoor scene is solved, and a complete, accurate and refined indoor scene model can be obtained.
Drawings
Fig. 1 is a schematic flow chart of a method for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera according to an embodiment of the present invention;
FIG. 2a is a color image corresponding to a depth image according to an embodiment of the present invention;
FIG. 2b is a schematic diagram of a point cloud derived from a depth image according to an embodiment of the invention;
FIG. 2c is a schematic diagram of a point cloud obtained by bilateral filtering of a depth image according to an embodiment of the present invention;
FIG. 2d is a schematic diagram of a point cloud obtained by performing adaptive bilateral filtering on a depth image according to an embodiment of the present invention
FIG. 3 is a flow chart of visual content segmentation based fusion and registration according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a weighted volumetric data fusion process according to an embodiment of the present invention;
FIG. 5a is a schematic diagram of a three-dimensional reconstruction result using a non-weighted volumetric data fusion algorithm;
FIG. 5b is a schematic partial detail view of the three-dimensional model of FIG. 5 a;
fig. 5c is a schematic diagram of a three-dimensional reconstruction result obtained by the weighted volume data fusion algorithm according to the embodiment of the present invention;
FIG. 5d is a schematic partial detail view of the three-dimensional model of FIG. 5 c;
fig. 6 is a schematic diagram illustrating an effect of performing three-dimensional reconstruction on a 3D Scene Data set by using the method according to the embodiment of the present invention;
FIG. 7 is a schematic diagram illustrating the effect of three-dimensional reconstruction performed on an Augmented ICL-NUIM Dataset by using the method proposed by the embodiment of the present invention according to the embodiment of the present invention;
FIG. 8 is a schematic diagram illustrating an effect of three-dimensional reconstruction using indoor scene data collected by Microsoft Kinect for Windows according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a system for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The embodiment of the invention provides a method for carrying out indoor complete scene three-dimensional reconstruction based on a consumption-level depth camera. As shown in fig. 1, the method includes:
s100: a depth image is acquired.
Specifically, the step may include: depth images are acquired with a consumer-grade depth camera based on structured light principles.
Among them, a consumer depth camera (Microsoft Kinect for Windows and rotation, abbreviated as depth camera) based on the structured light principle obtains depth data of a depth image by emitting structured light and receiving reflection information.
In practical applications, a handheld consumer-grade depth camera Microsoft Kinect for Windows can be used to collect real indoor scene data.
The depth data may be calculated according to the following equation:
wherein f represents the focal length of the consumer-level depth camera; b represents a baseline; d represents the parallax.
S110: and carrying out self-adaptive bilateral filtering on the depth image.
The step carries out self-adaptive bilateral filtering on the acquired depth image by using the noise characteristics of the consumer-grade depth camera based on the structured light principle.
The adaptive bilateral filtering algorithm is to perform filtering in both the spatial domain and the value domain of the depth image.
In practical application, parameters of the adaptive bilateral filtering algorithm can be set according to the noise characteristics and internal parameters of the depth camera, so that noise can be effectively removed and edge information is kept.
The depth Z is partially derived with respect to the disparity D, and the following relationship exists:
Figure BDA0001217870140000102
the noise of the depth data is mainly generated in the quantization process, and it can be seen from the above equation that the variance of the depth noise is proportional to the depth value squared, i.e. the larger the depth value, the larger the noise. In order to effectively remove noise in the depth image, the embodiment of the invention defines a filtering algorithm based on the noise characteristic.
Specifically, the adaptive bilateral filtering may be performed according to the following formula:
Figure BDA0001217870140000103
wherein u and ukRespectively representing any pixel and a domain pixel on the depth image; z (u) and Z (u)k) Respectively represent the corresponding u and ukDepth value of (d);
Figure BDA0001217870140000104
representing the corresponding depth value after filtering; w denotes in the field
Figure BDA0001217870140000105
A normalization factor of (a); w is asAnd wcRepresenting the gaussian kernel filtered in the spatial and value domains, respectively.
In the above embodiments, wsAnd wcCan be determined according to the following equation:
Figure BDA0001217870140000111
wherein, deltasAnd deltacThe variance of the spatial domain and value domain gaussian kernel functions, respectively.
δsAnd deltacThe value of the depth value is not fixed in relation to the size of the depth value.
Specifically, in the above-described embodiment, δsAnd deltacCan be determined according to the following equation:
where f denotes the focal length of the depth camera, KsAnd KcRepresenting constants whose specific values are related to the parameters of the depth camera.
Fig. 2a-d schematically show a comparison of the effect of different filtering algorithms. Wherein fig. 2a shows a color image corresponding to the depth image. Fig. 2b shows a point cloud obtained from a depth image. Fig. 2c shows the point cloud resulting from bilateral filtering of the depth image. Fig. 2d shows the point cloud resulting from adaptive bilateral filtering of the depth image.
The embodiment of the invention can realize edge protection and denoising of the depth map by adopting a self-adaptive bilateral filtering method.
S120: and carrying out visual content-based block fusion and registration processing on the depth image.
The method comprises the steps of segmenting a depth image sequence based on visual contents, carrying out block fusion on each segment, carrying out closed-loop detection between the segments, and carrying out global optimization on the result of the closed-loop detection. Wherein the depth image sequence is a depth image data stream.
Preferably, this step may include: determining the transformation relation among the depth images, segmenting the depth image sequence based on a visual content automatic segmentation method, dividing similar depth image contents into segments, performing block fusion on each segment, determining the transformation relation among the depth images, performing closed-loop detection among the segments according to the transformation relation, and realizing global optimization.
Further, the step may include:
s121: and (3) estimating a visual odometer by adopting a Kintinuous framework to obtain the camera pose information under each frame of depth image.
S122: and according to the camera pose information, back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing the camera pose for segmentation when the similarity is lower than a similarity threshold value.
S123: extracting PFFH geometric descriptors in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain a matching relationship between the segments.
In the step, closed loop detection is carried out between the sections.
S124: and constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by using a G2O frame to obtain optimized camera track information, thereby realizing global optimization.
In the step, a (SLAC) mode is applied during optimization to improve non-rigid distortion, and line processes are introduced to restrict and delete wrong closed-loop matching.
The step S122 may further specifically include:
s1221: and calculating the similarity of each frame of depth image and the first frame of depth image.
S1222: judging whether the similarity is lower than a similarity threshold, if so, executing a step S1223; otherwise, step S1224 is performed.
S1223: the sequence of depth images is segmented.
This step performs segmentation processing on the depth image sequence based on the visual content. Therefore, the problem of accumulated errors generated by visual odometer estimation can be effectively solved, and similar contents can be fused together, so that the registration accuracy is improved.
S1224: the depth image sequence is not segmented.
S1225: the next frame depth image is taken as the starting frame depth image of the next segment, and step S1221 and step S1222 are repeatedly performed until all frame depth images are processed.
In the above embodiment, the step of calculating the similarity between each frame of depth image and the first frame of depth image may specifically include:
s12211: and calculating a first space three-dimensional point corresponding to each pixel on the depth image according to the projection relation and the depth value of any frame of depth image by using the following formula:
p=π-1(up,Z(up))
wherein u ispIs any pixel on the depth image; z (u)p) And p each represents upCorresponding depth values and first spatial three-dimensional points; and pi represents a projection relation, namely a 2D-3D projection transformation relation of point cloud data corresponding to each frame of depth image back projection to the initial coordinate system.
S12212: and (3) converting the first space three-dimensional point into a world coordinate system by rotating and translating according to the following formula to obtain a second space three-dimensional point:
q=Tip
wherein, TiA rotation and translation matrix from the corresponding space three-dimensional point of the depth map of the ith frame to a world coordinate system is represented, and the rotation and translation matrix can be obtained by estimation of a visual odometer; i is a positive integer; p represents a first three-dimensional point in space, q represents a second three-dimensional point in space, and the coordinates of p and q are respectively:
p=(xp,yp,zp),q=(xq,yq,zq)。
s12213: and back projecting the second space three-dimensional point to a two-dimensional image plane according to the following formula to obtain a projected depth image:
Figure BDA0001217870140000131
wherein u isqIs the pixel on the projected depth image corresponding to q; f. ofx、fy、cxAnd cyRepresenting an internal reference of the depth camera; x is the number ofq、yq、zqCoordinates representing q; t denotes the transpose of the matrix.
S12214: and respectively calculating the number of effective pixels on the depth image of the initial frame and the depth image projected by any frame, and taking the ratio of the two as the similarity.
For example, the similarity is calculated according to the following formula:
Figure BDA0001217870140000132
wherein n is0And niRespectively representing the number of effective pixels on the depth image of the initial frame and the depth image projected by any frame; ρ represents the similarity.
Fig. 3 exemplarily shows a flow diagram of fusion, registration based on visual content segmentation.
The embodiment of the invention adopts the automatic segmentation algorithm based on the visual content, can effectively reduce the accumulated error in the estimation of the visual odometer and improve the registration precision.
S130: and according to the processing result, performing weighted volume data fusion so as to reconstruct an indoor complete scene three-dimensional model.
Specifically, the step may include: and according to the block fusion and registration processing result based on the visual content, fusing the depth image of each frame by using a Truncated Symbolic Distance Function (TSDF) grid model, and representing a three-dimensional space by using a voxel grid, thereby obtaining the indoor complete scene three-dimensional model.
The step may further comprise:
s131: and performing weighted fusion on truncated symbol distance function data by using a Volumetric method framework based on the noise characteristic and the interest region.
S132: and (4) extracting the Mesh model by adopting a Marching cubes algorithm.
In practical application, according to the estimation result of the visual odometer, the depth image of each frame is fused by using the TSDF grid model to represent three-dimensional spaces by using a voxel grid with a resolution of m, that is, each three-dimensional space is divided into m blocks, and each grid v stores two values: truncating the symbol distance function fi(v) And weight w thereofi(v)。
Wherein the truncated symbol distance function may be determined according to:
fi(v)=[K-1zi(u)[uT,1]T]z-[vi]z
wherein f isi(v) Representing a truncated symbolic distance function, i.e. the distance of the mesh to the surface of the object modelPositive and negative indicate whether the mesh is on the occluded side or the visible side of the surface, and the zero crossing points are points on the surface; k represents an intrinsic parameter matrix of the camera; u represents a pixel; z is a radical ofi(u) represents a depth value corresponding to pixel u; v. ofiThe voxels are represented. Wherein the camera may be a depth camera or a depth camcorder.
Wherein the data weighted fusion can be performed according to the following formula:
Figure BDA0001217870140000141
wherein f isi(v) And wi(v) Respectively representing a Truncated Symbolic Distance Function (TSDF) corresponding to the voxel v and a weight function thereof; n is a positive integer; f (v) represents a truncated sign distance function value corresponding to the voxel v after fusion; w (v) represents the weight of the truncated sign distance function value corresponding to the voxel v after fusion.
In the above embodiment, the weight function may be determined according to the noise characteristics of the depth data and the interest region, and the value thereof is not fixed. In order to maintain the geometric details of the surface of the object, the weight of the area with low noise and the area of interest is set to be large, and the weight of the area with high noise or the area without interest is set to be small.
Specifically, the weight function may be determined according to the following equation:
Figure BDA0001217870140000142
wherein d isiThe radius of the interest area is represented, and the smaller the radius is, the more interest is represented, and the weight is larger; deltasThe noise variance in the depth data is consistent with the variance of a kernel function in a spatial domain of the self-adaptive bilateral filtering algorithm in value; w is a constant, which may preferably take the value 1 or 0.
Fig. 4 exemplarily shows a weighted volumetric data fusion process diagram.
The embodiment of the invention adopts the weighted volume data fusion algorithm, can effectively keep the geometric details of the object surface, can obtain a complete, accurate and refined indoor scene model, and has good robustness and expansibility.
FIG. 5a schematically shows the result of a three-dimensional reconstruction using a non-weighted volumetric data fusion algorithm; FIG. 5b schematically shows a partial detail of the three-dimensional model of FIG. 5 a; FIG. 5c is a schematic diagram illustrating a three-dimensional reconstruction result obtained by using a weighted volume data fusion algorithm proposed by an embodiment of the present invention; fig. 5d schematically shows a partial detail of the three-dimensional model in fig. 5 c.
Fig. 6 is a schematic diagram illustrating an effect of three-dimensional reconstruction on a 3D Scene Data set by using the method proposed by the embodiment of the present invention; FIG. 7 is a schematic diagram illustrating the effect of three-dimensional reconstruction using the method proposed by the embodiment of the present invention on an Augmented ICL-NUIM Dataset; fig. 8 exemplarily shows an effect diagram of three-dimensional reconstruction using indoor scene data acquired by Microsoft Kinect for Windows.
It should be noted that although the embodiments of the present invention have been described herein in the above order, those skilled in the art will appreciate that the present invention may be practiced in other than the order described, and that such simple variations are intended to be within the scope of the present invention.
Based on the same technical concept as the method embodiment, an embodiment of the present invention further provides a system for performing indoor complete scene three-dimensional reconstruction based on a consumer-level depth camera, as shown in fig. 9, where the system 90 includes: an acquisition module 92, a filtering module 94, a block fusion and registration module 96, and a volume data fusion module 98. The obtaining module 92 is configured to obtain a depth image. The filtering module 94 is configured to perform adaptive bilateral filtering on the depth image. The block fusion and registration module 96 is configured to perform a visual content-based block fusion and registration process on the filtered depth image. And the volume data fusion module 98 is used for performing weighted volume data fusion according to the processing result so as to reconstruct an indoor complete scene three-dimensional model.
By adopting the technical scheme, the embodiment of the invention can effectively reduce the accumulated error in the estimation of the visual odometer, improve the registration precision, effectively keep the geometric details of the surface of the object and obtain a complete, accurate and refined indoor scene model.
In some embodiments, the filtering module is specifically configured to: adaptive bilateral filtering is performed according to:
Figure BDA0001217870140000161
wherein u and ukRespectively representing any pixel and a domain pixel on the depth image; z (u) and Z (u)k) Respectively represent the corresponding u and ukDepth value of (d);
Figure BDA0001217870140000162
representing the corresponding depth value after filtering; w denotes in the field
Figure BDA0001217870140000163
A normalization factor of (a); w is asAnd wcRepresenting the gaussian kernel filtered in the spatial and value domains, respectively.
In some embodiments, the patch fusion and registration module may be specifically configured to: and segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection.
In other embodiments, the block fusion and registration module may be further specifically configured to: determining the transformation relation among the depth images, segmenting a depth image sequence based on an automatic segmentation method for visual content detection, dividing similar depth image contents into segments, performing block fusion on each segment, determining the transformation relation among the depth images, performing closed-loop detection among the segments according to the transformation relation, and realizing global optimization.
In some preferred embodiments, the block fusion and registration module may specifically include: the device comprises a camera pose information acquisition unit, a segmentation unit, a registration unit and an optimization unit. The camera pose information acquisition unit is used for estimating the visual odometer by adopting a Kintinuous frame to obtain the camera pose information under each frame of depth image. The segmentation unit is used for back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, initializing the camera pose when the similarity is lower than a similarity threshold value, and segmenting. The registration unit is used for extracting the PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain the matching relationship between the segments. The optimization unit is used for constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by using a G2O frame to obtain optimized camera track information, so that global optimization is realized.
The segmentation unit may specifically include: the device comprises a calculating unit, a judging unit, a segmenting subunit and a processing unit. The calculating unit is used for calculating the similarity between each frame of depth image and the first frame of depth image. The judging unit is used for judging whether the similarity is lower than a similarity threshold value. The segmentation subunit is configured to segment the depth image sequence when the similarity is lower than a similarity threshold. The processing unit is used for taking the depth image of the next frame as the depth image of the starting frame of the next segment, and repeatedly executing the calculating unit and the judging unit until all the depth images of the frames are processed.
In some embodiments, the volume data fusion module may be specifically configured to fuse the depth images of the frames by using a truncated symbolic distance function mesh model according to the processing result, and use a voxel mesh to represent a three-dimensional space, so as to obtain a three-dimensional model of an indoor complete scene.
In some embodiments, the volume data fusion module may specifically include a weighted fusion unit and an extraction unit. The weighted fusion unit is used for performing weighted fusion on truncated symbol distance function data by using a Volumetric method frame based on noise characteristics and interest areas. The extraction unit is used for extracting the Mesh model by adopting a Marching cubes algorithm so as to obtain an indoor complete scene three-dimensional model.
The invention is explained in more detail below with reference to a preferred embodiment.
The system for performing indoor complete scene three-dimensional reconstruction based on the consumer-grade depth camera comprises an acquisition module, a filtering module, a block fusion and registration module and a volume data fusion module. Wherein:
the acquisition module is used for acquiring depth images of indoor scenes by using the depth camera.
The filtering module is used for carrying out self-adaptive bilateral filtering processing on the acquired depth image.
The acquisition module is an equivalent replacement of the acquisition module. In practical applications, a handheld consumer-grade depth camera Microsoft Kinect for Windows can be used to collect real indoor scene data. Then, the acquired depth image is subjected to self-adaptive bilateral filtering, and parameters in the self-adaptive bilateral filtering method are automatically set according to the noise characteristics and the internal parameters of the depth camera, so that the embodiment of the invention can effectively remove noise and keep edge information.
The block fusion and registration module is used for automatically segmenting the data stream based on visual content, carrying out block fusion on each segment, carrying out closed-loop detection between segments and carrying out global optimization on the result of the closed-loop detection.
The block fusion and registration module performs automatic block fusion and registration based on visual contents.
In a more preferred embodiment, the block fusion and registration module specifically includes: the system comprises a pose information acquisition module, a segmentation module, a coarse registration module, a fine registration module and an optimization module. The pose information acquisition module is used for estimating the visual odometer by adopting a Kintinuous frame to obtain the camera pose information under each frame of depth image. The segmentation module is used for back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity of the projected depth image with the depth image of the initial frame, initializing the camera pose if the similarity is lower than a similarity threshold, and performing new segmentation. The rough registration module is used for extracting a PFFH geometric descriptor in each segmented point cloud data and carrying out rough registration between each two segments; and the fine registration module is used for performing fine registration by adopting a GICP algorithm so as to acquire the matching relation between the segments. And the optimization module is used for constructing a graph and optimizing the graph by adopting a G2O framework by utilizing the pose information of each segment and the matching relation between the segments.
Preferably, the optimization module is further configured to apply a slac (simultaneous Localization and calibration) mode to optimize non-rigid distortion and remove erroneous closed-loop matching using line processes constraint.
The block fusion and registration module carries out segmented processing on the RGBD data stream based on the visual content, so that the problem of accumulated errors generated by estimation of the visual odometer can be effectively solved, similar contents can be fused together, and the registration precision can be improved.
And the volume data fusion module is used for performing weighted volume data fusion according to the optimized camera track information to obtain a three-dimensional model of the scene.
The volume data fusion module defines a weight function of a truncated symbol distance function according to the noise characteristics of the depth camera and the region of interest to realize the retention of the geometric details of the object surface.
Experiments on a system for performing indoor complete scene three-dimensional reconstruction based on a consumption-level depth camera show that: the high-precision three-dimensional reconstruction method based on the consumption-level depth camera can obtain a complete, accurate and refined indoor scene model, and the system has good robustness and expansibility.
The system embodiment for performing indoor complete scene three-dimensional reconstruction based on the consumer-grade depth camera can be used for executing the method embodiment for performing indoor complete scene three-dimensional reconstruction based on the consumer-grade depth camera, and the technical principle, the solved technical problems and the generated technical effects are similar and can be mutually referred; for convenience and brevity of description, the same parts as those described in the respective embodiments are omitted.
It should be noted that, when the system and the method for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera provided in the foregoing embodiment performs indoor complete scene three-dimensional reconstruction, only the division of the above functional modules, units, or steps is taken as an example, for example, the aforementioned acquisition module may also be used as an acquisition module, and in practical application, the above functions may be allocated to different functional modules, units, or steps according to needs, that is, the modules, units, or steps in the embodiment of the present invention are decomposed or combined again, for example, the acquisition module or the acquisition and filtering module may be combined into a data preprocessing module.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (15)

1. A method for performing indoor complete scene three-dimensional reconstruction based on a consumer-grade depth camera, the method comprising:
acquiring a depth image;
performing adaptive bilateral filtering on the depth image;
carrying out block fusion and registration processing based on visual contents on the filtered depth image;
according to the processing result, performing weighted volume data fusion so as to reconstruct an indoor complete scene three-dimensional model;
the performing weighted volume data fusion according to the processing result, so as to reconstruct an indoor complete scene three-dimensional model specifically comprises: according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function grid model, and representing a three-dimensional space by using a voxel grid so as to obtain an indoor complete scene three-dimensional model;
the truncated symbol distance function is determined according to:
fi(v)=[K-1zi(u)[uT,1]T]z-[vi]z
wherein f isi(v) Representing truncated symbol distance functionThe distance between the grid and the surface of the object model, wherein the positive and negative indicate that the grid is on the shielded side or the visible side of the surface, and the zero-crossing point is a point on the surface; the K represents an intrinsic parameter matrix of the camera; the u represents a pixel; z isi(u) representing a depth value corresponding to the pixel u; v isiThe voxels are represented.
2. The method according to claim 1, wherein the adaptive bilateral filtering of the depth image specifically comprises:
adaptive bilateral filtering is performed according to:
Figure FDA0002159599850000011
wherein said u and said ukRespectively representing any pixel and a domain pixel thereof on the depth image; the Z (u) and the Z (u)k) Respectively represent corresponding to the u and the ukDepth value of (d); the above-mentioned
Figure FDA0002159599850000012
Representing the corresponding depth value after filtering; w represents in the field N (u)k) A normalization factor of (a); said wsAnd said wcRepresenting the gaussian kernel filtered in the spatial and value domains, respectively.
3. The method of claim 2, wherein the gaussian kernel function for spatial and value domain filtering is determined according to the following equation:
Figure FDA0002159599850000021
wherein, the deltasAnd said deltacThe variance of the spatial domain and the value domain gaussian kernel function respectively;
wherein, the deltasAnd said deltacDetermined according to the following formula:
wherein f represents a focal length of the depth camera, KsAnd said KcRepresenting a constant.
4. The method according to claim 1, wherein the process of visual content-based block fusion and registration of the filtered depth image comprises in particular: and segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection.
5. The method according to claim 4, wherein the segmenting the depth image sequence based on the visual content, and performing block fusion on each segment, and performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection specifically comprises:
segmenting a depth image sequence based on an automatic segmentation method for visual content detection, dividing similar depth image contents into segments, performing block fusion on each segment, determining a transformation relation between the depth images, and performing closed-loop detection between the segments according to the transformation relation so as to realize global optimization.
6. The method according to claim 5, wherein the automatic segmentation method based on visual content detection is configured to segment a depth image sequence, segment similar depth image contents into one segment, perform block fusion on each segment, determine a transformation relationship between the depth images, and perform closed-loop detection between segments according to the transformation relationship, so as to implement global optimization, and specifically includes:
estimating a visual odometer by adopting a Kintinuous frame to obtain camera pose information under each frame of depth image;
according to the camera pose information, back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing a camera pose and segmenting when the similarity is lower than a similarity threshold value;
extracting a PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain a matching relation between the segments;
and constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by using a G2O frame to obtain optimized camera track information, thereby realizing the global optimization.
7. The method according to claim 6, wherein the back-projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity between the depth image obtained after projection and the depth image of the initial frame, and initializing a camera pose for segmentation when the similarity is lower than a similarity threshold, specifically comprises:
step 1: calculating the similarity between each frame of depth image and a first frame of depth image;
step 2: judging whether the similarity is lower than a similarity threshold value;
and step 3: if yes, segmenting the depth image sequence;
and 4, step 4: and taking the depth image of the next frame as the depth image of the starting frame of the next segment, and repeatedly executing the step 1 and the step 2 until all the depth images of the frames are processed.
8. The method according to claim 7, wherein the step 1 specifically comprises:
according to the projection relation and the depth value of any frame of depth image, calculating a first space three-dimensional point corresponding to each pixel on the depth image by using the following formula:
p=π-1(up,Z(up))
wherein u ispIs any pixel on the depth image; z (u) as defined abovep) And said p represents said u, respectivelypCorresponding depth values and the first spatial three-dimensional points; the pi represents the projection relation;
and rotationally translating the first space three-dimensional point to a world coordinate system according to the following formula to obtain a second space three-dimensional point:
q=Tip
wherein, T isiRepresenting a rotation translation matrix from the spatial three-dimensional point corresponding to the depth map of the ith frame to a world coordinate system; the p represents the first three-dimensional point in space, and the q represents the second three-dimensional point in space; the i is a positive integer;
and back projecting the second space three-dimensional point to a two-dimensional image plane according to the following formula to obtain a projected depth image:
wherein u isqIs the pixel on the projected depth image corresponding to said q; f isxThe above-mentioned fyC to cxAnd c is as describedyRepresenting an internal reference of the depth camera; said xq、yq、zqCoordinates representing said q; the T represents a transpose of a matrix;
and respectively calculating the number of effective pixels on the depth image of the initial frame and the depth image projected by any frame, and taking the ratio of the two as the similarity.
9. The method according to claim 1, wherein, according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function mesh model, and representing a three-dimensional space by using a voxel mesh, thereby obtaining a three-dimensional model of an indoor complete scene, specifically comprising:
based on the noise characteristic and the interest region, performing weighted fusion on the truncated symbol distance function data by using a Volumetric method frame;
and (4) extracting a Mesh model by adopting a Marching cubes algorithm, thereby obtaining the indoor complete scene three-dimensional model.
10. The method of claim 9, wherein the data weighted fusion is performed according to the following equation:
Figure FDA0002159599850000041
wherein v represents a voxel; f isi(v) And said wi(v) Respectively representing a truncated symbol distance function and a weight function thereof corresponding to the voxel v; the n is a positive integer; f (v) represents a truncated sign distance function value corresponding to the voxel v after fusion; w (v) represents the weight of the truncated sign distance function value corresponding to the voxel v after fusion;
wherein the weight function may be determined according to the following equation:
Figure FDA0002159599850000042
wherein d isiA radius representing the region of interest; delta. thesIs the noise variance in the depth data; w is a constant.
11. A system for three-dimensional reconstruction of a complete scene indoors based on a consumer-grade depth camera, the system comprising:
the acquisition module is used for acquiring a depth image;
the filtering module is used for carrying out self-adaptive bilateral filtering on the depth image;
the block fusion and registration module is used for carrying out block fusion and registration processing based on visual content on the filtered depth image;
the volume data fusion module is used for carrying out weighted volume data fusion according to the processing result so as to reconstruct an indoor complete scene three-dimensional model;
the block fusion and registration module is specifically configured to: segmenting the depth image sequence based on visual content, performing block fusion on each segment, performing closed-loop detection between the segments, and performing global optimization on the result of the closed-loop detection;
the block fusion and registration module is further specifically configured to:
segmenting a depth image sequence based on a visual content detection automatic segmentation method, dividing similar depth image contents into segments, performing block fusion on each segment, determining a transformation relation between the depth images, and performing closed-loop detection between the segments according to the transformation relation so as to realize global optimization;
the block fusion and registration module specifically comprises:
the camera pose information acquisition unit is used for estimating a visual odometer by adopting a Kintinuous frame to obtain camera pose information under each frame of depth image;
the segmentation unit is used for back projecting the point cloud data corresponding to each frame of depth image to an initial coordinate system according to the camera pose information, comparing the similarity of the depth image obtained after projection with the depth image of the initial frame, and initializing a camera pose for segmentation when the similarity is lower than a similarity threshold value;
the registration unit is used for extracting the PFFH geometric descriptor in each segmented point cloud data, performing coarse registration between each two segments, and performing fine registration by adopting a GICP algorithm to obtain the matching relationship between the segments;
and the optimization unit is used for constructing a graph by using the pose information of each segment and the matching relation between the segments, and performing graph optimization by adopting a G2O frame to obtain optimized camera track information so as to realize the global optimization.
12. The system of claim 11, wherein the filtering module is specifically configured to:
adaptive bilateral filtering is performed according to:
Figure FDA0002159599850000051
wherein said u and said ukRespectively representing any pixel and a domain pixel thereof on the depth image; the Z (u) and the Z (u)k) Respectively represent corresponding to the u and the ukDepth value of (d); the above-mentioned
Figure FDA0002159599850000061
Representing the corresponding depth value after filtering; w represents in the field N (u)k) A normalization factor of (a); said wsAnd said wcRepresenting the gaussian kernel filtered in the spatial and value domains, respectively.
13. The system according to claim 11, wherein the segmentation unit comprises in particular:
the calculating unit is used for calculating the similarity between each frame of depth image and the first frame of depth image;
the judging unit is used for judging whether the similarity is lower than a similarity threshold value;
a segmentation subunit, configured to segment the depth image sequence when the similarity is lower than a similarity threshold;
and the processing unit is used for taking the next frame depth image as the starting frame depth image of the next segmentation, and repeatedly executing the calculating unit and the judging unit until all the frame depth images are processed.
14. The system of claim 11, wherein the volumetric data fusion module is specifically configured to: and according to the processing result, fusing the depth image of each frame by using a truncated symbolic distance function grid model, and representing a three-dimensional space by using a voxel grid so as to obtain an indoor complete scene three-dimensional model.
15. The system according to claim 14, wherein the volume data fusion module specifically comprises:
the weighted fusion unit is used for performing weighted fusion on the truncated symbol distance function data by using a Volumetric method frame based on the noise characteristics and the interest region;
and the extraction unit is used for extracting the Mesh model by adopting a Marching cubes algorithm so as to obtain the indoor complete scene three-dimensional model.
CN201710051366.5A 2017-01-23 2017-01-23 Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera Active CN106910242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710051366.5A CN106910242B (en) 2017-01-23 2017-01-23 Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710051366.5A CN106910242B (en) 2017-01-23 2017-01-23 Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera

Publications (2)

Publication Number Publication Date
CN106910242A CN106910242A (en) 2017-06-30
CN106910242B true CN106910242B (en) 2020-02-28

Family

ID=59207090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710051366.5A Active CN106910242B (en) 2017-01-23 2017-01-23 Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera

Country Status (1)

Country Link
CN (1) CN106910242B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107067470B (en) * 2017-04-05 2019-09-06 东北大学 Portable three-dimensional reconstruction of temperature field system based on thermal infrared imager and depth camera
CN107464278B (en) * 2017-09-01 2020-01-24 叠境数字科技(上海)有限公司 Full-view sphere light field rendering method
CN109492656B (en) * 2017-09-11 2022-04-29 阿波罗智能技术(北京)有限公司 Method and apparatus for outputting information
CN107833270B (en) * 2017-09-28 2020-07-03 浙江大学 Real-time object three-dimensional reconstruction method based on depth camera
CN109819173B (en) * 2017-11-22 2021-12-03 浙江舜宇智能光学技术有限公司 Depth fusion method based on TOF imaging system and TOF camera
CN108053476B (en) * 2017-11-22 2021-06-04 上海大学 Human body parameter measuring system and method based on segmented three-dimensional reconstruction
CN108133496B (en) * 2017-12-22 2021-11-26 北京工业大学 Dense map creation method based on g2o and random fern algorithm
CN108227707B (en) * 2017-12-25 2021-11-26 清华大学苏州汽车研究院(吴江) Automatic driving method based on laser radar and end-to-end deep learning method
CN108537876B (en) * 2018-03-05 2020-10-16 清华-伯克利深圳学院筹备办公室 Three-dimensional reconstruction method, device, equipment and storage medium
CN108550181B (en) * 2018-03-12 2020-07-31 中国科学院自动化研究所 Method, system and equipment for online tracking and dense reconstruction on mobile equipment
CN108564616B (en) * 2018-03-15 2020-09-01 中国科学院自动化研究所 Fast robust RGB-D indoor three-dimensional scene reconstruction method
CN108961176B (en) * 2018-06-14 2021-08-03 中国科学院半导体研究所 Self-adaptive bilateral reference restoration method for range-gated three-dimensional imaging
CN109472820B (en) * 2018-10-19 2021-03-16 清华大学 Monocular RGB-D camera real-time face reconstruction method and device
CN109737974B (en) * 2018-12-14 2020-11-27 中国科学院深圳先进技术研究院 3D navigation semantic map updating method, device and equipment
CN110007754B (en) * 2019-03-06 2020-08-28 清华大学 Real-time reconstruction method and device for hand-object interaction process
CN110148217A (en) * 2019-05-24 2019-08-20 北京华捷艾米科技有限公司 A kind of real-time three-dimensional method for reconstructing, device and equipment
CN112598778B (en) * 2020-08-28 2023-11-14 国网陕西省电力公司西咸新区供电公司 VR three-dimensional reconstruction method based on improved texture mapping algorithm
CN112053435A (en) * 2020-10-12 2020-12-08 武汉艾格美康复器材有限公司 Self-adaptive real-time human body three-dimensional reconstruction method
CN113436338A (en) * 2021-07-14 2021-09-24 中德(珠海)人工智能研究院有限公司 Three-dimensional reconstruction method and device for fire scene, server and readable storage medium
CN113902846B (en) * 2021-10-11 2024-04-12 岱悟智能科技(上海)有限公司 Indoor three-dimensional modeling method based on monocular depth camera and mileage sensor
CN113989451B (en) * 2021-10-28 2024-04-09 北京百度网讯科技有限公司 High-precision map construction method and device and electronic equipment
CN115358156B (en) * 2022-10-19 2023-03-24 南京耀宇视芯科技有限公司 Adaptive indoor scene modeling and optimization analysis system
CN116563118A (en) * 2023-07-12 2023-08-08 浙江华诺康科技有限公司 Endoscopic image stitching method and device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751697A (en) * 2010-01-21 2010-06-23 西北工业大学 Three-dimensional scene reconstruction method based on statistical model
CN103559737A (en) * 2013-11-12 2014-02-05 中国科学院自动化研究所 Object panorama modeling method
CN103927717A (en) * 2014-03-28 2014-07-16 上海交通大学 Depth image recovery method based on improved bilateral filters
CN105913489A (en) * 2016-04-19 2016-08-31 东北大学 Indoor three-dimensional scene reconstruction method employing plane characteristics
CN106056664A (en) * 2016-05-23 2016-10-26 武汉盈力科技有限公司 Real-time three-dimensional scene reconstruction system and method based on inertia and depth vision

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751697A (en) * 2010-01-21 2010-06-23 西北工业大学 Three-dimensional scene reconstruction method based on statistical model
CN103559737A (en) * 2013-11-12 2014-02-05 中国科学院自动化研究所 Object panorama modeling method
CN103927717A (en) * 2014-03-28 2014-07-16 上海交通大学 Depth image recovery method based on improved bilateral filters
CN105913489A (en) * 2016-04-19 2016-08-31 东北大学 Indoor three-dimensional scene reconstruction method employing plane characteristics
CN106056664A (en) * 2016-05-23 2016-10-26 武汉盈力科技有限公司 Real-time three-dimensional scene reconstruction system and method based on inertia and depth vision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera》;Shahram Izadi等;《UIST’11》;20111019;第1-10页 *
《Kintinuous: Spatially Extended KinectFusion》;Thomas Whelan等;《Computer Science and Artificial Intelligence Laboratory》;20120719;第1-10页 *
《基于Kinect深度信息的实时三维重建和滤波算法研究》;陈晓明,蒋乐天,应忍冬;《计算机应用研究》;20130430;第1216-1218页 *

Also Published As

Publication number Publication date
CN106910242A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106910242B (en) Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera
US11727587B2 (en) Method and system for scene image modification
CN111815757B (en) Large member three-dimensional reconstruction method based on image sequence
Huang et al. 3Dlite: towards commodity 3D scanning for content creation.
Liu et al. Continuous depth estimation for multi-view stereo
US9426444B2 (en) Depth measurement quality enhancement
CN113178009B (en) Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair
US11348267B2 (en) Method and apparatus for generating a three-dimensional model
WO2018133119A1 (en) Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera
CN109961506A (en) A kind of fusion improves the local scene three-dimensional reconstruction method of Census figure
Sibbing et al. Sift-realistic rendering
EP2064675A1 (en) Method for determining a depth map from images, device for determining a depth map
Kordelas et al. State-of-the-art algorithms for complete 3d model reconstruction
Xu et al. Survey of 3D modeling using depth cameras
CN115393519A (en) Three-dimensional reconstruction method based on infrared and visible light fusion image
Blanchet et al. Fattening free block matching
Nouduri et al. Deep realistic novel view generation for city-scale aerial images
Jisen A study on target recognition algorithm based on 3D point cloud and feature fusion
Labatut et al. Hierarchical shape-based surface reconstruction for dense multi-view stereo
Novacheva Building roof reconstruction from LiDAR data and aerial images through plane extraction and colour edge detection
Kim et al. Automatic registration of LiDAR and optical imagery using depth map stereo
Lyra et al. Development of an efficient 3D reconstruction solution from permissive open-source code
Murayama et al. Depth Image Noise Reduction and Super-Resolution by Pixel-Wise Multi-Frame Fusion
WO2011080669A1 (en) System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method
Xiang et al. A method of scene flow estimation with bilateral filter and adaptive TV (Total Variation) penalty function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Gao Wei

Inventor after: Li Jianwei

Inventor after: Wu Yihong

Inventor before: Li Jianwei

Inventor before: Gao Wei

Inventor before: Wu Yihong

CB03 Change of inventor or designer information