WO2018133119A1 - Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera - Google Patents

Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera Download PDF

Info

Publication number
WO2018133119A1
WO2018133119A1 PCT/CN2017/072257 CN2017072257W WO2018133119A1 WO 2018133119 A1 WO2018133119 A1 WO 2018133119A1 CN 2017072257 W CN2017072257 W CN 2017072257W WO 2018133119 A1 WO2018133119 A1 WO 2018133119A1
Authority
WO
WIPO (PCT)
Prior art keywords
depth image
depth
frame
segments
fusion
Prior art date
Application number
PCT/CN2017/072257
Other languages
French (fr)
Chinese (zh)
Inventor
李建伟
高伟
吴毅红
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to PCT/CN2017/072257 priority Critical patent/WO2018133119A1/en
Publication of WO2018133119A1 publication Critical patent/WO2018133119A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Definitions

  • the present invention relates to the field of computer vision technology, and in particular, to a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera.
  • High-precision 3D reconstruction of indoor scenes is one of the challenging research topics in computer vision, involving theories and techniques in computer vision, computer graphics, pattern recognition, optimization and many other fields.
  • the traditional method is to use laser or radar ranging sensors or structured light technology to acquire the structural information of the scene or the surface of the object for 3D reconstruction.
  • these instruments are mostly expensive and difficult to carry, so the application is limited.
  • researchers began to study the use of pure vision methods for 3D reconstruction, which has produced a lot of useful research work.
  • KinectFusion algorithm proposed by Newcombe et al. uses Kinect to obtain the depth information of each point in the image, and iteratively approximates the coordinates of the 3D point in the current frame camera coordinate system and the global model by iterative approximation of the nearest neighbor (ICP) algorithm.
  • the coordinates are aligned to estimate the pose of the current frame camera, and the volume data fusion is performed through the Truncated Signed Distance Function (TSDF) iteration to obtain a dense three-dimensional model.
  • TSDF Truncated Signed Distance Function
  • Whelan et al. proposed the Kintinuous algorithm, which is a further extension of KinectFusion.
  • the algorithm uses ShiftingTSDFVolume uses the memory to solve the problem of memory consumption of the mesh model during large scene reconstruction. It also uses DBoW to find matching key frames for closed-loop detection. Finally, the pose and model are optimized to obtain a large-scale 3D model.
  • Choi et al. proposed the Elastic Fragment idea. First, the RGBD data stream is segmented every 50 frames. The visual odometer estimation is performed separately for each segment. The geometric descriptor FPFH is extracted from the point cloud data between the two segments to find the matching.
  • a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera are provided.
  • a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera may include:
  • weighted volume data fusion is performed to reconstruct a three-dimensional model of the complete scene in the room.
  • the performing adaptive bilateral filtering on the depth image specifically includes:
  • Adaptive bilateral filtering is performed according to the following formula:
  • the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ; Representing the corresponding depth value after filtering; the W is expressed in the field The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
  • the Gaussian kernel function filtered in the spatial domain and the range domain is determined according to the following formula:
  • ⁇ s and the ⁇ c are variances of a spatial domain and a range Gaussian kernel function, respectively;
  • f represents a focal length of the depth camera
  • K s and the K c represent a constant.
  • the performing visual content-based block fusion and registration processing on the filtered depth image specifically includes: segmenting the depth image sequence based on the visual content, Each segment is block-fused, and closed-loop detection is performed between the segments, and the result of the closed-loop detection is globally optimized.
  • the depth image sequence is segmented based on the visual content, and each segment is block-fused, and the closed-loop detection is performed between the segments, and the global optimization of the closed-loop detection result includes:
  • the depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine the transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
  • the visual content detection automatic segmentation method segments the depth image sequence, divides the similar depth image content into one segment, and performs block fusion on each segment to determine the depth image.
  • the map is constructed and optimized by using the G2O framework to obtain optimized camera track information, thereby implementing the global optimization.
  • Step 1 calculating a similarity between the depth image of each frame and the depth image of the first frame
  • Step 2 determining whether the similarity is lower than a similarity threshold
  • Step 3 If yes, segment the depth image sequence
  • Step 4 The next frame depth image is taken as the starting frame depth image of the next segment, and steps 1 and 2 are repeatedly performed until all frame depth images are processed.
  • the step 1 specifically includes:
  • the u p is any pixel on the depth image;
  • the Z(u p ) and the p respectively represent a depth value corresponding to the u p and the first spatial three-dimensional point;
  • the first spatial three-dimensional point rotation translation is transformed into a world coordinate system according to the following formula to obtain a second spatial three-dimensional point:
  • the T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system;
  • the p represents the first spatial three-dimensional point, and the q represents the second spatial three-dimensional point;
  • the i takes a positive integer;
  • the second spatial three-dimensional point is back-projected to the two-dimensional image plane according to the following formula to obtain the projected depth image:
  • the u q is a pixel on the projected depth image corresponding to the q;
  • the f x , the f y , the c x and the c y represent internal parameters of the depth camera;
  • the x q , y q , z q represent the coordinates of the q;
  • the T represents the transposition of the matrix;
  • the number of effective pixels on the start frame depth image and the depth image projected on any frame is calculated separately, and the ratios of the two are used as similarities.
  • the weighting volume data fusion is performed according to the processing result, so that reconstructing the indoor full scene three-dimensional model specifically includes: combining the depth image of each frame by using the truncated symbol distance function mesh model according to the processing result, and using the voxel
  • the grid represents the three-dimensional space, thereby obtaining a three-dimensional model of the complete scene in the room.
  • the depth image of each frame is merged by using the truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene, which specifically includes:
  • the weighted fusion of the truncated symbol distance function data is performed by using a Volumemetric method framework
  • the Mesh model extraction is performed by using the Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
  • the truncated symbol distance function is determined according to the following formula:
  • f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface a point;
  • the K represents an internal parameter matrix of the camera;
  • the u represents a pixel;
  • the z i (u) represents a depth value corresponding to the pixel u;
  • the v i represents a voxel.
  • the data weighted fusion is performed according to the following formula:
  • the v represents a voxel
  • the f i (v) and the w i (v) respectively represent a truncated symbol distance function corresponding to the voxel v and a weight function thereof
  • the n takes a positive integer
  • the F(v) represents a truncated symbol distance function value corresponding to the voxel v after the fusion
  • the W(v) represents a weight of a truncated symbol distance function value corresponding to the voxel v after the fusion
  • weight function can be determined according to the following formula:
  • a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera comprising:
  • a filtering module configured to perform adaptive bilateral filtering on the depth image
  • a block fusion and registration module for performing visual content-based block fusion and registration processing on the filtered depth image
  • the volume data fusion module is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the complete scene in the room.
  • the filtering module is specifically configured to:
  • Adaptive bilateral filtering is performed according to the following formula:
  • the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ; Representing the corresponding depth value after filtering; the W is expressed in the field The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
  • the block fusion and registration module is specifically configured to: segment the depth image sequence based on the visual content, perform block fusion on each segment, and perform closed-loop detection between the segments, The results of closed-loop detection are globally optimized.
  • the block fusion and registration module is further specifically configured to:
  • the depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, and the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine a transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
  • the block fusion and registration module specifically includes:
  • the camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image;
  • a segmentation unit configured to backproject the point cloud data corresponding to the depth image of each frame to an initial coordinate system according to the camera pose information, and perform similarity between the depth image obtained by the projection and the depth image of the initial frame. Comparing, and when the similarity is lower than the similarity threshold, initializing the camera pose and performing segmentation;
  • a registration unit is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a match between the segments. relationship;
  • An optimization unit is configured to utilize the pose information of each segment and the matching relationship between the segments and the segments, construct a map, and perform graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby implementing the global optimization.
  • the segmentation unit specifically includes:
  • a calculating unit configured to calculate a similarity between the depth image of each frame and the depth image of the first frame
  • a determining unit configured to determine whether the similarity is lower than a similarity threshold
  • a segmentation subunit configured to segment the depth image sequence when the similarity is lower than a similarity threshold
  • a processing unit configured to use the next frame depth image as the starting frame depth image of the next segment, and repeatedly execute the calculating unit and the determining unit until all the frame depth images are processed.
  • the volume data fusion module is specifically configured to: according to the processing result, use a truncated symbol distance function mesh model to fuse the depth image of each frame, and use a voxel grid to represent the three-dimensional space, thereby obtaining a complete indoor scene. 3D model.
  • the volume data fusion module specifically includes:
  • a weighted fusion unit configured to perform weighted fusion of the truncated symbol distance function data by using a Volumemetric method framework based on noise characteristics and an interest region;
  • An extracting unit is configured to perform Mesh model extraction by using a Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
  • Embodiments of the present invention provide a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera.
  • the method includes: acquiring a depth image; performing adaptive bilateral filtering on the depth image; performing block-based fusion and registration processing on the filtered depth image based on the visual content; and performing weighted volume data fusion according to the processing result, thereby reconstructing the indoor A complete 3D model of the scene.
  • the embodiment of the invention can effectively reduce the cumulative error in the visual odometer estimation and improve the registration precision by performing the visual content-based block fusion and registration on the depth image, and also adopts the weighted volume data fusion algorithm, which can effectively The geometrical details of the surface of the object are maintained, thereby solving the technical problem of how to improve the accuracy of the three-dimensional reconstruction in the indoor scene, so that a complete, accurate and refined indoor scene model can be obtained.
  • FIG. 1 is a flow chart showing a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention
  • 2a is a color image corresponding to a depth image according to an embodiment of the present invention.
  • FIG. 2b is a schematic diagram of a point cloud obtained from a depth image according to an embodiment of the present invention.
  • 2c is a schematic diagram of a point cloud obtained by bilaterally filtering a depth image according to an embodiment of the present invention
  • 2d is a schematic diagram of a point cloud obtained by adaptive bilateral filtering of a depth image according to an embodiment of the invention
  • FIG. 3 is a schematic flow chart of segmentation fusion and registration based on visual content according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a weighted volume data fusion process according to an embodiment of the present invention.
  • FIG. 5a is a schematic diagram of a three-dimensional reconstruction result using an unweighted volume data fusion algorithm
  • Figure 5b is a partial detail view of the three-dimensional model of Figure 5a;
  • FIG. 5c is a schematic diagram of a three-dimensional reconstruction result obtained by a weighted volume data fusion algorithm according to an embodiment of the present invention.
  • Figure 5d is a partial detail view of the three-dimensional model of Figure 5c;
  • FIG. 6 is a schematic diagram of an effect of performing three-dimensional reconstruction on a 3D Scene Data data set using the method proposed by the embodiment of the present invention
  • FIG. 7 is a schematic diagram showing the effect of performing three-dimensional reconstruction on the Augmented ICL-NUIM Dataset data set using the method proposed by the embodiment of the present invention.
  • FIG. 8 is a schematic diagram showing an effect of three-dimensional reconstruction of indoor scene data collected by Microsoft Kinect for Windows according to an embodiment of the present invention
  • FIG. 9 is a schematic structural diagram of a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention.
  • Embodiments of the present invention provide a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera. As shown in Figure 1, the method includes:
  • this step may include: utilizing consumption based on the principle of structured light Level depth camera to get depth images.
  • the consumer-level depth camera (Microsoft Kinect for Windows and Xtion, referred to as the depth camera) based on the structured light principle acquires the depth data of the depth image by transmitting the structured light and receiving the reflection information.
  • real indoor scene data can be acquired using the handheld consumer depth camera Microsoft Kinect for Windows.
  • the depth data can be calculated according to the following formula:
  • f represents the focal length of the consumer-grade depth camera
  • B represents the baseline
  • D represents the parallax
  • S110 Perform adaptive bilateral filtering on the depth image.
  • the acquired depth image is adaptively double-filtered by using the noise characteristics of the consumer-level depth camera based on the structured light principle.
  • the adaptive bilateral filtering algorithm refers to filtering in both the spatial domain and the value domain of the depth image.
  • the parameters of the adaptive bilateral filtering algorithm can be set according to the noise characteristics of the depth camera and its internal parameters, which can effectively remove noise and preserve edge information.
  • the noise of the depth data is mainly generated in the quantization process. It can be seen from the above equation that the variance of the depth noise is proportional to the square of the depth value, that is, the larger the depth value, the larger the noise.
  • embodiments of the present invention define a filtering algorithm based on this noise characteristic.
  • the above adaptive bilateral filtering can be performed according to the following formula:
  • u and u k respectively represent any pixel on the depth image and its domain pixel; Z(u) and Z(u k ) respectively represent depth values corresponding to u and u k ; Indicates the depth value corresponding to the filter; W indicates the field The normalization factor on; w s and w c represent the Gaussian kernel function filtered in the spatial domain and the range domain, respectively.
  • w s and w c can be determined according to the following formula:
  • ⁇ s and ⁇ c are the variances of the spatial domain and the Gaussian kernel function of the range, respectively.
  • ⁇ s and ⁇ c are related to the magnitude of the depth value, which is not fixed.
  • ⁇ s and ⁇ c can be determined according to the following formula:
  • K s and K c represent constants, the specific values of which are related to the parameters of the depth camera.
  • FIG. 2a shows a color image corresponding to the depth image.
  • Figure 2b shows a point cloud derived from a depth image.
  • Figure 2c shows a point cloud resulting from bilateral filtering of the depth image.
  • Figure 2d shows a point cloud obtained by adaptive bilateral filtering of depth images.
  • the embodiment of the present invention can implement edge preservation and denoising of the depth map by adopting an adaptive bilateral filtering method.
  • S120 Perform visual content-based block fusion and registration processing on the depth image.
  • the depth image sequence is segmented based on the visual content, and each segment is block-fused, and closed-loop detection is performed between segments, and the result of the closed-loop detection is globally optimized.
  • the depth image sequence is a depth image data stream.
  • the step may include: determining a transformation relationship between the depth images, segmenting the depth image sequence according to the method of automatically segmenting the visual content, and dividing the similar depth image content into one segment, for each segment
  • the segment performs block fusion, determines the transformation relationship between the depth images, and performs closed-loop detection between segments and segments according to the transformation relationship, and achieves global optimization.
  • this step may include:
  • S122 Backprojecting the point cloud data corresponding to the depth image of each frame to the initial coordinate system according to the camera pose information, and comparing the similarity between the depth image obtained by the projection and the depth image of the initial frame, and when the similarity is lower than At the similarity threshold, the camera pose is initialized and segmented.
  • This step performs closed-loop detection between segments.
  • S124 Using the pose information of each segment and the matching relationship between segments and segments, constructing a graph and performing graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby achieving global optimization.
  • This step improves the non-rigid distortion in the Simultaneous Localization and Calibration (SLAC) mode, and introduces line processes constraints to remove the wrong closed-loop matching.
  • SLAC Simultaneous Localization and Calibration
  • the foregoing step S122 may further include:
  • S1221 Calculate the similarity between the depth image of each frame and the depth image of the first frame.
  • This step segments the depth image sequence based on the visual content. In this way, the cumulative error problem caused by the estimation of the visual odometer can be effectively solved, and the similar content can be fused together, thereby improving the registration accuracy.
  • next frame depth image is taken as the start frame depth image of the next segment, and steps S1221 and S1222 are repeatedly performed until all the frame depth images are processed.
  • the step of calculating the similarity between the depth image of each frame and the depth image of the first frame may specifically include:
  • S12211 Calculate the first spatial three-dimensional point corresponding to each pixel on the depth image according to the projection relationship and the depth value of the image of any frame depth:
  • u p is any pixel on the depth image
  • Z(u p ) and p respectively represent the depth value corresponding to u p and the first spatial three-dimensional point
  • represents the projection relationship, that is, the point cloud data corresponding to each depth image Backprojection to 2D-3D projection transformation relationship in the initial coordinate system.
  • T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system, which can be estimated by a visual odometer; i takes a positive integer; p represents a first spatial three-dimensional point, and q represents a second The three-dimensional point of space, the coordinates of p and q are:
  • u q is the pixel on the projected depth image corresponding to q; f x , f y , c x and c y represent the internal parameters of the depth camera; x q , y q , z q represent the coordinates of q; T represents the matrix Transpose.
  • S12214 Calculate the number of effective pixels on the start frame depth image and the depth image after any frame projection, and compare the ratios of the two as the similarity.
  • n 0 and n i respectively represent the starting frame depth image and the number of effective pixels on the depth image after any frame projection; ⁇ represents the similarity.
  • FIG. 3 exemplarily shows a flow diagram of segmentation fusion and registration based on visual content.
  • the embodiment of the invention adopts an automatic segmentation algorithm based on visual content, which can effectively reduce the cumulative error in the visual odometer estimation and improve the registration accuracy.
  • the step may include: combining the depth image of each frame by using a truncated symbol distance function (TSDF) mesh model according to the content of the block-based fusion and registration processing based on the visual content, and using the voxel mesh to represent the three-dimensional space To obtain a three-dimensional model of the complete scene in the room.
  • TSDF truncated symbol distance function
  • This step may further include:
  • the TSDF mesh model can be used to fuse the depth images of each frame to represent the three-dimensional space using a voxel grid with a resolution of m, that is, each three-dimensional space is divided into m blocks.
  • Each grid v stores two values: the truncated symbol distance function f i (v) and its weight w i (v).
  • the truncated symbol distance function can be determined according to the following formula:
  • f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface Point;
  • K represents the internal parameter matrix of the camera;
  • u represents the pixel;
  • z i (u) represents the depth value corresponding to the pixel u;
  • v i represents the voxel.
  • the camera can be a depth camera or a depth camera.
  • data weighted fusion can be performed according to the following formula:
  • f i (v) and w i (v) respectively represent the truncated symbol distance function (TSDF) corresponding to voxel v and its weight function;
  • n takes a positive integer;
  • F(v) represents the voxel v after fusion The truncated symbol distance function value;
  • W(v) represents the weight of the truncated symbol distance function value corresponding to the voxel v after fusion.
  • the weight function may be determined according to the noise characteristics of the depth data and the region of interest, and the value is not fixed. In order to maintain the geometric details of the surface of the object, the weight of the area with small noise and the area of interest is set large, and the weight of the area with high noise or the area of no interest is set small.
  • the weight function can be determined according to the following formula:
  • d i represents the radius of the region of interest, the smaller the radius, the more interested, the greater the weight
  • ⁇ s is the noise variance in the depth data, and its value is consistent with the variance of the spatial domain kernel function of the adaptive bilateral filtering algorithm
  • w is a constant, preferably it may take a value of 1 or 0.
  • FIG. 4 exemplarily shows a schematic diagram of a weighted volume data fusion process.
  • the weighted volume data fusion algorithm in the embodiment of the invention can effectively maintain the geometric details of the surface of the object, and can obtain a complete, accurate and refined indoor scene model, which has good robustness and expandability.
  • Figure 5a exemplarily shows a three-dimensional reconstruction result using an unweighted volume data fusion algorithm
  • Figure 5b exemplarily shows a partial detail of the three-dimensional model in Figure 5a
  • Figure 5c exemplarily illustrates the use of an embodiment of the invention The three-dimensional reconstruction result obtained by the proposed weighted volume data fusion algorithm
  • Figure 5d exemplarily shows the local details of the three-dimensional model in Figure 5c.
  • FIG. 6 exemplarily shows an effect of performing three-dimensional reconstruction on the 3D Scene Data data set using the method proposed by the embodiment of the present invention
  • FIG. 7 exemplarily shows the use of the present invention on the Augmented ICL-NUIM Dataset data set Schematic diagram of the effect of the method proposed by the embodiment for three-dimensional reconstruction
  • FIG. 8 exemplarily shows the effect of three-dimensional reconstruction of the indoor scene data collected by Microsoft Kinect for Windows.
  • the embodiment of the present invention further provides a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera.
  • the system 90 includes: an obtaining module 92 and a filtering module 94.
  • the obtaining module 92 is configured to acquire a depth image.
  • the filtering module 94 is configured to perform adaptive bilateral filtering on the depth image.
  • the block fusion and registration module 96 is configured to perform visual content based block fusion and registration processing on the filtered depth image.
  • the volume data fusion module 98 is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the indoor complete scene.
  • the embodiment of the invention can effectively reduce the accumulated error in the visual odometer estimation, improve the registration precision, can effectively maintain the geometric details of the surface of the object, and can obtain a complete, accurate and refined indoor scene model. .
  • the filtering module is specifically configured to: perform adaptive bilateral filtering according to the following formula:
  • u and u k respectively represent any pixel on the depth image and its domain pixel; Z(u) and Z(u k ) respectively represent depth values corresponding to u and u k ; Indicates the depth value corresponding to the filter; W indicates the field The normalization factor on; w s and w c represent the Gaussian kernel function filtered in the spatial domain and the range domain, respectively.
  • the block fusion and registration module may be specifically configured to: segment the depth image sequence based on the visual content, perform block fusion on each segment, and perform closed-loop detection between segments, The results of the test are globally optimized.
  • the block fusion and registration module is further specifically configured to: determine a transformation relationship between depth images, segment the depth image sequence based on the visual content detection automatic segmentation method, and similar depth images
  • the content is divided into a segment, and each segment is block-fused, the transformation relationship between the depth images is determined, and closed-loop detection is performed between segments and segments according to the transformation relationship, and global optimization is realized.
  • the block fusion and registration module may specifically include: a camera pose information acquisition unit, a segmentation unit, a registration unit, and an optimization unit.
  • the camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image.
  • the segmentation unit is configured to backproject the point cloud data corresponding to each frame depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the depth image obtained by the projection and the depth image of the initial frame, and similar When the degree is lower than the similarity threshold, the camera pose is initialized and segmented.
  • the registration unit is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a matching relationship between the segments.
  • the optimization unit is used to utilize the pose information of each segment and the matching relationship between segments and segments, construct a graph and optimize the graph by using the G2O framework to obtain optimized camera track information, thereby achieving global optimization.
  • the segmentation unit may specifically include: a calculation unit, a determination unit, a segmentation subunit, and a processing unit.
  • the calculation unit is configured to calculate the similarity between the depth image of each frame and the depth image of the first frame.
  • the judging unit is configured to judge whether the similarity is lower than the similarity threshold.
  • the segmentation subunit is configured to segment the depth image sequence when the similarity is below the similarity threshold.
  • the processing unit is configured to use the next frame depth image as the start frame depth image of the next segment, and repeatedly execute the calculation unit and the determination unit until all frame depth images are processed.
  • the volume data fusion module can be specifically used in accordance with As a result, the depth image of each frame is fused by the truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene.
  • the volume data fusion module may specifically include a weighted fusion unit and an extraction unit.
  • the weighted fusion unit is configured to perform weighted fusion of the truncated symbol distance function data based on the noise feature and the region of interest using the Volumetric method framework.
  • the extraction unit is used to extract the Mesh model by using the Marching cubes algorithm, thereby obtaining a three-dimensional model of the indoor complete scene.
  • the system for performing three-dimensional reconstruction of indoor complete scenes based on the consumer-level depth camera includes an acquisition module, a filtering module, a block fusion and registration module, and a volume data fusion module. among them:
  • the acquisition module is used for depth image acquisition of indoor scenes using a depth camera.
  • the filtering module is configured to perform adaptive bilateral filtering processing on the acquired depth image.
  • the acquisition module is an equivalent replacement of the above acquisition module.
  • real indoor scene data can be acquired using the handheld consumer depth camera Microsoft Kinect for Windows.
  • the adaptive depth filtering is performed on the acquired depth image, and the parameters in the adaptive bilateral filtering method are automatically set according to the noise characteristics of the depth camera and its internal parameters. Therefore, the embodiment of the present invention can effectively remove noise and preserve edge information. .
  • the block fusion and registration module is used to automatically segment the data stream based on the visual content, each segment performs block fusion, and the closed-loop detection is performed between segments, and the result of the closed-loop detection is globally optimized.
  • the block fusion and registration module performs automatic block fusion and registration based on visual content.
  • the block fusion and registration module specifically includes: a pose information acquisition module, a segmentation module, a coarse registration module, a fine registration module, and an optimization module.
  • the pose information acquisition module is configured to perform visual odometer estimation using the Kintinuous framework to obtain camera pose information under each frame depth image.
  • the segmentation module is configured to backproject the point cloud data corresponding to each frame depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the projected depth image and the depth image of the initial frame, if the similarity is lower than The similarity threshold initializes the camera pose and performs a new segmentation.
  • the coarse registration module is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments;
  • the fine registration module is used for fine registration using the GICP algorithm to obtain the matching relationship between segments.
  • the optimization module is used to construct the map and use the G2O framework for graph optimization by using the pose information of each segment and the matching relationship between segments.
  • the optimization module is further used to apply a SLAC (Simultaneous Localization and Calibration) mode to optimize non-rigid distortion, and use line processes constraints to delete the wrong closed-loop matching.
  • SLAC Simultaneous Localization and Calibration
  • the above-mentioned block fusion and registration module segments the RGBD data stream based on the visual content, which can effectively solve the cumulative error problem caused by the visual odometer estimation, and can fuse the similar content together, thereby improving the registration. Precision.
  • the volume data fusion module is configured to perform weighted volume data fusion according to the optimized camera track information to obtain a three-dimensional model of the scene.
  • the volume data fusion module defines a weight function of the truncated symbol distance function according to the noise characteristics of the depth camera and the region of interest to achieve the geometric detail of the surface of the object.
  • the system embodiment for performing three-dimensional reconstruction of indoor complete scenes based on the consumer-level depth camera may be used to implement a method embodiment for performing three-dimensional reconstruction of indoor complete scenes based on a consumer-level depth camera, the technical principle, the technical problems solved, and the technical effects produced. Similarly, reference may be made to each other; for the convenience and brevity of the description, the same portions are omitted between the various embodiments.
  • the system and method for performing three-dimensional reconstruction of indoor complete scenes based on a consumer-level depth camera are only illustrated by dividing the above functional modules, units or steps in performing three-dimensional reconstruction of indoor complete scenes.
  • the acquisition module in the foregoing may also be used as an acquisition module.
  • the function distribution may be performed by different functional modules, units or steps according to requirements, that is, modules, units or steps in the embodiment of the present invention. Decomposed or combined, for example, the acquisition module or the acquisition and filtering module can be combined into a data preprocessing module.

Abstract

Provided are a method and system for three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera. The method comprises: acquiring a depth image and performing adaptive bilateral filtering; performing visual odometer estimation using the filtered depth image, automatically segmenting an image sequence on the basis of visual content, performing closed-loop detection between segments, and performing global optimization; and performing weighted volume data fusion according to optimized camera trajectory information, so as to reconstruct a three-dimensional model of a complete indoor scene. The method and system realize edge preservation and denoising of a depth image by means of an adaptive bilateral filtering algorithm. An automatic segmentation algorithm based on visual content can effectively reduce a cumulative error in the process of visual odometer estimation and improve the registration accuracy. The use of the weighted volume data fusion algorithm can effectively preserve the geometric details of the surface of an object. Thus, the technical problem of how to improve the accuracy of three-dimensional reconstruction in an indoor scene is solved, such that a complete, accurate and refined indoor scene model can be obtained.

Description

基于深度相机进行室内完整场景三维重建的方法及系统Method and system for three-dimensional reconstruction of indoor complete scene based on depth camera 技术领域Technical field
本发明涉及计算机视觉技术领域,具体地,涉及一种基于消费级深度相机进行室内完整场景三维重建的方法及系统。The present invention relates to the field of computer vision technology, and in particular, to a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera.
背景技术Background technique
室内场景高精度三维重建是计算机视觉中具有挑战性的研究课题之一,涉及计算机视觉、计算机图形学、模式识别、最优化等多个领域的理论与技术。实现三维重建有多种途径,传统方法是采用激光、雷达等测距传感器或结构光技术来获取场景或物体表面的结构信息进行三维重建,但这些仪器大多价格昂贵并且不易携带,所以应用场合有限。随着计算机视觉技术的发展,研究者们开始研究使用纯视觉的方法进行三维重建,其中涌现出来了大量有益的研究工作。High-precision 3D reconstruction of indoor scenes is one of the challenging research topics in computer vision, involving theories and techniques in computer vision, computer graphics, pattern recognition, optimization and many other fields. There are many ways to realize 3D reconstruction. The traditional method is to use laser or radar ranging sensors or structured light technology to acquire the structural information of the scene or the surface of the object for 3D reconstruction. However, these instruments are mostly expensive and difficult to carry, so the application is limited. . With the development of computer vision technology, researchers began to study the use of pure vision methods for 3D reconstruction, which has produced a lot of useful research work.
消费级深度摄像机Microsoft Kinect推出后,人们可以直接利用深度数据比较便捷地进行室内场景三维重建。Newcombe等人提出的KinectFusion算法利用Kinect来获取图像中各点的深度信息,通过迭代近似最近邻点(Iterative Closest Point,ICP)算法将三维点在当前帧相机坐标系下的坐标与在全局模型中的坐标进行对齐来估计当前帧相机的姿态,再通过曲面隐函数(Truncated Signed Distance Function,TSDF)迭代进行体数据融合,得到稠密的三维模型。虽然Kinect获取深度不受光照条件和纹理丰富程度的影响,但其深度数据范围只有0.5-4m,而且网格模型的位置和大小是固定的,所以只适用于局部、静态的室内场景。After the introduction of the consumer-grade depth camera Microsoft Kinect, people can directly use the depth data to easily reconstruct the indoor scene in three dimensions. The KinectFusion algorithm proposed by Newcombe et al. uses Kinect to obtain the depth information of each point in the image, and iteratively approximates the coordinates of the 3D point in the current frame camera coordinate system and the global model by iterative approximation of the nearest neighbor (ICP) algorithm. The coordinates are aligned to estimate the pose of the current frame camera, and the volume data fusion is performed through the Truncated Signed Distance Function (TSDF) iteration to obtain a dense three-dimensional model. Although Kinect capture depth is not affected by lighting conditions and texture richness, its depth data range is only 0.5-4m, and the position and size of the mesh model is fixed, so it is only suitable for local and static indoor scenes.
基于消费级深度相机进行室内场景三维重建,一般存在以下几个问题:(1)消费级深度相机获取的深度图像分辨率小、噪声大使得物体表面细节难以保持,而且深度值范围有限不能直接用于完整场景三维重建;(2)相机姿态估计产生的累积误差会造成错误、扭曲的三维模型;(3)消费级深度相机一般都是手持式拍摄,相机的运动状态比较随意,获取的数据质量有好有坏,影响重建效果。Based on the consumer-level depth camera for 3D reconstruction of indoor scenes, there are generally the following problems: (1) The resolution of the depth image acquired by the consumer-level depth camera is small, the noise is large, and the surface details of the object are difficult to maintain, and the depth value range is limited and cannot be directly used. 3D reconstruction of the complete scene; (2) The cumulative error caused by camera pose estimation will cause a wrong, distorted 3D model; (3) Consumer-grade depth cameras are generally handheld, the camera's motion state is relatively random, and the acquired data quality There are good and bad, affecting the reconstruction effect.
为了进行完整的室内场景三维重建,Whelan等人提出了Kintinuous算法,其是对KinectFusion的进一步扩展。该算法使用 ShiftingTSDFVolume循环利用显存的方式解决大场景重建时网格模型显存消耗的问题,并通过DBoW寻找匹配的关键帧进行闭环检测,最后对位姿图和模型做优化,从而得到大场景三维模型。Choi等人提出了Elastic Fragment思想,先将RGBD数据流每隔50帧做分段,对每段单独做视觉里程计估计,从两两段间的点云数据中提取几何描述子FPFH寻找匹配进行闭环检测,再引入line processes约束对检测结果进行优化、去除错误的闭环,最后利用优化后的里程计信息进行体数据融合。通过分段处理和闭环检测实现了室内完整场景重建,但是没有考虑保留物体的局部几何细节,而且这种固定分段的方法在进行真实室内场景重建时并不鲁棒。Zeng等人提出了3D Match描述子概念,该算法先将RGBD数据流进行固定分段处理并重建得到局部模型,从每个分段的3D模型上提取关键点作为3D卷积网络(ConvNet)的输入,用该网络学习得到的特征向量作为另一矩阵网络(Metric network)的输入,通过相似度比较输出匹配结果。由于深度网络具有非常明显的特征学习优势,相对其他描述子用3D Match来做几何配准可以提高重建精度。但这种方法需要先进行局部三维重建,利用深度学习网络来做几何配准,再输出全局三维模型,而且网络训练需要大量的数据,整个重建流程效率较低。In order to perform a complete 3D reconstruction of the indoor scene, Whelan et al. proposed the Kintinuous algorithm, which is a further extension of KinectFusion. The algorithm uses ShiftingTSDFVolume uses the memory to solve the problem of memory consumption of the mesh model during large scene reconstruction. It also uses DBoW to find matching key frames for closed-loop detection. Finally, the pose and model are optimized to obtain a large-scale 3D model. Choi et al. proposed the Elastic Fragment idea. First, the RGBD data stream is segmented every 50 frames. The visual odometer estimation is performed separately for each segment. The geometric descriptor FPFH is extracted from the point cloud data between the two segments to find the matching. Closed-loop detection, and then introduce line processes constraints to optimize the detection results, remove the error closed loop, and finally use the optimized odometer information for volume data fusion. The indoor complete scene reconstruction is realized by segmentation processing and closed-loop detection, but the local geometric details of the retained objects are not considered, and the method of fixed segmentation is not robust when performing real indoor scene reconstruction. Zeng et al. proposed the concept of 3D Match descriptor. The algorithm firstly fixed the RGBD data stream and reconstructed it to obtain a local model. The key points are extracted from each segmented 3D model as a 3D convolution network (ConvNet). Input, using the feature vector learned by the network as an input of another matrix network, and outputting the matching result by similarity comparison. Since the deep network has a very obvious feature learning advantage, geometric registration can be improved by using 3D Match for other descriptors. However, this method requires local 3D reconstruction, using the deep learning network for geometric registration, and then outputting the global 3D model, and the network training requires a large amount of data, and the entire reconstruction process is inefficient.
在提高三维重建精度方面,Angela等人提出了VSBR算法,其主要思想是利用明暗恢复形状(Shape from Shading,SFS)技术对TSDF数据进行分层优化后再进行融合,以解决TSDF数据融合时过度平滑导致物体表面细节丢失的问题,从而得到较为精细的三维结构模型。但这种方法只对理想光源下的单体重建比较有效,室内场景由于光源变化较大精度提升不明显。In improving the accuracy of 3D reconstruction, Angela et al. proposed the VSBR algorithm. The main idea is to use Stratified Shading (SFS) technology to stratify TSDF data and then fuse it to solve the problem of TSDF data fusion. Smoothing causes the surface details of the object to be lost, resulting in a more refined three-dimensional structural model. However, this method is only effective for the reconstruction of the single element under the ideal light source, and the indoor scene is not obviously improved due to the large variation of the light source.
有鉴于此,特提出本发明。In view of this, the present invention has been specifically proposed.
发明内容Summary of the invention
为了解决现有技术中的上述问题,即为了解决如何提高室内场景下三维重建精度的技术问题,提供一种基于消费级深度相机进行室内完整场景三维重建的方法及系统。In order to solve the above problems in the prior art, in order to solve the technical problem of how to improve the accuracy of three-dimensional reconstruction in an indoor scene, a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera are provided.
为了实现上述目的,一方面,提供以下技术方案:In order to achieve the above object, on the one hand, the following technical solutions are provided:
一种基于消费级深度相机进行室内完整场景三维重建的方法,该方法可以包括: A method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera, the method may include:
获取深度图像;Get a depth image;
对所述深度图像进行自适应双边滤波;Performing adaptive bilateral filtering on the depth image;
对滤波后的深度图像进行基于视觉内容的分块融合和配准处理;Performing visual content-based block fusion and registration processing on the filtered depth image;
根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。According to the processing result, weighted volume data fusion is performed to reconstruct a three-dimensional model of the complete scene in the room.
优选地,所述对所述深度图像进行自适应双边滤波具体包括:Preferably, the performing adaptive bilateral filtering on the depth image specifically includes:
根据下式进行自适应双边滤波:Adaptive bilateral filtering is performed according to the following formula:
Figure PCTCN2017072257-appb-000001
Figure PCTCN2017072257-appb-000001
其中,所述u和所述uk分别表示所述深度图像上的任一像素及其领域像素;所述Z(u)和所述Z(uk)分别表示对应所述u和所述uk的深度值;所述
Figure PCTCN2017072257-appb-000002
表示滤波后对应的深度值;所述W表示在领域
Figure PCTCN2017072257-appb-000003
上的归一化因子;所述ws和所述wc分别表示在空间域和值域滤波的高斯核函数。
Wherein the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ;
Figure PCTCN2017072257-appb-000002
Representing the corresponding depth value after filtering; the W is expressed in the field
Figure PCTCN2017072257-appb-000003
The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
优选地,所述在空间域和值域滤波的高斯核函数根据下式来确定:Preferably, the Gaussian kernel function filtered in the spatial domain and the range domain is determined according to the following formula:
Figure PCTCN2017072257-appb-000004
Figure PCTCN2017072257-appb-000004
其中,所述δs和所述δc分别是空间域和值域高斯核函数的方差;Wherein the δ s and the δ c are variances of a spatial domain and a range Gaussian kernel function, respectively;
其中,所述δs和所述δc根据下式来确定:Wherein the δ s and the δ c are determined according to the following formula:
Figure PCTCN2017072257-appb-000005
Figure PCTCN2017072257-appb-000005
其中,所述f表示所述深度相机的焦距,所述Ks和所述Kc表示常数。Wherein f represents a focal length of the depth camera, and the K s and the K c represent a constant.
优选地,所述对滤波后的深度图像进行基于视觉内容的分块融合和配准处理具体包括:基于视觉内容对深度图像序列进行分段, 并对每一分段进行分块融合,且所述分段间进行闭环检测,对闭环检测的结果做全局优化。Preferably, the performing visual content-based block fusion and registration processing on the filtered depth image specifically includes: segmenting the depth image sequence based on the visual content, Each segment is block-fused, and closed-loop detection is performed between the segments, and the result of the closed-loop detection is globally optimized.
优选地,所述基于视觉内容对深度图像序列进行分段,并对每一分段进行分块融合,且所述分段间进行闭环检测,对闭环检测的结果做全局优化具体包括:Preferably, the depth image sequence is segmented based on the visual content, and each segment is block-fused, and the closed-loop detection is performed between the segments, and the global optimization of the closed-loop detection result includes:
基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,并对每一分段进行分块融合,确定所述深度图像之间的变换关系,并根据所述变换关系在段与段之间做闭环检测,以实现全局优化。The depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine the transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
优选地,所述基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,并对每一分段进行分块融合,确定所述深度图像之间的变换关系,并根据所述变换关系在段与段之间做闭环检测,以实现全局优化,具体包括:Preferably, the visual content detection automatic segmentation method segments the depth image sequence, divides the similar depth image content into one segment, and performs block fusion on each segment to determine the depth image. The transformation relationship between the two, and the closed-loop detection between the segments and the segments according to the transformation relationship, to achieve global optimization, specifically including:
采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息;Using the Kintinuous framework, visual odometer estimation is performed to obtain camera pose information under each frame of depth image;
根据所述相机位姿信息,将由所述每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段;Depicking the point cloud data corresponding to the depth image of each frame back to the initial coordinate system according to the camera pose information, and comparing the similarity between the depth image obtained by the projection and the depth image of the initial frame, and when the similarity is When the similarity threshold is lower, the camera pose is initialized and segmented;
提取每一分段点云数据中的PFFH几何描述子,并在每两段之间进行粗配准,以及采用GICP算法进行精配准,得到段与段之间的匹配关系;Extracting the PFFH geometric descriptor in each segment point cloud data, and performing coarse registration between each two segments, and performing fine registration using the GICP algorithm to obtain a matching relationship between segments and segments;
利用每一分段的位姿信息以及所述段与段之间的匹配关系,构建图并采用G2O框架进行图优化,得到优化后的相机轨迹信息,从而实现所述全局优化。Using the pose information of each segment and the matching relationship between the segments and the segments, the map is constructed and optimized by using the G2O framework to obtain optimized camera track information, thereby implementing the global optimization.
优选地,所述根据所述相机位姿信息,将由所述每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段,具体包括:Preferably, the back-projection of the point cloud data corresponding to the depth image of each frame to the initial coordinate system according to the camera pose information, and comparing the similarity between the depth image obtained by the projection and the depth image of the initial frame And when the similarity is lower than the similarity threshold, the camera pose is initialized and segmented, specifically including:
步骤1:计算所述每帧深度图像与第一帧深度图像的相似度;Step 1: calculating a similarity between the depth image of each frame and the depth image of the first frame;
步骤2:判断所述相似度是否低于相似度阈值; Step 2: determining whether the similarity is lower than a similarity threshold;
步骤3:若是,则对所述深度图像序列进行分段;Step 3: If yes, segment the depth image sequence;
步骤4:将下一帧深度图像作为下一分段的起始帧深度图像,并重复执行步骤1和步骤2,直至处理完所有帧深度图像。Step 4: The next frame depth image is taken as the starting frame depth image of the next segment, and steps 1 and 2 are repeatedly performed until all frame depth images are processed.
优选地,所述步骤1具体包括:Preferably, the step 1 specifically includes:
根据投影关系和任一帧深度图像的深度值,并利用下式计算所述深度图像上每个像素所对应的第一空间三维点:Calculating a first spatial three-dimensional point corresponding to each pixel on the depth image according to a projection relationship and a depth value of any frame depth image:
p=π-1(up,Z(up))p=π -1 (u p ,Z(u p ))
其中,所述up是所述深度图像上的任一像素;所述Z(up)和所述p分别表示所述up对应的深度值和所述第一空间三维点;所述π表示所述投影关系;Wherein the u p is any pixel on the depth image; the Z(u p ) and the p respectively represent a depth value corresponding to the u p and the first spatial three-dimensional point; the π Representing the projection relationship;
根据下式将所述第一空间三维点旋转平移变换到世界坐标系下,得到第二空间三维点:The first spatial three-dimensional point rotation translation is transformed into a world coordinate system according to the following formula to obtain a second spatial three-dimensional point:
q=Tipq=T i p
其中,所述Ti表示第i帧深度图对应空间三维点到世界坐标系下的旋转平移矩阵;所述p表示所述第一空间三维点,所述q表示所述第二空间三维点;所述i取正整数;Wherein, the T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system; the p represents the first spatial three-dimensional point, and the q represents the second spatial three-dimensional point; The i takes a positive integer;
根据下式将所述第二空间三维点反投影到二维图像平面,得到投影后的深度图像:The second spatial three-dimensional point is back-projected to the two-dimensional image plane according to the following formula to obtain the projected depth image:
Figure PCTCN2017072257-appb-000006
Figure PCTCN2017072257-appb-000006
其中,所述uq是所述q对应的投影后深度图像上的像素;所述fx、所述fy、所述cx和所述cy表示深度相机的内参;所述xq、yq、zq表示所述q的坐标;所述T表示矩阵的转置;Wherein the u q is a pixel on the projected depth image corresponding to the q; the f x , the f y , the c x and the c y represent internal parameters of the depth camera; the x q , y q , z q represent the coordinates of the q; the T represents the transposition of the matrix;
分别计算所述起始帧深度图像和任一帧投影后的深度图像上的有效像素个数,并将两者比值作为相似度。The number of effective pixels on the start frame depth image and the depth image projected on any frame is calculated separately, and the ratios of the two are used as similarities.
优选地,所述根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型具体包括:根据所述处理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型。Preferably, the weighting volume data fusion is performed according to the processing result, so that reconstructing the indoor full scene three-dimensional model specifically includes: combining the depth image of each frame by using the truncated symbol distance function mesh model according to the processing result, and using the voxel The grid represents the three-dimensional space, thereby obtaining a three-dimensional model of the complete scene in the room.
优选地,根据所述处理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型,具体包括: Preferably, according to the processing result, the depth image of each frame is merged by using the truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene, which specifically includes:
基于噪声特点与兴趣区域模型,利用Volumetric method框架进行所述截断符号距离函数数据加权融合;Based on the noise characteristics and the region of interest model, the weighted fusion of the truncated symbol distance function data is performed by using a Volumemetric method framework;
采用Marching cubes算法进行Mesh模型提取,从而得到所述室内完整场景三维模型。The Mesh model extraction is performed by using the Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
优选地,所述截断符号距离函数根据下式来确定:Preferably, the truncated symbol distance function is determined according to the following formula:
fi(v)=[K-1zi(u)[uT,1]T]z-[vi]z f i (v)=[K -1 z i (u)[u T ,1] T ] z -[v i ] z
其中,fi(v)表示截断符号距离函数,也即网格到物体模型表面的距离,正负表示该网格是在表面被遮挡一侧还是在可见一侧,而过零点就是表面上的点;所述K表示所述相机的内参数矩阵;所述u表示像素;所述zi(u)表示所述像素u对应的深度值;所述vi表示体素。Where f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface a point; the K represents an internal parameter matrix of the camera; the u represents a pixel; the z i (u) represents a depth value corresponding to the pixel u; and the v i represents a voxel.
优选地,所述数据加权融合根据下式进行:Preferably, the data weighted fusion is performed according to the following formula:
Figure PCTCN2017072257-appb-000007
Figure PCTCN2017072257-appb-000007
其中,所述v表示体素;所述fi(v)和所述wi(v)分别表示所述体素v对应的截断符号距离函数及其权值函数;所述n取正整数;所述F(v)表示融合后所述体素v所对应的截断符号距离函数值;所述W(v)表示融合后体素v所对应的截断符号距离函数值的权重;Wherein, the v represents a voxel; the f i (v) and the w i (v) respectively represent a truncated symbol distance function corresponding to the voxel v and a weight function thereof; the n takes a positive integer; The F(v) represents a truncated symbol distance function value corresponding to the voxel v after the fusion; the W(v) represents a weight of a truncated symbol distance function value corresponding to the voxel v after the fusion;
其中,所述权值函数可以根据下式来确定:Wherein, the weight function can be determined according to the following formula:
Figure PCTCN2017072257-appb-000008
Figure PCTCN2017072257-appb-000008
其中,所述di表示兴趣区域的半径;所述δs是深度数据中的噪声方差;所述w为常数。Wherein the d i represents a radius of the region of interest; the δ s is a noise variance in the depth data; the w is a constant.
为了实现上述目的,另一方面,还提供了一种基于消费级深度相机进行室内完整场景三维重建的系统,该系统包括:In order to achieve the above object, on the other hand, a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera is provided, the system comprising:
获取模块,用于获取深度图像;Obtaining a module for acquiring a depth image;
滤波模块,用于对所述深度图像进行自适应双边滤波;a filtering module, configured to perform adaptive bilateral filtering on the depth image;
分块融合与配准模块,用于对滤波后的深度图像进行基于视觉内容的分块融合和配准处理; a block fusion and registration module for performing visual content-based block fusion and registration processing on the filtered depth image;
体数据融合模块,用于根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。The volume data fusion module is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the complete scene in the room.
优选地,所述滤波模块具体用于:Preferably, the filtering module is specifically configured to:
根据下式进行自适应双边滤波:Adaptive bilateral filtering is performed according to the following formula:
Figure PCTCN2017072257-appb-000009
Figure PCTCN2017072257-appb-000009
其中,所述u和所述uk分别表示所述深度图像上的任一像素及其领域像素;所述Z(u)和所述Z(uk)分别表示对应所述u和所述uk的深度值;所述
Figure PCTCN2017072257-appb-000010
表示滤波后对应的深度值;所述W表示在领域
Figure PCTCN2017072257-appb-000011
上的归一化因子;所述ws和所述wc分别表示在空间域和值域滤波的高斯核函数。
Wherein the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ;
Figure PCTCN2017072257-appb-000010
Representing the corresponding depth value after filtering; the W is expressed in the field
Figure PCTCN2017072257-appb-000011
The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
优选地,所述分块融合与配准模块具体可以用于:基于视觉内容对深度图像序列进行分段,并对每一分段进行分块融合,且所述分段间进行闭环检测,对闭环检测的结果做全局优化。Preferably, the block fusion and registration module is specifically configured to: segment the depth image sequence based on the visual content, perform block fusion on each segment, and perform closed-loop detection between the segments, The results of closed-loop detection are globally optimized.
优选地,所述分块融合与配准模块还具体可以用于:Preferably, the block fusion and registration module is further specifically configured to:
基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,对每一分段进行分块融合,确定所述深度图像之间的变换关系,并根据所述变换关系在段与段之间做闭环检测,以实现全局优化。The depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, and the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine a transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
优选地,所述分块融合与配准模块具体包括:Preferably, the block fusion and registration module specifically includes:
相机位姿信息获取单元,用于采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息;The camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image;
分段单元,用于根据所述相机位姿信息,将由所述每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段;a segmentation unit, configured to backproject the point cloud data corresponding to the depth image of each frame to an initial coordinate system according to the camera pose information, and perform similarity between the depth image obtained by the projection and the depth image of the initial frame. Comparing, and when the similarity is lower than the similarity threshold, initializing the camera pose and performing segmentation;
配准单元,用于提取每一分段点云数据中的PFFH几何描述子,并在每两段之间进行粗配准,以及采用GICP算法进行精配准,得到段与段之间的匹配关系;A registration unit is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a match between the segments. relationship;
优化单元,用于利用每一分段的位姿信息以及所述段与段之间的匹配关系,构建图并采用G2O框架进行图优化,得到优化后的相机轨迹信息,从而实现所述全局优化。 An optimization unit is configured to utilize the pose information of each segment and the matching relationship between the segments and the segments, construct a map, and perform graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby implementing the global optimization. .
优选地,所述分段单元具体包括:Preferably, the segmentation unit specifically includes:
计算单元,用于计算所述每帧深度图像与第一帧深度图像的相似度;a calculating unit, configured to calculate a similarity between the depth image of each frame and the depth image of the first frame;
判断单元,用于判断所述相似度是否低于相似度阈值;a determining unit, configured to determine whether the similarity is lower than a similarity threshold;
分段子单元,用于当所述相似度低于相似度阈值时,对所述深度图像序列进行分段;a segmentation subunit, configured to segment the depth image sequence when the similarity is lower than a similarity threshold;
处理单元,用于将下一帧深度图像作为下一分段的起始帧深度图像,并重复执行计算单元和判断单元,直至处理完所有帧深度图像。And a processing unit, configured to use the next frame depth image as the starting frame depth image of the next segment, and repeatedly execute the calculating unit and the determining unit until all the frame depth images are processed.
优选地,所述体数据融合模块具体用于:根据所述处理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型。Preferably, the volume data fusion module is specifically configured to: according to the processing result, use a truncated symbol distance function mesh model to fuse the depth image of each frame, and use a voxel grid to represent the three-dimensional space, thereby obtaining a complete indoor scene. 3D model.
优选地,所述体数据融合模块具体包括:Preferably, the volume data fusion module specifically includes:
加权融合单元,用于基于噪声特点与兴趣区域,利用Volumetric method框架进行所述截断符号距离函数数据加权融合;a weighted fusion unit, configured to perform weighted fusion of the truncated symbol distance function data by using a Volumemetric method framework based on noise characteristics and an interest region;
提取单元,用于采用Marching cubes算法进行Mesh模型提取,从而得到所述室内完整场景三维模型。An extracting unit is configured to perform Mesh model extraction by using a Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
本发明实施例提供一种基于消费级深度相机进行室内完整场景三维重建的方法及系统。其中,该方法包括获取深度图像;对深度图像进行自适应双边滤波;对滤波后的深度图像进行基于视觉内容的分块融合和配准处理;根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。本发明实施例通过对深度图像进行基于视觉内容的分块融合和配准,能有效地降低视觉里程计估计中的累积误差并提高配准精度,还采用加权体数据融合算法,这可以有效地保持物体表面的几何细节,由此,解决了如何提高室内场景下三维重建精度的技术问题,从而能够得到完整、准确、精细化的室内场景模型。Embodiments of the present invention provide a method and system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera. The method includes: acquiring a depth image; performing adaptive bilateral filtering on the depth image; performing block-based fusion and registration processing on the filtered depth image based on the visual content; and performing weighted volume data fusion according to the processing result, thereby reconstructing the indoor A complete 3D model of the scene. The embodiment of the invention can effectively reduce the cumulative error in the visual odometer estimation and improve the registration precision by performing the visual content-based block fusion and registration on the depth image, and also adopts the weighted volume data fusion algorithm, which can effectively The geometrical details of the surface of the object are maintained, thereby solving the technical problem of how to improve the accuracy of the three-dimensional reconstruction in the indoor scene, so that a complete, accurate and refined indoor scene model can be obtained.
附图说明DRAWINGS
图1为根据本发明实施例的基于消费级深度相机进行室内完整场景三维重建的方法的流程示意图;1 is a flow chart showing a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention;
图2a为根据本发明实施例的深度图像对应的彩色图像;2a is a color image corresponding to a depth image according to an embodiment of the present invention;
图2b为根据本发明实施例的从深度图像得到的点云示意 图;2b is a schematic diagram of a point cloud obtained from a depth image according to an embodiment of the present invention. Figure
图2c为根据本发明实施例的对深度图像进行双边滤波得到的点云示意图;2c is a schematic diagram of a point cloud obtained by bilaterally filtering a depth image according to an embodiment of the present invention;
图2d为根据本发明实施例的对深度图像进行自适应双边滤波得到的点云示意图2d is a schematic diagram of a point cloud obtained by adaptive bilateral filtering of a depth image according to an embodiment of the invention
图3为根据本发明实施例的基于视觉内容分段融合、配准的流程示意图;3 is a schematic flow chart of segmentation fusion and registration based on visual content according to an embodiment of the present invention;
图4为根据本发明实施例的加权体数据融合过程示意图;4 is a schematic diagram of a weighted volume data fusion process according to an embodiment of the present invention;
图5a为运用非加权体数据融合算法的三维重建结果示意图;FIG. 5a is a schematic diagram of a three-dimensional reconstruction result using an unweighted volume data fusion algorithm;
图5b为图5a中三维模型的局部细节示意图;Figure 5b is a partial detail view of the three-dimensional model of Figure 5a;
图5c为根据本发明实施例提出的加权体数据融合算法得到的三维重建结果示意图;FIG. 5c is a schematic diagram of a three-dimensional reconstruction result obtained by a weighted volume data fusion algorithm according to an embodiment of the present invention; FIG.
图5d为图5c中三维模型的局部细节示意图;Figure 5d is a partial detail view of the three-dimensional model of Figure 5c;
图6为根据本发明实施例的在3D Scene Data数据集上使用本发明实施例提出的方法进行三维重建的效果示意图;6 is a schematic diagram of an effect of performing three-dimensional reconstruction on a 3D Scene Data data set using the method proposed by the embodiment of the present invention;
图7为根据本发明实施例的在Augmented ICL-NUIM Dataset数据集上使用本发明实施例提出的方法进行三维重建的效果示意图;FIG. 7 is a schematic diagram showing the effect of performing three-dimensional reconstruction on the Augmented ICL-NUIM Dataset data set using the method proposed by the embodiment of the present invention;
图8为根据本发明实施例的利用Microsoft Kinect for Windows采集的室内场景数据进行三维重建的效果示意图;FIG. 8 is a schematic diagram showing an effect of three-dimensional reconstruction of indoor scene data collected by Microsoft Kinect for Windows according to an embodiment of the present invention; FIG.
图9为根据本发明实施例的基于消费级深度相机进行室内完整场景三维重建的系统的结构示意图。9 is a schematic structural diagram of a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-grade depth camera according to an embodiment of the present invention.
具体实施方式detailed description
下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是,这些实施方式仅仅用于解释本发明的技术原理,并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the scope of the present invention.
本发明实施例提供一种基于消费级深度相机进行室内完整场景三维重建的方法。如图1所示,该方法包括:Embodiments of the present invention provide a method for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera. As shown in Figure 1, the method includes:
S100:获取深度图像。S100: Acquire a depth image.
具体地,本步骤可以包括:利用基于结构光原理的消费 级深度相机来获取深度图像。Specifically, this step may include: utilizing consumption based on the principle of structured light Level depth camera to get depth images.
其中,基于结构光原理的消费级深度相机(Microsoft Kinect for Windows和Xtion,简称深度相机),是通过发射结构光,接收反射信息来获取深度图像的深度数据的。Among them, the consumer-level depth camera (Microsoft Kinect for Windows and Xtion, referred to as the depth camera) based on the structured light principle acquires the depth data of the depth image by transmitting the structured light and receiving the reflection information.
在实际应用中,可以利用手持式消费级深度相机Microsoft Kinect for Windows采集真实室内场景数据。In practical applications, real indoor scene data can be acquired using the handheld consumer depth camera Microsoft Kinect for Windows.
深度数据可以根据下式来计算:The depth data can be calculated according to the following formula:
Figure PCTCN2017072257-appb-000012
Figure PCTCN2017072257-appb-000012
其中,f表示消费级深度相机的焦距;B表示基线;D表示视差。Where f represents the focal length of the consumer-grade depth camera; B represents the baseline; D represents the parallax.
S110:对深度图像进行自适应双边滤波。S110: Perform adaptive bilateral filtering on the depth image.
本步骤利用基于结构光原理的消费级深度相机的噪声特点对获取的深度图像进行自适应双边滤波。In this step, the acquired depth image is adaptively double-filtered by using the noise characteristics of the consumer-level depth camera based on the structured light principle.
其中,自适应双边滤波算法是指在深度图像的空间域和值域上都进行滤波。The adaptive bilateral filtering algorithm refers to filtering in both the spatial domain and the value domain of the depth image.
在实际应用中,可以根据深度相机的噪声特点及其内部参数来设置自适应双边滤波算法的参数,这样能有效地去除噪声并保留边缘信息。In practical applications, the parameters of the adaptive bilateral filtering algorithm can be set according to the noise characteristics of the depth camera and its internal parameters, which can effectively remove noise and preserve edge information.
对深度Z关于视差D求偏导,存在以下关系:For the depth Z with respect to the parallax D, the following relationship exists:
Figure PCTCN2017072257-appb-000013
Figure PCTCN2017072257-appb-000013
深度数据的噪声主要产生于量化过程,从上式可以看出深度噪声的方差与深度值二次方成正比,也就是说深度值越大,噪声也越大。为了有效去除深度图像中的噪声,本发明实施例基于这个噪声特点来定义滤波算法。The noise of the depth data is mainly generated in the quantization process. It can be seen from the above equation that the variance of the depth noise is proportional to the square of the depth value, that is, the larger the depth value, the larger the noise. In order to effectively remove noise in the depth image, embodiments of the present invention define a filtering algorithm based on this noise characteristic.
具体地,上述自适应双边滤波可以根据下式进行:Specifically, the above adaptive bilateral filtering can be performed according to the following formula:
Figure PCTCN2017072257-appb-000014
Figure PCTCN2017072257-appb-000014
其中,u和uk分别表示深度图像上的任一像素及其领域像素;Z(u)和Z(uk)分别表示对应u和uk的深度值;
Figure PCTCN2017072257-appb-000015
表示滤波后对应的深度值;W表示在领域
Figure PCTCN2017072257-appb-000016
上的归一化因子;ws和wc分别表示在空间域和值域滤波的高斯核函数。
Where u and u k respectively represent any pixel on the depth image and its domain pixel; Z(u) and Z(u k ) respectively represent depth values corresponding to u and u k ;
Figure PCTCN2017072257-appb-000015
Indicates the depth value corresponding to the filter; W indicates the field
Figure PCTCN2017072257-appb-000016
The normalization factor on; w s and w c represent the Gaussian kernel function filtered in the spatial domain and the range domain, respectively.
在上述实施例中,ws和wc可以根据下式来确定:In the above embodiment, w s and w c can be determined according to the following formula:
Figure PCTCN2017072257-appb-000017
Figure PCTCN2017072257-appb-000017
其中,δs和δc分别是空间域和值域高斯核函数的方差。Where δ s and δ c are the variances of the spatial domain and the Gaussian kernel function of the range, respectively.
δs和δc与深度值大小有关,其取值不是固定的。δ s and δ c are related to the magnitude of the depth value, which is not fixed.
具体地,在上述实施例中,δs和δc可以根据下式来确定:Specifically, in the above embodiment, δ s and δ c can be determined according to the following formula:
Figure PCTCN2017072257-appb-000018
Figure PCTCN2017072257-appb-000018
其中,f表示深度相机的焦距,Ks和Kc表示常数,其具体取值与深度相机的参数有关。Where f represents the focal length of the depth camera, and K s and K c represent constants, the specific values of which are related to the parameters of the depth camera.
图2a-d示例性地示出了不同滤波算法的效果比较示意图。其中,图2a示出了深度图像对应的彩色图像。图2b示出了从深度图像得到的点云。图2c示出了对深度图像进行双边滤波得到的点云。图2d示出了对深度图像进行自适应双边滤波得到的点云。Figures 2a-d exemplarily show a comparison of the effects of different filtering algorithms. Among them, FIG. 2a shows a color image corresponding to the depth image. Figure 2b shows a point cloud derived from a depth image. Figure 2c shows a point cloud resulting from bilateral filtering of the depth image. Figure 2d shows a point cloud obtained by adaptive bilateral filtering of depth images.
本发明实施例通过采用自适应双边滤波方法,可以实现深度图的保边、去噪。The embodiment of the present invention can implement edge preservation and denoising of the depth map by adopting an adaptive bilateral filtering method.
S120:对深度图像进行基于视觉内容的分块融合和配准处理。S120: Perform visual content-based block fusion and registration processing on the depth image.
本步骤基于视觉内容对深度图像序列进行分段,并对每个分段进行分块融合,且分段间进行闭环检测,对闭环检测的结果做全局优化。其中,深度图像序列为深度图像数据流。In this step, the depth image sequence is segmented based on the visual content, and each segment is block-fused, and closed-loop detection is performed between segments, and the result of the closed-loop detection is globally optimized. The depth image sequence is a depth image data stream.
优选地,本步骤可以包括:确定深度图像之间的变换关系,基于视觉内容自动分段的方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,对每个分段进行分块融合,确定深度图像之间的变换关系,并根据变换关系在段与段之间做闭环检测,并实现全局优化。Preferably, the step may include: determining a transformation relationship between the depth images, segmenting the depth image sequence according to the method of automatically segmenting the visual content, and dividing the similar depth image content into one segment, for each segment The segment performs block fusion, determines the transformation relationship between the depth images, and performs closed-loop detection between segments and segments according to the transformation relationship, and achieves global optimization.
进一步地,本步骤可以包括:Further, this step may include:
S121:采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息。 S121: Using the Kintinuous frame, visual odometer estimation is performed to obtain camera pose information under each frame depth image.
S122:根据相机位姿信息,将由每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段。S122: Backprojecting the point cloud data corresponding to the depth image of each frame to the initial coordinate system according to the camera pose information, and comparing the similarity between the depth image obtained by the projection and the depth image of the initial frame, and when the similarity is lower than At the similarity threshold, the camera pose is initialized and segmented.
S123:提取每一分段点云数据中的PFFH几何描述子,并在每两段之间进行粗配准,以及采用GICP算法进行精配准,得到段与段之间的匹配关系。S123: Extract the PFFH geometric descriptor in each piece of point cloud data, perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a matching relationship between segments.
本步骤对段与段之间做闭环检测。This step performs closed-loop detection between segments.
S124:利用每一分段的位姿信息以及段与段之间的匹配关系,构建图并采用G2O框架进行图优化,得到优化后的相机轨迹信息,从而实现全局优化。S124: Using the pose information of each segment and the matching relationship between segments and segments, constructing a graph and performing graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby achieving global optimization.
本步骤在优化时应用(Simultaneous Localization and Calibration,SLAC)模式改善非刚性畸变,引入line processes约束删除错误的闭环匹配。This step improves the non-rigid distortion in the Simultaneous Localization and Calibration (SLAC) mode, and introduces line processes constraints to remove the wrong closed-loop matching.
上述步骤S122还可以具体包括:The foregoing step S122 may further include:
S1221:计算每帧深度图像与第一帧深度图像的相似度。S1221: Calculate the similarity between the depth image of each frame and the depth image of the first frame.
S1222:判断该相似度是否低于相似度阈值,若是,则执行步骤S1223;否则,执行步骤S1224。S1222: Determine whether the similarity is lower than the similarity threshold, and if yes, execute step S1223; otherwise, execute step S1224.
S1223:对深度图像序列进行分段。S1223: Segment the depth image sequence.
本步骤基于视觉内容对深度图像序列进行分段处理。这样既可以有效地解决视觉里程计估计产生的累积误差问题,又可以将相似的内容融合在一起,从而提高配准精度。This step segments the depth image sequence based on the visual content. In this way, the cumulative error problem caused by the estimation of the visual odometer can be effectively solved, and the similar content can be fused together, thereby improving the registration accuracy.
S1224:对深度图像序列不进行分段。S1224: The depth image sequence is not segmented.
S1225:将下一帧深度图像作为下一分段的起始帧深度图像,并重复执行步骤S1221和步骤S1222,直至处理完所有帧深度图像。S1225: The next frame depth image is taken as the start frame depth image of the next segment, and steps S1221 and S1222 are repeatedly performed until all the frame depth images are processed.
在上述实施例中,计算每帧深度图像与第一帧深度图像的相似度的步骤具体可以包括:In the above embodiment, the step of calculating the similarity between the depth image of each frame and the depth image of the first frame may specifically include:
S12211:根据投影关系和任一帧深度图像的深度值,并利用下式计算深度图像上每个像素所对应的第一空间三维点:S12211: Calculate the first spatial three-dimensional point corresponding to each pixel on the depth image according to the projection relationship and the depth value of the image of any frame depth:
p=π-1(up,Z(up))p=π -1 (u p ,Z(u p ))
其中,up是深度图像上的任一像素;Z(up)和p分别表示up对应的深度值和第一空间三维点;π表示投影关系,即每帧深度图像对应 的点云数据反投影到初始坐标系下的2D-3D投影变换关系。Where u p is any pixel on the depth image; Z(u p ) and p respectively represent the depth value corresponding to u p and the first spatial three-dimensional point; π represents the projection relationship, that is, the point cloud data corresponding to each depth image Backprojection to 2D-3D projection transformation relationship in the initial coordinate system.
S12212:根据下式将第一空间三维点旋转平移变换到世界坐标系下,得到第二空间三维点:S12212: The first spatial three-dimensional point rotation translation is transformed into a world coordinate system according to the following formula to obtain a second spatial three-dimensional point:
q=Tipq=T i p
其中,Ti表示第i帧深度图对应空间三维点到世界坐标系下的旋转平移矩阵,其可以通过视觉里程计估计得到;i取正整数;p表示第一空间三维点,q表示第二空间三维点,p和q的坐标分别为:Wherein, T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system, which can be estimated by a visual odometer; i takes a positive integer; p represents a first spatial three-dimensional point, and q represents a second The three-dimensional point of space, the coordinates of p and q are:
p=(xp,yp,zp),q=(xq,yq,zq)。p = (x p , y p , z p ), q = (x q , y q , z q ).
S12213:根据下式将第二空间三维点反投影到二维图像平面,得到投影后的深度图像:S12213: Backprojecting the second spatial three-dimensional point to the two-dimensional image plane according to the following formula to obtain the projected depth image:
Figure PCTCN2017072257-appb-000019
Figure PCTCN2017072257-appb-000019
其中,uq是q对应的投影后深度图像上的像素;fx、fy、cx和cy表示深度相机的内参;xq、yq、zq表示q的坐标;T表示矩阵的转置。Where u q is the pixel on the projected depth image corresponding to q; f x , f y , c x and c y represent the internal parameters of the depth camera; x q , y q , z q represent the coordinates of q; T represents the matrix Transpose.
S12214:分别计算起始帧深度图像和任一帧投影后的深度图像上的有效像素个数,并将两者比值作为相似度。S12214: Calculate the number of effective pixels on the start frame depth image and the depth image after any frame projection, and compare the ratios of the two as the similarity.
举例来说,根据下式来计算相似度:For example, calculate the similarity according to the following formula:
Figure PCTCN2017072257-appb-000020
Figure PCTCN2017072257-appb-000020
其中,n0和ni分别表示起始帧深度图像和任一帧投影后的深度图像上的有效像素个数;ρ表示相似度。Where n 0 and n i respectively represent the starting frame depth image and the number of effective pixels on the depth image after any frame projection; ρ represents the similarity.
图3示例性地示出了基于视觉内容分段融合、配准的流程示意图。FIG. 3 exemplarily shows a flow diagram of segmentation fusion and registration based on visual content.
本发明实施例采用基于视觉内容自动分段算法,能有效降低视觉里程计估计中的累积误差,提高了配准精度。The embodiment of the invention adopts an automatic segmentation algorithm based on visual content, which can effectively reduce the cumulative error in the visual odometer estimation and improve the registration accuracy.
S130:根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。S130: Perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the indoor complete scene.
具体地,本步骤可以包括:根据基于视觉内容的分块融合和配准处理结果,利用截断符号距离函数(TSDF)网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型。Specifically, the step may include: combining the depth image of each frame by using a truncated symbol distance function (TSDF) mesh model according to the content of the block-based fusion and registration processing based on the visual content, and using the voxel mesh to represent the three-dimensional space To obtain a three-dimensional model of the complete scene in the room.
本步骤还可以进一步包括: This step may further include:
S131:基于噪声特点与兴趣区域,利用Volumetric method框架进行截断符号距离函数数据加权融合。S131: Based on the noise characteristics and the region of interest, the weighted fusion of the truncated symbol distance function data is performed by using the Volumemetric method framework.
S132:采用Marching cubes算法进行Mesh模型提取。S132: Mesh model extraction is performed by using the Marching cubes algorithm.
在实际应用中,可以根据视觉里程计的估计结果,利用TSDF网格模型融合各帧的深度图像使用分辨率为m的体素网格来表示三维空间,即每个三维空间被分为m块,每个网格v存储两个值:截断符号距离函数fi(v)及其权重wi(v)。In practical applications, according to the estimation result of the visual odometer, the TSDF mesh model can be used to fuse the depth images of each frame to represent the three-dimensional space using a voxel grid with a resolution of m, that is, each three-dimensional space is divided into m blocks. Each grid v stores two values: the truncated symbol distance function f i (v) and its weight w i (v).
其中,可以根据下式来确定截断符号距离函数:Among them, the truncated symbol distance function can be determined according to the following formula:
fi(v)=[K-1zi(u)[uT,1]T]z-[vi]z f i (v)=[K -1 z i (u)[u T ,1] T ] z -[v i ] z
其中,fi(v)表示截断符号距离函数,也即网格到物体模型表面的距离,正负表示该网格是在表面被遮挡一侧还是在可见一侧,而过零点就是表面上的点;K表示相机的内参数矩阵;u表示像素;zi(u)表示像素u对应的深度值;vi表示体素。其中,该相机可以为深度相机或深度摄像机。Where f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface Point; K represents the internal parameter matrix of the camera; u represents the pixel; z i (u) represents the depth value corresponding to the pixel u; v i represents the voxel. Among them, the camera can be a depth camera or a depth camera.
其中,可以根据下式进行数据加权融合:Among them, data weighted fusion can be performed according to the following formula:
Figure PCTCN2017072257-appb-000021
Figure PCTCN2017072257-appb-000021
其中,fi(v)和wi(v)分别表示体素v对应的截断符号距离函数(TSDF)及其权值函数;n取正整数;F(v)表示融合后体素v所对应的截断符号距离函数值;W(v)表示融合后体素v所对应的截断符号距离函数值的权重。Where f i (v) and w i (v) respectively represent the truncated symbol distance function (TSDF) corresponding to voxel v and its weight function; n takes a positive integer; F(v) represents the voxel v after fusion The truncated symbol distance function value; W(v) represents the weight of the truncated symbol distance function value corresponding to the voxel v after fusion.
在上述实施例中,权值函数可以根据深度数据的噪声特点以及兴趣区域来确定,其值是不固定的。为了保持物体表面的几何细节,将噪声小的区域以及感兴趣区域的权值设置得大,将噪声大的区域或不感兴趣区域的权值设置得小。In the above embodiment, the weight function may be determined according to the noise characteristics of the depth data and the region of interest, and the value is not fixed. In order to maintain the geometric details of the surface of the object, the weight of the area with small noise and the area of interest is set large, and the weight of the area with high noise or the area of no interest is set small.
具体地,权值函数可以根据下式来确定:Specifically, the weight function can be determined according to the following formula:
Figure PCTCN2017072257-appb-000022
Figure PCTCN2017072257-appb-000022
其中,di表示兴趣区域的半径,半径越小表示越感兴趣,权值越大;δs是深度数据中的噪声方差,其取值与自适应双边滤波算法空间域核函数的方差一致;w为常数,优选地,其可以取值为1或0。Where d i represents the radius of the region of interest, the smaller the radius, the more interested, the greater the weight; δ s is the noise variance in the depth data, and its value is consistent with the variance of the spatial domain kernel function of the adaptive bilateral filtering algorithm; w is a constant, preferably it may take a value of 1 or 0.
图4示例性地示出了加权体数据融合过程示意图。FIG. 4 exemplarily shows a schematic diagram of a weighted volume data fusion process.
本发明实施例采用加权体数据融合算法可以有效保持物体表面的几何细节,能够得到完整、准确、精细化的室内场景模型,具有良好的鲁棒性和扩展性。The weighted volume data fusion algorithm in the embodiment of the invention can effectively maintain the geometric details of the surface of the object, and can obtain a complete, accurate and refined indoor scene model, which has good robustness and expandability.
图5a示例性地示出了运用非加权体数据融合算法的三维重建结果;图5b示例性地示出了图5a中三维模型的局部细节;图5c示例性地示出了利用本发明实施例提出的加权体数据融合算法得到的三维重建结果;图5d示例性地示出了图5c中三维模型的局部细节。Figure 5a exemplarily shows a three-dimensional reconstruction result using an unweighted volume data fusion algorithm; Figure 5b exemplarily shows a partial detail of the three-dimensional model in Figure 5a; Figure 5c exemplarily illustrates the use of an embodiment of the invention The three-dimensional reconstruction result obtained by the proposed weighted volume data fusion algorithm; Figure 5d exemplarily shows the local details of the three-dimensional model in Figure 5c.
图6示例性地示出了在3D Scene Data数据集上使用本发明实施例提出的方法进行三维重建的效果示意图;图7示例性地示出了在Augmented ICL-NUIM Dataset数据集上使用本发明实施例提出的方法进行三维重建的效果示意图;图8示例性地示出了利用Microsoft Kinect for Windows采集的室内场景数据进行三维重建的效果示意图。FIG. 6 exemplarily shows an effect of performing three-dimensional reconstruction on the 3D Scene Data data set using the method proposed by the embodiment of the present invention; FIG. 7 exemplarily shows the use of the present invention on the Augmented ICL-NUIM Dataset data set Schematic diagram of the effect of the method proposed by the embodiment for three-dimensional reconstruction; FIG. 8 exemplarily shows the effect of three-dimensional reconstruction of the indoor scene data collected by Microsoft Kinect for Windows.
应指出的是,本文虽然以上述顺序来描述本发明实施例,但是,本领域技术人员能够理解,还可以采取不同于此处的描述顺序来实施本发明,这些简单的变化也应包含在本发明的保护范围之内。It should be noted that although the embodiments of the present invention are described in the above-described order, those skilled in the art can understand that the present invention may be implemented in a different order than the description herein. These simple changes should also be included in the present invention. Within the scope of protection of the invention.
基于与方法实施例相同的技术构思,本发明实施例还提供一种基于消费级深度相机进行室内完整场景三维重建的系统,如图9所示,该系统90包括:获取模块92、滤波模块94、分块融合与配准模块96和体数据融合模块98。其中,获取模块92用于获取深度图像。滤波模块94用于对深度图像进行自适应双边滤波。分块融合与配准模块96用于对滤波后的深度图像进行基于视觉内容的分块融合和配准处理。体数据融合模块98用于根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。Based on the same technical concept as the method embodiment, the embodiment of the present invention further provides a system for performing three-dimensional reconstruction of a complete indoor scene based on a consumer-level depth camera. As shown in FIG. 9 , the system 90 includes: an obtaining module 92 and a filtering module 94. The block fusion and registration module 96 and the volume data fusion module 98. The obtaining module 92 is configured to acquire a depth image. The filtering module 94 is configured to perform adaptive bilateral filtering on the depth image. The block fusion and registration module 96 is configured to perform visual content based block fusion and registration processing on the filtered depth image. The volume data fusion module 98 is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the indoor complete scene.
本发明实施例通过采用上述技术方案,能有效地降低视觉里程计估计中的累积误差,并提高配准精度,可以有效保持物体表面的几何细节,能够得到完整、准确、精细化的室内场景模型。By adopting the above technical solution, the embodiment of the invention can effectively reduce the accumulated error in the visual odometer estimation, improve the registration precision, can effectively maintain the geometric details of the surface of the object, and can obtain a complete, accurate and refined indoor scene model. .
在一些实施例中,滤波模块具体用于:根据下式进行自适应双边滤波: In some embodiments, the filtering module is specifically configured to: perform adaptive bilateral filtering according to the following formula:
Figure PCTCN2017072257-appb-000023
Figure PCTCN2017072257-appb-000023
其中,u和uk分别表示深度图像上的任一像素及其领域像素;Z(u)和Z(uk)分别表示对应u和uk的深度值;
Figure PCTCN2017072257-appb-000024
表示滤波后对应的深度值;W表示在领域
Figure PCTCN2017072257-appb-000025
上的归一化因子;ws和wc分别表示在空间域和值域滤波的高斯核函数。
Where u and u k respectively represent any pixel on the depth image and its domain pixel; Z(u) and Z(u k ) respectively represent depth values corresponding to u and u k ;
Figure PCTCN2017072257-appb-000024
Indicates the depth value corresponding to the filter; W indicates the field
Figure PCTCN2017072257-appb-000025
The normalization factor on; w s and w c represent the Gaussian kernel function filtered in the spatial domain and the range domain, respectively.
在一些实施例中,分块融合与配准模块具体可以用于:基于视觉内容对深度图像序列进行分段,并对每一分段进行分块融合,且分段间进行闭环检测,对闭环检测的结果做全局优化。In some embodiments, the block fusion and registration module may be specifically configured to: segment the depth image sequence based on the visual content, perform block fusion on each segment, and perform closed-loop detection between segments, The results of the test are globally optimized.
在另一些实施例中,分块融合与配准模块还具体可以用于:确定深度图像之间的变换关系,基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,并对每一分段进行分块融合,确定深度图像之间的变换关系,根据变换关系在段与段之间做闭环检测,并实现全局优化。In other embodiments, the block fusion and registration module is further specifically configured to: determine a transformation relationship between depth images, segment the depth image sequence based on the visual content detection automatic segmentation method, and similar depth images The content is divided into a segment, and each segment is block-fused, the transformation relationship between the depth images is determined, and closed-loop detection is performed between segments and segments according to the transformation relationship, and global optimization is realized.
在一些优选的实施例中,分块融合与配准模块具体可以包括:相机位姿信息获取单元、分段单元、配准单元和优化单元。其中,相机位姿信息获取单元用于采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息。分段单元用于根据相机位姿信息,将由每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段。配准单元用于提取每一分段点云数据中的PFFH几何描述子,并在每两段之间进行粗配准,以及采用GICP算法进行精配准,得到段与段之间的匹配关系。优化单元用于利用每一分段的位姿信息以及段与段之间的匹配关系,构建图并采用G2O框架进行图优化,得到优化后的相机轨迹信息,从而实现全局优化。In some preferred embodiments, the block fusion and registration module may specifically include: a camera pose information acquisition unit, a segmentation unit, a registration unit, and an optimization unit. The camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image. The segmentation unit is configured to backproject the point cloud data corresponding to each frame depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the depth image obtained by the projection and the depth image of the initial frame, and similar When the degree is lower than the similarity threshold, the camera pose is initialized and segmented. The registration unit is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments, and perform fine registration using the GICP algorithm to obtain a matching relationship between the segments. . The optimization unit is used to utilize the pose information of each segment and the matching relationship between segments and segments, construct a graph and optimize the graph by using the G2O framework to obtain optimized camera track information, thereby achieving global optimization.
其中,上述分段单元具体可以包括:计算单元、判断单元、分段子单元和处理单元。其中,计算单元用于计算每帧深度图像与第一帧深度图像的相似度。判断单元用于判断相似度是否低于相似度阈值。分段子单元用于当相似度低于相似度阈值时,对深度图像序列进行分段。处理单元用于将下一帧深度图像作为下一分段的起始帧深度图像,并重复执行计算单元和判断单元,直至处理完所有帧深度图像。The segmentation unit may specifically include: a calculation unit, a determination unit, a segmentation subunit, and a processing unit. The calculation unit is configured to calculate the similarity between the depth image of each frame and the depth image of the first frame. The judging unit is configured to judge whether the similarity is lower than the similarity threshold. The segmentation subunit is configured to segment the depth image sequence when the similarity is below the similarity threshold. The processing unit is configured to use the next frame depth image as the start frame depth image of the next segment, and repeatedly execute the calculation unit and the determination unit until all frame depth images are processed.
在一些实施例中,体数据融合模块具体可以用于根据处 理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型。In some embodiments, the volume data fusion module can be specifically used in accordance with As a result, the depth image of each frame is fused by the truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene.
在一些实施例中,体数据融合模块具体可以包括加权融合单元和提取单元。其中,加权融合单元用于基于噪声特点与兴趣区域,利用Volumetric method框架进行截断符号距离函数数据加权融合。提取单元用于采用Marching cubes算法进行Mesh模型提取,从而得到室内完整场景三维模型。In some embodiments, the volume data fusion module may specifically include a weighted fusion unit and an extraction unit. The weighted fusion unit is configured to perform weighted fusion of the truncated symbol distance function data based on the noise feature and the region of interest using the Volumetric method framework. The extraction unit is used to extract the Mesh model by using the Marching cubes algorithm, thereby obtaining a three-dimensional model of the indoor complete scene.
下面以一优选实施例来详细说明本发明。The invention will now be described in detail by way of a preferred embodiment.
基于消费级深度相机进行室内完整场景三维重建的系统包括采集模块、滤波模块、分块融合与配准模块和体数据融合模块。其中:The system for performing three-dimensional reconstruction of indoor complete scenes based on the consumer-level depth camera includes an acquisition module, a filtering module, a block fusion and registration module, and a volume data fusion module. among them:
采集模块用于利用深度相机对室内场景进行深度图像采集。The acquisition module is used for depth image acquisition of indoor scenes using a depth camera.
滤波模块用于对获取的深度图像做自适应双边滤波处理。The filtering module is configured to perform adaptive bilateral filtering processing on the acquired depth image.
该采集模块为上述获取模块的等同替换。在实际应用中,可以利用手持式消费级深度相机Microsoft Kinect for Windows采集真实室内场景数据。然后,对采集到的深度图像进行自适应双边滤波,根据深度相机的噪声特点及其内部参数来自动设置自适应双边滤波方法中的参数,故,本发明实施例能有效去除噪声并保留边缘信息。The acquisition module is an equivalent replacement of the above acquisition module. In practical applications, real indoor scene data can be acquired using the handheld consumer depth camera Microsoft Kinect for Windows. Then, the adaptive depth filtering is performed on the acquired depth image, and the parameters in the adaptive bilateral filtering method are automatically set according to the noise characteristics of the depth camera and its internal parameters. Therefore, the embodiment of the present invention can effectively remove noise and preserve edge information. .
分块融合与配准模块用于基于视觉内容对数据流做自动分段,每个分段进行分块融合,分段间进行闭环检测,对闭环检测的结果做全局优化。The block fusion and registration module is used to automatically segment the data stream based on the visual content, each segment performs block fusion, and the closed-loop detection is performed between segments, and the result of the closed-loop detection is globally optimized.
该分块融合与配准模块进行基于视觉内容的自动分块融合、配准。The block fusion and registration module performs automatic block fusion and registration based on visual content.
在一个更优选的实施例中,分块融合与配准模块具体包括:位姿信息获取模块、分段模块、粗配准模块、精配准模块和优化模块。其中,位姿信息获取模块用于采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息。分段模块用于根据相机位姿信息将由每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后的深度图像与初始帧的深度图像进行相似度比较,若相似度低于相似度阈值则初始化相机位姿,并进行新的分段。粗配准模块用于提取每一分段点云数据中的PFFH几何描述子,并在每两段间之间进行粗配准; 精配准模块用于采用GICP算法进行精配准,以获取段与段之间的匹配关系。优化模块用于利用每一段的位姿信息以及段与段之间的匹配关系,构建图并采用G2O框架进行图优化。In a more preferred embodiment, the block fusion and registration module specifically includes: a pose information acquisition module, a segmentation module, a coarse registration module, a fine registration module, and an optimization module. The pose information acquisition module is configured to perform visual odometer estimation using the Kintinuous framework to obtain camera pose information under each frame depth image. The segmentation module is configured to backproject the point cloud data corresponding to each frame depth image to the initial coordinate system according to the camera pose information, and compare the similarity between the projected depth image and the depth image of the initial frame, if the similarity is lower than The similarity threshold initializes the camera pose and performs a new segmentation. The coarse registration module is used to extract the PFFH geometric descriptor in each piece of point cloud data, and perform coarse registration between each two segments; The fine registration module is used for fine registration using the GICP algorithm to obtain the matching relationship between segments. The optimization module is used to construct the map and use the G2O framework for graph optimization by using the pose information of each segment and the matching relationship between segments.
优选地,上述优化模块还进一步用于应用SLAC(Simultaneous Localization and Calibration)模式以优化非刚性畸变,并利用line processes约束删除错误的闭环匹配。Preferably, the optimization module is further used to apply a SLAC (Simultaneous Localization and Calibration) mode to optimize non-rigid distortion, and use line processes constraints to delete the wrong closed-loop matching.
上述分块融合与配准模块基于视觉内容对RGBD数据流进行分段处理,既可以有效地解决视觉里程计估计产生的累积误差问题,又可以将相似的内容融合在一起,从而可以提高配准精度。The above-mentioned block fusion and registration module segments the RGBD data stream based on the visual content, which can effectively solve the cumulative error problem caused by the visual odometer estimation, and can fuse the similar content together, thereby improving the registration. Precision.
体数据融合模块用于根据优化后的相机轨迹信息进行加权体数据融合,得到场景的三维模型。The volume data fusion module is configured to perform weighted volume data fusion according to the optimized camera track information to obtain a three-dimensional model of the scene.
该体数据融合模块根据深度相机的噪声特点和感兴趣区域来定义截断符号距离函数的权值函数,来实现物体表面的几何细节的保持。The volume data fusion module defines a weight function of the truncated symbol distance function according to the noise characteristics of the depth camera and the region of interest to achieve the geometric detail of the surface of the object.
基于消费级深度相机进行室内完整场景三维重建的系统上的实验表明:基于消费级深度相机的高精度三维重建方法,能够得到完整、准确、精细化的室内场景模型,系统具有良好的鲁棒性和扩展性。Experiments on a system based on consumer-grade depth camera for 3D reconstruction of indoor complete scenes show that a high-precision 3D reconstruction method based on consumer-grade depth camera can obtain a complete, accurate and refined indoor scene model, and the system has good robustness. And scalability.
上述基于消费级深度相机进行室内完整场景三维重建的系统实施例可以用于执行基于消费级深度相机进行室内完整场景三维重建的方法实施例,其技术原理、所解决的技术问题及产生的技术效果相似,可以互相参考;为描述的方便和简洁,各个实施例之间省略了描述相同的部分。The system embodiment for performing three-dimensional reconstruction of indoor complete scenes based on the consumer-level depth camera may be used to implement a method embodiment for performing three-dimensional reconstruction of indoor complete scenes based on a consumer-level depth camera, the technical principle, the technical problems solved, and the technical effects produced. Similarly, reference may be made to each other; for the convenience and brevity of the description, the same portions are omitted between the various embodiments.
需要说明的是,上述实施例提供的基于消费级深度相机进行室内完整场景三维重建的系统和方法在进行室内完整场景三维重建时,仅以上述各功能模块、单元或步骤的划分进行举例说明,例如,前述中的获取模块也可以作为采集模块,在实际应用中,可以根据需要而将上述功能分配由不同的功能模块、单元或步骤来完成,即将本发明实施例中的模块、单元或者步骤再分解或者组合,例如,可以将获取模块或采集和滤波模块合并为数据预处理模块。It should be noted that, the system and method for performing three-dimensional reconstruction of indoor complete scenes based on a consumer-level depth camera provided by the above embodiments are only illustrated by dividing the above functional modules, units or steps in performing three-dimensional reconstruction of indoor complete scenes. For example, the acquisition module in the foregoing may also be used as an acquisition module. In an actual application, the function distribution may be performed by different functional modules, units or steps according to requirements, that is, modules, units or steps in the embodiment of the present invention. Decomposed or combined, for example, the acquisition module or the acquisition and filtering module can be combined into a data preprocessing module.
至此,已经结合附图所示的优选实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本 领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。 Heretofore, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the drawings, but it is obvious to those skilled in the art that the scope of the present invention is obviously not limited to the specific embodiments. Without departing from the principles of the invention, Those skilled in the art can make equivalent changes or substitutions to the related technical features, and the technical solutions after the modification or replacement will fall within the protection scope of the present invention.

Claims (20)

  1. 一种基于消费级深度相机进行室内完整场景三维重建的方法,其特征在于,所述方法包括:A method for performing three-dimensional reconstruction of an indoor complete scene based on a consumer-level depth camera, the method comprising:
    获取深度图像;Get a depth image;
    对所述深度图像进行自适应双边滤波;Performing adaptive bilateral filtering on the depth image;
    对滤波后的深度图像进行基于视觉内容的分块融合和配准处理;Performing visual content-based block fusion and registration processing on the filtered depth image;
    根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。According to the processing result, weighted volume data fusion is performed to reconstruct a three-dimensional model of the complete scene in the room.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述深度图像进行自适应双边滤波具体包括:The method according to claim 1, wherein the performing adaptive bilateral filtering on the depth image comprises:
    根据下式进行自适应双边滤波:Adaptive bilateral filtering is performed according to the following formula:
    Figure PCTCN2017072257-appb-100001
    Figure PCTCN2017072257-appb-100001
    其中,所述u和所述uk分别表示所述深度图像上的任一像素及其领域像素;所述Z(u)和所述Z(uk)分别表示对应所述u和所述uk的深度值;所述
    Figure PCTCN2017072257-appb-100002
    表示滤波后对应的深度值;所述W表示在领域
    Figure PCTCN2017072257-appb-100003
    上的归一化因子;所述ws和所述wc分别表示在空间域和值域滤波的高斯核函数。
    Wherein the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ;
    Figure PCTCN2017072257-appb-100002
    Representing the corresponding depth value after filtering; the W is expressed in the field
    Figure PCTCN2017072257-appb-100003
    The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
  3. 根据权利要求2所述的方法,其特征在于,所述在空间域和值域滤波的高斯核函数根据下式来确定:The method according to claim 2, wherein said Gaussian kernel function in spatial domain and range filtering is determined according to the following formula:
    Figure PCTCN2017072257-appb-100004
    Figure PCTCN2017072257-appb-100004
    其中,所述δs和所述δc分别是空间域和值域高斯核函数的方差;Wherein the δ s and the δ c are variances of a spatial domain and a range Gaussian kernel function, respectively;
    其中,所述δs和所述δc根据下式来确定:Wherein the δ s and the δ c are determined according to the following formula:
    Figure PCTCN2017072257-appb-100005
    Figure PCTCN2017072257-appb-100005
    其中,所述f表示所述深度相机的焦距,所述Ks和所述Kc表示常数。Wherein f represents a focal length of the depth camera, and the K s and the K c represent a constant.
  4. 根据权利要求1所述的方法,其特征在于,所述对滤波后的深度图像进行基于视觉内容的分块融合和配准处理具体包括:基于视觉内容 对深度图像序列进行分段,并对每一分段进行分块融合,且所述分段间进行闭环检测,对闭环检测的结果做全局优化。The method according to claim 1, wherein the performing visual content-based block fusion and registration processing on the filtered depth image comprises: visual content based The depth image sequence is segmented, and each segment is block-fused, and closed-loop detection is performed between the segments, and the result of the closed-loop detection is globally optimized.
  5. 根据权利要求4所述的方法,其特征在于,所述基于视觉内容对深度图像序列进行分段,并对每一分段进行分块融合,且所述分段间进行闭环检测,对闭环检测的结果做全局优化具体包括:The method according to claim 4, wherein the segmentation of the depth image sequence is performed based on the visual content, and each segment is subjected to block fusion, and the closed-loop detection is performed between the segments, and the closed-loop detection is performed. The results of global optimization include:
    基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,并对每一分段进行分块融合,确定所述深度图像之间的变换关系,并根据所述变换关系在段与段之间做闭环检测,以实现全局优化。The depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine the transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
  6. 根据权利要求5所述的方法,其特征在于,所述基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,并对每一分段进行分块融合,确定所述深度图像之间的变换关系,并根据所述变换关系在段与段之间做闭环检测,以实现全局优化,具体包括:The method according to claim 5, wherein said automatic segmentation method based on visual content segmentation segments the depth image sequence, and the similar depth image content is divided into a segment and each segment is segmented Performing block fusion, determining a transformation relationship between the depth images, and performing closed-loop detection between the segments according to the transformation relationship, to achieve global optimization, specifically including:
    采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息;Using the Kintinuous framework, visual odometer estimation is performed to obtain camera pose information under each frame of depth image;
    根据所述相机位姿信息,将由所述每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段;Depicking the point cloud data corresponding to the depth image of each frame back to the initial coordinate system according to the camera pose information, and comparing the similarity between the depth image obtained by the projection and the depth image of the initial frame, and when the similarity is When the similarity threshold is lower, the camera pose is initialized and segmented;
    提取每一分段点云数据中的PFFH几何描述子,并在每两段之间进行粗配准,以及采用GICP算法进行精配准,得到段与段之间的匹配关系;Extracting the PFFH geometric descriptor in each segment point cloud data, and performing coarse registration between each two segments, and performing fine registration using the GICP algorithm to obtain a matching relationship between segments and segments;
    利用每一分段的位姿信息以及所述段与段之间的匹配关系,构建图并采用G2O框架进行图优化,得到优化后的相机轨迹信息,从而实现所述全局优化。Using the pose information of each segment and the matching relationship between the segments and the segments, the map is constructed and optimized by using the G2O framework to obtain optimized camera track information, thereby implementing the global optimization.
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述相机位姿信息,将由所述每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段,具体包括:The method according to claim 6, wherein the back-projection of the point cloud data corresponding to the depth image of each frame to the initial coordinate system according to the camera pose information, and the depth image obtained by the projection Performing a similarity comparison with the depth image of the initial frame, and when the similarity is lower than the similarity threshold, initializing the camera pose and performing segmentation, specifically including:
    步骤1:计算所述每帧深度图像与第一帧深度图像的相似度;Step 1: calculating a similarity between the depth image of each frame and the depth image of the first frame;
    步骤2:判断所述相似度是否低于相似度阈值;Step 2: determining whether the similarity is lower than a similarity threshold;
    步骤3:若是,则对所述深度图像序列进行分段; Step 3: If yes, segment the depth image sequence;
    步骤4:将下一帧深度图像作为下一分段的起始帧深度图像,并重复执行步骤1和步骤2,直至处理完所有帧深度图像。Step 4: The next frame depth image is taken as the starting frame depth image of the next segment, and steps 1 and 2 are repeatedly performed until all frame depth images are processed.
  8. 根据权利要求7所述的方法,其特征在于,所述步骤1具体包括:The method of claim 7, wherein the step 1 comprises:
    根据投影关系和任一帧深度图像的深度值,并利用下式计算所述深度图像上每个像素所对应的第一空间三维点:Calculating a first spatial three-dimensional point corresponding to each pixel on the depth image according to a projection relationship and a depth value of any frame depth image:
    p=π-1(up,Z(up))p=π -1 (u p ,Z(u p ))
    其中,所述up是所述深度图像上的任一像素;所述Z(up)和所述p分别表示所述up对应的深度值和所述第一空间三维点;所述π表示所述投影关系;Wherein the u p is any pixel on the depth image; the Z(u p ) and the p respectively represent a depth value corresponding to the u p and the first spatial three-dimensional point; the π Representing the projection relationship;
    根据下式将所述第一空间三维点旋转平移变换到世界坐标系下,得到第二空间三维点:The first spatial three-dimensional point rotation translation is transformed into a world coordinate system according to the following formula to obtain a second spatial three-dimensional point:
    q=Tipq=T i p
    其中,所述Ti表示第i帧深度图对应空间三维点到世界坐标系下的旋转平移矩阵;所述p表示所述第一空间三维点,所述q表示所述第二空间三维点;所述i取正整数;Wherein, the T i represents a rotational translation matrix corresponding to the spatial 3D point of the ith frame depth map to the world coordinate system; the p represents the first spatial three-dimensional point, and the q represents the second spatial three-dimensional point; The i takes a positive integer;
    根据下式将所述第二空间三维点反投影到二维图像平面,得到投影后的深度图像:The second spatial three-dimensional point is back-projected to the two-dimensional image plane according to the following formula to obtain the projected depth image:
    Figure PCTCN2017072257-appb-100006
    Figure PCTCN2017072257-appb-100006
    其中,所述uq是所述q对应的投影后深度图像上的像素;所述fx、所述fy、所述cx和所述cy表示深度相机的内参;所述xq、yq、zq表示所述q的坐标;所述T表示矩阵的转置;Wherein the u q is a pixel on the projected depth image corresponding to the q; the f x , the f y , the c x and the c y represent internal parameters of the depth camera; the x q , y q , z q represent the coordinates of the q; the T represents the transposition of the matrix;
    分别计算所述起始帧深度图像和任一帧投影后的深度图像上的有效像素个数,并将两者比值作为相似度。The number of effective pixels on the start frame depth image and the depth image projected on any frame is calculated separately, and the ratios of the two are used as similarities.
  9. 根据权利要求1所述的方法,其特征在于,所述根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型具体包括:根据所述处理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型。The method according to claim 1, wherein the performing the weighted volume data fusion according to the processing result, thereby reconstructing the indoor complete scene three-dimensional model comprises: merging the truncated symbol distance function mesh model according to the processing result A depth image of each frame, and a voxel grid is used to represent the three-dimensional space, thereby obtaining a three-dimensional model of the indoor complete scene.
  10. 根据权利要求9所述的方法,其特征在于,根据所述处理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型,具体包括: The method according to claim 9, wherein according to the processing result, the depth image of each frame is merged by using a truncated symbol distance function mesh model, and the voxel mesh is used to represent the three-dimensional space, thereby obtaining a complete indoor scene. The 3D model specifically includes:
    基于噪声特点与兴趣区域,利用Volumetric method框架进行所述截断符号距离函数数据加权融合;Based on the noise characteristics and the region of interest, the weighted fusion of the truncated symbol distance function data is performed by using a Volumemetric method framework;
    采用Marching cubes算法进行Mesh模型提取,从而得到所述室内完整场景三维模型。The Mesh model extraction is performed by using the Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
  11. 根据权利要求9或10所述的方法,其特征在于,所述截断符号距离函数根据下式来确定:The method according to claim 9 or 10, wherein the truncated symbol distance function is determined according to the following equation:
    fi(v)=[K-1zi(u)[uT,1]T]z-[vi]z f i (v)=[K -1 z i (u)[u T ,1] T ] z -[v i ] z
    其中,fi(v)表示截断符号距离函数,也即网格到物体模型表面的距离,正负表示该网格是在表面被遮挡一侧还是在可见一侧,而过零点就是表面上的点;所述K表示所述相机的内参数矩阵;所述u表示像素;所述zi(u)表示所述像素u对应的深度值;所述vi表示体素。Where f i (v) represents the truncated symbol distance function, that is, the distance from the mesh to the surface of the object model, positive or negative indicates whether the mesh is on the occluded side or the visible side, and the zero crossing is on the surface a point; the K represents an internal parameter matrix of the camera; the u represents a pixel; the z i (u) represents a depth value corresponding to the pixel u; and the v i represents a voxel.
  12. 根据权利要求10所述的方法,其特征在于,所述数据加权融合根据下式进行:The method according to claim 10, wherein said data weighting fusion is performed according to the following formula:
    Figure PCTCN2017072257-appb-100007
    Figure PCTCN2017072257-appb-100007
    其中,所述v表示体素;所述fi(v)和所述wi(v)分别表示所述体素v对应的截断符号距离函数及其权值函数;所述n取正整数;所述F(v)表示融合后所述体素v所对应的截断符号距离函数值;所述W(v)表示融合后体素v所对应的截断符号距离函数值的权重;Wherein, the v represents a voxel; the f i (v) and the w i (v) respectively represent a truncated symbol distance function corresponding to the voxel v and a weight function thereof; the n takes a positive integer; The F(v) represents a truncated symbol distance function value corresponding to the voxel v after the fusion; the W(v) represents a weight of a truncated symbol distance function value corresponding to the voxel v after the fusion;
    其中,所述权值函数可以根据下式来确定:Wherein, the weight function can be determined according to the following formula:
    Figure PCTCN2017072257-appb-100008
    Figure PCTCN2017072257-appb-100008
    其中,所述di表示兴趣区域的半径;所述δs是深度数据中的噪声方差;所述w为常数。Wherein the d i represents a radius of the region of interest; the δ s is a noise variance in the depth data; the w is a constant.
  13. 一种基于消费级深度相机进行室内完整场景三维重建的系统,其特征在于,所述系统包括:A system for performing three-dimensional reconstruction of an indoor complete scene based on a consumer-grade depth camera, wherein the system comprises:
    获取模块,用于获取深度图像;Obtaining a module for acquiring a depth image;
    滤波模块,用于对所述深度图像进行自适应双边滤波; a filtering module, configured to perform adaptive bilateral filtering on the depth image;
    分块融合与配准模块,用于对滤波后的深度图像进行基于视觉内容的分块融合和配准处理;a block fusion and registration module for performing visual content-based block fusion and registration processing on the filtered depth image;
    体数据融合模块,用于根据处理结果,进行加权体数据融合,从而重建室内完整场景三维模型。The volume data fusion module is configured to perform weighted volume data fusion according to the processing result, thereby reconstructing a three-dimensional model of the complete scene in the room.
  14. 根据权利要求13所述的系统,其特征在于,所述滤波模块具体用于:The system according to claim 13, wherein the filtering module is specifically configured to:
    根据下式进行自适应双边滤波:Adaptive bilateral filtering is performed according to the following formula:
    Figure PCTCN2017072257-appb-100009
    Figure PCTCN2017072257-appb-100009
    其中,所述u和所述uk分别表示所述深度图像上的任一像素及其领域像素;所述Z(u)和所述Z(uk)分别表示对应所述u和所述uk的深度值;所述
    Figure PCTCN2017072257-appb-100010
    表示滤波后对应的深度值;所述W表示在领域
    Figure PCTCN2017072257-appb-100011
    上的归一化因子;所述ws和所述wc分别表示在空间域和值域滤波的高斯核函数。
    Wherein the u and the u k respectively represent any pixel on the depth image and its domain pixel; the Z(u) and the Z(u k ) respectively represent the u and the u Depth value of k ;
    Figure PCTCN2017072257-appb-100010
    Representing the corresponding depth value after filtering; the W is expressed in the field
    Figure PCTCN2017072257-appb-100011
    The normalization factor on; the w s and the w c represent Gaussian kernel functions filtered in the spatial domain and the range domain, respectively.
  15. 根据权利要求13所述的系统,其特征在于,所述分块融合与配准模块具体用于:基于视觉内容对深度图像序列进行分段,并对每一分段进行分块融合,且所述分段间进行闭环检测,对闭环检测的结果做全局优化。The system according to claim 13, wherein the block fusion and registration module is specifically configured to: segment the depth image sequence based on the visual content, and perform block fusion on each segment, and Closed-loop detection is performed between segments, and the results of closed-loop detection are globally optimized.
  16. 根据权利要求15所述的系统,其特征在于,所述分块融合与配准模块还具体用于:The system according to claim 15, wherein the block fusion and registration module is further configured to:
    基于视觉内容检测自动分段方法对深度图像序列进行分段,将相似的深度图像内容分在一个分段中,对每一分段进行分块融合,确定所述深度图像之间的变换关系,并根据所述变换关系在段与段之间做闭环检测,以实现全局优化。The depth segmentation sequence is segmented according to the automatic segmentation method for visual content detection, and the similar depth image content is divided into one segment, and each segment is subjected to block fusion to determine a transformation relationship between the depth images. And performing closed-loop detection between segments and segments according to the transformation relationship to achieve global optimization.
  17. 根据权利要求16所述的系统,其特征在于,所述分块融合与配准模块具体包括:The system according to claim 16, wherein the block fusion and registration module specifically comprises:
    相机位姿信息获取单元,用于采用Kintinuous框架,进行视觉里程计估计,得到每帧深度图像下的相机位姿信息;The camera pose information acquisition unit is configured to perform visual odometer estimation using the Kintinuous frame to obtain camera pose information under each frame depth image;
    分段单元,用于根据所述相机位姿信息,将由所述每帧深度图像对应的点云数据反投影到初始坐标系下,用投影后得到的深度图像与初始帧的深度图像进行相似度比较,并当相似度低于相似度阈值时,初始化相机位姿,进行分段;a segmentation unit, configured to backproject the point cloud data corresponding to the depth image of each frame to an initial coordinate system according to the camera pose information, and perform similarity between the depth image obtained by the projection and the depth image of the initial frame. Comparing, and when the similarity is lower than the similarity threshold, initializing the camera pose and performing segmentation;
    配准单元,用于提取每一分段点云数据中的PFFH几何描述子,并在 每两段之间进行粗配准,以及采用GICP算法进行精配准,得到段与段之间的匹配关系;a registration unit for extracting a PFFH geometric descriptor in each piece of point cloud data, and The coarse registration is performed between every two segments, and the fine registration is performed by the GICP algorithm to obtain the matching relationship between the segments and the segments;
    优化单元,用于利用每一分段的位姿信息以及所述段与段之间的匹配关系,构建图并采用G2O框架进行图优化,得到优化后的相机轨迹信息,从而实现所述全局优化。An optimization unit is configured to utilize the pose information of each segment and the matching relationship between the segments and the segments, construct a map, and perform graph optimization using a G2O framework to obtain optimized camera trajectory information, thereby implementing the global optimization. .
  18. 根据权利要求17所述的系统,其特征在于,所述分段单元具体包括:The system according to claim 17, wherein the segmentation unit specifically comprises:
    计算单元,用于计算所述每帧深度图像与第一帧深度图像的相似度;a calculating unit, configured to calculate a similarity between the depth image of each frame and the depth image of the first frame;
    判断单元,用于判断所述相似度是否低于相似度阈值;a determining unit, configured to determine whether the similarity is lower than a similarity threshold;
    分段子单元,用于当所述相似度低于相似度阈值时,对所述深度图像序列进行分段;a segmentation subunit, configured to segment the depth image sequence when the similarity is lower than a similarity threshold;
    处理单元,用于将下一帧深度图像作为下一分段的起始帧深度图像,并重复执行计算单元和判断单元,直至处理完所有帧深度图像。And a processing unit, configured to use the next frame depth image as the starting frame depth image of the next segment, and repeatedly execute the calculating unit and the determining unit until all the frame depth images are processed.
  19. 根据权利要求13所述的系统,其特征在于,所述体数据融合模块具体用于:根据所述处理结果,利用截断符号距离函数网格模型融合各帧的深度图像,并使用体素网格来表示三维空间,从而得到室内完整场景三维模型。The system according to claim 13, wherein the volume data fusion module is specifically configured to: according to the processing result, fuse a depth image of each frame by using a truncated symbol distance function mesh model, and use a voxel grid To represent the three-dimensional space, so as to obtain a three-dimensional model of the indoor complete scene.
  20. 根据权利要求19所述的系统,其特征在于,所述体数据融合模块具体包括:The system according to claim 19, wherein the volume data fusion module specifically comprises:
    加权融合单元,用于基于噪声特点与兴趣区域,利用Volumetric method框架进行所述截断符号距离函数数据加权融合;a weighted fusion unit, configured to perform weighted fusion of the truncated symbol distance function data by using a Volumemetric method framework based on noise characteristics and an interest region;
    提取单元,用于采用Marching cubes算法进行Mesh模型提取,从而得到所述室内完整场景三维模型。 An extracting unit is configured to perform Mesh model extraction by using a Marching cubes algorithm to obtain a three-dimensional model of the indoor complete scene.
PCT/CN2017/072257 2017-01-23 2017-01-23 Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera WO2018133119A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/072257 WO2018133119A1 (en) 2017-01-23 2017-01-23 Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/072257 WO2018133119A1 (en) 2017-01-23 2017-01-23 Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera

Publications (1)

Publication Number Publication Date
WO2018133119A1 true WO2018133119A1 (en) 2018-07-26

Family

ID=62907634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/072257 WO2018133119A1 (en) 2017-01-23 2017-01-23 Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera

Country Status (1)

Country Link
WO (1) WO2018133119A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166171A (en) * 2018-08-09 2019-01-08 西北工业大学 Motion recovery structure three-dimensional reconstruction method based on global and incremental estimation
CN110375765A (en) * 2019-06-28 2019-10-25 上海交通大学 Visual odometry method, system and storage medium based on direct method
CN110807789A (en) * 2019-08-23 2020-02-18 腾讯科技(深圳)有限公司 Image processing method, model, device, electronic equipment and readable storage medium
CN111260709A (en) * 2020-01-15 2020-06-09 浙江大学 Ground-assisted visual odometer method for dynamic environment
CN111524075A (en) * 2020-03-26 2020-08-11 北京迈格威科技有限公司 Depth image filtering method, image synthesis method, device, equipment and medium
CN115346002A (en) * 2022-10-14 2022-11-15 佛山科学技术学院 Virtual scene construction method and rehabilitation training application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101860667A (en) * 2010-05-06 2010-10-13 中国科学院西安光学精密机械研究所 Method of quickly eliminating composite noise in images
CN102800127A (en) * 2012-07-18 2012-11-28 清华大学 Light stream optimization based three-dimensional reconstruction method and device
CN104599314A (en) * 2014-06-12 2015-05-06 深圳奥比中光科技有限公司 Three-dimensional model reconstruction method and system
CN104732492A (en) * 2015-03-09 2015-06-24 北京工业大学 Depth image denoising method
CN106169179A (en) * 2016-06-30 2016-11-30 北京大学 Image denoising method and image noise reduction apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101860667A (en) * 2010-05-06 2010-10-13 中国科学院西安光学精密机械研究所 Method of quickly eliminating composite noise in images
CN102800127A (en) * 2012-07-18 2012-11-28 清华大学 Light stream optimization based three-dimensional reconstruction method and device
CN104599314A (en) * 2014-06-12 2015-05-06 深圳奥比中光科技有限公司 Three-dimensional model reconstruction method and system
CN104732492A (en) * 2015-03-09 2015-06-24 北京工业大学 Depth image denoising method
CN106169179A (en) * 2016-06-30 2016-11-30 北京大学 Image denoising method and image noise reduction apparatus

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166171A (en) * 2018-08-09 2019-01-08 西北工业大学 Motion recovery structure three-dimensional reconstruction method based on global and incremental estimation
CN109166171B (en) * 2018-08-09 2022-05-13 西北工业大学 Motion recovery structure three-dimensional reconstruction method based on global and incremental estimation
CN110375765A (en) * 2019-06-28 2019-10-25 上海交通大学 Visual odometry method, system and storage medium based on direct method
CN110375765B (en) * 2019-06-28 2021-04-13 上海交通大学 Visual odometer method, system and storage medium based on direct method
CN110807789A (en) * 2019-08-23 2020-02-18 腾讯科技(深圳)有限公司 Image processing method, model, device, electronic equipment and readable storage medium
CN111260709A (en) * 2020-01-15 2020-06-09 浙江大学 Ground-assisted visual odometer method for dynamic environment
CN111260709B (en) * 2020-01-15 2022-04-19 浙江大学 Ground-assisted visual odometer method for dynamic environment
CN111524075A (en) * 2020-03-26 2020-08-11 北京迈格威科技有限公司 Depth image filtering method, image synthesis method, device, equipment and medium
CN111524075B (en) * 2020-03-26 2023-08-22 北京迈格威科技有限公司 Depth image filtering method, image synthesizing method, device, equipment and medium
CN115346002A (en) * 2022-10-14 2022-11-15 佛山科学技术学院 Virtual scene construction method and rehabilitation training application thereof
CN115346002B (en) * 2022-10-14 2023-01-17 佛山科学技术学院 Virtual scene construction method and rehabilitation training application thereof

Similar Documents

Publication Publication Date Title
CN106910242B (en) Method and system for carrying out indoor complete scene three-dimensional reconstruction based on depth camera
CN111815757B (en) Large member three-dimensional reconstruction method based on image sequence
US11727587B2 (en) Method and system for scene image modification
CN109872397B (en) Three-dimensional reconstruction method of airplane parts based on multi-view stereo vision
Zach et al. A globally optimal algorithm for robust tv-l 1 range image integration
WO2018133119A1 (en) Method and system for three-dimensional reconstruction of complete indoor scene based on depth camera
CN109242873B (en) Method for carrying out 360-degree real-time three-dimensional reconstruction on object based on consumption-level color depth camera
Li et al. Detail-preserving and content-aware variational multi-view stereo reconstruction
Hiep et al. Towards high-resolution large-scale multi-view stereo
CN111243093B (en) Three-dimensional face grid generation method, device, equipment and storage medium
Sibbing et al. Sift-realistic rendering
Hane et al. Class specific 3d object shape priors using surface normals
CN109961506A (en) A kind of fusion improves the local scene three-dimensional reconstruction method of Census figure
CN113178009B (en) Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair
Park et al. Translation-symmetry-based perceptual grouping with applications to urban scenes
Xu et al. Survey of 3D modeling using depth cameras
CN113674400A (en) Spectrum three-dimensional reconstruction method and system based on repositioning technology and storage medium
Nouduri et al. Deep realistic novel view generation for city-scale aerial images
CN112927251A (en) Morphology-based scene dense depth map acquisition method, system and device
Kim et al. Multi-view object extraction with fractional boundaries
Park et al. A tensor voting approach for multi-view 3D scene flow estimation and refinement
Cushen et al. Markerless real-time garment retexturing from monocular 3d reconstruction
Li et al. Multi-view stereo via depth map fusion: A coordinate decent optimization method
Nguyen et al. High resolution 3d content creation using unconstrained and uncalibrated cameras
Nicosevici et al. Efficient 3D scene modeling and mosaicing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17892520

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17892520

Country of ref document: EP

Kind code of ref document: A1