WO2018129794A1 - 一种实时大规模场景三维扫描建模方法及系统 - Google Patents

一种实时大规模场景三维扫描建模方法及系统 Download PDF

Info

Publication number
WO2018129794A1
WO2018129794A1 PCT/CN2017/075025 CN2017075025W WO2018129794A1 WO 2018129794 A1 WO2018129794 A1 WO 2018129794A1 CN 2017075025 W CN2017075025 W CN 2017075025W WO 2018129794 A1 WO2018129794 A1 WO 2018129794A1
Authority
WO
WIPO (PCT)
Prior art keywords
optimization
frame
matching
frames
camera
Prior art date
Application number
PCT/CN2017/075025
Other languages
English (en)
French (fr)
Inventor
黄经纬
Original Assignee
上海云拟科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海云拟科技有限公司 filed Critical 上海云拟科技有限公司
Publication of WO2018129794A1 publication Critical patent/WO2018129794A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Definitions

  • the present invention relates to the field of three-dimensional modeling, and in particular, to a method and system for modeling a three-dimensional scan of a large-scale scene.
  • RGB-D cameras have led to tremendous changes in 3D scanning.
  • 3D printing, virtual and augmented reality, games and robotics have important requirements for large-scale real-world 3D scanning: robots or ordinary users can obtain synthetic 3D models by real-time scanning, which can be applied to robot navigation. Bring the physical world into the virtual environment, or provide users with real information feedback.
  • Real-time 3D model update With a stable camera tracking peer, the system needs to collect 3D data of each frame to obtain a complete 3D model and visualize it. Updating the model at the same location as the camera location update is a huge challenge.
  • Reality Fast implementation is an indispensable element in many virtual/augmented reality and robotic applications.
  • the Chinese invention patent application with the application number 201410461249.2 discloses a scene recovery method and apparatus based on low quality GRB-D data.
  • the contour retrieval method is used to help restore the small object. This greatly improves the accuracy of recovering the 3D model from the low-quality RGB-D image sequence, and automatically recovers the semantically correct and visually realistic virtual 3D scene model without manual intervention.
  • the present invention has been made in order to provide a method and system for realizing large-scale scene three-dimensional scanning modeling that overcomes the above problems or at least partially solves the above problems.
  • the main core innovations of the present invention are the new global camera online optimization, from sparse to dense joint optimization, and online model correction optimization.
  • the present invention stores CPU-GPU memory management and Hash table, so that the model accuracy of the present invention can be guaranteed at a level of 0.5 cm under the premise of large-scale space modeling.
  • the present invention provides a method for modeling a large-scale scene three-dimensional scan, the method comprising the following steps:
  • the MIMO-D video stream is captured by the depth camera, as described above.
  • the method for performing a global camera pose optimization on the video stream according to the real large-scale scene three-dimensional scanning modeling method as described above comprises the following sub-steps:
  • Feature matching search The feature matching of the pairwise input frame is established by SIFT, and the detected SIFT feature points will be matched with all previous frames, and the wrong matching is filtered out;
  • the process of the above sub-step (1) is as follows:
  • intra-block camera pose optimization optimizing camera pose based on all verified feature matching points of the block-by-block internal frame, the optimization is based on sparse feature points, and dense color geometric information, minimizing energy owned;
  • the process of the above sub-step (3) is as follows:
  • the method for performing three-dimensional scanning modeling of the globally optimized large-scale scene according to the global camera prediction position includes the following sub-steps:
  • fusion and de-fusion for each individual element, record its signed distance and weight to the surface of the nearest object; for a new frame, update the signed distance and weight by weighted average, thereby Complete the fusion, or reverse the above fusion, complete the fusion;
  • a real-time large-scale scene three-dimensional scanning modeling system comprising the following modules:
  • a video stream obtaining module configured to acquire an RGB-D video stream
  • a global camera pose optimization module configured to perform global camera pose optimization on the video stream to obtain a global camera predicted position
  • a three-dimensional modeling module configured to perform globally optimized three-dimensional scanning modeling according to the global camera predicted position.
  • the MIMO-D video stream is captured by the depth camera.
  • the global camera posture optimization module comprises the following units:
  • a feature matching search unit configured to establish feature matching of the pairwise input frame by using SIFT, matching the detected SI FT feature points with all previous frames, and filtering out the wrong match;
  • a hierarchical optimization model establishing unit is configured to divide the video stream into small blocks including consecutive frames, and in the bottom layer optimization, only the inner frame of the small block is optimized, and in the optimization of the upper layer, the key is used.
  • the corresponding points of the frame are associated with all the blocks, and the blocks are aligned with each other as a whole.
  • a pose alignment and energy minimization unit for parameterizing a matrix into a six-dimensional vector, wherein three unknowns are from rotation, three unknowns are from translation; aligned energy is by sparse energy and dense energy The linear combination is obtained, and the weight of the dense energy is gradually increased, thereby obtaining a global optimization from coarse to fine.
  • the feature matching search unit includes the following subunits:
  • a matching filter sub-unit for filtering erroneous feature points based on color and geometric consistency: for each pair of RGB-D frames, successively combining potential feature matches and calculating two frames by minimizing matching errors Rigid transformation matrix; then calculating whether the ratio of the maximum and minimum eigenvalues of the corresponding point covariance matrix is too large, if the maximum matching error exceeds 0.02m, or the ratio is greater than 100, such a match is deleted; [0052] a surface area screening sub-unit, configured to detect whether a maximum planar area covered by each frame matching point is sufficiently large: for a pair of RGB-D frames, calculate a projection of the corresponding point in each frame to the main plane, if the projected 2D envelops If the rectangle is large enough, the surface area covered is considered to be sufficiently large;
  • a dense verification sub-unit configured to transform a previous frame into a space of a subsequent frame for the acquired rigid transformation matrix, and perform pixel-by-pixel color-to-depth comparison, if the color and depth are close enough that the pixel does not exceed a certain Proportion, the transformation is incorrect.
  • the hierarchical optimization model establishing unit includes the following sub-units:
  • a block internal camera optimization sub-unit for optimizing a camera pose based on all verified feature matching points of the block-by-block internal frame, the optimization is based on sparse feature points, and dense color geometric information, minimizing energy to obtain of;
  • a key frame acquisition subunit of the block configured to calculate a feature point set related to the key frame: merging feature points of the intra-block frame according to the optimized intra-block camera pose, and calculating a consistent three-dimensional feature point;
  • a global block optimization sub-unit is configured to obtain a global position of a key frame by means of cumulative transformation, and use the position as an initialization, and optimize a key frame posture by using an energy minimization optimization model, thereby obtaining a global optimization of the block.
  • the posture alignment and energy minimization unit comprises the following subunits:
  • a sparse feature point matching sub-unit configured to calculate an error of an arbitrary matching pair under a matrix transformation for a set of frame sets, and use the sum of squares as energy for coefficient matching;
  • a dense matching sub-unit for linear combination including luminance error and geometric error: for each pair of corresponding frames
  • the three-dimensional modeling module comprises the following units:
  • a scene expression unit configured to complete by gradually merging the input RGB-D data into the TSDF
  • a scene memory management unit configured to store the foregoing TSDF through a hash table: for each spatial point, encode it by a large prime multiplication accumulation and modulo method, and simultaneously pass a chain table solution of length 7 Collision
  • a fusion and de-melting unit for recording the signed distance and weight to the nearest object surface for each individual element; for a new frame, updating the signed distance and weight by means of weighted averaging Convergence, or reverse the above fusion, to complete the fusion;
  • an update unit for managing reconstruction wherein the frame is sorted in descending order by a gap between the old and new postures, the posture includes an Euler angle vector and a translation vector of the rotation matrix; and the linear combination of the Euclidean distance is used as the new and old posture
  • the gap for each newly accepted input frame, updates several frames with the largest gap to complete the optimization of the 3D model.
  • the beneficial effects of the present invention are:
  • the present invention implements a practical, end-to-end modeling.
  • the core of the present invention is a stable camera prediction method that optimizes each video frame by a layered model by combining the scanned entire RGB-D video.
  • the present invention does not rely on local camera tracking, but instead maintains a globally optimal camera position.
  • the invention proposes and develops a parallel optimization modeling system combining sparse feature points, dense geometry and color information, which can predict the global optimal camera position and support camera relocation and real update.
  • a globally consistent optimal 3D model is intended for large-scale indoor scanning, enabling high quality 3D models to be obtained.
  • FIG. 1 is a flow chart of a method for modeling a three-dimensional scan of a large-scale scene in the present invention.
  • FIG. 2 is a schematic structural diagram of a three-dimensional scanning modeling system for a real-time large-scale scene according to the present invention.
  • the core of the present technology is a global camera position optimization algorithm, which is suitable for a large-scale three-dimensional reconstruction system.
  • the present invention For each frame, the present invention performs camera pose optimization and updates the reconstructed model based on new camera predictions.
  • the present invention does not strictly rely on the continuity of the day, thereby allowing any form of camera path, i.e., relocation of the frame, and allowing frequent re-access to the area that has passed.
  • FIG. 1 is a flow chart of a method for modeling a three-dimensional scan of a large-scale scene in the present invention.
  • the modeling method includes the following steps:
  • RGB-D video stream can be captured by a normal depth camera, which is a real RGB-D video stream, which is captured by a commercial depth camera, such as structure sensor, kinect, primesense.
  • the video stream usually has a resolution of 640x480 and a frame rate of 30fps.
  • the present invention assumes that the color information and depth information for each frame are perfectly aligned.
  • S2 Perform global camera pose optimization on the video stream to obtain a global camera predicted position.
  • the method of global pose optimization is the basis of online, globally consistent three-dimensional reconstruction.
  • the goal of the present invention is to find a 3D matching point between frames and find an optimized camera position matrix such that the 3D matching points are best aligned under the matrix.
  • step S2 is further implemented by the following steps S21 to S23:
  • S21 Feature matching search.
  • the present invention uses camera pose prediction from sparse to dense: Since sparse features naturally provide loop detection and relocation, the present invention uses sparse feature matching to obtain coarser global alignment.
  • coarse alignment is optimized and refined through dense image and geometric consistency.
  • Scale-invariant feature conversion Scale-I
  • SIFT Nvariant Feature Transform
  • the present invention first looks for sparse feature matching between frames.
  • the present invention uses rapid feature extraction, feature matching, and matching screening steps.
  • SIFT Scale-Invariant Feature Transform
  • the SIFT feature points are used because they can contain almost all changes in the area captured by the handheld scan, such as pan, rotate, and zoom.
  • Potential pairwise feature matching will remove the wrong matches by filtering and get a correct set of pairwise feature matching results. This match will be used as a global camera optimization.
  • the feature search of the present invention is all done in the GPU, avoiding data transfer between the CPU and the GPU.
  • SIFT feature points and description operators of the present invention typically requires 4-5 ms/fmm e and matches two frames using approximately 0.05 ms. Therefore, under the hierarchical optimization model, the present invention can still achieve a real match in the case of scanning more than 20,000 frames.
  • step S21 is further implemented by the following steps S211 to S213:
  • the present invention filters erroneous feature points based on color and geometric consistency. For each pair of 1 ⁇ 3 ⁇ 4-0 frames, the present invention will combine potential feature matching one by one and calculate the rigid transformation matrix of two frames by minimizing the matching error. The present invention then calculates whether the ratio of the largest and smallest eigenvalues of the corresponding point covariance matrix is too large to check whether such a transformation has ambiguity. If the maximum matching error exceeds 0.02m, or the ratio is greater than 100, then such a match is deleted until all conditions are met. If the remaining matching points are too small, the pair of frames cannot obtain a correct transformation matrix, and the present invention will ignore their correspondence.
  • S212 surface area screening.
  • the present invention detects whether the maximum planar area covered by each frame matching point is sufficiently large. If the coverage area is small, the calculated matrix is unstable. For a pair of RGB-D frames ⁇ 3 ⁇ 4>, the present invention calculates the projection of the corresponding point to the main plane in each frame. If the projected 2D enclosing rectangle is sufficiently large (>0.032 square meters), the present invention considers that the covered surface area is sufficient. Big.
  • the present invention transforms the previous frame into the space of the next frame, and performs pixel-by-pixel color-to-depth comparison. If the color and depth are close enough to a certain pixel, the relative transformation is incorrect. [0085] If all of the above tests pass, the matching points of the pair of RGB-D frames are added to a correct matching set and used as a global camera pose optimization in the future. The present invention requires that the number of matching points for each pair of RGB-D frames is not less than 5, thereby ensuring a trusted transformation.
  • the present invention takes a hierarchical optimization strategy.
  • the input video sequence is divided into small blocks including consecutive frames.
  • the present invention optimizes only the frames inside the tile.
  • the present invention associates all the blocks with the corresponding points of the key frames, and aligns the blocks as a whole.
  • the present invention performs a hierarchical, local to global pose optimization using the selected frame matching points.
  • each n adjacent frames form a small block, and the small block is internally optimized.
  • the small blocks will be associated by matching points and optimized overall.
  • the present invention is abstracted as an energy minimization problem in which sparse feature points, dense images, and geometric information are considered.
  • the present invention solves this highly nonlinear optimization problem by a fast parallelized graphics card algorithm.
  • step S22 is further implemented by the following steps S221 to S223:
  • S221 block internal camera optimization.
  • the block internal alignment optimization is based on a block with 11 consecutive frames, each adjacent block sharing a frame at the beginning and the end.
  • the goal of local camera optimization is to obtain the optimal camera pose for internal alignment of the block.
  • the present invention optimizes the camera pose based on all verified feature matching points of the block-by-block internal frame. Optimization is based on sparse feature points, and dense color geometry information, minimizing energy. Since each block has only a small number of frames, the camera pose does not change much in the block. Therefore, the present invention initializes the camera parameters of each frame to an identity matrix. In order to ensure that the camera pose after optimization convergence is sufficiently accurate, the present invention employs dense verification to filter frames that are not accurate enough.
  • the present invention needs to calculate a set of feature points associated with the key frame.
  • the present invention combines feature points of intra-block frames and computes a consistent three-dimensional feature point. These feature points may exist in multiple instances of multiple video frames of the same global spatial 3D point.
  • the present invention transforms the feature points of all frames into a key frame space based on the relative transformation matrix, and aggregates the feature points into one set. For feature points less than 0.03m, combine them into one feature point.
  • the feature information feature points, description operators, and matching relationships
  • the goal of the gesture alignment is to find an optimal camera rigid transformation matrix such that the feature points of the frame are best aligned (minimum error) under the transformation matrix.
  • the present invention parameterizes a matrix into a six-dimensional vector, where three unknowns are from rotation and three unknowns are from translation.
  • the aligned energy is obtained from a linear combination of sparse energy and dense energy.
  • the weight of the dense energy is gradually increased, resulting in a global optimization from coarse to fine.
  • the present invention fixes the matrix of the first frame and optimizes the matrix of the remaining frames.
  • step S23 is further implemented by the following steps S231 to S233:
  • S23 sparse feature point matching.
  • the present invention calculates the error of any matching pair under the matrix transformation and uses the sum of squares as the energy of the coefficient matching.
  • the present invention incorporates dense image and geometric constraints to achieve fine alignment. So far, the present invention considers the depth value and color value of the frame. Since the calculation of dense correspondence is much larger than the sparse correspondence, the present invention is optimized only on some closely related frames: the camera has a viewing angle difference of less than 60 degrees, and there is a repeating region between them. Dense optimization takes into account dense image and geometric alignment information, a linear combination of luminance error and geometric error. For each pair of corresponding frames, the present invention transforms the image from one frame to another and calculates a pixel-by-pixel luminance error and defines it as the sum of squares. For geometric errors, this issue The depth map is transformed from one frame to another, and the dot product of the pixel-by-pixel corresponding 3D vector and the surface method is calculated and defined as the sum of squares.
  • the present invention is based on continuously changing camera poses, continuously changing and optimizing the global 3D reconstruction model.
  • the key here is to allow RGB-D frames before symmetric online recombination.
  • the present invention undoes the effect of the R GB-D frame on the three-dimensional model in the old pose and replaces it with a new pose. Therefore, the volumetric model will be continuously updated and optimized under the premise of engraving a globally optimized camera pose (eg, detecting a loop).
  • the key to online globally consistent 3D reconstruction is the ability to update the model based on the latest optimized camera position.
  • the present invention monitors the constantly changing camera pose for each frame, thereby updating the impact of each frame on the three-dimensional model by means of fusion and de-melting. Based on such a strategy, the accumulated camera drift error and errors in areas of insignificant features can be eliminated by dynamic reconstruction after a more optimized camera pose is calculated.
  • step S3 is further implemented by the following steps S31 to S32:
  • the geometric representation of the scene is accomplished by progressively merging the input RGB-D data into implicit truncated signed distance (TSDF) [Curies s 1996].
  • TSDF implicit truncated signed distance
  • RGB-D frames are allowed to be fused to TSDF, allowing RGB-D frames to be fused from TSDF.
  • the present invention ensures symmetry of fusion and de-fusion, thereby ensuring that the fusion + de-convergence caused by the position of the old camera after the update of the camera position does not have an additional effect on the TSDF.
  • the present invention stores the TSDF by having a has h table, enabling very efficient memory compression.
  • the present invention subdivides an infinitely uniform spatial grid into voxel blocks. Each voxel block is a small uniform voxel grid with dimensions of 8*8*8.
  • the present invention stores these voxel blocks by hashing. For each spatial point (x, y, z), the invention passes it Large prime multiplication is accumulated and added to the modulo method for encoding.
  • the present invention solves the collision by a linked list of length 7. When the linked list is full, the present invention accumulates the code and adds the voxel block at the next location.
  • the present invention will maintain a ball with a radius of 5 m and a center of the camera point. Record the voxel in the ball in the graphics card, transfer the voxel from the video card to the memory, and transfer the voxel entering the ball from the memory to the graphics card.
  • the present invention can ensure that the algorithm of the present invention can actually maintain and store data of a large-scale scene.
  • the present invention For each individual element, the present invention records its signed distance and weight to the nearest object surface. Therefore, for a new frame, the present invention can update the signed distance and weight by means of weighted averaging, thereby achieving the effect of fusion. Also, the present invention can reverse this operator to obtain the effect of de-fusion.
  • the present invention can fuse the old posture and fuse the new posture to the TSDF to achieve the effect of updating the three-dimensional model.
  • Each input frame stores its depth and color data, and has both old and new poses.
  • the old poses are updated after being merged, and the new poses are updated after each global optimization.
  • the present invention needs to incorporate it into the TSDF as soon as possible, thereby giving the user immediate 3D model feedback. Since the global optimization is based on the block, the optimized camera pose cannot be directly calculated.
  • the present invention acquires the initialized current frame pose by the previous frame-optimized pose, and the frame-to-frame relative transformation matrix calculated by the feature points.
  • the present invention sorts the frames in descending order by the difference between the old and new gestures.
  • the pose is two three-dimensional vectors (the Euler angle vector and the translation vector of the rotation matrix).
  • the present invention uses the linear combination of its Euclidean distance as the gap between the old and the new posture. For each newly accepted input frame, the present invention updates the optimization of the three-dimensional model with the 10 largest gap frames. Therefore, the present invention can obtain a three-dimensional reconstruction model optimized by the correction.
  • FIG. 2 is a schematic structural diagram of a three-dimensional scanning modeling system 100 for a real-time large-scale scene according to the present invention.
  • the modeling system 100 includes the following modules: [0121]
  • the video stream obtaining module 110 is configured to acquire an RGB-D video stream.
  • the RGB-D video stream can be captured by a common depth camera, which is a real RGB-D video stream, which is photographed by a commercial depth camera such as structure sensor, kinect,
  • the video stream usually has a resolution of 640x480 and a frame rate of 30fps.
  • the present invention assumes that the color information and depth information for each frame are perfectly aligned.
  • the global camera pose optimization module 120 is configured to perform global camera pose optimization on the video stream to obtain a global camera predicted position.
  • the method of global pose optimization is the basis of online, globally consistent three-dimensional reconstruction.
  • the goal of the present invention is to find a 3D matching point between frames and find an optimized camera position matrix such that the 3D matching points are best aligned under the matrix.
  • the global camera pose optimization module 120 further includes the following units:
  • Feature matching search unit 121 In order to achieve consistent global point cloud alignment, the present invention uses camera pose prediction from sparse to dense: Since sparse features naturally provide loop detection and relocation, the present invention uses matching of sparse features to obtain coarser global alignment. Second, coarse alignment is optimized and refined through dense image and geometric consistency. The present invention uses scale invariant feature conversion (Seal e-Invariant Feature
  • SIFT Transfonn
  • the present invention first looks for sparse feature matching between frames.
  • the present invention uses rapid feature extraction, feature matching, and matching screening steps.
  • SIFT Scale-Invariant Feature Transform
  • the SIFT feature points are used because they can contain almost all changes in the area captured by the handheld scan, such as pan, rotate, and zoom.
  • Potential pairwise feature matching will remove the wrong matches by filtering and get a correct set of pairwise feature matching results. This match will be used as a global camera optimization.
  • the feature search of the present invention is all done in the GPU, avoiding data transfer between the CPU and the GPU.
  • the feature matching search unit 121 further includes the following subunits:
  • Matching filter sub-unit 1211. In order to minimize erroneous matching, the present invention filters erroneous feature points based on color and geometric consistency. For each pair of 1 ⁇ 3 ⁇ 4-0 frames ⁇ 3 ⁇ 4>, the present invention will combine the potential feature matching one by one and calculate the rigid transformation matrix of the two frames by minimizing the matching error. The present invention then calculates whether the ratio of the largest and smallest eigenvalues of the corresponding point covariance matrix is too large to check whether such a transformation has ambiguity. If the maximum matching error exceeds 0.02m, or the ratio is greater than 100, then such a match is deleted until all conditions are met. If the remaining matching points are too small, the pair of frames cannot obtain a correct transformation matrix, and the present invention will ignore their correspondence.
  • the present invention detects whether the maximum planar area covered by each frame matching point is sufficiently large. If the coverage area is small, the calculated matrix is unstable. For a pair of RG 8-0 frames ⁇ 3 ⁇ 4>, the present invention calculates the projection of the corresponding point to the principal plane per frame. If the projected 2D enclosing rectangle is sufficiently large (>0.032 square meters), the present invention considers that the covered surface area is sufficient Big.
  • Dense verification subunit 1213 For the obtained relative transformation matrix, the present invention transforms the previous frame into the space of the next frame, and performs pixel-by-pixel color-to-depth comparison. If the color and depth are close enough to a certain pixel, the relative transformation is incorrect.
  • the matching points of the pair of RGB-D frames are added to a correct matching set and used as a global camera pose optimization in the future.
  • the present invention requires that the number of matching points for each pair of RGB-D frames is not less than 5, thereby ensuring a trusted conversion.
  • the present invention takes a hierarchical optimization strategy.
  • the input video sequence is divided into small blocks including consecutive frames.
  • the present invention optimizes only the frames inside the tile.
  • the present invention associates all the blocks with the corresponding points of the key frames, and aligns the blocks as a whole.
  • the present invention performs a hierarchical, local to global pose optimization using the selected frame matching points.
  • each n adjacent frames form a small block, and the small block is internally optimized.
  • the small block is internally optimized.
  • all the small blocks will be associated by matching points and optimized overall.
  • the present invention is abstracted as an energy minimization problem in which sparse feature points, dense images, and geometric information are considered. The present invention solves this highly nonlinear optimization problem by a fast parallelized graphics card algorithm.
  • the above-described hierarchical optimization model unit 122 further includes the following subunits:
  • Block internal camera optimization sub-unit 1221 The block internal alignment optimization is based on a block with 11 consecutive frames, each adjacent block sharing a frame at the beginning and the end.
  • the goal of local camera optimization is to obtain the optimal camera pose for internal alignment of the block.
  • the present invention optimizes the camera pose based on all verified feature matching points of the block-by-block frame. Optimization is based on sparse feature points, and dense color geometry information, minimizing energy. Since each block has only a small number of frames, the camera pose does not change much in the block. Therefore, the present invention initializes the camera parameters of each frame to an identity matrix. In order to ensure that the camera pose after optimization convergence is sufficiently accurate, the present invention employs dense verification to filter frames that are not accurate enough.
  • the key frame acquisition sub-unit 1222 of the block Once the interior of a block is optimized, the first frame of the definition block of the present invention is the key frame of the block.
  • the present invention needs to calculate a set of feature points associated with the key frame.
  • the present invention combines feature points of intra-block frames and computes a consistent three-dimensional feature point. These feature points may exist in multiple instances of multiple video frames of the same global spatial 3D point.
  • the present invention transforms the feature points of all frames into a key frame space based on the relative transformation matrix, and aggregates the feature points into a set. For feature points less than 0.03m, merge them into one feature point.
  • the feature information feature points, description operators, and matching relationships
  • Global block optimization sub-unit 1223 The search and filtering of sparse feature matching will be applied to the internal frames of the block and the key frames between the blocks, except that the matching in the key frames uses the set of feature points gathered by all the feature points in the block. If a key frame does not find any match with the previous frame, the present invention treats it as an unverified frame and converts it to a verified frame after finding a correspondence with the subsequent frame.
  • the invention can obtain the relative matrix transformation of the key blocks of adjacent blocks by the optimization inside the block. By accumulating the transformation, the present invention is able to obtain the global position of the key frame. The present invention takes this position as an initialization and optimizes the pose of the key frame using the energy minimization optimization model to obtain global optimization of the block.
  • Posture Alignment and Energy Minimization Unit 123 For a three-dimensional corresponding point where a set of frames has been calculated, the goal of the gesture alignment is to find an optimal camera rigid transformation matrix such that the feature points of the frame are best aligned (minimum error) under the transformation matrix.
  • the present invention parameterizes a matrix into a six-dimensional vector, where three unknowns are from rotation and three unknowns are from translation.
  • the aligned energy is obtained from a linear combination of sparse energy and dense energy.
  • the weight of the dense energy is gradually increased, resulting in a global optimization from coarse to fine.
  • the present invention fixes the matrix of the first frame and optimizes the matrix of the remaining frames.
  • gesture alignment and energy minimization 123 further includes the following subunits:
  • the sparse feature point matching sub-unit 1231 For a set of frames, the present invention calculates the error of any matching pair under the matrix transformation and uses the sum of squares as the energy of the coefficient matching.
  • Dense matching subunit 1232 The present invention incorporates dense image and geometric constraints to achieve fine alignment. So far, the present invention considers the depth value and color value of the frame. Since the calculation of dense correspondence is much larger than the sparse correspondence, the present invention is optimized only on some closely related frames: the camera has a viewing angle difference of less than 60 degrees, and there is a repeating region between them. Dense optimization takes into account dense image and geometric alignment information, a linear combination of luminance error and geometric error. For each pair of corresponding frames, the present invention converts the image from one frame to another and calculates a pixel-by-pixel luminance error and defines it as the sum of squares. For geometric errors, the present invention transforms the depth map from one frame to another, and computes the dot product of the pixel-by-pixel corresponding 3D vector and the surface method, and defines it as the sum of squares.
  • the three-dimensional modeling module 130 is configured to perform global optimized three-dimensional scanning modeling according to the global camera prediction position.
  • the present invention is based on continuously changing camera poses, continuously changing and optimizing the global 3D reconstruction model.
  • the key here is to allow RGB-D frames before symmetric online recombination.
  • the present invention undoes the effect of the R GB-D frame on the three-dimensional model in the old pose and replaces it with a new pose. Therefore, the volumetric model will be continuously updated and optimized under the premise of engraving a globally optimized camera pose (eg, detecting a loop).
  • the key to online globally consistent 3D reconstruction is the ability to implement a camera position based on the latest optimization. New model.
  • the present invention monitors the constantly changing camera poses for each frame, thereby updating the impact of each frame on the three-dimensional model by means of fusion and de-fusion. Based on such a strategy, accumulated camera drift errors and errors in areas of insignificant features can be eliminated by dynamic reconstruction after more optimized camera poses are calculated.
  • the above-described three-dimensional modeling module 130 further includes the following units:
  • the geometric representation of the scene is accomplished by progressively merging the input RGB-D data into implicit truncated signed distance (TSDF) [Curies s 1996].
  • TSDF implicit truncated signed distance
  • RGB-D frames are allowed to be fused to TSDF, allowing RGB-D frames to be fused from TSDF.
  • the present invention ensures symmetry of fusion and de-fusion, thereby ensuring that the fusion + de-convergence caused by the position of the old camera after the update of the camera position does not have an additional effect on the TSDF.
  • the present invention stores TSDF through the has h table, enabling very efficient memory compression.
  • the present invention subdivides an infinitely uniform spatial grid into voxel blocks. Each voxel block is a small uniform voxel grid with dimensions of 8*8*8.
  • the present invention stores these voxel blocks by hash.
  • For each spatial point (x, y, z), the present invention encodes it by means of a large prime multiplication accumulation plus modulo.
  • the present invention solves the collision by a linked list of length 7. When the linked list is full, the present invention accumulates the code and adds the voxel block at the next location.
  • the present invention will maintain a ball with a radius of 5 m and a center point of the camera. Record the voxel in the ball in the graphics card, transfer the voxel from the video card to the memory, and transfer the voxel entering the ball from the memory to the graphics card.
  • the present invention can ensure that the algorithm of the present invention can actually maintain and store data of a large-scale scene.
  • Fusion and De-fusion Unit 133 [0157] Fusion and De-fusion Unit 133:
  • the present invention For each individual element, the present invention records its signed distance and weight to the nearest object surface. Therefore, for a new frame, the present invention can update the signed distance and weight by means of weighted averaging, thereby achieving the effect of fusion. Also, the present invention can reverse this operator to obtain the effect of de-fusion.
  • the present invention can fuse the old posture and put a new posture Fusion to TSDF to achieve the effect of updating the 3D model.
  • Each input frame stores its depth and color data, and has both old and new poses.
  • the old poses are updated after being merged, and the new poses are updated after each global optimization.
  • the present invention needs to incorporate it into the TSDF as soon as possible, thereby giving the user immediate 3D model feedback. Since the global optimization is based on the block, the optimized camera pose cannot be directly calculated.
  • the present invention acquires the initialized current frame pose by the previous frame-optimized pose, and the frame-to-frame relative transformation matrix calculated by the feature points.
  • the present invention sorts the frames in descending order by the difference between the old and new gestures.
  • the pose is two three-dimensional vectors (the Euler angle vector and the translation vector of the rotation matrix).
  • the present invention uses the linear combination of its Euclidean distance as the gap between the old and the new posture. For each newly accepted input frame, the present invention updates the optimization of the three-dimensional model with the 10 largest gap frames. Therefore, the present invention can obtain a three-dimensional reconstruction model optimized by the correction.
  • the system of the present invention is an integrated system capable of solving all the problems existing in the prior art and having terminal-to-terminal real-world modeling capabilities.
  • At the heart of the present invention is a stable camera position prediction method that optimizes the camera by combining all captured RGB-D video frames with a hierarchical local to global optimization method. Since the present invention contemplates all video frames, the present invention eliminates the need for explicit loop detection.
  • Current real-world camera tracking is typically frame-to-frame or frame-to-model matching techniques with large camera tracking errors or errors that are well avoided in the method of the present invention.
  • the system of the present invention can immediately obtain these globally optimal camera predictions by global matching of these discontinuous frames. This technology ensures a stable scanning experience, allowing ordinary users to successfully perform large-scale scanning.
  • the key to the system of the present invention is parallelized from a sparse to dense global camera prediction system: Sparse RGB features are applied to coarse global camera prediction to ensure that the camera prediction position is sufficiently accurate that subsequent dense optimization models can convergence. Therefore, the present invention maintains a globally optimal camera structure while ensuring the accuracy of local modeling.
  • the model update of the present invention supports model correction caused by camera correction, thereby ensuring consistency of the scanned space after re-access. In this regard, Compared with the traditional method, the invention has great speed improvement, and is superior to many offline methods in model accuracy and stability, and is convenient for ordinary users.
  • a novel, practical globally consistent camera model optimization system that takes into account all of the previously captured RGB-D video frames, abandoning the camera tracking flaws based on the inter-turn continuous hypothesis, and achieves a hierarchical local-to-global optimal separation. The requirements of the actual.
  • modules in the devices in the embodiments can be adaptively changed and placed in one or more devices different from the embodiment.
  • the modules or units or components of the embodiments may be combined into one module or unit or component, and further, they may be divided into a plurality of sub-modules or sub-units or sub-components.
  • any combination of the features disclosed in the specification including the accompanying claims, the abstract and the drawings) and any methods so disclosed may be employed in any combination. Or combine all the processes or units of the device. Unless otherwise stated, this manual (including accompanying rights)
  • Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • Those skilled in the art will appreciate that some or all of the functionality of some or all of the components of the virtual machine creation apparatus in accordance with embodiments of the present invention may be implemented in practice using a microprocessor or digital signal processor (DSP).
  • DSP digital signal processor
  • the invention may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing some or all of the methods described herein.
  • Such a program implementing the present invention may be stored on a computer readable medium or may be in the form of one or more signals. Such signals may be downloaded from the Internet website, or provided on a carrier signal, or in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

涉及一种实时大规模场景三维扫描建模方法及系统,所述方法包括如下步骤:获取RGB-D视频流;对上述视频流进行全局相机姿势优化,获取全局相机预测位置;根据所述全局相机预测位置,进行全局优化的三维扫描建模。实现实时的、终端到终端的建模。因此,不依赖于局部的相机追踪,而是时刻保持全局最优的相机定位。提出并开发了一个结合了稀疏特征点、稠密几何以及颜色信息的并行优化建模系统,它同时可以实时的预测全局最优的相机位置,并且支持相机重定位、实时更新全局一致的最优三维模型。该系统应用于大规模的室内扫描,能够获得高质量的三维模型。

Description

技术领域
[0001] 本发明涉及三维建模领域, 具体而言, 本发明涉及一种实吋大规模场景三维扫 描建模的方法及系统。
背景技术
[0002] 实吋高质量的大规模场景三维扫描是虚拟增强现实和机器人领域的重要应用。
RGB-D相机的普及, 使得三维扫描发生了巨大的改革。 3D打印, 虚拟与增强现 实, 游戏以及机器人领域, 对大规模的实吋三维扫描拥有重要的需求: 机器人 或者普通用户通过实吋扫描能够即吋获得合成的三维模型, 从而能够应用到机 器人导航, 将物理世界引入虚拟环境, 或者提供给用户实吋的信息回馈。
[0003] 然而, 尽管三维重建系统发展已久, 仍然没有一个成熟的解决方案能够让普通 用户方便的建模。 这里拥有许多挑战需要被解决:
[0004] 1.高质量的表面建模: 图形应用往往需要无噪音、 有纹理的场景。 三维信息的 表达方式应该是连续的表面而不是点云。
[0005] 2.可扩展性: 对于混合实境和机器人导航系统, 需要获取整个房间或数个较大 的三维空间。
[0006] 3.全局模型一致性: 为了达到大规模场景, 需要矫正相机位置的误差和偏移, 以及三维模型的变形。 实吋的矫正是保证全局模型一致的关键, 拥有巨大的挑 战。
[0007] 4.稳定的相机追踪: 除了每帧增加的误差, 在特征不明显的区域相机追踪可能 失败。 为了恢复相机位置, 需要相机重定位的能力。 已有的方法通常需要被恢 复帧与以往帧拥有极高的相似性, 因此限制了相机精确定位和定位失败后恢复 的能力。
[0008] 5.实吋的三维模型更新: 在拥有稳定的相机追踪同吋, 系统需要集合每帧的三 维数据, 得到一个完整的三维模型并实吋可视化。 在相机位置更新的同吋进行 模型的更新是一个巨大的挑战。 [0009] 6.实吋性: 快速的实吋是许多虚拟 /增强现实和机器人应用中不可或缺的元素
, 拥有巨大的挑战。
[0010] 现有技术中已有解决上述问题的尝试。 例如, 申请号为 201410461249.2的中国 发明专利申请, 公幵了一种基于低质量 GRB-D数据的场景恢复方法及装置。 通 过首先根据语义关系以及点云分类器恢复出场景中的主要物体模型, 然后从对 应的彩色图像准确的提取出小物体的轮廓, 采用轮廓检索方法来帮助恢复细小 物体。 这样极大提高了从低质量 RGB-D图像序列中恢复三维模型的准确率, 实 现了在不需要人工干预的前提下自动恢复出语义上正确的、 视觉上逼真的虚拟 三维场景模型。
[0011] 然而, 到目前为止, 相机位置预测的偏移误差仍然会在建模过程中引入重大的 误差, 从而成为这个领域的重大挑战。 为了解决这个问题, 已有的方法通常通 过数小吋的后期计算来得到全局正确的模型。 较近的线上建模方法通常存在以 下问题: (1)仍然需要数分钟的矫正, 从而不能达到实吋。 (2)帧到帧或者帧到模 型的相机追踪方式非常脆弱, 经常导致无可挽回的错误。 (3)只支持无结构点云 的表示方法, 严重影响到扫描的质量和实用性。
技术问题
[0012] 鉴于上述问题, 提出了本发明以便提供一种克服上述问题或者至少部分地解决 上述问题的实吋大规模场景三维扫描建模的方法及系统。
[0013] 本发明主要的核心创新在于全新的全局相机在线优化, 从稀疏到稠密的联合优 化, 以及在线模型矫正优化。 同吋, 本发明通过 CPU-GPU内存管理和 Hash表存 储, 使得本发明的模型精度可以在大规模空间建模的前提下保证在 0.5cm的级别
问题的解决方案
技术解决方案
[0014] 本发明是通过下述技术方案来解决上述技术问题的:
[0015] 具体的, 根据本发明的一个方面, 本发明提供了一种实吋大规模场景三维扫描 建模方法, 所述方法包括如下步骤:
[0016] 获取 RGB-D视频流; [0017] 对上述视频流进行全局相机姿势优化, 获取全局相机预测位置;
[0018] 根据所述全局相机预测位置, 进行全局优化的三维扫描建模。
[0019] 优选的, 如上所述的实吋大规模场景三维扫描建模方法, 所述 RGB-D视频流由 深度摄像机实吋拍取。
[0020] 优选的, 如上所述的实吋大规模场景三维扫描建模方法, 所述对上述视频流进 行全局相机姿势优化, 获取全局相机预测位置的步骤包括以下子步骤:
[0021] (1) 特征匹配搜索: 通过 SIFT建立逐对输入帧的特征匹配, 将被检测的 SIFT 特征点将与所有之前的帧进行匹配, 并滤掉错误的匹配;
[0022] (2) 建立层次化的优化模型: 将所述视频流分成包括连续帧的小块, 在最底 层的优化中, 只优化小块内部的帧, 而在上层的优化中, 使用关键帧的对应点 关联所有块, 将块作为整体相互对齐。
[0023] (3) 姿势对齐与能量最小化: 将矩阵参数化为一个六维向量, 其中三个未知 量来自于旋转, 三个未知量来自于平移; 对齐的能量由稀疏能量和稠密能量的 线性组合得到, 并且稠密能量的权重逐步增大, 从而得到从粗糙到精细的全局 优化。
[0024] 更优选的, 如上所述的实吋大规模场景三维扫描建模方法, 上述子步骤 (1) 的过程如下:
[0025] (i) 、 匹配筛选, 基于颜色和几何的一致性来过滤错误的特征点:对于每一对 R GB-D帧, 逐次结合潜在的特征匹配并通过最小化匹配误差来计算两帧的刚性变 换矩阵; 接着计算对应点协方差矩阵最大最小特征值的比值是否过大, 如果最 大的匹配误差超过 0.02m, 或者比值大于 100, 则这样的匹配被刪除;
[0026] (ii) 、 表面积筛选, 检测每帧匹配点所覆盖的最大平面面积是否足够大:对于 一对 RGB-D帧, 计算对应点在每帧向主平面的投影, 如果投影的 2D包围矩形足 够的大, 则认为覆盖的表面积足够的大;
[0027] (iii) 、 稠密验证:对于获取的刚性变换矩阵, 将前一帧变换到后一帧的空间, 并进行逐像素的色彩与深度比较, 如果色彩和深度足够接近的像素未超过一定 比例, 则该变换不正确。
[0028] 更优选的, 如上所述的实吋大规模场景三维扫描建模方法, 上述子步骤 (2) 的过程如下:
[0029] (i) 、 块内部相机姿势优化, 基于块内部逐对帧所有被验证的特征匹配点来 优化相机姿势, 所述优化是基于稀疏特征点, 和稠密颜色几何信息, 最小化能 量来得到的;
[0030] (ii) 、 获取块的关键帧, 计算该关键帧相关的特征点集合: 根据已优化的块 内部相机姿势, 合并块内帧的特征点, 并计算一个一致的三维特征点;
[0031] (iii) 、 全局块优化, 通过累积变换的方式, 得到关键帧全局的位置, 并将该 位置作为初始化, 利用能量最小化优化模型优化关键帧的姿势, 从而得到块的 全局最优化。
[0032] 更优选的, 如上所述的实吋大规模场景三维扫描建模方法, 上述子步骤 (3) 的过程如下:
[0033] (i) 、 稀疏特征点匹配: 对于一组帧集合, 计算在矩阵变换下任意匹配对的 误差, 并将其平方和作为系数匹配的能量;
[0034] (ii) 、 稠密匹配, 包括亮度误差和几何误差的线性组合: 对于每对对应帧, 将图像从一帧变换到另外一帧, 并计算逐像素亮度误差; 对于几何误差, 将深 度图从一帧变换到另外一帧, 并计算逐像素对应 3D矢量与表面法相的点积。
[0035] 优选的, 如上所述的实吋大规模场景三维扫描建模方法, 所述根据所述全局相 机预测位置, 进行全局优化的三维扫描建模的步骤包括以下子步骤:
[0036] (1) 、 场景表达, 通过逐步融合输入的 RGB-D数据到 TSDF来完成;
[0037] (2) 、 场景内存管理, 通过 hash表来存储上述 TSDF: 对于每一个空间点, 将 其通过大质数乘积累加求模的方法进行编码, 同吋通过一个长度为 7的链表解决 碰撞;
[0038] (3) 、 融合与去融合: 对于每一个体元, 记录其到最近物体表面的带符号距 离和权重; 对于一个新的帧, 通过加权平均的方式更新带符号距离和权重, 从 而完成融合, 或者将上述融合逆向化, 完成去融合;
[0039] (4) 、 管理重建的更新: 将帧通过新旧姿势的差距进行降序排序, 所述姿势 包含旋转矩阵的欧拉角向量和平移向量; 将其欧几里得距离的线性组合作为新 旧姿势的差距, 对于每一个新接受的输入帧, 更新若干个差距最大的帧从而完 成对三维模型的优化。
[0040]
[0041] 根据本发明的另一个方面, 还提供了一种实吋大规模场景三维扫描建模系统, 所述系统包括如下模块:
[0042] 视频流获取模块, 用于获取 RGB-D视频流;
[0043] 全局相机姿势优化模块, 用于对上述视频流进行全局相机姿势优化, 获取全局 相机预测位置;
[0044] 三维建模模块, 用于根据所述全局相机预测位置, 进行全局优化的三维扫描建 模。
[0045] 优选的, 如上所述的实吋大规模场景三维扫描建模系统, 所述 RGB-D视频流由 深度摄像机实吋拍取。
[0046] 优选的, 如上所述的实吋大规模场景三维扫描建模系统, 所述全局相机姿势优 化模块包括以下单元:
[0047] 特征匹配搜索单元, 用于通过 SIFT建立逐对输入帧的特征匹配, 将被检测的 SI FT特征点将与所有之前的帧进行匹配, 并滤掉错误的匹配;
[0048] 层次化的优化模型建立单元, 用于将所述视频流分成包括连续帧的小块, 在最 底层的优化中, 只优化小块内部的帧, 而在上层的优化中, 使用关键帧的对应 点关联所有块, 将块作为整体相互对齐。
[0049] 姿势对齐与能量最小化单元, 用于将矩阵参数化为一个六维向量, 其中三个未 知量来自于旋转, 三个未知量来自于平移; 对齐的能量由稀疏能量和稠密能量 的线性组合得到, 并且稠密能量的权重逐步增大, 从而得到从粗糙到精细的全 局优化。
[0050] 更优选的, 如上所述的实吋大规模场景三维扫描建模系统, 上述特征匹配搜索 单元包括如下子单元:
[0051] 匹配筛选子单元, 用于基于颜色和几何的一致性来过滤错误的特征点:对于每 一对 RGB-D帧, 逐次结合潜在的特征匹配并通过最小化匹配误差来计算两帧的 刚性变换矩阵; 接着计算对应点协方差矩阵最大最小特征值的比值是否过大, 如果最大的匹配误差超过 0.02m, 或者比值大于 100, 则这样的匹配被刪除; [0052] 表面积筛选子单元, 用于检测每帧匹配点所覆盖的最大平面面积是否足够大: 对于一对 RGB-D帧, 计算对应点在每帧向主平面的投影, 如果投影的 2D包围矩 形足够的大, 则认为覆盖的表面积足够的大;
[0053] 稠密验证子单元, 用于对于获取的刚性变换矩阵, 将前一帧变换到后一帧的空 间, 并进行逐像素的色彩与深度比较, 如果色彩和深度足够接近的像素未超过 一定比例, 则该变换不正确。
[0054] 更优选的, 如上所述的实吋大规模场景三维扫描建模系统, 上述层次化的优化 模型建立单元包括如下子单元:
[0055] 块内部相机优化子单元, 用于基于块内部逐对帧所有被验证的特征匹配点来优 化相机姿势, 所述优化是基于稀疏特征点, 和稠密颜色几何信息, 最小化能量 来得到的;
[0056] 块的关键帧获取子单元, 用于计算该关键帧相关的特征点集合: 根据已优化的 块内部相机姿势, 合并块内帧的特征点, 并计算一个一致的三维特征点;
[0057] 全局块优化子单元, 用于通过累积变换的方式, 得到关键帧全局的位置, 并将 该位置作为初始化, 利用能量最小化优化模型优化关键帧的姿势, 从而得到块 的全局最优化。
[0058] 更优选的, 如上所述的实吋大规模场景三维扫描建模系统, 所述姿势对齐与能 量最小化单元包括如下子单元:
[0059] 稀疏特征点匹配子单元, 用于对于一组帧集合, 计算在矩阵变换下任意匹配对 的误差, 并将其平方和作为系数匹配的能量;
[0060] 稠密匹配子单元, 用于包括亮度误差和几何误差的线性组合: 对于每对对应帧
, 将图像从一帧变换到另外一帧, 并计算逐像素亮度误差; 对于几何误差, 将 深度图从一帧变换到另外一帧, 并计算逐像素对应 3D矢量与表面法相的点积。
[0061] 优选的, 如上所述的实吋大规模场景三维扫描建模系统, 所述三维建模模块包 括如下单元:
[0062] 场景表达单元, 用于通过逐步融合输入的 RGB-D数据到 TSDF来完成;
[0063] 场景内存管理单元, 用于通过 hash表来存储上述 TSDF: 对于每一个空间点, 将其通过大质数乘积累加求模的方法进行编码, 同吋通过一个长度为 7的链表解 决碰撞;
[0064] 融合与去融合单元, 用于对于每一个体元, 记录其到最近物体表面的带符号距 离和权重; 对于一个新的帧, 通过加权平均的方式更新带符号距离和权重, 从 而完成融合, 或者将上述融合逆向化, 完成去融合;
[0065] 管理重建的更新单元, 用于将帧通过新旧姿势的差距进行降序排序, 所述姿势 包含旋转矩阵的欧拉角向量和平移向量; 将其欧几里得距离的线性组合作为新 旧姿势的差距, 对于每一个新接受的输入帧, 更新若干个差距最大的帧从而完 成对三维模型的优化。
发明的有益效果
有益效果
[0066] 本发明的有益效果在于: 本发明实现实吋的、 终端到终端的建模。 本发明的核 心是稳定的相机预测方法, 通过结合扫描的整个 RGB-D视频, 通过层次化的模 型优化每个视频帧。 因此, 本发明不依赖于局部的相机追踪, 而是吋刻保持全 局最优的相机定位。 本发明提出并幵发了一个结合了稀疏特征点、 稠密几何以 及颜色信息的并行优化建模系统, 它同吋可以实吋的预测全局最优的相机位置 , 并且支持相机重定位、 实吋更新全局一致的最优三维模型。 本发明的系统应 用于大规模的室内扫描, 能够获得高质量的三维模型。
对附图的简要说明
附图说明
[0067] 通过阅读下文优选实施方式的详细描述, 各种其他的优点和益处对于本领域普 通技术人员将变得清楚明了。 附图仅用于示出优选实施方式的目的, 而并不认 为是对本发明的限制。 而且在整个附图中, 用相同的参考符号表示相同的部件 。 在附图中:
[0068] 图 1为本发明的实吋大规模场景三维扫描建模的方法的流程图。
[0069] 图 2为本发明的实吋大规模场景三维扫描建模系统的结构示意图。
实施该发明的最佳实施例
本发明的最佳实施方式 [0070] 上述说明仅是本发明技术方案的概述, 为了能够更清楚了解本发明的技术手段 , 而可依照说明书的内容予以实施, 并且为了让本发明的上述和其它目的、 特 征和优点能够更明显易懂, 以下特举本发明的具体实施方式。
本发明的实施方式
[0071] 下面将参照附图更详细地描述本公幵的示例性实施例。 虽然附图中显示了本公 幵的示例性实施例, 然而应当理解, 可以以各种形式实现本公幵而不应被这里 阐述的实施例所限制。 相反, 提供这些实施例是为了能够更透彻地理解本公幵 , 并且能够将本公幵的范围完整的传达给本领域的技术人员。
[0072] 本发明技术的核心是全局相机位置优化算法, 适用于大规模的三维重建系统。
对于每一帧, 本发明都会做相机姿势优化, 并根据新的相机预测更新重建的模 型。 本发明不严格依赖吋间的连续性, 从而允许任意形式的相机路径, 即吋的 重定位, 并允许频繁的重新访问曾经经过的区域。
[0073] 图 1为本发明的实吋大规模场景三维扫描建模的方法的流程图。
[0074] 如图 1所示, 所述建模方法包括如下步骤:
[0075] Sl、 获取 RGB-D视频流。 所述 RGB-D视频流通过普通的深度摄像机即可拍取 , 是实吋的 RGB-D视频流, 通过商用的深度摄像头进行拍摄, 如 structure sensor, kinect, primesense。 视频流通常分辨率为 640X480, 帧率为 30fps。 本发明假设每 一帧的彩色信息和深度信息是完美对齐的。
[0076] S2、 对上述视频流进行全局相机姿势优化, 获取全局相机预测位置。
[0077] 所述全局姿势优化的方法是在线、 全局一致的三维重建的基础。 本发明的目标 是找到帧之间 3D的匹配点, 并且找到一个优化的相机位置矩阵, 从而使 3D的匹 配点最好的在矩阵下对齐。
[0078] 为此, 上述步骤 S2进一步通过以下步骤 S21〜S23来实现:
[0079] S21、 特征匹配搜索。 为了得到全局点云对齐的一致性, 本发明使用了从稀疏 到稠密的相机姿势预测: 由于稀疏特征自然能够提供回路检测和重定位, 本发 明用稀疏特征的匹配来获取较为粗糙的全局对齐。 其次, 粗糙的对齐将通过稠 密的图像和几何一致性进行优化和精细化。 本发明用尺度不变特征转换 (Scale-I nvariant Feature Transform(SIFT)) 来建立逐对输入帧的特征匹配。 被检测的 SIFT 特征点将与所有之前的帧进行匹配, 并小心的滤掉错误的匹配, 从而避免错误 的回环检测。
[0080] 在本发明的系统中, 本发明首先寻找帧之间的稀疏特征匹配。 本发明使用到了 快速的特征提取、 特征匹配和匹配筛选步骤。 对于每一个新的帧, 本发明检测 SI FT (尺度不变特征转换, Scale-Invariant Feature Transform) 特征点, 并与所有 之前的帧做匹配。 之所以使用 SIFT特征点, 是因为它能够包含手持扫描拍摄的 区域的几乎全部变化, 如平移、 旋转和缩放。 潜在的逐对特征匹配将通过筛选 去除错误的匹配, 并得到一组正确的逐对特征匹配结果。 这个匹配将被用作全 局的相机优化。 本发明的特征搜索全部在 GPU中完成, 避免了 CPU和 GPU的数据 传输。 本发明计算 SIFT特征点和描述算子通常需要 4-5ms/fmme, 并使用约 0.05m s匹配两帧。 因此, 在层次化的优化模型下, 本发明可以在扫描超过 20000帧的情 况下依然得到实吋的匹配。
[0081] 为此, 上述步骤 S21进一步通过以下步骤 S211〜S213来实现:
[0082] S211、 匹配筛选。 为了使错误的匹配最小化, 本发明会基于颜色和几何的一致 性来过滤错误的特征点。 对于每一对 1^¾-0帧 , 本发明将逐次结合潜在的 特征匹配并通过最小化匹配误差来计算两帧的刚性变换矩阵。 本发明接着计算 对应点协方差矩阵最大最小特征值的比值是否过大, 来检验这样的变换是否存 在模糊性。 如果最大的匹配误差超过 0.02m, 或者比值大于 100, 则这样的匹配 被刪除, 直到所有的条件被满足。 如果剩下的匹配点过少, 则这一对帧无法得 到一个正确的变换矩阵, 本发明将忽略他们的对应关系。
[0083] S212、 表面积筛选。 接着, 本发明检测每帧匹配点所覆盖的最大平面面积是否 足够大。 如果覆盖面积小, 则计算得到的矩阵存在不稳定性。 对于一对 RGB-D 帧<¾>, 本发明计算对应点在每帧向主平面的投影, 如果投影的 2D包围矩形足 够的大 (>0.032平方米), 则本发明认为覆盖的表面积足够的大。
[0084] S213、 稠密验证。 对于获取的相对变换矩阵, 本发明将前一帧变换到后一帧的 空间, 并进行逐像素的色彩与深度比较, 如果色彩和深度足够接近的像素未超 过一定的比例, 这样的相对变换是不正确的。 [0085] 如果所有上述的检测都通过, 则这一对 RGB-D帧的匹配点会被加入一个正确匹 配集合, 并在将来被用作全局的相机姿势优化。 本发明要求每一对 RGB-D帧的 匹配点数目不小于 5, 从而能确保得到的是一个可信的变换。
[0086]
[0087] S22、 建立层次化的优化模型
[0088] 为了保证在数以万计的 RGB-D视频帧中保持实吋的计算, 本发明采取了层次化 的优化策略。 输入的视频序列被分成了包括连续帧的小块。 在最底层的优化中 , 本发明只优化小块内部的帧。 在上层的优化中, 本发明用关键帧的对应点关 联所有块, 将块作为整体相互对齐。
[0089] 为了保证实吋的全局姿势对齐优化, 本发明用被筛选好的帧匹配点, 做了一个 层次化的, 由局部到全局的姿势优化。 在第一层优化中, 每 n个相邻的帧组成了 一个小块, 并将小块进行内部的整体优化。 在第二层优化中, 所有小块将通过 匹配点被关联, 并进行整体的优化。
[0090] 对于这两级优化, 本发明都抽象为一个能量最小化问题, 其中考虑了稀疏的特 征点、 稠密的图像和几何信息。 本发明通过一个快速的并行化的显卡算法来解 决这个高度非线性的优化问题。
[0091] 为此, 上述步骤 S22进一步通过以下步骤 S221〜S223来实现:
[0092] S221、 块内部相机优化。 块内部对齐优化基于一个拥有 11个连续帧的块, 每个 相邻的块的头尾共享一帧。 局部相机优化的目标是获得块内部对齐最优的相机 姿势。 在此, 本发明基于块内部逐对帧所有被验证的特征匹配点来优化相机姿 势。 优化是基于稀疏特征点, 和稠密颜色几何信息, 最小化能量来得到的。 由 于每块只拥有少量的帧, 相机姿势在块中变化不大。 因此, 本发明初始化每一 帧的相机参数为一个单位矩阵。 为了保证在优化收敛后的相机姿势足够的准确 , 本发明采用了稠密验证, 过滤不够准确的帧。
[0093] S222、 获取块的关键帧。 一旦一个块的内部被优化, 本发明定义块的第一帧为 该块的关键帧。 本发明同吋需要计算该关键帧相关的特征点集合。 根据已优化 的块内相机姿势, 本发明会合并块内帧的特征点, 并计算一个一致的三维特征 点。 这些特征点可能存在同一个全局空间三维点的在多个视频帧的多个实例。 本发明将所有帧的特征点在基于相对变换矩阵后变换到关键帧空间, 并将特征 点聚集到一个集合。 对于小于 0.03m的特征点, 将他们合并成一个特征点。 在获 得一个关键帧的特征点后, 其余帧的特征信息 (特征点、 描述算子和匹配关系 ) 将可以被释放。
[0094] S223、 全局块优化。 稀疏特征匹配的搜索和过滤将同吋被应用在块内部帧以及 块之间的关键帧, 只是在关键帧中的匹配使用的是块内所有特征点聚集的特征 点集合。 如果一个关键帧没有找到任何与之前帧的匹配对应, 本发明将其视为 未被验证的帧, 并在与之后的帧找到对应后转换为已被验证的帧。 本发明通过 块内部的优化可以得到相邻块关键帧的相对矩阵变换。 通过累积变换的方式, 本发明能够得到关键帧全局的位置。 本发明将这个位置作为初始化, 并利用能 量最小化优化模型优化关键帧的姿势, 从而得到块的全局最优化。
[0095]
[0096] S23、 姿势对齐与能量最小化
[0097] 对于已经计算帧集合的三维对应点, 姿势对齐的目标是找到一个最优的相机刚 性变换矩阵, 使得帧的特征点在变换矩阵下能够最好的对齐 (误差最小) 。 本 发明将矩阵参数化为一个六维向量, 其中三个未知量来自于旋转, 三个未知量 来自于平移。 对齐的能量是由稀疏能量和稠密能量的线性组合得到。 稠密能量 的权重逐步增大, 从而得到从粗糙到精细的全局优化。 这里, 本发明固定第一 帧的矩阵, 优化其余帧的矩阵。
[0098] 为此, 上述步骤 S23进一步通过以下步骤 S231〜S233来实现:
[0099] S231、 稀疏特征点匹配。 对于一组帧集合, 本发明计算在矩阵变换下任意匹配 对的误差, 并将其平方和作为系数匹配的能量。
[0100] S232、 稠密匹配。 本发明同吋加入了稠密的图像和几何限制, 来得到精细的对 齐。 到此, 本发明考虑帧的深度值和彩色值。 由于稠密对应的计算比稀疏对应 大很多, 本发明只在一些关联较大的帧上进行优化: 相机的视角差小于 60度, 且之间存在重复区域。 稠密的优化考虑到了稠密的图像和几何对齐信息, 即亮 度误差和几何误差的线性组合。 对于每对对应帧, 本发明将图像从一帧变换到 另外一帧, 并计算逐像素亮度误差, 并定义为其平方和。 对于几何误差, 本发 明将深度图从一帧变换到另外一帧, 并计算逐像素对应 3D矢量与表面法相的点 积, 并定义为其平方和。
[0101]
[0102] S3、 根据所述全局相机预测位置, 进行全局优化的三维扫描建模。
[0103] 稠密的场景三维重建是通过压缩的体元融合 (volumetric representation and
fusion)来表达, 并可以实吋应用在大规模的场景上。 本发明基于持续改变的相机 姿势, 持续的改变并优化全局的三维重建模型。 这里的关键在于允许对称的在 线重组之前的 RGB-D帧。 为了保证被优化的帧影响到三维模型, 本发明会撤销 R GB-D帧在旧姿势对三维模型的作用, 并取而代之以新的姿势。 因此, 体元模型( volumetric model)将在吋刻进行全局优化的相机姿势前提下 (如检测到环路) , 能够持续的被更新优化。
[0104] 在线全局一致的三维重建的关键在于能够实吋的基于最新优化的相机位置来更 新模型。 因此, 本发明会监测每帧持续变化的相机姿势, 从而通过融合和去融 合的方式更新每帧对三维模型的影响。 基于这样的策略, 累积的相机漂移误差 和在特征不明显区域的误差可以在更优化的相机姿势计算出来后通过动态重建 被消除。
[0105] 为此, 上述步骤 S3进一步通过以下步骤 S31〜S32来实现:
[0106] S31、 场景表达
[0107] 场景的几何表达是通过逐步融合输入的 RGB-D数据到 implicit truncated signed distance (TSDF) [Curies s 1996]来完成。 TSDF由空间栅格的每个体元组成。
[0108] 进一步, 在本发明允许 RGB-D帧融合到 TSDF的同吋, 允许 RGB-D帧从 TSDF中 去融合。 本发明同吋保证融合和去融合的对称性, 从而保证在相机位置的更新 后旧相机的位置造成的融合 +去融合不会对 TSDF产生附加的影响。
[0109] S32、 场景内存管理
[0110] 对于一个较大的场景, 传统的 TSDF通常需要耗费大量的内存。 本发明通过 has h表来存储 TSDF, 能够进行非常有效的内存压缩。 本发明将一个无限大的均匀空 间栅格细分为体元块。 每个体元块是一个小的均匀体元栅格, 其维度为 8*8*8。 本发明通过 hash来存储这些体元块。 对于每一个空间点 (x,y,z), 本发明将其通过 大质数乘积累加求模的方法进行编码。 本发明同吋通过一个长度为 7的链表解决 碰撞。 当链表已满, 本发明将编码累加, 并在下一个位置添加体元块。 由于显 卡内存仍然有限, 在实吋的重建中, 本发明将维护一个半径为 5m, 圆心为相机 点的球。 将球内的体元记录在显卡中, 将离幵球的体元从显卡转移到内存, 并 将进入球的体元从内存转移到显卡。 由此, 本发明可以保证本发明的算法能够 实吋的维护并存储大规模的场景的数据。
[0111] S33、 融合与去融合
[0112] 对于每一个体元, 本发明记录了它到最近物体表面的带符号距离和权重。 因此 , 对于一个新的帧, 本发明可以通过加权平均的方式更新带符号距离和权重, 从而达到融合的效果。 同样, 本发明可以将这个算子逆向化, 得到去融合的效 果。
[0113] 因此, 对于一个更新姿势的帧, 本发明可以将旧的姿势去融合, 并将新的姿势 融合到 TSDF, 达到更新三维模型的效果。
[0114] S34、 管理重建的更新
[0115] 每一个输入帧都存储了其深度和颜色的数据, 并同吋拥有新旧两个姿势。 旧的 姿势在以往被融合吋被更新, 而新的姿势在每次全局优化后被更新。
[0116] 当一个输入帧到来吋, 本发明需要将其尽快融合到 TSDF中, 从而能够给予用 户即刻的三维模型反馈。 由于全局的优化是建立在块的基础上, 优化的相机姿 势不能直接被计算。 本发明通过之前帧最优化的姿势, 和通过特征点计算出的 帧到帧的相对变换矩阵, 来获取初始化的当前帧姿势。
[0117] 为了保证重建的更新能够最高效的反馈给用户, 本发明将帧通过新旧姿势的差 距进行降序排序。 姿势是两个三维向量 (旋转矩阵的欧拉角向量和平移向量) 。 本发明将其欧几里得距离的线性组合作为新旧姿势的差距。 对于每一个新接 受的输入帧, 本发明会更新 10个差距最大的帧对三维模型的优化。 因此, 本发 明能够获得实吋矫正优化的三维重建模型。
[0118]
[0119] 图 2为本发明的实吋大规模场景三维扫描建模系统 100的结构示意图。
[0120] 如图 2所示, 所述建模系统 100包括如下模块: [0121] 视频流获取模块 110, 用于获取 RGB-D视频流。 所述 RGB-D视频流通过普通的 深度摄像机即可拍取, 是实吋的 RGB-D视频流, 通过商用的深度摄像头进行拍 如 structure sensor, kinect,
primesense。 视频流通常分辨率为 640X480, 帧率为 30fps。 本发明假设每一帧的 彩色信息和深度信息是完美对齐的。
[0122] 全局相机姿势优化模块 120, 用于对上述视频流进行全局相机姿势优化, 获取 全局相机预测位置。
[0123] 所述全局姿势优化的方法是在线、 全局一致的三维重建的基础。 本发明的目标 是找到帧之间 3D的匹配点, 并且找到一个优化的相机位置矩阵, 从而使 3D的匹 配点最好的在矩阵下对齐。
[0124] 为此, 上述全局相机姿势优化模块 120进一步包括以下单元:
[0125] 特征匹配搜索单元 121。 为了得到全局点云对齐的一致性, 本发明使用了从稀 疏到稠密的相机姿势预测: 由于稀疏特征自然能够提供回路检测和重定位, 本 发明用稀疏特征的匹配来获取较为粗糙的全局对齐。 其次, 粗糙的对齐将通过 稠密的图像和几何一致性进行优化和精细化。 本发明用尺度不变特征转换 (Seal e-Invariant Feature
Transfonn(SIFT)) 来建立逐对输入帧的特征匹配。 被检测的 SIFT特征点将与所 有之前的帧进行匹配, 并小心的滤掉错误的匹配, 从而避免错误的回环检测。
[0126] 在本发明的系统中, 本发明首先寻找帧之间的稀疏特征匹配。 本发明使用到了 快速的特征提取、 特征匹配和匹配筛选步骤。 对于每一个新的帧, 本发明检测 SI FT (尺度不变特征转换, Scale-Invariant Feature Transform) 特征点, 并与所有 之前的帧做匹配。 之所以使用 SIFT特征点, 是因为它能够包含手持扫描拍摄的 区域的几乎全部变化, 如平移、 旋转和缩放。 潜在的逐对特征匹配将通过筛选 去除错误的匹配, 并得到一组正确的逐对特征匹配结果。 这个匹配将被用作全 局的相机优化。 本发明的特征搜索全部在 GPU中完成, 避免了 CPU和 GPU的数据 传输。 本发明计算 SIFT特征点和描述算子通常需要 4-5ms/fmme, 并使用约 0.05m s匹配两帧。 因此, 在层次化的优化模型下, 本发明可以在扫描超过 20000帧的情 况下依然得到实吋的匹配。 [0127] 为此, 上述特征匹配搜索单元 121进一步包括以下子单元:
[0128] 匹配筛选子单元 1211。 为了使错误的匹配最小化, 本发明会基于颜色和几何的 一致性来过滤错误的特征点。 对于每一对 1^¾-0帧<¾>, 本发明将逐次结合潜 在的特征匹配并通过最小化匹配误差来计算两帧的刚性变换矩阵。 本发明接着 计算对应点协方差矩阵最大最小特征值的比值是否过大, 来检验这样的变换是 否存在模糊性。 如果最大的匹配误差超过 0.02m, 或者比值大于 100, 则这样的 匹配被刪除, 直到所有的条件被满足。 如果剩下的匹配点过少, 则这一对帧无 法得到一个正确的变换矩阵, 本发明将忽略他们的对应关系。
[0129] 表面积筛选子单元 1212。 接着, 本发明检测每帧匹配点所覆盖的最大平面面积 是否足够大。 如果覆盖面积小, 则计算得到的矩阵存在不稳定性。 对于一对 RG 8-0帧<¾>, 本发明计算对应点在每帧向主平面的投影, 如果投影的 2D包围矩形 足够的大 (>0.032平方米), 则本发明认为覆盖的表面积足够的大。
[0130] 稠密验证子单元 1213。 对于获取的相对变换矩阵, 本发明将前一帧变换到后一 帧的空间, 并进行逐像素的色彩与深度比较, 如果色彩和深度足够接近的像素 未超过一定的比例, 这样的相对变换是不正确的。
[0131] 如果所有上述的检测都通过, 则这一对 RGB-D帧的匹配点会被加入一个正确匹 配集合, 并在将来被用作全局的相机姿势优化。 本发明要求每一对 RGB-D帧的 匹配点数目不小于 5, 从而能确保得到的是一个可信的变换。
[0132]
[0133] 建立层次化的优化模型单元 122:
[0134] 为了保证在数以万计的 RGB-D视频帧中保持实吋的计算, 本发明采取了层次化 的优化策略。 输入的视频序列被分成了包括连续帧的小块。 在最底层的优化中 , 本发明只优化小块内部的帧。 在上层的优化中, 本发明用关键帧的对应点关 联所有块, 将块作为整体相互对齐。
[0135] 为了保证实吋的全局姿势对齐优化, 本发明用被筛选好的帧匹配点, 做了一个 层次化的, 由局部到全局的姿势优化。 在第一层优化中, 每 n个相邻的帧组成了 一个小块, 并将小块进行内部的整体优化。 在第二层优化中, 所有小块将通过 匹配点被关联, 并进行整体的优化。 [0136] 对于这两级优化, 本发明都抽象为一个能量最小化问题, 其中考虑了稀疏的特 征点、 稠密的图像和几何信息。 本发明通过一个快速的并行化的显卡算法来解 决这个高度非线性的优化问题。
[0137] 为此, 上述建立层次化的优化模型单元 122进一步包括以下子单元:
[0138] 块内部相机优化子单元 1221。 块内部对齐优化基于一个拥有 11个连续帧的块, 每个相邻的块的头尾共享一帧。 局部相机优化的目标是获得块内部对齐最优的 相机姿势。 在此, 本发明基于块内部逐对帧所有被验证的特征匹配点来优化相 机姿势。 优化是基于稀疏特征点, 和稠密颜色几何信息, 最小化能量来得到的 。 由于每块只拥有少量的帧, 相机姿势在块中变化不大。 因此, 本发明初始化 每一帧的相机参数为一个单位矩阵。 为了保证在优化收敛后的相机姿势足够的 准确, 本发明采用了稠密验证, 过滤不够准确的帧。
[0139] 块的关键帧获取子单元 1222。 一旦一个块的内部被优化, 本发明定义块的第一 帧为该块的关键帧。 本发明同吋需要计算该关键帧相关的特征点集合。 根据已 优化的块内相机姿势, 本发明会合并块内帧的特征点, 并计算一个一致的三维 特征点。 这些特征点可能存在同一个全局空间三维点的在多个视频帧的多个实 例。 本发明将所有帧的特征点在基于相对变换矩阵后变换到关键帧空间, 并将 特征点聚集到一个集合。 对于小于 0.03m的特征点, 将他们合并成一个特征点。 在获得一个关键帧的特征点后, 其余帧的特征信息 (特征点、 描述算子和匹配 关系) 将可以被释放。
[0140] 全局块优化子单元 1223。 稀疏特征匹配的搜索和过滤将同吋被应用在块内部帧 以及块之间的关键帧, 只是在关键帧中的匹配使用的是块内所有特征点聚集的 特征点集合。 如果一个关键帧没有找到任何与之前帧的匹配对应, 本发明将其 视为未被验证的帧, 并在与之后的帧找到对应后转换为已被验证的帧。 本发明 通过块内部的优化可以得到相邻块关键帧的相对矩阵变换。 通过累积变换的方 式, 本发明能够得到关键帧全局的位置。 本发明将这个位置作为初始化, 并利 用能量最小化优化模型优化关键帧的姿势, 从而得到块的全局最优化。
[0141]
[0142] 姿势对齐与能量最小化单元 123: [0143] 对于已经计算帧集合的三维对应点, 姿势对齐的目标是找到一个最优的相机刚 性变换矩阵, 使得帧的特征点在变换矩阵下能够最好的对齐 (误差最小) 。 本 发明将矩阵参数化为一个六维向量, 其中三个未知量来自于旋转, 三个未知量 来自于平移。 对齐的能量是由稀疏能量和稠密能量的线性组合得到。 稠密能量 的权重逐步增大, 从而得到从粗糙到精细的全局优化。 这里, 本发明固定第一 帧的矩阵, 优化其余帧的矩阵。
[0144] 为此, 上述姿势对齐与能量最小化 123进一步包括以下子单元:
[0145] 稀疏特征点匹配子单元 1231。 对于一组帧集合, 本发明计算在矩阵变换下任意 匹配对的误差, 并将其平方和作为系数匹配的能量。
[0146] 稠密匹配子单元 1232。 本发明同吋加入了稠密的图像和几何限制, 来得到精细 的对齐。 到此, 本发明考虑帧的深度值和彩色值。 由于稠密对应的计算比稀疏 对应大很多, 本发明只在一些关联较大的帧上进行优化: 相机的视角差小于 60 度, 且之间存在重复区域。 稠密的优化考虑到了稠密的图像和几何对齐信息, 即亮度误差和几何误差的线性组合。 对于每对对应帧, 本发明将图像从一帧变 换到另外一帧, 并计算逐像素亮度误差, 并定义为其平方和。 对于几何误差, 本发明将深度图从一帧变换到另外一帧, 并计算逐像素对应 3D矢量与表面法相 的点积, 并定义为其平方和。
[0147]
[0148] 三维建模模块 130, 用于根据所述全局相机预测位置, 进行全局优化的三维扫 描建模。
[0149] 稠密的场景三维重建是通过压缩的体元融合 (volumetric representation and
fusion)来表达, 并可以实吋应用在大规模的场景上。 本发明基于持续改变的相机 姿势, 持续的改变并优化全局的三维重建模型。 这里的关键在于允许对称的在 线重组之前的 RGB-D帧。 为了保证被优化的帧影响到三维模型, 本发明会撤销 R GB-D帧在旧姿势对三维模型的作用, 并取而代之以新的姿势。 因此, 体元模型( volumetric model)将在吋刻进行全局优化的相机姿势前提下 (如检测到环路) , 能够持续的被更新优化。
[0150] 在线全局一致的三维重建的关键在于能够实吋的基于最新优化的相机位置来更 新模型。 因此, 本发明会监测每帧持续变化的相机姿势, 从而通过融合和去融 合的方式更新每帧对三维模型的影响。 基于这样的策略, 累积的相机漂移误差 和在特征不明显区域的误差可以在更优化的相机姿势计算出来后通过动态重建 被消除。
[0151] 为此, 上述三维建模模块 130进一步包括以下单元:
[0152] 场景表达单元 131 :
[0153] 场景的几何表达是通过逐步融合输入的 RGB-D数据到 implicit truncated signed distance (TSDF) [Curies s 1996]来完成。 TSDF由空间栅格的每个体元组成。
[0154] 进一步, 在本发明允许 RGB-D帧融合到 TSDF的同吋, 允许 RGB-D帧从 TSDF中 去融合。 本发明同吋保证融合和去融合的对称性, 从而保证在相机位置的更新 后旧相机的位置造成的融合 +去融合不会对 TSDF产生附加的影响。
[0155] 场景内存管理单元 132:
[0156] 对于一个较大的场景, 传统的 TSDF通常需要耗费大量的内存。 本发明通过 has h表来存储 TSDF, 能够进行非常有效的内存压缩。 本发明将一个无限大的均匀空 间栅格细分为体元块。 每个体元块是一个小的均匀体元栅格, 其维度为 8*8*8。 本发明通过 hash来存储这些体元块。 对于每一个空间点 (x,y,z), 本发明将其通过 大质数乘积累加求模的方法进行编码。 本发明同吋通过一个长度为 7的链表解决 碰撞。 当链表已满, 本发明将编码累加, 并在下一个位置添加体元块。 由于显 卡内存仍然有限, 在实吋的重建中, 本发明将维护一个半径为 5m, 圆心为相机 点的球。 将球内的体元记录在显卡中, 将离幵球的体元从显卡转移到内存, 并 将进入球的体元从内存转移到显卡。 由此, 本发明可以保证本发明的算法能够 实吋的维护并存储大规模的场景的数据。
[0157] 融合与去融合单元 133:
[0158] 对于每一个体元, 本发明记录了它到最近物体表面的带符号距离和权重。 因此 , 对于一个新的帧, 本发明可以通过加权平均的方式更新带符号距离和权重, 从而达到融合的效果。 同样, 本发明可以将这个算子逆向化, 得到去融合的效 果。
[0159] 因此, 对于一个更新姿势的帧, 本发明可以将旧的姿势去融合, 并将新的姿势 融合到 TSDF, 达到更新三维模型的效果。
[0160] 管理重建的更新单元 134:
[0161] 每一个输入帧都存储了其深度和颜色的数据, 并同吋拥有新旧两个姿势。 旧的 姿势在以往被融合吋被更新, 而新的姿势在每次全局优化后被更新。
[0162] 当一个输入帧到来吋, 本发明需要将其尽快融合到 TSDF中, 从而能够给予用 户即刻的三维模型反馈。 由于全局的优化是建立在块的基础上, 优化的相机姿 势不能直接被计算。 本发明通过之前帧最优化的姿势, 和通过特征点计算出的 帧到帧的相对变换矩阵, 来获取初始化的当前帧姿势。
[0163] 为了保证重建的更新能够最高效的反馈给用户, 本发明将帧通过新旧姿势的差 距进行降序排序。 姿势是两个三维向量 (旋转矩阵的欧拉角向量和平移向量) 。 本发明将其欧几里得距离的线性组合作为新旧姿势的差距。 对于每一个新接 受的输入帧, 本发明会更新 10个差距最大的帧对三维模型的优化。 因此, 本发 明能够获得实吋矫正优化的三维重建模型。
[0164]
[0165] 本发明的系统是一个能够同吋解决现有技术中存在的所有问题, 并且拥有终端 到终端实吋建模能力的综合系统。 本发明的核心是稳定的相机位置预测方法, 通过层次化的局部到全局的优化方法, 结合所有拍摄的 RGB-D视频帧来优化相 机。 由于本发明同吋考虑到了所有的视频帧, 本发明不再需要显式的环路检测 。 目前的实吋的相机追踪通常是帧到帧或帧到模型的匹配技术, 拥有较大的相 机追踪错误或者误差, 这在本发明的方法中能够很好的避免。 另一方面, 即使 相机预测失败, 或者视频从完全不同的角度重新拍摄, 本发明的系统能够立即 将这些不连续的帧通过全局的匹配得到全局最优的相机预测。 这个技术能够保 证稳定的扫描体验, 让普通用户可以成功的进行大规模的扫描。
[0166] 本发明系统的关键是并行化的从稀疏到稠密的全局相机预测系统: 稀疏的 RGB 特征被应用于粗略的全局相机预测, 从而保证相机预测位置足够精确, 使得随 后的稠密优化模型可以收敛。 因此, 本发明在保证局部建模精度的同吋, 实吋 的维护一个全局最优的相机结构。 另外, 本发明的模型更新支持相机矫正而引 起的模型矫正, 从而保证已扫描的空间在重新访问吋能够保证一致性。 就此, 本发明比起传统的方法, 拥有极大的速度提升, 并在模型精度和稳定性上也高 于许多离线的方法, 方便普通的用户使用。
[0167] 综上, 本发明的创新点在于:
[0168] 1.
一个新颖, 实吋的全局一致相机模型优化系统, 考虑到所有曾经拍摄的 RGB-D 视频帧, 摒弃了基于吋间连续假设的相机追踪缺陷, 同吋通过层次化的局部到 全局的优化分离达到实吋的要求。
[0169] 2.一个从稀疏到稠密的模型匹配方法, 保证一致的全局结构和较为精确细致的 局部表面细节。
[0170] 3. RGB-D重结合方法, 在相机位置被矫正吋, 实吋的更新该相机数据对全局三 维模型造成的影响。
[0171] 4.大规模的几何以及纹理重建。
[0172]
[0173] 需要说明的是:
[0174] 在此提供的算法和显示不与任何特定计算机、 虚拟装置或者其它设备固有相关 。 各种通用装置也可以与基于在此的示教一起使用。 根据上面的描述, 构造这 类装置所要求的结构是显而易见的。 此外, 本发明也不针对任何特定编程语言 。 应当明白, 可以利用各种编程语言实现在此描述的本发明的内容, 并且上面 对特定语言所做的描述是为了披露本发明的最佳实施方式。
[0175] 在此处所提供的说明书中, 说明了大量具体细节。 然而, 能够理解, 本发明的 实施例可以在没有这些具体细节的情况下实践。 在一些实例中, 并未详细示出 公知的方法、 结构和技术, 以便不模糊对本说明书的理解。
[0176] 类似地, 应当理解, 为了精简本公幵并帮助理解各个发明方面中的一个或多个 , 在上面对本发明的示例性实施例的描述中, 本发明的各个特征有吋被一起分 组到单个实施例、 图、 或者对其的描述中。 然而, 并不应将该公幵的方法解释 成反映如下意图: 即所要求保护的本发明要求比在每个权利要求中所明确记载 的特征更多的特征。 更确切地说, 如下面的权利要求书所反映的那样, 发明方 面在于少于前面公幵的单个实施例的所有特征。 因此, 遵循具体实施方式的权 利要求书由此明确地并入该具体实施方式, 其中每个权利要求本身都作为本发 明的单独实施例。
[0177] 本领域那些技术人员可以理解, 可以对实施例中的设备中的模块进行自适应性 地改变并且把它们设置在与该实施例不同的一个或多个设备中。 可以把实施例 中的模块或单元或组件组合成一个模块或单元或组件, 以及此外可以把它们分 成多个子模块或子单元或子组件。 除了这样的特征和 /或过程或者单元中的至少 一些是相互排斥之外, 可以采用任何组合对本说明书 (包括伴随的权利要求、 摘 要和附图)中公幵的所有特征以及如此公幵的任何方法或者设备的所有过程或单 元进行组合。 除非另外明确陈述, 本说明书 (包括伴随的权
[0178] 利要求、 摘要和附图)中公幵的每个特征可以由提供相同、 等同或相似目的的 替代特征来代替。
[0179] 此外, 本领域的技术人员能够理解, 尽管在此所述的一些实施例包括其它实施 例中所包括的某些特征而不是其它特征, 但是不同实施例的特征的组合意味着 处于本发明的范围之内并且形成不同的实施例。 例如, 在下面的权利要求书中 , 所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
[0180] 本发明的各个部件实施例可以以硬件实现, 或者以在一个或者多个处理器上运 行的软件模块实现, 或者以它们的组合实现。 本领域的技术人员应当理解, 可 以在实践中使用微处理器或者数字信号处理器( DSP )来实现根据本发明实施例 的虚拟机的创建装置中的一些或者全部部件的一些或者全部功能。 本发明还可 以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序 (例 如, 计算机程序和计算机程序产品)。 这样的实现本发明的程序可以存储在计算 机可读介质上, 或者可以具有一个或者多个信号的形式。 这样的信号可以从因 特网网站上下载得到, 或者在载体信号上提供, 或者以任何其他形式提供。
[0181] 应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制, 并且本 领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。 在 权利要求中, 不应将位于括号之间的任何参考符号构造成对权利要求的限制。 单词"包含"不排除存在未列在权利要求中的元件或步骤。 位于元件之前的单词" 一"或"一个"不排除存在多个这样的元件。 本发明可以借助于包括有若干不同元 件的硬件以及借助于适当编程的计算机来实现。 在列举了若干装置的单元权利 要求中, 这些装置中的若干个可以是通过同一个硬件项来具体体现。 单词第一 、 第二、 以及第三等的使用不表示任何顺序。 可将这些单词解释为名称。

Claims

权利要求书 [权利要求 1] 一种实吋大规模场景三维扫描建模方法, 其特征在于, 所述方法包括 如下步骤: 获取 RGB-D视频流; 对上述视频流进行全局相机姿势优化, 获取全局相机预测位置; 根据所述全局相机预测位置, 进行全局优化的三维扫描建模。 [权利要求 2] 如权利要求 1所述的实吋大规模场景三维扫描建模方法, 其特征在于 所述对上述视频流进行全局相机姿势优化, 获取全局相机预测位置的 步骤包括以下子步骤:
(1) 特征匹配搜索: 通过 SIFT建立逐对输入帧的特征匹配, 将被检 测的 SIFT特征点将与所有之前的帧进行匹配, 并滤掉错误的匹配;
(2) 建立层次化的优化模型: 将所述视频流分成包括连续帧的小块
, 在最底层的优化中, 只优化小块内部的帧, 而在上层的优化中, 使 用关键帧的对应点关联所有块, 将块作为整体相互对齐。
(3) 姿势对齐与能量最小化: 将矩阵参数化为一个六维向量, 其中 三个未知量来自于旋转, 三个未知量来自于平移; 对齐的能量由稀疏 能量和稠密能量的线性组合得到, 并且稠密能量的权重逐步增大, 从 而得到从粗糙到精细的全局优化。
[权利要求 3] 如权利要求 2所述的实吋大规模场景三维扫描建模方法, 其特征在于 上述子步骤 (1) 的过程如下:
(i) 、 匹配筛选, 基于颜色和几何的一致性来过滤错误的特征点:对 于每一对 RGB-D帧, 逐次结合潜在的特征匹配并通过最小化匹配误 差来计算两帧的刚性变换矩阵; 接着计算对应点协方差矩阵最大最小 特征值的比值是否过大, 如果最大的匹配误差超过 0.02m, 或者比值 大于 100, 则这样的匹配被刪除;
(ii) 、 表面积筛选, 检测每帧匹配点所覆盖的最大平面面积是否足 够大:对于一对 RGB-D帧, 计算对应点在每帧向主平面的投影, 如果 投影的 2D包围矩形足够的大, 则认为覆盖的表面积足够的大;
(iii) 、 稠密验证:对于获取的刚性变换矩阵, 将前一帧变换到后一 帧的空间, 并进行逐像素的色彩与深度比较, 如果色彩和深度足够接 近的像素未超过一定比例, 则该变换不正确。
[权利要求 4] 如权利要求 2所述的实吋大规模场景三维扫描建模方法, 其特征在于 上述子步骤 (2) 的过程如下:
(i) 、 块内部相机姿势优化, 基于块内部逐对帧所有被验证的特征 匹配点来优化相机姿势, 所述优化是基于稀疏特征点, 和稠密颜色几 何信息, 最小化能量来得到的;
(ii) 、 获取块的关键帧, 计算该关键帧相关的特征点集合: 根据已 优化的块内部相机姿势, 合并块内帧的特征点, 并计算一个一致的三 维特征点;
(iii) 、 全局块优化, 通过累积变换的方式, 得到关键帧全局的位置 , 并将该位置作为初始化, 利用能量最小化优化模型优化关键帧的姿 势, 从而得到块的全局最优化。
[权利要求 5] 如权利要求 2所述的实吋大规模场景三维扫描建模方法, 其特征在于 上述子步骤 (3) 的过程如下:
(i) 、 稀疏特征点匹配: 对于一组帧集合, 计算在矩阵变换下任意 匹配对的误差, 并将其平方和作为系数匹配的能量;
(ii) 、 稠密匹配, 包括亮度误差和几何误差的线性组合: 对于每对 对应帧, 将图像从一帧变换到另外一帧, 并计算逐像素亮度误差; 对 于几何误差, 将深度图从一帧变换到另外一帧, 并计算逐像素对应 3 D矢量与表面法相的点积。
[权利要求 6] —种实吋大规模场景三维扫描建模系统, 其特征在于, 所述系统包括 如下模块:
视频流获取模块, 用于获取 RGB-D视频流;
全局相机姿势优化模块, 用于对上述视频流进行全局相机姿势优化, 获取全局相机预测位置;
三维建模模块, 用于根据所述全局相机预测位置, 进行全局优化的三 维扫描建模。
[权利要求 7] 如权利要求 6所述的实吋大规模场景三维扫描建模系统, 其特征在于 所述全局相机姿势优化模块包括以下单元:
特征匹配搜索单元, 用于通过 SIFT建立逐对输入帧的特征匹配, 将被 检测的 SIFT特征点将与所有之前的帧进行匹配, 并滤掉错误的匹配; 层次化的优化模型建立单元, 用于将所述视频流分成包括连续帧的小 块, 在最底层的优化中, 只优化小块内部的帧, 而在上层的优化中, 使用关键帧的对应点关联所有块, 将块作为整体相互对齐。
姿势对齐与能量最小化单元, 用于将矩阵参数化为一个六维向量, 其 中三个未知量来自于旋转, 三个未知量来自于平移; 对齐的能量由稀 疏能量和稠密能量的线性组合得到, 并且稠密能量的权重逐步增大, 从而得到从粗糙到精细的全局优化。
[权利要求 8] 如权利要求 6所述的实吋大规模场景三维扫描建模系统, 其特征在于 上述特征匹配搜索单元包括如下子单元:
匹配筛选子单元, 用于基于颜色和几何的一致性来过滤错误的特征点 :对于每一对 RGB-D帧, 逐次结合潜在的特征匹配并通过最小化匹配 误差来计算两帧的刚性变换矩阵; 接着计算对应点协方差矩阵最大最 小特征值的比值是否过大, 如果最大的匹配误差超过 0.02m, 或者比 值大于 100, 则这样的匹配被刪除;
表面积筛选子单元, 用于检测每帧匹配点所覆盖的最大平面面积是否 足够大:对于一对 RGB-D帧, 计算对应点在每帧向主平面的投影, 如 果投影的 2D包围矩形足够的大, 则认为覆盖的表面积足够的大; 稠密验证子单元, 用于对于获取的刚性变换矩阵, 将前一帧变换到后 一帧的空间, 并进行逐像素的色彩与深度比较, 如果色彩和深度足够 接近的像素未超过一定比例, 则该变换不正确。
[权利要求 9] 如权利要求 6所述的实吋大规模场景三维扫描建模系统, 其特征在于 上述层次化的优化模型建立单元包括如下子单元: 块内部相机优化子单元, 用于基于块内部逐对帧所有被验证的特征匹 配点来优化相机姿势, 所述优化是基于稀疏特征点, 和稠密颜色几何 信息, 最小化能量来得到的;
块的关键帧获取子单元, 用于计算该关键帧相关的特征点集合: 根据 已优化的块内部相机姿势, 合并块内帧的特征点, 并计算一个一致的 三维特征点;
全局块优化子单元, 用于通过累积变换的方式, 得到关键帧全局的位 置, 并将该位置作为初始化, 利用能量最小化优化模型优化关键帧的 姿势, 从而得到块的全局最优化。
[权利要求 10] 如权利要求 6所述的实吋大规模场景三维扫描建模系统, 其特征在于 所述姿势对齐与能量最小化单元包括如下子单元: 稀疏特征点匹配子单元, 用于对于一组帧集合, 计算在矩阵变换下任 意匹配对的误差, 并将其平方和作为系数匹配的能量; 稠密匹配子单元, 用于包括亮度误差和几何误差的线性组合: 对于每 对对应帧, 将图像从一帧变换到另外一帧, 并计算逐像素亮度误差; 对于几何误差, 将深度图从一帧变换到另外一帧, 并计算逐像素对应 3D矢量与表面法相的点积。
PCT/CN2017/075025 2017-01-12 2017-02-27 一种实时大规模场景三维扫描建模方法及系统 WO2018129794A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710021754.9 2017-01-12
CN201710021754.9A CN106856012B (zh) 2017-01-12 2017-01-12 一种实时大规模场景三维扫描建模方法及系统

Publications (1)

Publication Number Publication Date
WO2018129794A1 true WO2018129794A1 (zh) 2018-07-19

Family

ID=59126094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/075025 WO2018129794A1 (zh) 2017-01-12 2017-02-27 一种实时大规模场景三维扫描建模方法及系统

Country Status (2)

Country Link
CN (1) CN106856012B (zh)
WO (1) WO2018129794A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11615570B2 (en) 2019-04-30 2023-03-28 Tencent Technology (Shenzhen) Company Limited Virtual object display method and apparatus, electronic device, and storage medium
CN116758157A (zh) * 2023-06-14 2023-09-15 深圳市华赛睿飞智能科技有限公司 一种无人机室内三维空间测绘方法、系统及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107845134B (zh) * 2017-11-10 2020-12-29 浙江大学 一种基于彩色深度相机的单个物体的三维重建方法
CN108416840B (zh) * 2018-03-14 2020-02-18 大连理工大学 一种基于单目相机的三维场景稠密重建方法
CN111476882B (zh) * 2020-03-26 2023-09-08 哈尔滨工业大学 一种面向浏览器的机器人虚拟图形建模方法
CN111915741A (zh) * 2020-08-13 2020-11-10 广东申义实业投资有限公司 一种基于三维重建的vr生成器
CN112257605B (zh) * 2020-10-23 2021-07-23 中国科学院自动化研究所 基于自标注训练样本的三维目标检测方法、系统及装置
CN112991515B (zh) * 2021-02-26 2022-08-19 山东英信计算机技术有限公司 一种三维重建方法、装置及相关设备
CN114327334A (zh) * 2021-12-27 2022-04-12 苏州金羲智慧科技有限公司 基于光线分析的环境信息传递系统及其传递方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286682B1 (en) * 2014-11-21 2016-03-15 Adobe Systems Incorporated Aligning multi-view scans
CN105701820A (zh) * 2016-01-14 2016-06-22 上海大学 一种基于匹配区域的点云配准方法
CN105809681A (zh) * 2016-03-04 2016-07-27 清华大学 基于单相机的人体rgb-d数据恢复与三维重建方法
CN105989604A (zh) * 2016-02-18 2016-10-05 合肥工业大学 一种基于kinect的目标物体三维彩色点云生成方法
CN106204718A (zh) * 2016-06-28 2016-12-07 华南理工大学 一种基于单个Kinect的简易高效三维人体重建方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101850027B1 (ko) * 2011-12-08 2018-04-24 한국전자통신연구원 실시간 3차원 실 환경 복원장치 및 그 방법
CN104851094A (zh) * 2015-05-14 2015-08-19 西安电子科技大学 一种基于rgb-d的slam算法的改进方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9286682B1 (en) * 2014-11-21 2016-03-15 Adobe Systems Incorporated Aligning multi-view scans
CN105701820A (zh) * 2016-01-14 2016-06-22 上海大学 一种基于匹配区域的点云配准方法
CN105989604A (zh) * 2016-02-18 2016-10-05 合肥工业大学 一种基于kinect的目标物体三维彩色点云生成方法
CN105809681A (zh) * 2016-03-04 2016-07-27 清华大学 基于单相机的人体rgb-d数据恢复与三维重建方法
CN106204718A (zh) * 2016-06-28 2016-12-07 华南理工大学 一种基于单个Kinect的简易高效三维人体重建方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11615570B2 (en) 2019-04-30 2023-03-28 Tencent Technology (Shenzhen) Company Limited Virtual object display method and apparatus, electronic device, and storage medium
CN116758157A (zh) * 2023-06-14 2023-09-15 深圳市华赛睿飞智能科技有限公司 一种无人机室内三维空间测绘方法、系统及存储介质
CN116758157B (zh) * 2023-06-14 2024-01-30 深圳市华赛睿飞智能科技有限公司 一种无人机室内三维空间测绘方法、系统及存储介质

Also Published As

Publication number Publication date
CN106856012B (zh) 2018-06-22
CN106856012A (zh) 2017-06-16

Similar Documents

Publication Publication Date Title
WO2018129794A1 (zh) 一种实时大规模场景三维扫描建模方法及系统
CN111968129B (zh) 具有语义感知的即时定位与地图构建系统及方法
CN109166149B (zh) 一种融合双目相机与imu的定位与三维线框结构重建方法与系统
Clipp et al. Parallel, real-time visual SLAM
US10789765B2 (en) Three-dimensional reconstruction method
CN112347861B (zh) 一种基于运动特征约束的人体姿态估计方法
EP3602494A1 (en) Robust mesh tracking and fusion by using part-based key frames and priori model
KR20180026400A (ko) 3-차원 공간 모델링
Jin et al. MoADNet: Mobile asymmetric dual-stream networks for real-time and lightweight RGB-D salient object detection
CN115205489A (zh) 一种大场景下的三维重建方法、系统及装置
CN111127524A (zh) 一种轨迹跟踪与三维重建方法、系统及装置
CN110110694B (zh) 一种基于目标检测的视觉slam闭环检测方法
Chen et al. Key issues in modeling of complex 3D structures from video sequences
Ren et al. Lidar-aid inertial poser: Large-scale human motion capture by sparse inertial and lidar sensors
CN114119739A (zh) 一种基于双目视觉的手部关键点空间坐标获取方法
Takacs et al. 3D mobile augmented reality in urban scenes
CN112085849A (zh) 基于航拍视频流的实时迭代三维建模方法、系统及可读介质
CN111860651A (zh) 一种基于单目视觉的移动机器人半稠密地图构建方法
CN116188825A (zh) 一种基于并行注意力机制的高效特征匹配方法
CN110378995B (zh) 一种利用投射特征进行三维空间建模的方法
CN115393519A (zh) 一种基于红外可见光融合图像的三维重构方法
Zhu et al. PairCon-SLAM: Distributed, online, and real-time RGBD-SLAM in large scenarios
WO2022032996A1 (zh) 基于非同步视频的运动捕捉方法
Qing et al. Dort: Modeling dynamic objects in recurrent for multi-camera 3d object detection and tracking
CN111829522B (zh) 即时定位与地图构建方法、计算机设备以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17891316

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17891316

Country of ref document: EP

Kind code of ref document: A1