WO2019170164A1 - 基于深度相机的三维重建方法、装置、设备及存储介质 - Google Patents

基于深度相机的三维重建方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2019170164A1
WO2019170164A1 PCT/CN2019/084820 CN2019084820W WO2019170164A1 WO 2019170164 A1 WO2019170164 A1 WO 2019170164A1 CN 2019084820 W CN2019084820 W CN 2019084820W WO 2019170164 A1 WO2019170164 A1 WO 2019170164A1
Authority
WO
WIPO (PCT)
Prior art keywords
voxel
frame
image
current
target scene
Prior art date
Application number
PCT/CN2019/084820
Other languages
English (en)
French (fr)
Inventor
方璐
韩磊
苏卓
戴琼海
Original Assignee
清华-伯克利深圳学院筹备办公室
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华-伯克利深圳学院筹备办公室 filed Critical 清华-伯克利深圳学院筹备办公室
Priority to US16/977,899 priority Critical patent/US20210110599A1/en
Publication of WO2019170164A1 publication Critical patent/WO2019170164A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Definitions

  • the embodiments of the present application relate to the field of image processing technologies, for example, to a depth camera-based three-dimensional reconstruction method, apparatus, device, and storage medium.
  • 3D reconstruction is a reconstruction of the mathematical model of 3D objects in the real world through specific devices and algorithms. It is of great significance for virtual reality, augmented reality, robot perception, human-computer interaction and robot path planning.
  • the target scene is captured by the depth camera to obtain at least two frames of images; each frame of the image is solved by the GPU to obtain the relative camera pose of the depth camera when each frame of image is captured; and the relative camera pose corresponding to each frame of the image is obtained.
  • GPU graphics processing unit
  • RGB-D camera RGB-D camera
  • the three-dimensional reconstruction method in the related art has a large amount of computation, and is highly dependent on the GPU dedicated to image processing.
  • the GPU cannot be portable, and it is difficult to apply to mobile robots, portable devices, and wearable devices (such as the augmented reality display device Microsoft HoloLens).
  • the embodiment of the present invention provides a three-dimensional reconstruction method, device, device, and storage medium based on a depth camera, which avoids a large amount of computation when performing three-dimensional reconstruction on a target scene, and implements the application of three-dimensional reconstruction to a portable device. Make the application of 3D reconstruction more extensive.
  • the embodiment of the present application provides a depth camera-based three-dimensional reconstruction method, the method includes: acquiring at least two frames of images obtained by acquiring a target scene by a depth camera; determining, according to the at least two frames of images, the The relative camera pose when the depth camera captures the target scene; for each frame image, at least one feature voxel is determined from each frame image by using at least two levels of nested screening mode, wherein each level of screening adopts a corresponding filter for each level.
  • a voxel partitioning rule performing fusion calculation on at least one characteristic voxel of each frame image according to a relative camera pose of each frame image to obtain a grid voxel model of the target scene; generating the grid voxel model, etc.
  • the value plane obtains a three-dimensional reconstruction model of the target scene.
  • the embodiment of the present application further provides a depth camera-based three-dimensional reconstruction apparatus, the apparatus comprising: an image acquisition module, configured to acquire at least two frames of images obtained by the depth camera acquiring the target scene; and the pose determination module And determining, according to the at least two frames of images, a relative camera pose when the depth camera captures the target scene; and a voxel determining module configured to use at least two levels of nested screening for each frame of image Determining at least one feature voxel in the frame image, wherein each level of filtering adopts a voxel blocking rule corresponding to each level of screening; the model generating module is configured to set at least one of each frame image according to a relative camera pose of each frame of image The feature voxel performs fusion calculation to obtain a grid voxel model of the target scene; the three-dimensional reconstruction module is configured to generate an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.
  • an embodiment of the present application further provides an electronic device, including: one or more processors; a storage device configured to store one or more programs; and at least one depth camera configured to perform image collection on a target scene
  • the one or more processors implement a depth camera based three-dimensional reconstruction method as described in any of the embodiments of the present application.
  • the embodiment of the present application further provides a computer readable storage medium, where the computer program is stored, and the program is executed by the processor to implement a depth camera-based three-dimensional reconstruction method according to any embodiment of the present application. .
  • FIG. 1 is a flowchart of a depth camera-based three-dimensional reconstruction method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a cube of a two-level nested screening method provided by an embodiment of the present application
  • FIG. 3 is a flowchart of a method for determining a relative camera pose when a depth camera collects a target scene according to an embodiment of the present application
  • FIG. 4 is a flowchart of a method for determining at least one feature voxel from an image provided by an embodiment of the present application
  • FIG. 5 is a schematic plan view showing determining at least one characteristic voxel according to an embodiment of the present application
  • FIG. 6 is a flowchart of a depth camera-based three-dimensional reconstruction method according to another embodiment of the present application.
  • FIG. 7 is a structural block diagram of a depth camera-based three-dimensional reconstruction apparatus according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a depth camera-based three-dimensional reconstruction method according to an embodiment of the present disclosure.
  • the embodiment is applicable to a three-dimensional reconstruction of a target scene based on a depth camera, and the method may be performed by a depth camera-based three-dimensional reconstruction.
  • the device or the electronic device is implemented, and the device may be implemented by using hardware and/or software.
  • the depth camera-based three-dimensional reconstruction method of FIG. 1 is schematically illustrated below with reference to the cube schematic of the two-level nested screening mode of FIG. 2 .
  • the method includes steps S101 to S105.
  • step S101 at least two frames of images obtained by the depth camera acquiring the target scene are acquired.
  • the difference between the depth camera and the traditional camera is that the camera can simultaneously capture the image information of the scene and its corresponding depth information.
  • the design principle is to emit a reference beam for the target scene to be measured, and calculate the time difference or phase difference of the return light.
  • the target scene refers to a scene to be reconstructed in three dimensions. For example, when the self-driving car is driving on the road, the target scene is the driving environment scene of the automobile, and the driving environment image of the automobile is collected in real time through the depth camera.
  • At least two frames of images acquired by the depth camera are acquired for processing, and the more frames are acquired, the more accurate the reconstructed target scene model is.
  • images captured by the depth camera can be obtained by wired means such as serial port or network cable, and can be obtained by wireless means such as Bluetooth or wireless broadband.
  • step S102 based on at least two frames of images, a relative camera pose when the depth camera captures the target scene is determined.
  • the pose of the camera refers to the position and posture of the camera.
  • the position represents the translation distance of the camera (such as the translation of the camera in three directions of X, Y, and Z)
  • the posture represents the rotation angle of the camera ( For example, the angle of the camera in the three directions of X, Y, Z is ⁇ , ⁇ , ⁇ ).
  • the relative position and posture of the depth camera are different when shooting each frame of image, and can be expressed by the relative pose of the depth camera.
  • the depth camera can automatically change the position and posture according to a certain trajectory, or It is manually rotated and moved to a depth camera for shooting. Therefore, it is necessary to determine the relative camera pose when acquiring each frame of image, and accurately reconstruct the frame image to the position corresponding to the target scene.
  • the pose of the camera can be directly obtained by installing a sensor that measures the translation distance and the rotation angle on the depth camera. Since the relative pose does not change much when the depth camera captures the adjacent two frames of images, in order to obtain the relative camera pose more accurately, the acquired image can be processed to determine the relative pose of the camera when the frame image is captured. .
  • step S103 for each frame of image, at least one feature voxel is determined from each frame image by using at least two levels of nested screening mode, wherein each level of screening adopts a voxel blocking rule corresponding to each level.
  • the reconstructed target scene is divided into a grid-like voxel block (FIG. 2 is a partial grid-like voxel block of the reconstructed target scene), and Corresponding to the corresponding position of each frame of image, each frame of image can also be divided into a single plane of voxel. Since the image acquired by the depth camera includes characteristic voxels and non-feature voxels when the target scene is reconstructed in three dimensions, for example, when a scene of a driving environment of a vehicle is to be reconstructed, pedestrians, vehicles, and the like in the image are characteristic voxels.
  • the blue sky and white clouds in the distance are non-characterized voxels. Therefore, the voxels in each frame of the acquired image are filtered to find the characteristic voxels in the three-dimensional reconstruction of the target scene.
  • the feature voxel may be composed of one voxel block or a preset number of voxel blocks.
  • the amount of operation is large.
  • at least two levels of nested filtering may be adopted by the voxel blocking rule. At least one characteristic voxel is determined in the image.
  • the voxel blocking rule may be configured to set at least two voxel units, and each level of the screening object is divided into at least two index blocks corresponding to the voxel unit according to the voxel unit corresponding to the level. The level is used to filter the index blocks.
  • the two-level nested screening method is introduced as an example in conjunction with FIG. 2, and the two-level voxel unit corresponding to the two-level nested screening is 20mm and 5mm voxel units, for example:
  • the first index block (cube 20) contains the judgment result of the feature voxel, and is selected as the feature block.
  • each feature block (cube 20) can be divided into 4 ⁇ 4.
  • ⁇ 4 second index blocks the cube 21 in Fig. 2 is a second index block divided by 5 mm voxel units).
  • the remaining levels of nested filtering will feature the features of the last time nested and filtered.
  • the object to be divided when the block is filtered by the next level is divided into a plurality of index blocks by the first-level voxel unit, and the judgment of whether or not the feature voxel is included is performed until the nested screening of the last-level voxel unit is completed.
  • All the second index blocks (cube 21) including the feature voxels are obtained as objects to be divided in the third-level screening, and are divided into a plurality of index blocks according to the third-level voxel unit, and then whether or not the feature voxels are included Judge.
  • step S104 at least one characteristic voxel of each frame image is subjected to fusion calculation according to the relative camera pose of each frame image, to obtain a grid voxel model of the target scene.
  • the relative camera pose when the frame image is acquired by the depth camera is combined with the determined at least one feature.
  • the voxel is fused to calculate the grid voxel model of the target scene.
  • Each voxel in the raster voxel model stores a distance from the surface of the target scene and weight information indicating the uncertainty of the observation.
  • the grid voxel model in this embodiment may be a TSDF model, as shown in FIG. 2, assuming that the cube 21 is a multi-level nested feature voxel, according to the formula.
  • a fusion calculation is performed for each feature voxel in each frame of the image to obtain a TSDF model of the target scene.
  • tsdf avg is the fusion result of the current feature voxel
  • tsdf i-1 is the distance from the previous feature voxel to the surface of the target scene
  • w i-1 is the weight information of the previous feature voxel
  • tsdf i is the current feature The distance from the voxel to the surface of the target scene
  • w i is the weight information of the current feature voxel.
  • the selected characteristic voxels may include a voxel block corresponding to a preset number of voxel units (eg, a characteristic voxel may be It is composed of 8 ⁇ 8 ⁇ 8 voxel blocks.
  • the voxel blocks in each feature voxel can be fused by a certain number in the fusion calculation, for example, in the feature voxels.
  • the 8 ⁇ 8 ⁇ 8 voxel block is subjected to fusion calculation according to the 2 ⁇ 2 ⁇ 2 voxel block as a fusion object (ie, a voxel).
  • the feature voxels selected in step S103 may be simultaneously calculated in parallel to improve the fusion rate of the grid voxel model of the target scene.
  • step S105 an isosurface of the grid voxel model is generated, and a three-dimensional reconstruction model of the target scene is obtained.
  • the grid voxel model of the target scene obtained in step S104 is a distance model of the feature voxel to the surface of the target scene.
  • the grid voxel model needs to be generated, etc. Value face.
  • the moving cube (Marching Cubes) algorithm can be used to generate isosurfaces (ie, generate triangular patches representing the surface of the model), trilinear interpolation for color extraction and addition, and normal vector extraction to obtain a three-dimensional reconstruction model of the target scene. .
  • the isosurface of the grid voxel model is generated.
  • the method may include: in response to determining that the current frame image obtained by acquiring the target scene is a key frame, generating an isosurface of the voxel block corresponding to the current key frame, and adding a color to the isosurface to obtain a three-dimensional reconstruction model of the target scene.
  • the key frame is set after judging the similarity of the feature points between the two frames of images acquired by the depth camera. For example, a key frame may be set for consecutive frames of high similarity.
  • the isosurface is generated, only the key frames are processed, and the isosurface of the voxel block corresponding to each key frame image is generated.
  • the model has no color information and it is not easy to recognize multiple objects in the image.
  • the target scene of the reconstruction is a scene of the driving environment of the automobile.
  • the pedestrian, the vehicle, and the road in the model for generating the isosurface are integrated, and it is impossible to distinguish which part is a pedestrian and which part is a vehicle, and therefore according to each frame of the image.
  • the color information adds color to the generated isosurface, which in turn can clearly identify multiple objects in the 3D reconstruction model of the target scene.
  • the three-dimensional reconstruction process is a real-time dynamic process.
  • the relative camera pose is determined in real time when each frame image is acquired, and the feature voxel is determined for the corresponding image, and the grid body is The prime model and the generation of its isosurfaces.
  • the embodiment provides a three-dimensional reconstruction method based on a depth camera.
  • the target camera image acquired by the depth camera is obtained, and the relative camera pose of the depth camera when acquiring the target scene image is determined, and at least two levels of nested screening are used to determine each
  • the characteristic voxels of the frame image are obtained by fusion calculation to obtain the grid voxel model of the target scene, and the isosurface of the grid voxel model is generated to obtain a three-dimensional reconstruction model of the target scene.
  • the fusion computing phase at least two levels of nested filtering are used to determine the characteristic voxels of each frame of image, without traversing by individual voxels, reducing the amount of calculation, and greatly improving the fusion speed while ensuring the reconstruction accuracy, and thus improving The efficiency of 3D reconstruction.
  • the operation amount of the three-dimensional reconstruction of the target scene is avoided, and the three-dimensional reconstruction is applied to the portable device, so that the application of the three-dimensional reconstruction is more extensive.
  • FIG. 3 is a flowchart of a method for determining a relative camera pose when a depth camera collects a target scene according to an embodiment of the present application. As shown in FIG. 3, the method includes steps S301 to S305.
  • step S301 feature extraction is performed on each frame of the image to obtain at least one feature point of each frame of the image.
  • the feature extraction of the image is to find some pixel points (ie, feature points) with landmark features in the frame image.
  • a feature point may be a corner point, a texture, or a pixel point at an edge in a frame of image.
  • Feature extraction for each frame of image may use attaining an Oriented FAST and Rotated BRIEF (ORB) algorithm to find at least one feature point in the frame image.
  • ORB Rotated BRIEF
  • step S302 the feature points between the adjacent two frames of images are matched, and the feature point correspondence between the adjacent two frames of images is obtained.
  • a fast search method may be used to compare the Hamming distance between feature points between adjacent two frames of images to obtain a feature point correspondence relationship between adjacent two frames of images.
  • H represents the Hamming distance between two feature points X1, X2, XORing the two feature points, and counting the number of 1 as the Hamming distance of a feature point between adjacent two frames of images (ie feature point correspondence).
  • step S303 the anomaly correspondence in the feature point correspondence is removed, and the J( ⁇ ) T J( ⁇ ) is calculated by the linear component including the second-order statistic of the remaining feature points and the nonlinear component including the relative camera pose.
  • the iterative calculation can be performed using the Gauss-Newton method. For example, it is possible to calculate the pose when the reprojection error is minimized.
  • r( ⁇ ) denotes a vector containing all reprojection errors
  • J( ⁇ ) is a Jacobian matrix of r( ⁇ )
  • denotes a Lie algebra relative to the camera pose
  • denotes r( ⁇ ) at each iteration Incremental value
  • R i represents the rotation matrix of the camera when the image of the ith frame is acquired
  • R j represents the rotation matrix of the camera when the image of the jth frame is acquired
  • C i,j represents a set of correspondence points of the feature points of the i-th frame image and the j-th frame image
  • -1 represents the i-th frame image
  • [] ⁇ represents the vector product
  • represents the norm of C i,j
  • the nonlinear term The expression is:
  • Represents a linear component; r il T and r jl represent nonlinear components, r il T is the lth row in the rotation matrix R i , r jl is the transpose of the lth row in the rotation matrix R j , l 0,1 2 (This embodiment starts counting from 0 based on the programming idea, that is, it means the first line of the matrix, and so on).
  • a part of the feature point correspondence between adjacent two frames of images obtained in step S302 is an abnormal correspondence.
  • feature points not included in another frame image must exist in each frame image, and if they are subjected to the matching operation in step S302, an abnormal correspondence relationship occurs.
  • determining the camera pose is a nonlinear least squares problem between two frames of images with the following cost function:
  • E represents a re-projection error of the ith frame image in the Euclidean space compared to the j-th frame image (in the present embodiment, the previous frame image);
  • T i represents the pose of the camera when the ith frame image is acquired (according to the foregoing
  • the explanation of the camera pose can be seen that the actual position refers to the pose change of the image of the i-th frame relative to the image of the previous frame)
  • T j represents the pose of the camera when the image of the j-th frame is captured
  • N represents the total frame acquired by the camera.
  • the linear component including the second-order statistic relationship of the remaining feature points is included.
  • Linear term The expression of the nonlinear term The linear part that is fixed between two frames of image when calculating As a whole, W is calculated. It does not need to be calculated according to the number of feature points. It reduces the complexity of the camera pose determination algorithm and enhances the real-time performance of the camera pose.
  • Equation (1) The derivation process of equation (1) is described below, and the principle of reducing the complexity of the algorithm is analyzed in combination with the derivation process.
  • the camera pose T i [R i /t i ] when the camera captures the ith frame image in the Euclidean space, in fact, T i refers to the image of the jth frame when the camera captures the ith frame image (this embodiment)
  • the pose transformation matrix of the middle frame image of the middle finger includes a rotation matrix R i and a translation matrix t i .
  • the rigid transformation T i in the Euclidean space is represented by the Lie algebra ⁇ i on the SE3 space, that is, ⁇ i also represents the camera pose when the camera acquires the ith frame image, and T( ⁇ i ) maps the Lie algebra ⁇ i Is the T i in the Euclidean space.
  • Represents the lth row in the rotation matrix R i ; t il represents the lth element in the translation vector t i , l 0, 1, 2.
  • I 3 ⁇ 3 represents a 3 ⁇ 3 unit matrix.
  • the four non-zero 6 ⁇ 6 submatrices are: Below As an example, the other three non-zero sub-matrices are similarly calculated and will not be described again.
  • each correspondence The Jacobian matrix is composed of geometric terms ⁇ i , ⁇ j and structure terms Decide.
  • their corresponding Jacobian matrices share the same geometric item, but have different structural items.
  • the algorithm in the related art depends on the number of feature point correspondences in C i,j , and the present embodiment can calculate the complexity with a fixed complexity.
  • the four non-zero sub-matrices can be calculated by the complexity O(1) instead of the complexity O(
  • the sparse matrices J T J and J T r required in the iterative step of nonlinear Gauss-Newton optimization for ⁇ -(J( ⁇ ) T J( ⁇ )) - 1 J( ⁇ ) T r( ⁇ )
  • the complexity O(M) can be efficiently calculated instead of the original computational complexity O(N coor ), N coor represents the total number of correspondences of all feature points of all frame pairs, and M represents the number of frame pairs.
  • O(N coor ) is approximately 300 in sparse matching and approximately 10,000 in dense matching, much larger than the number M of frame pairs.
  • step S304 it is determined whether the current frame image obtained by the acquisition target scene is a key frame, and based on the determination result that the current frame image is a key frame, step S305 is performed, and the next frame image is waited based on the determination result that the current frame image is not a key frame. Step S304 is re-executed.
  • the determining whether the current frame image obtained by the target scene is a key frame may be: performing a matching operation on the current frame image obtained by acquiring the target scene and the previous key frame image to obtain a conversion relationship matrix between the two frames of images; In the case where the relationship matrix is greater than or equal to the preset conversion threshold, it is determined that the current frame image is the current key frame.
  • a matching operation between the current frame image and the previous key frame may be performed to obtain a feature point correspondence matrix between the two frames of images. And when the matrix is greater than or equal to a preset conversion threshold, determining that the current image is the current key frame.
  • the conversion relationship matrix between the two frames of images may be a matrix composed of feature point correspondences between two frames of images.
  • the first frame image obtained by the acquisition target scene may be set as the first key frame, and the preset conversion threshold is set in advance according to the motion condition when the image is acquired by the depth camera, for example, if the camera captures the adjacent When the pose changes greatly in two frames, the preset conversion threshold is set larger.
  • step S305 loopback detection is performed according to the current key frame and the historical key frame; in response to determining that the loopback is successful, a globally consistent optimization update is performed on the determined relative camera pose according to the current key frame.
  • the globally consistent optimization update means that during the reconstruction process, with the movement of the camera, the reconstruction algorithm continuously expands the three-dimensional reconstruction model of the target scene, and when the depth camera moves to a place that has arrived or has a large overlap with the historical perspective
  • the extended 3D reconstruction model is consistent with the generated model or optimized for updating to a new model instead of staggering and aliasing.
  • Loopback detection is based on the current observation of the depth camera to determine whether the camera has moved to where it was reached or where it has a large overlap with the historical perspective, and optimizes to reduce the cumulative error.
  • the current key frame and the historical key frame pair are generated.
  • the model performs a globally consistent optimization update to reduce the error of the 3D reconstruction model; in response to determining that the loopback detection is unsuccessful, waiting for the occurrence of the next key frame, and performing loopback detection on the next key frame.
  • the loopback detection of the current key frame and the historical key frame may be performed by matching the current key frame with the feature point of the historical key frame. If the matching degree is high, the loopback is successful.
  • a globally consistent optimized update relative to the camera pose is performed, that is, according to the correspondence between the current key frame and one or more historical key frames with high matching degree, Minimizes the conversion error problem between the current keyframe of the cost function and all historical keyframes with high matching.
  • T i ⁇ SE3,i ⁇ [1,N-1]) represent all pairs of frames (any historical matching key frame and current key frame Conversion error for one frame pair; N is the number of historical key frames with high matching degree with the current key frame; E i,j represents the conversion error between the ith frame and the jth frame, and the conversion error is re-projection error.
  • the relative pose of the non-key frame and its corresponding key frame needs to be kept unchanged, and the optimized update algorithm uses the BA algorithm in the related art, and may also be used.
  • the method in step S303 is not described in detail.
  • the method for determining the relative camera pose when the depth camera captures the target scene is provided by the embodiment, extracting at least one feature point of each frame image, and performing matching operations on the feature points between the adjacent two frame images to obtain adjacent Corresponding relationship between feature points between two frames, removing the anomaly correspondence between them, calculating the relative camera pose by linear components including the corresponding relationship of the remaining feature points and nonlinear components including relative camera poses, and determining the key frame. If the currently acquired image is a key frame and the loopback detection is successful, the globally consistent optimization update of the determined relative camera pose is performed according to the current key frame and the historical key frame. While ensuring global consistency, the amount of computation in 3D reconstruction is reduced, and the application of 3D reconstruction to portable devices is realized, making the application of 3D reconstruction more extensive.
  • This embodiment is based on the foregoing embodiment, and the method for determining at least one feature voxel from each frame image by using at least two levels of nested screening mode for each frame of image in S103 is explained.
  • the method for determining at least one characteristic voxel from the image of FIG. 4 is schematically illustrated below with reference to the planar schematic diagram of determining at least one characteristic voxel of FIG. 5, the method comprising steps S401 to S406.
  • step S401 for each frame of image, the image is used as a current level screening object, and the current level voxel unit is determined.
  • the voxel unit represents the accuracy of the constructed 3D reconstruction model, which is set in advance according to the accuracy of the 3D reconstruction model of the target scene reconstructed. For example, it may be 5 mm, 10 mm or the like. Since the embodiment determines at least one feature voxel from each frame image by using at least two levels of nested screening, at least two voxel units are set, wherein the minimum voxel unit is the precision required to reconstruct the model. Firstly, the collected image is used as the current screening object, and the characteristic voxel is filtered. The current voxel unit is the largest voxel unit in the preset multi-level voxel unit.
  • FIG. 5 it is assumed that a real-time three-dimensional reconstruction of a CPU-based 100 Hz frame rate and a 5 mm voxel-level accuracy model is implemented, and two-level nesting is performed in a voxel unit of 20 mm and a voxel unit of 5 mm, respectively. Screen for characteristic voxels. At this time, the collected image is used as the current screening object, and the current voxel unit is a 20 mm voxel unit.
  • step S402 the current level screening object is divided into voxel blocks according to the current level voxel unit, and at least one current index block is determined according to the voxel block; wherein the current index block includes a preset number of voxel blocks.
  • At least one index block may be determined according to a preset number of voxel blocks divided by the current voxel unit, and the characteristic voxel is filtered according to the index block.
  • the method improves the screening rate compared to the screening of voxel blocks directly divided by the current level of voxel units. It should be noted that the feature voxel size at this time is not the size of one voxel block, but a preset number of voxel block sizes.
  • the current index block is composed of a preset number of 8 ⁇ 8 ⁇ 8 voxel blocks, and the acquired image is divided into a plurality of side lengths of 20 mm according to a voxel unit of 20 mm.
  • the voxel block is divided into a plurality of index blocks having a side length of 160 mm corresponding to at least one 20 mm voxel unit according to the number of 8 ⁇ 8 ⁇ 8, and is mapped to a plane diagram.
  • the entire image is divided into 6 index blocks of 160 mm side length corresponding to 20 mm voxel units according to the 8 ⁇ 8 box.
  • step S403 at least one feature block is selected in all current index blocks, and the distance from the at least one feature block to the target scene surface is smaller than the current level voxel unit corresponding distance threshold.
  • the distance from all the current index blocks determined in S402 to the surface of the target scene is calculated.
  • the index block is selected as the feature block.
  • the distance threshold corresponding to the upper voxel unit is greater than the distance threshold corresponding to the next voxel unit.
  • At least one feature block is selected in all current index blocks, and the distance from the at least one feature block to the surface of the target scene is smaller than the distance threshold corresponding to the current voxel unit, which may be: for each current index block, according to the current The hash value of the index block accesses the index block, and calculates the distance from all the vertices of the current index block to the surface of the target scene according to the relative camera pose and the image depth value obtained by the depth camera when acquiring each frame image; The distance of the surface of the target scene is smaller than the current index block of the corresponding distance voxel unit corresponding to the distance threshold as the feature block.
  • a hash value may be set for each current index block, and each index block is accessed by a hash value, each index block having a plurality of vertices.
  • the index block When the distance from the vertices of the index block to the target scene surface is smaller than the distance threshold corresponding to the current voxel unit, the index block is set as a feature block; if it is greater than or equal to the distance corresponding to the current voxel unit, the The index block is removed.
  • an average value of the distance from all the vertices of the index block to the target scene surface may be calculated. If the average value is smaller than the distance threshold corresponding to the current voxel unit, the index block is set as the feature block.
  • the slanted square with a side length of 160 mm is a 20 mm voxel unit to be removed, that is, the part of the index block to the target scene surface distance greater than 20 mm voxel unit corresponding Distance threshold.
  • step S404 it is determined whether the feature block satisfies the division condition of the minimum voxel unit, and based on the determination result that the feature block satisfies the division condition of the minimum voxel unit, step S405 is performed, based on the feature block not satisfying the minimum voxel unit The result of the determination of the division condition is executed in step S406.
  • step S403 The determining whether the feature block satisfies the division condition of the minimum voxel unit, that is, determining whether the feature block selected in step S403 is the feature block selected after the preset minimum voxel unit division.
  • the feature block selected in step S403 is a feature block with a side length of 160 mm divided by a 20 mm voxel unit and a minimum voxel unit is a voxel unit of 5 mm, step S403 is illustrated.
  • the selected feature block does not satisfy the division condition of the minimum level 5mm voxel unit, and step S406 is performed to perform the screening of the next level 5mm voxel unit; if the feature block selected in step S403 is 5 mm voxel unit, the side length is The 40 mm feature block indicates that the feature block selected in step S403 satisfies the division condition of the minimum level of 5 mm voxel units, and step S405 is performed as the feature voxel.
  • step S405 the feature block is taken as a feature voxel.
  • step S406 all the feature blocks determined by the current level screening object are replaced with new current level screening objects, and the next level voxel unit is selected to be replaced with the new current level voxel unit, and the process returns to step S402.
  • step S403 when the selection condition that the feature block does not satisfy the minimum voxel unit is selected in step S403, all the feature blocks selected in step S403 are used as the new current level screening object, and the next-level voxel unit is selected as the current level.
  • the voxel unit is returned to step S402 to perform the screening of the feature block again.
  • the feature block selected in step S403 is a feature block with a side length of 160 mm divided by a 20 mm voxel unit
  • the feature length of the minimum 5 mm voxel unit is 40 mm.
  • all the feature blocks with a side length of 160 mm divided by 20 mm voxel units are used as the current level screening object, and the next level 5 mm voxel unit is selected as the current level voxel unit, and the process returns to step S402, and the step S403 is filtered out.
  • All feature blocks with a side length of 160 mm are divided into a plurality of voxel blocks with a side length of 5 mm according to a voxel unit of 5 mm, and the plurality of voxel blocks having a side length of 5 mm are divided into 8 ⁇ 8 ⁇ 8
  • the number is divided into at least one 5mm voxel unit corresponding to the index block with a side length of 40mm, and the mapping to the plane schematic is to divide the whole image into 5mm voxel units corresponding to 32 sides according to the 8 ⁇ 8 box. 40mm index block, and then step S403 and step S404 are performed.
  • the obtained feature block with a side length of 40 mm (in the figure, the blank square corresponding to the side length of 40 mm) is selected as the minimum level 5 mm voxel unit.
  • Feature block that is, the feature block is a selected feature voxel,
  • the dot-shaped square with a side length of 40 mm is an index block to be removed divided by 5 mm voxel units.
  • the method for determining at least one feature voxel from an image determines at least one feature voxel from each frame image by using at least two levels of nested filtering manner for each frame of image.
  • the operation amount of the three-dimensional reconstruction of the target scene is avoided, and the three-dimensional reconstruction is applied to the portable device, so that the application of the three-dimensional reconstruction is more extensive.
  • This embodiment provides an example embodiment of three-dimensional reconstruction based on a depth camera based on the above embodiment. As shown in FIG. 6, the method includes steps S601 to S6011.
  • step S601 at least two frames of images obtained by the depth camera acquiring the target scene are acquired.
  • step S602 based on at least two frames of images, a relative camera pose when the depth camera captures the target scene is determined.
  • step S603 it is determined whether the current frame image obtained by the acquisition target scene is a key frame, and based on the determination result that the current frame image is a key frame, the key frame is stored and step S604 is performed, based on the determination result that the current frame image is not a key frame, Waiting for the next frame of image to re-execute step S603.
  • the frame image For each frame of image captured by the camera, it can be judged whether the frame image is a key frame, and the determined key frame is stored to generate an isosurface according to the key frame rate and used as a historical key frame in subsequent loopback optimization. It should be noted that the first frame captured by the camera defaults as a key frame.
  • step S604 loopback detection is performed according to the current key frame and the historical key frame, and in response to determining that the loopback is successful, step S608 is performed (to perform raster voxel model and optimization update of the isosurface) and step S6011 (for relative camera) Optimized update of pose).
  • step S605 for each frame of image, at least one feature voxel is determined from each frame image by using at least two levels of nested screening mode, wherein each level of screening adopts a voxel blocking rule corresponding to each level of screening.
  • step S606 at least one feature voxel of each frame image is subjected to fusion calculation according to the relative camera pose of each frame image, and a grid voxel model of the target scene is obtained.
  • step S607 an isosurface of the grid voxel model is generated to obtain a three-dimensional reconstruction model of the target scene.
  • step S608 a first preset number of matching key frames matching the current key frame are selected in the historical key frame, and second presets are respectively obtained in the non-key frames corresponding to each selected matching key frame. Number of non-key frames.
  • the first preset number of matching key frames matching the current key frame are selected in the historical key frame
  • the current key frame and the historical key frame may be matched, for example, the Hamming distance between the feature points between the current key frame and the historical key frame may be calculated to complete the matching between the current key frame and the historical key frame.
  • the first preset number of historical key frames with high matching degree with the current key frame are selected, for example, 10 historical key frames with high matching degree with the current key frame are selected.
  • Each key frame has a non-key frame corresponding thereto, and for each selected historical key frame with high matching degree, a second preset number of non-key frames are also selected in the corresponding non-key frame.
  • a second preset number of non-key frames are also selected in the corresponding non-key frame.
  • no more than 11 non-key frames can be averaged and decentralized in all non-key frames corresponding to the historical key frame to improve the optimization update efficiency and make the optimized frame selection more representative.
  • the first preset number and the second preset number may be set in advance according to the need when updating the three-dimensional reconstruction model.
  • step S609 the grid voxel model of the three-dimensional reconstruction model is optimized and updated according to the correspondence between the current key frame and each matching key frame and the acquired non-key frame.
  • the optimization and updating of the grid voxel model of the 3D reconstruction model is divided into the updating of the feature voxels and the updating of the grid voxel model of the target scene.
  • the feature voxel when the feature voxel is updated, considering that the depth of view overlaps when the depth camera captures two adjacent frames, the feature voxels selected by the adjacent two frames are almost identical, and each frame is It takes a long time for the image to perform the optimization update of the feature voxels. Therefore, when updating the feature voxels, only the matching historical key frames are re-executed in step S605 to complete the optimized updating of the feature voxels.
  • step S606 Since the grid voxel model of the target scene generated in step S606 is generated after processing each frame image, when the grid voxel model of the target scene is updated, the historical key frame with high matching degree and its corresponding The non-key frame is optimized and updated, that is, the first preset number of historical key frames and each historical key frame with high matching degree with the current key frame selected in step S608 are obtained when each key frame arrives. Corresponding to the second preset number of non-key frames, the corresponding fusion data is removed, and step S606 is performed again to perform the fusion calculation, and the optimized updating of the grid voxel model of the target scene is completed.
  • a voxel block can be used as a fusion object for fusion calculation.
  • a preset number of voxel blocks may be used as a fusion object for fusion calculation, for example, a voxel having a size of 2 ⁇ 2 ⁇ 2 voxel blocks.
  • step S610 the isosurface of the three-dimensional reconstruction model is optimized and updated according to the correspondence between the current key frame and each matching key frame.
  • step S607 may be re-executed only for the historical key selected in step S608 with high matching degree with the current key frame. Perform an update to match the isosurface of the keyframe.
  • the optimization of the isosurface of the 3D reconstruction model may be: selecting, for each matching key frame, at least one voxel block in the plurality of voxel blocks corresponding to the current key frame.
  • the distance from the at least one voxel block to the surface of the target scene is less than or equal to the update threshold of the corresponding voxel in the matching key frame; and the isosurface of each matching key frame is optimized and updated according to the selected at least one voxel block.
  • the update threshold may be that, when the isosurface of the grid voxel model is generated in step S607, the voxel block in the voxel is selected to the target scene surface for each voxel in the key frame used to generate the isosurface.
  • the distances of all voxel blocks of the current key frame to the surface of the target scene may be calculated, and then, for each matching key frame, determining the image of the two frames according to the correspondence between the current key frame and the matching key frame.
  • the voxel correspondence Finding a voxel corresponding to the current voxel in the current key frame according to the voxel correspondence, determining a corresponding update threshold, and then selecting at least one voxel block in the plurality of voxel blocks of the current voxel.
  • the distance from the at least one voxel block to the surface of the target scene is less than or equal to the update threshold.
  • the above selection operation is performed on each voxel in the current key frame one by one, the voxel block filtering is completed, and the isosurface optimization is updated according to the selected voxel block, and the process of obtaining the isosurface is similar to step S607, no longer Narration.
  • a voxel block whose distance is greater than the update threshold is a voxel block to be ignored, and no operation is performed on it. As a result, some voxel blocks are filtered, which can increase the calculation speed.
  • a hash value is searched for in the hash table, and the voxel table may be used to search for a plurality of adjacent voxel blocks in the hash table.
  • the value is processed.
  • step S6011 a globally consistent optimized update is performed on the determined relative camera pose based on the current key frame. Update relative camera poses to use when updating the corresponding raster voxel model.
  • the target scene image acquisition may be performed in step S601, and the determination of the relative position of the camera in step S602 and the determination of the key frame in step S603 may be performed in real time, that is, the image is acquired while the image is being acquired.
  • Calculation of posture and key frame judgment The process of generating the three-dimensional reconstruction model of the target scene in steps S605 to S607 and the updating of the generated three-dimensional reconstruction model from step S608 to step S610 are also performed simultaneously, that is, the completed part is completed in the process of generating the three-dimensional reconstruction model. Optimized update of the model.
  • the embodiment provides a three-dimensional reconstruction method based on a depth camera.
  • the target camera image acquired by the depth camera is obtained, and the relative camera pose of the depth camera when acquiring the target scene image is determined, and at least two levels of nested screening are used to determine each
  • the characteristic voxels of the frame image are obtained by fusion calculation to obtain the grid voxel model of the target scene, and the isosurface of the grid voxel model is generated, and the three-dimensional reconstruction model of the target scene is obtained, and according to the current key frame and multiple matching keys.
  • the frame and a plurality of non-key frames matching the key frames are optimized and updated to ensure the global consistency of the model.
  • the operation amount of the three-dimensional reconstruction of the target scene is avoided, and the three-dimensional reconstruction is applied to the portable device, so that the application of the three-dimensional reconstruction is more extensive.
  • FIG. 7 is a structural block diagram of a depth camera-based three-dimensional reconstruction apparatus according to an embodiment of the present disclosure.
  • the apparatus may perform a depth camera-based three-dimensional reconstruction method according to any embodiment of the present application, and have a function module corresponding to the execution method.
  • the device can be implemented on a CPU basis.
  • the apparatus includes an image acquisition module 701, a pose determination module 702, a voxel determination module 703, a model generation module 704, and a three-dimensional reconstruction module 705.
  • the image obtaining module 701 is configured to acquire at least two frames of images obtained by the depth camera acquiring the target scene.
  • the pose determining module 702 is configured to determine a relative camera pose when the depth camera captures the target scene according to the at least two frames of images.
  • the voxel determining module 703 is configured to determine, according to each frame image, at least one feature voxel from each frame image by using at least two levels of nested screening manner, wherein each level of screening adopts a voxel blocking rule corresponding to each level of screening. .
  • the model generation module 704 is configured to perform fusion calculation on at least one characteristic voxel of each frame image according to a relative camera pose of each frame image to obtain a grid voxel model of the target scene.
  • the three-dimensional reconstruction module 705 is configured to generate an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.
  • the three-dimensional reconstruction module 705 is configured to generate an isosurface of the voxel block corresponding to the current key frame in response to determining that the current frame image obtained by acquiring the target scene is a key frame, and add a color to the isosurface. A three-dimensional reconstruction model of the target scene is obtained.
  • the embodiment provides a depth camera-based three-dimensional reconstruction device, which acquires a target scene image acquired by a depth camera, determines a camera pose when the depth camera captures the target scene image, and determines each frame by using at least two levels of nested screening.
  • the characteristic voxels of the image are obtained by fusion calculation to obtain the grid voxel model of the target scene, and the isosurface of the grid voxel model is generated to obtain the 3D reconstruction model of the target scene.
  • the operation amount of the three-dimensional reconstruction of the target scene is avoided, and the three-dimensional reconstruction is applied to the portable device, so that the application of the three-dimensional reconstruction is more extensive.
  • the pose determination module 702 includes a feature point extraction unit, a matching operation unit, and a pose determination unit.
  • the feature point extracting unit is configured to perform feature extraction on each frame of the image to obtain at least one feature point of each frame of the image.
  • the matching operation unit is configured to perform matching operations on feature points between adjacent two frames of images to obtain a feature point correspondence relationship between adjacent two frames of images.
  • the pose determining unit is configured to remove the abnormal correspondence in the correspondence of the feature points, and calculate J( ⁇ ) T J by a linear component including a second-order statistic of the remaining feature points and a nonlinear component including a relative camera pose.
  • r( ⁇ ) denotes a vector containing all reprojection errors
  • J( ⁇ ) is a Jacobian matrix of r( ⁇ )
  • denotes a Lie algebra relative to the camera pose
  • denotes r( ⁇ ) at each iteration Incremental value
  • R i represents the rotation matrix of the camera when the image of the ith frame is acquired
  • R j represents the rotation matrix of the camera when the image of the jth frame is acquired
  • C i,j represents a set of correspondence points of the feature points of the i-th frame image and the j-th frame image
  • -1 represents the i-th frame image
  • [] ⁇ represents the vector product
  • represents the norm of C i,j
  • the nonlinear term The expression is:
  • Represents a linear component; r il T and r jl represent nonlinear components, r il T is the lth row in the rotation matrix R i , r jl is the transpose of the lth row in the rotation matrix R j , l 0,1 ,2.
  • the apparatus further includes a key frame determining module, a loopback detecting module, and a pose updating module.
  • the key frame determining module is configured to perform a matching operation between the current frame image obtained by the acquisition target scene and the previous key frame image to obtain a conversion relationship matrix between the two frame images; if the conversion relationship matrix is greater than or equal to the preset conversion threshold, Determine the current frame image as the current key frame.
  • the loopback detection module is configured to perform loopback detection according to the current key frame and the historical key frame in response to determining that the current frame image obtained by acquiring the target scene is a key frame.
  • the pose update module is configured to, in response to determining the loopback success, perform a globally consistent optimized update on the determined relative camera pose based on the current keyframe.
  • the voxel determining module 703 includes an initial determining unit, an index block determining unit, a feature block selecting unit, a feature voxel determining unit, and a loop unit.
  • the initial determination unit is configured to, for each frame of the image, use the image as the current level screening object and determine the current level voxel unit.
  • the index block determining unit is configured to divide the current level screening object into voxel blocks according to a current level voxel unit, and determine at least one current index block according to the voxel block; wherein the current index block includes a preset number of voxel blocks.
  • the feature block selecting unit is configured to select at least one feature block among all the current index blocks, and the distance from the at least one feature block to the surface of the target scene is smaller than a distance threshold corresponding to the current level voxel unit.
  • the feature voxel determining unit is configured to use the feature block as a feature voxel if the feature block satisfies the division condition of the minimum voxel unit.
  • the loop unit is set to replace all the feature blocks determined by the current level filter object with the new current level filter object if the feature block does not satisfy the division condition of the minimum level voxel unit, and select the next level voxel unit to be replaced with new
  • the current level voxel unit returns the voxel block division operation for the current level of the filtered object; wherein the voxel unit is reduced step by step to the smallest voxel unit.
  • the feature block selection unit is configured to: for each current index block, access the index block according to the hash value of the current index block, according to the relative camera pose and the depth camera acquired when acquiring each frame image.
  • the image depth value is used to calculate the distance from all the vertices of the current index block to the surface of the target scene; the current index block whose distance from the vertices to the target scene surface is smaller than the distance threshold corresponding to the current voxel unit is selected as the feature block.
  • the apparatus further includes a matching frame determining module, a model updating module, and an isosurface updating module.
  • the matching frame determining module is configured to: in response to determining that the current frame image obtained by acquiring the target scene is a key frame, select, in the historical key frame, a first preset number of matching key frames that match the current key frame, and respectively select the selected A second preset number of non-key frames are obtained in each non-key frame corresponding to the matching key frame.
  • the model updating module is configured to optimize and update the grid voxel model of the three-dimensional reconstruction model according to the correspondence between the current key frame and each matching key frame and the acquired non-key frame.
  • the isosurface update module is configured to optimize and update the isosurface of the three-dimensional reconstruction model according to the correspondence between the current key frame and each matching key frame.
  • the isosurface update module is configured to select, for each matching key frame, at least one voxel block in the plurality of voxel blocks corresponding to the current key frame, and a distance from the at least one voxel block to the surface of the target scene.
  • the update threshold is less than or equal to the corresponding voxel in the matching key frame; and the isosurface of the matching key frame is optimized and updated according to the selected at least one voxel block.
  • the three-dimensional reconstruction module 705 is configured to generate an isosurface of the voxel block corresponding to the current key frame image, and further set each body in the key frame used for generating the isosurface to select all the voxels in the voxel.
  • the maximum value of the distance from the prime block to the surface of the target scene, and the maximum value is set as the update threshold of the voxel.
  • FIG. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • the electronic device includes a storage device 80, one or more processors 81, and at least one depth camera 82.
  • 80, the processor 81 and the depth camera 82 may be connected by a bus or other means, as shown in FIG. 8 by a bus connection.
  • the storage device 80 is configured as a computer readable storage medium, and is configured to store a software program, a computer executable program, and a module, such as a module corresponding to the depth camera based 3D reconstruction device in the embodiment of the present application (for example, set to depth based) An image acquisition module 701) in the three-dimensional reconstruction device of the camera.
  • the processor 81 performs various function applications and data processing of the electronic device device by processing software programs, instructions, and modules stored in the storage device 80, that is, implementing the above-described depth camera-based three-dimensional reconstruction method.
  • processor 81 may be a central processing unit or a high performance graphics processor.
  • the storage device 80 may mainly include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the terminal, and the like.
  • storage device 80 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • storage device 80 can include storage devices remotely disposed relative to processor 81, which can be connected to the device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the depth camera 82 can be configured to perform image acquisition of the target scene under the control of the processor 81.
  • the depth camera can be embedded in the electronic device.
  • the electronic device can be a portable mobile electronic device.
  • the electronic device can be a smart terminal (mobile phone, tablet) or a three-dimensional visual interaction device (virtual Virtual Reality (VR) glasses and wearable helmets can be used for image capture under moving, rotating, etc.
  • VR virtual Virtual Reality
  • An electronic device provided by this embodiment may be configured to perform the depth camera-based three-dimensional reconstruction method provided by any of the foregoing embodiments, and have corresponding functions.
  • the embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, which is implemented by the processor to implement the depth camera based three-dimensional reconstruction method of the above embodiment.
  • the computer storage medium of the embodiments of the present application may employ any combination of one or more computer readable mediums.
  • the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
  • the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above.
  • a computer readable storage medium can be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
  • a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, optical cable, radio frequency (RF), and the like, or any suitable combination of the foregoing.
  • suitable medium including but not limited to wireless, wire, optical cable, radio frequency (RF), and the like, or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, or a combination thereof, including an object oriented programming language, such as Java, Smalltalk, C++, and also conventional. Procedural programming language—such as the "C" language or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer, partly on the remote computer, or entirely on the remote computer or electronic device.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or can be connected to an external computer (eg Use an Internet service provider to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • the depth camera-based three-dimensional reconstruction scheme provided by the embodiment of the present application adopts a coarse-to-fine nested screening strategy and a sparse sampling idea to select feature voxels in the fusion computing stage, and ensures reconstruction accuracy.
  • the fusion speed is greatly improved; the generation of the isosurface at the key frame rate can improve the generation speed of the isosurface; and improve the efficiency of the three-dimensional reconstruction.
  • the global consistency of 3D reconstruction can be effectively guaranteed by optimizing the update phase.
  • modules or operations of the embodiments of the present application may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • they may be implemented by program code executable by a computer device such that they may be stored in a storage device for execution by a computing device, or they may be separately fabricated into an integrated circuit module, or a plurality of modules thereof or The operation is made into a single integrated circuit module.
  • the application is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了一种基于深度相机的三维重建方法、装置、设备及存储介质,其中,该方法包括:获取深度相机对目标场景进行采集得到的至少两帧图像;根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿;针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则;依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型;生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型。

Description

基于深度相机的三维重建方法、装置、设备及存储介质
本申请要求在2018年03月05日提交中国专利局、申请号为201810179264.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及图像处理技术领域,例如涉及一种基于深度相机的三维重建方法、装置、设备及存储介质。
背景技术
三维重建是通过特定的装置及算法对现实世界中的三维物体的数学模型进行重新构建,对于虚拟现实、增强现实、机器人感知、人机交互及机器人路径规划等具有极其重要的意义。
目前的三维重建方法中,为保证重建结果的质量、一致性及实时性,通常需要由高性能的图形处理器(Graphics Processing Unit,GPU)和深度相机(RGB-D相机)来完成。首先利用深度相机对目标场景进行拍摄,获得至少两帧图像;利用GPU对每帧图像进行求解,以获取拍摄每帧图像时深度相机的相对相机位姿;依据每帧图像对应的相对相机位姿,遍历该帧图像中的所有体素,以确定满足一定条件的体素作为候选体素;进而依据每帧图像中的候选体素来构建该帧图像的截断符号距离函数(Truncated Signed Distance Function,TSDF)模型;最后在TSDF模型的基础上,对每帧图像生成等值面,从而能完成对目标场景的实时重建。
但是相关技术中的三维重建方法运算量较大,对专用于图像处理的GPU依赖性很强。而GPU无法便携化,难以应用于移动机器人、便携化设备及可穿戴设备(如增强现实头显设备Microsoft HoloLens)等。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供一种基于深度相机的三维重建方法、装置、设备及存储介质,避免了对目标场景进行三维重建时运算量大的情况,实现了将三维重建应用于便携化的设备中,使得三维重建的应用更加广泛。
第一方面,本申请实施例提供了一种基于深度相机的三维重建方法,该方法包括:获取深度相机对目标场景进行采集得到的至少两帧图像;根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿;针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则;依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型;生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型。
第二方面,本申请实施例还提供了一种基于深度相机的三维重建装置,该装置包括:图像获取模块,设置为获取深度相机对目标场景进行采集得到的至少两帧图像;位姿确定模块,设置为根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿;体素确定模块,设置为针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一 个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则;模型生成模块,设置为依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型;三维重建模块,设置为生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型。
第三方面,本申请实施例还提供了一种电子设备,包括:一个或多个处理器;存储装置,设置为存储一个或多个程序;至少一个深度相机,设置为对目标场景进行图像采集;当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本申请任意实施例所述的基于深度相机的三维重建方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请任意实施例所述的基于深度相机的三维重建方法。
在阅读并理解了附图和详细描述后,可以明白其他方面。
附图说明
为了更加清楚地说明本申请示例性实施例的技术方案,下面对描述实施例中所需要用到的附图做一简单介绍。所介绍的附图只是本申请所要描述的一部分实施例的附图,而不是全部的附图,对于本领域普通技术人员,在不付出创造性劳动的前提下,还可以根据这些附图得到其他的附图。
图1是本申请实施例提供的一种基于深度相机的三维重建方法的流程图;
图2是本申请实施例提供的两级嵌套筛选方式的立方体示意图;
图3是本申请实施例提供的确定深度相机对目标场景进行采集时的相对相机位姿的方法流程图;
图4是本申请实施例提供的从图像中确定至少一个特征体素的方法流程图;
图5是本申请实施例提供的确定至少一个特征体素的平面示意图;
图6是本申请另一实施例提供的一种基于深度相机的三维重建方法的流程图;
图7是本申请实施例提供的一种基于深度相机的三维重建装置的结构框图;
图8是本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
图1为本申请实施例提供的一种基于深度相机的三维重建方法的流程图,本实施例可适用于基于深度相机对目标场景进行三维重建的情况,该方法可以由基于深度相机的三维重建装置或电子设备来执行,该装置可采用硬件和/或软件的方式实现,下面结合图2的两级嵌套筛选方式的立方体示意图对图1的基于深度相机的三维重建方法进行示意说明,该方法包括步骤S101至步骤S105。
在步骤S101中,获取深度相机对目标场景进行采集得到的至少两帧图像。
其中,深度相机与传统相机不同之处在于该相机可同时拍摄景物的图像信息及其对应的 深度信息,其设计原理是针对待测目标场景发射一参考光束,由计算回光的时间差或相位差,来换算被拍摄景物的距离,以产生深度信息,此外,再结合传统的相机拍摄,以获取图像信息。而目标场景是指待进行三维重建的场景,例如,自动驾驶的汽车在公路上行驶时,目标场景为该汽车的行驶环境场景,通过深度相机实时采集该汽车的行驶环境图像。在一实施例中,为了能够准确的对目标场景进行三维重建,要获取深度相机采集到的至少两帧图像进行处理,且获取的帧数越多,重建的目标场景模型就越准确。获取深度相机采集的图像的方法有很多,例如,可以是通过串口、网线等有线的方式进行获取,可以通过蓝牙、无线宽带等无线的方式进行获取。
在步骤S102中,根据至少两帧图像,确定深度相机对目标场景进行采集时的相对相机位姿。
其中,相机的位姿是指相机的位置和姿态,在一实施例中,位置代表相机的平移距离(如相机在X、Y、Z三个方向的平移变换),姿态代表相机的旋转角度(如相机在X、Y、Z三个方向上的角度变换α、β、γ)。
由于深度相机的视场角是固定的,拍摄的角度也是固定的,因此为了准确进行目标场景的三维重建,要改变深度相机的位姿,从不同的位置和角度进行拍摄,才能够精准的重建目标场景。因此,拍摄每帧图像时深度相机的相对位置和姿态都是不一样的,可以通过深度相机的相对位姿来表示,例如,深度相机可以按照一定的轨迹自动进行位置和姿态的变换,也可以是人工转动、移动深度相机进行拍摄。所以,要对采集每帧图像时的相对相机位姿进行确定,准确的将该帧图像重建到目标场景对应的位置。
在一实施例中,确定深度相机位姿的方法有很多,例如,可以通过在深度相机上安装测量平移距离和旋转角度的传感器,直接获取相机的位姿。由于深度相机在采集相邻两帧图像时相对位姿变化不大,为了更准确的获取相对相机位姿,可以通过对采集的图像进行处理,从而确定该相机采集该帧图像时的相对位姿。
在步骤S103中,针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级对应的体素分块规则。
其中,本申请实施例在进行目标场景的三维重建时是将重建的目标场景分成一个个栅格状的体素块(图2为重建的目标场景的部分栅格状体素块),将其对应到每帧图像的相应位置可将每帧图像也分为一个个平面的体素格。由于深度相机采集到的图像中包含了对目标场景进行三维重建时的特征体素和非特征体素,例如,要进行汽车行驶环境场景重建时,图像中的行人,车辆等为特征体素,而远处的蓝天白云为非特征体素。因此,要对采集的每帧图像中的体素进行筛选,找到目标场景三维重建时的特征体素。特征体素可以由一个体素块构成,也可以是预设个数的体素块构成。
如果对每帧图像中的体素格一个个的进行是否是特征体素的判断,运算量较大,在一实施例中,可以通过体素分块规则采用至少两级嵌套筛选的方式从图像中确定至少一个特征体素。在一实施例中,体素分块规则可以是设置至少两级体素单位,将每级筛选对象按照该级对应的体素单位划分为至少两个该级体素单位对应的索引块,逐级进行索引块的筛选。
示例性的,结合图2以两级嵌套筛选的方式为例进行介绍,假设两级嵌套筛选对应的两级体素单位分别为20mm和5mm的体素单位,例如:
(1)将一帧图像对应的目标场景栅格体素按20mm体素单位划分为多个第一索引块(图 2中的立方体20即为20mm体素单位划分后的一个第一索引块)。
(2)对划分后的所有第一索引块进行一级筛选,判断其中是否包含特征体素,基于第一索引块(立方体20)中不包含特征体素的判断结果,将其移除,基于第一索引块(立方体20)中包含特征体素的判断结果,将其选为特征块。
(3)假设图2中的立方体20中包含特征体素,则对选出的特征块(立方体20)再按照5mm体素单位进行划分,每个特征块(立方体20)可以划分为4×4×4个第二索引块(图2中的立方体21即为5mm体素单位划分后的一个第二索引块)。
(4)对划分后的所有第二索引块(立方体21)进行二级筛选,判断其中是否包含特征体素,基于第二索引块(立方体21)中不包含特征体素的判断结果,将其移除,基于第二索引块(立方体21)中包含特征体素的判断结果,将其选为特征体素。
在为多级嵌套筛选的情况下,除第一次将整帧图像划分为多个索引块进行筛选,剩余的几级嵌套筛选均将上一次嵌套筛选出的包含特征体素的特征块所为下一级筛选时待划分的对象,按下一级体素单位划分为多个索引块,进行是否包含特征体素的判断,直到完成最后一级体素单位的嵌套筛选为止。例如,在进行三级嵌套筛选的情况下,执行完上述二级筛选操作后,由于还没有进行第三级体素单位的筛选,因此需要再将上述二级嵌套筛选第(4)步得出的包含特征体素的所有第二索引块(立方体21)作为第三级筛选时待划分的对象,按照第三级体素单位划分为多个索引块,再进行是否包含特征体素的判断。
在步骤S104中,依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型。
其中,步骤S103中确定出图像对应的至少一个特征体素之后,要得到目标场景的栅格体素模型,就要结合深度相机采集该帧图像时的相对相机位姿,对确定的至少一个特征体素进行融合计算得到目标场景的栅格体素模型。该栅格体素模型中的每一个体素中都存储有距离目标场景表面的距离以及表示观测不确定度的权值信息。
在一实施例中,本实施例中的栅格体素模型可以是TSDF模型,如图2所示,假设立方块21为多级嵌套筛选出的特征体素,按照公式
Figure PCTCN2019084820-appb-000001
对每帧图像中的每个特征体素进行融合计算,从而得到目标场景的TSDF模型。其中,tsdf avg为当前特征体素的融合结果,tsdf i-1为前一特征体素到目标场景表面的距离,w i-1为前一特征体素的权值信息,tsdf i为当前特征体素到目标场景表面的距离,w i为当前特征体素的权值信息。
在一实施例中,在步骤S103筛选特征体素时,为了提高筛选速率,筛选出的特征体素中可能包括预设个数的体素单位对应的体素块(如一个特征体素可以是由8×8×8个体素块构成的),此时在进行融合计算时可将每个特征体素中的体素块按照一定的个数进行融合计算,例如,可以是对特征体素中的8×8×8个体素块按照2×2×2个体素块作为一个融合对象(即一个体元)进行融合计算。
在一实施例中,可以并行同时对步骤S103中选出的特征体素进行融合计算,提高目标场景的栅格体素模型的融合速率。
在步骤S105中,生成栅格体素模型的等值面,得到目标场景的三维重建模型。
其中,步骤S104中得到的目标场景的栅格体素模型是特征体素到目标场景表面的距离模型,要得到目标场景的三维重建模型,还需要在栅格体素模型的基础上,生成等值面。例如, 可以利用移动立方体(Marching Cubes)算法,进行等值面生成(即生成表示模型表面的三角面片)、三线性插值进行颜色提取与添加以及法向量提取,进而得到目标场景的三维重建模型。
深度相机在进行目标场景的图像采集时,相邻两帧图像中大部分的场景是重合的,为了提高三维重建模型的生成速率,在一实施例中,生成栅格体素模型的等值面可以包括:响应于确定采集目标场景得到的当前帧图像为关键帧,生成当前关键帧对应的体素块的等值面,并对等值面添加颜色,得到目标场景的三维重建模型。
其中,关键帧是对深度相机采集到的两帧图像之间的特征点相似度进行判断处理后设置的。例如可以为连续的相似度高的几帧图像设置一个关键帧,在进行等值面生成时,只对关键帧进行处理,生成每个关键帧图像对应体素块的等值面,此时得到的模型没有颜色信息,不易识别出图像中多个对象。例如重建的目标场景为汽车行驶环境的场景,此时生成等值面的模型中行人、车辆、公路是一体的,无法区分哪部分是行人,哪部分是车辆,因此还要根据每帧图像中的颜色信息,为生成的等值面添加颜色,进而能够清楚的识别目标场景的三维重建模型中多个对象。
需要说明的是,三维重建过程是一个实时动态的过程,随着相机对图像的采集,实时确定采集每帧图像时的相对相机位姿,并针对相应图像进行特征体素的确定、栅格体素模型及其等值面的生成。
本实施例提供了一种基于深度相机的三维重建方法,通过获取深度相机采集的目标场景图像,确定深度相机在采集目标场景图像时的相对相机位姿,采用至少两级嵌套筛选方式确定每帧图像的特征体素,并进行融合计算得到目标场景的栅格体素模型,生成栅格体素模型的等值面,得到目标场景的三维重建模型。在融合计算阶段,采用至少两级嵌套筛选方式确定每帧图像的特征体素,无需逐个体素进行遍历,减少计算量,在保证重建精度的同时,极大地提升了融合速度,进而可以提升三维重建的效率。避免了对目标场景进行三维重建时运算量大的情况,实现了将三维重建应用于便携化的设备中,使得三维重建的应用更加广泛。
本实施例在上述实施例的基础上,对步骤S102中根据至少两帧图像确定深度相机对目标场景进行采集时的相对相机位姿进行了细化。图3为本申请实施例提供的确定深度相机对目标场景进行采集时的相对相机位姿的方法流程图,如图3所示,该方法包括步骤S301至步骤S305。
在步骤S301中,对每帧图像进行特征提取,得到每帧图像的至少一个特征点。
其中,对图像进行特征提取是为了找到该帧图像中一些具有标志性特征的像素点(即特征点)。例如,特征点可以是一帧图像中的角点、纹理、边缘处的像素点。对每帧图像进行特征提取可以采用快速特征点提取和描述(Oriented FAST and Rotated BRIEF,ORB)算法,找到该帧图像中的至少一个特征点。
在步骤S302中,对相邻两帧图像间的特征点进行匹配运算,得到相邻两帧图像间的特征点对应关系。
在对目标场景进行图像采集时,相邻两帧图像的大部分内容是一样的,因此两帧图像对应的特征点之间也存在着对应关系。在一实施例中,可以采用快速搜索方式(稀疏匹配算法)比较相邻两帧图像间的特征点之间的汉明距离,得到相邻两帧图像间的特征点对应关系。
在一实施例中,以相邻两帧图像间的一个特征点为例,假设两帧图像中表示同一个纹理特征的特征点X1,X2分别位于两帧图像的不同位置,以H(X1,X2)表示两个特征点X1, X2之间的汉明距离,对两特征点进行异或运算,并统计结果为1的个数,作为相邻两帧图像间的一个特征点的汉明距离(即特征点对应关系)。
在步骤S303中,移除特征点对应关系中的异常对应关系,通过包含剩余特征点二阶统计量的线性成分以及包含相对相机位姿的非线性成分,计算J(ξ) T J(ξ)中的非线性项
Figure PCTCN2019084820-appb-000002
对δ=-(J(ξ) TJ(ξ)) -1J(ξ) Tr(ξ)进行多次迭代计算,求解重投影误差小于预设误差阈值时的相对相机位姿。例如可使用高斯牛顿法进行迭代计算。例如可以计算重投影误差最小化时的位姿。
其中,r(ξ)表示包含所有重投影误差的向量,J(ξ)为r(ξ)的雅克比矩阵,ξ表示相对相机位姿的李代数,δ表示每次迭代时r(ξ)的增量值;R i表示采集第i帧图像时相机的旋转矩阵;R j表示采集第j帧图像时相机的旋转矩阵;
Figure PCTCN2019084820-appb-000003
表示第i帧图像上的第k个特征点;
Figure PCTCN2019084820-appb-000004
表示第j帧图像上的第k个特征点;C i,j表示第i帧图像与第j帧图像的特征点对应关系的集合;||C i,j||-1表示第i帧图像与第j帧图像的特征点对应关系的数量;[] ×表示向量积;||C i,j||表示取C i,j的范数。
在一实施例中,非线性项
Figure PCTCN2019084820-appb-000005
的表达式为:
Figure PCTCN2019084820-appb-000006
其中,
Figure PCTCN2019084820-appb-000007
表示线性成分;r il T和r jl表示非线性成分,r il T是旋转矩阵R i中的第l行,r jl是旋转矩阵R j中的第l行的转置,l=0,1,2(本实施例基于编程思想从0开始计数,即表示通常所说的矩阵第1行,依此类推)。
在一实施例中,步骤S302中得到的相邻两帧图像间的特征点对应关系中有一部分是异常对应关系。例如,相邻的两帧图像中,每帧图像中一定存在另一帧图像所没有的特征点,将它们进行步骤S302的匹配运算,就会出现异常的对应关系。在一实施例中,可以使用随机抽样一致(Random Sample Consensus,RANSAC)算法对异常对应关系进行移除处理,得到的剩余特征点对应关系可以表示为
Figure PCTCN2019084820-appb-000008
其中,
Figure PCTCN2019084820-appb-000009
表示第i帧图像与第j帧图像间第k个特征点之间的对应关系;j=i-1。
在相对相机位姿确定时,必然会产生一定的误差,因此确定相机位姿就是求解以下式为代价函数的两帧图像之间的非线性最小二乘问题:
Figure PCTCN2019084820-appb-000010
其中,E表示欧氏空间中第i帧图像相比于第j帧图像(本实施例中指上一帧图像)的重投影误差;T i表示相机采集第i帧图像时的位姿(根据前述对相机位姿的解释可知,实际是指采集第i帧图像相对于上一帧图像的位姿变化),T j表示相机采集第j帧图像时的位姿;N表示相机采集到的总帧数;
Figure PCTCN2019084820-appb-000011
表示第i帧图像上的第k个特征点
Figure PCTCN2019084820-appb-000012
的齐次坐标,
Figure PCTCN2019084820-appb-000013
表示第j帧图像上的第k个特征点
Figure PCTCN2019084820-appb-000014
的齐次坐标。需要说明的是,当i和k取值相同时,
Figure PCTCN2019084820-appb-000015
Figure PCTCN2019084820-appb-000016
表示同一个点,区别在于
Figure PCTCN2019084820-appb-000017
是本地坐标,
Figure PCTCN2019084820-appb-000018
是齐次坐标。
在一实施例中,在进行相对相机位姿确定时,为了加快运算速率,并不是对上式的代价函数进行直接计算,而是通过包含剩余特征点二阶统计量对应关系的线性成分以及包含相对相机位姿的非线性成分计算J(ξ) TJ(ξ)中的非线性项
Figure PCTCN2019084820-appb-000019
对δ=-(J(ξ) TJ(ξ))- 1J(ξ) Tr(ξ)进行多次迭代计算,求解重投影误差小于预设误差阈值时的相对相机位姿;由非线性项
Figure PCTCN2019084820-appb-000020
的表达式可知,在进行非线性项
Figure PCTCN2019084820-appb-000021
计算时,将两帧图像间固定的线性部分
Figure PCTCN2019084820-appb-000022
看成一个整体W来进行计算,不需要按照特征点对应关系的数量进行计算,降低了相对相机位姿确定算法的复杂度,增强了相对相机位姿计算的实时性。
下面对式(1)的推导过程进行说明,并结合推导过程分析降低算法复杂度的原理。
欧氏空间中相机采集第i帧图像时的相机位姿T i=[R i/t i],实际上T i是指相机采集第i帧图像时相对于采集第j帧图像(本实施例中指上一帧图像)时的位姿变换矩阵,包括旋转矩阵R i和平移矩阵t i。将欧氏空间中的刚性变换T i用SE3空间上的李代数ξ i来表示,即ξ i也表示相机采集第i帧图像时的相机位姿,T(ξ i)将李代数ξ i映射为欧氏空间中的T i
对于每个特征点对应关系
Figure PCTCN2019084820-appb-000023
其重投影误差为:
Figure PCTCN2019084820-appb-000024
式(1)中欧氏空间的重投影误差可表示为E(ξ)=||r(ξ)||,r(ξ)表示包含所有重投影误差的向量,即:
Figure PCTCN2019084820-appb-000025
Figure PCTCN2019084820-appb-000026
可以表示为(为表示简便,以下省去ξ i):
Figure PCTCN2019084820-appb-000027
其中,
Figure PCTCN2019084820-appb-000028
表示旋转矩阵R i中的第l行;t il表示平移向量t i中的第l个元素,l=0,1,2。
Figure PCTCN2019084820-appb-000029
其中,
Figure PCTCN2019084820-appb-000030
表示第i帧图像与第j帧图像间特征点对应关系相应的雅克比矩阵;m表示第m个特征点对应关系。
Figure PCTCN2019084820-appb-000031
Figure PCTCN2019084820-appb-000032
是一个6×6方阵,
Figure PCTCN2019084820-appb-000033
表示矩阵
Figure PCTCN2019084820-appb-000034
的转置,
Figure PCTCN2019084820-appb-000035
表达式如下:
Figure PCTCN2019084820-appb-000036
其中,I 3×3表示3×3的单位矩阵。根据式(6)和式(7),
Figure PCTCN2019084820-appb-000037
中四个非零的6×6子矩阵为:
Figure PCTCN2019084820-appb-000038
下面以
Figure PCTCN2019084820-appb-000039
为例进行说明,其他三个非零子矩阵也类似计算,不再赘述。
Figure PCTCN2019084820-appb-000040
其中,结合式(5)可以得到:
Figure PCTCN2019084820-appb-000041
Figure PCTCN2019084820-appb-000042
表示为W,结合式(5),则可将式(10)中的非线性项
Figure PCTCN2019084820-appb-000043
简化为式(1),该非线性项中的结构项
Figure PCTCN2019084820-appb-000044
被线性为W。虽然对结构项
Figure PCTCN2019084820-appb-000045
而言,
Figure PCTCN2019084820-appb-000046
是非线性的,但经过上述分析,
Figure PCTCN2019084820-appb-000047
中的所有非零元素与C i,j中结构项的二阶统计量成线性关系,结构项的二阶统计量为
Figure PCTCN2019084820-appb-000048
Figure PCTCN2019084820-appb-000049
也就是说,稀疏矩阵
Figure PCTCN2019084820-appb-000050
对C i,j中结构项的二阶统计量是元素线性的。
需要说明的是,每个对应关系
Figure PCTCN2019084820-appb-000051
的雅克比矩阵均由几何项ξ i,ξ j和结构项
Figure PCTCN2019084820-appb-000052
决定。对于同一帧对C i,j中的所有对应关系,其对应的雅可比矩阵共享相同的几何项,但具有不同的结构项。对于一个帧对C i,j,计算
Figure PCTCN2019084820-appb-000053
时,相关技术中的算法依赖于C i,j中特征点对应关系 的数量,而本实施例可以固定的复杂度高效计算
Figure PCTCN2019084820-appb-000054
只需计算结构项的二阶统计量W,而不需要每个对应关系都将相关的结构项去参与计算,即
Figure PCTCN2019084820-appb-000055
中四个非零子矩阵可以用复杂度O(1)代替复杂度O(||C i,j||)来计算。
因此,在δ=-(J(ξ) TJ(ξ))- 1J(ξ) Tr(ξ)的非线性高斯牛顿最优化的迭代步骤中需要的稀疏矩阵J TJ和J Tr可以复杂度O(M)高效计算,代替原来的计算复杂度O(N coor),N coor表示所有帧对的全部特征点对应关系的总数,M表示帧对的个数。一般的,O(N coor)在稀疏匹配中大约为300,而在稠密匹配中大约为10000,远大于帧对个数M。
经过上述推导,在相机位姿计算过程中,对于每个帧对,计算W,然后计算式(1)、(10、(9)、(8)和(6),求取
Figure PCTCN2019084820-appb-000056
进而可以通过迭代计算,求取r(ξ)最小时的ξ。
在步骤S304中,判断采集目标场景得到的当前帧图像是否为关键帧,基于当前帧图像是关键帧的判断结果,执行步骤S305,基于当前帧图像不是关键帧的判断结果,等待下一帧图像重新执行步骤S304。
其中,判断采集目标场景得到的当前帧图像是否为关键帧可以是:对采集目标场景得到的当前帧图像与上一关键帧图像进行匹配运算,得到两帧图像之间的转换关系矩阵;在转换关系矩阵大于或等于预设转换阈值的情况下,确定当前帧图像为当前关键帧。
在一实施例中,与S302中确定相邻两帧图像间特征点对应关系的方法类似,可以对当前帧图像与上一关键帧进行匹配运算,得到两帧图像之间的特征点对应关系矩阵,当该矩阵大于或等于预设转换阈值,则确定当前图像为当前关键帧。其中,两帧图像之间的转换关系矩阵可以是由两帧图像之间的特征点对应关系组成的矩阵。
需要说明的是,可以将采集目标场景得到的第一帧图像设置为第一个关键帧,预设转换阈值是根据深度相机采集图像时的运动情况提前设定的,例如,若相机拍摄相邻两帧图像时位姿变化较大,则预设转换阈值就设置大一些。
在步骤S305中,根据当前关键帧和历史关键帧进行回环检测;响应于确定回环成功,根据当前关键帧对已确定的相对相机位姿进行全局一致的优化更新。
其中,全局一致的优化更新是指在重建过程中,随着相机的运动,重建算法不断扩展目标场景的三维重建模型,而当深度相机运动到曾经到达的地方或与历史视角具有较大重叠时,扩展的三维重建模型和已生成的模型一致或一同优化更新为新的模型,而非产生交错、混叠等现象。回环检测则是依据深度相机当前观测判断该相机是否运动到曾经达到的地方或与历史视角具有较大重叠的地方,并以此优化减小累积误差。
为了提高优化速率,若前关键帧与历史关键帧回环检测成功(即深度相机运动到了曾经达到的地方或与历史视角具有较大重叠的地方),则通过当前关键帧与历史关键帧对已生成的模型进行全局一致的优化更新,减小三维重建模型的误差;响应于确定回环检测不成功,等待下一关键帧的出现,对下一关键帧进行回环检测。在一实施例中,将当前关键帧与历史关键帧进行回环检测可以是将当前关键帧与历史关键帧的特征点进行匹配运算,若匹配度高,则说明回环成功。
在一实施例中,进行相对相机位姿的全局一致的优化更新,即依据当前关键帧和匹配度高的一个或多个历史关键帧之间的对应关系,求解以
Figure PCTCN2019084820-appb-000057
为代价函数的当前关键帧与所有匹配度高的历史关键帧间的最小化转换误差问题。其中,E(T 1,T 2,···,T N-1|T i∈SE3,i∈[1,N-1])表示所有帧对(任意一个历史匹配关键帧与当前关键帧即为一个帧对)的转换误差;N为与当前关键帧匹配度高的历史关键帧的个数;E i,j表示第i帧与第j帧之间的转换误差,转换误差即为重投影误差。
在一实施例中,在进行相对相机位姿更新优化的过程中,需要保持非关键帧和其对应的关键帧的相对位姿不变,优化更新算法使用相关技术中的BA算法,也可以使用步骤S303中的方法,具体不再赘述。
本实施例提供的确定深度相机对目标场景进行采集时的相对相机位姿的方法,提取每帧图像的至少一个特征点,并对相邻两帧图像间的特征点进行匹配运算,得到相邻两帧图像间特征点对应关系,移出其中的异常对应关系,通过包含剩余特征点对应关系的线性成分以及包含相对相机位姿的非线性成分计算相对相机位姿,并进行关键帧的判断,若当前采集到的图像为关键帧且回环检测成功,则根据当前关键帧和历史关键帧对已确定的相对相机位姿进行全局一致的优化更新。在保证全局一致的同时,减少了三维重建时的运算量,实现了将三维重建应用于便携化的设备中,使得三维重建的应用更加广泛。
本实施例在上述实施例的基础上,对S103中针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素进行了解释说明。下面结合图5的确定至少一个特征体素的平面示意图,对图4的从图像中确定至少一个特征体素的方法进行示意说明,该方法包括步骤S401至步骤S406。
在步骤S401中,针对每帧图像,将该图像作为当前级筛选对象,并确定当前级体素单位。
其中,体素单位代表了构建的三维重建模型的精度,是根据要求重建的目标场景三维重建模型的精度提前设定的。例如,可以是5mm、10mm等。由于本实施例是采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,因此,会设置至少两级体素单位,其中最小级体素单位即为要求重建模型的精度。首先要将采集到的图像作为当前筛选对象,进行特征体素的筛选,此时的当前体素单位是预设的多级体素单位中最大级的体素单位。
示例性的,如图5所示,假设要实现基于CPU的100Hz帧率、5mm体素级精度模型的实时三维重建,且分别以20mm的体素单位和5mm的体素单位进行两级嵌套筛选特征体素。此时要以采集到的图像作为当前筛选对象,且当前级体素单位为20mm的体素单位。
在步骤S402中,将当前级筛选对象按照当前级体素单位划分为体素块,根据体素块确定至少一个当前索引块;其中,当前索引块包含预设个数的体素块。
其中,为了提高筛选速率,在对当前级筛选对象进行筛选时,可以根据当前体素单位划分的体素块按预设个数确定至少一个索引块,按照索引块进行特征体素的筛选,该方法与直接按照当前级体素单位划分的体素块进行筛选相比,提高了筛选的速率。需要说明的是此时的特征体素大小并不是一个体素块的大小,而是预设个数的体素块大小。
示例性的,如图5所示,假设当前索引块是由预设个数的8×8×8个体素块组成,将采集到的图像按照20mm的体素单位划分为多个边长为20mm的体素块,再将划分后的多个边长为20mm的体素块按照8×8×8的个数分成至少一个20mm体素单位对应的边长为160mm的索引块,映射到平面示意图中则是按照8×8的方框将整幅图像分为20mm体素单位对应的6 个边长为160mm的索引块。
在步骤S403中,在所有当前索引块中选取至少一个特征块,至少一个特征块到目标场景表面的距离小于当前级体素单位对应距离阈值。
其中,计算S402中确定的所有当前索引块到目标场景表面的距离,距离越小,说明该索引块距离目标场景表面的距离越近,每级体素单位都预先设定一个距离阈值,当索引块到目标场景表面的距离小于当前级体素单位对应的距离阈值时,则将该索引块选为特征块。其中上一级体素单位对应的距离阈值大于下一级体素单位对应的距离阈值。
在一实施例中,在所有当前索引块中选取至少一个特征块,至少一个特征块到目标场景表面的距离小于当前级体素单位对应距离阈值,可以是:针对每个当前索引块,按照当前索引块的哈希值访问索引块,依据采集每帧图像时的相对相机位姿及深度相机获取的图像深度值,分别计算当前索引块全部顶点到目标场景表面的距离;选取全部顶点到所述目标场景表面的距离均小于当前级体素单位对应距离阈值的当前索引块作为特征块。
在一实施例中,可以为每个当前索引块设置一个哈希值,通过哈希值来访问每个索引块,每个索引块具有多个顶点。按照公式sdf=||ξ-S||-D(u,v)计算位于当前索引块每个顶点的体素块到目标场景表面的距离,其中,sdf表示体素块(索引块的每个顶点处的体素块)到目标场景表面的距离;ξ表示采集该帧图像时的相对相机位姿;S表示该体素块在重建空间的栅格体素体素模型中的坐标;D(u,v)表示该体素块在深度相机获取图像中对应的深度值。当该索引块全部顶点到目标场景表面距离均小于当前级体素单位对应的距离阈值时,将该索引块设置为特征块;若大于或等于当前级体素单位对应的距离时,则将该索引块移除。在一实施例中,也可以计算该索引块全部顶点到目标场景表面距离的平均值,若平均值小于当前体素单位对应的距离阈值时,将该索引块设置为特征块。示例性的,如图5所示,图中边长为160mm的斜线方格为20mm体素单位划分的待移除索引块,即该部分索引块到目标场景表面距离大于20mm体素单位对应的距离阈值。
在步骤S404中,判断特征块是否满足最小级体素单位的划分条件,基于特征块满足最小级体素单位的划分条件的判断结果,执行步骤S405,基于特征块不满足最小级体素单位的划分条件的判断结果,执行步骤S406。
其中,判断特征块是否满足最小级体素单位的划分条件,即判断步骤S403中选出的特征块是否是预设的最小级体素单位划分后选取的特征块。示例性的,如图5所示,若步骤S403中选取的特征块是20mm体素单位划分的边长为160mm的特征块,而最小级体素单位为5mm的体素单位,则说明步骤S403中选取的特征块不满足最小级5mm体素单位的划分条件,执行步骤S406,进行下一级5mm体素单位的筛选;若步骤S403中选取的特征块是5mm体素单位划分的边长为40mm的特征块,则说明步骤S403中选取的特征块满足最小级5mm体素单位的划分条件,执行步骤S405将该特征块作为特征体素。
在步骤S405中,将该特征块作为特征体素。
在步骤S406中,将当前级筛选对象确定的所有特征块替换为新的当前级筛选对象,并选择下一级体素单位替换为新的当前级体素单位,返回执行步骤S402。
其中,当步骤S403中选取特征块不满足最小级体素单位的划分条件时,则将步骤S403中选出的所有特征块作为新的当前级筛选对象,选择下一级体素单位作为当前级体素单位,返回执行步骤S402,再次进行特征块的筛选。
示例性的,如图5所示,若判断出步骤S403选取的特征块是20mm体素单位划分的边长为160mm的特征块,并不是最小级5mm体素单位划分的边长为40mm的特征块,此时将20mm体素单位划分的边长为160mm的所有特征块作为当前级筛选对象,选择下一级5mm体素单位作为当前级体素单位,返回执行步骤S402,将步骤S403筛选出的边长为160mm的所有特征块按照5mm的体素单位划分为多个边长为5mm的体素块,再将划分后的多个边长为5mm的体素块按照8×8×8的个数分成至少一个5mm体素单位对应的边长为40mm的索引块,映射到平面示意图中则是按照8×8的方框将整幅图像分为5mm体素单位对应的32个边长为40mm的索引块,然后再执行步骤S403和步骤S404,此时,得到的边长为40mm的特征块(如图中边长为40mm对应的空白方格)为最小级5mm体素单位划分后选取的特征块,即该特征块为选定的特征体素,而图5中边长为40mm的点状方格为5mm体素单位划分的待移除索引块。
本实施例提供的从图像中确定至少一个特征体素的方法,通过针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素。避免了对目标场景进行三维重建时运算量大的情况,实现了将三维重建应用于便携化的设备中,使得三维重建的应用更加广泛。
本实施例在上述实施例的基础上,提供了一种基于深度相机的三维重建的示例实施例,如图6所示,该方法包括步骤S601至步骤S6011。
在步骤S601中,获取深度相机对目标场景进行采集得到的至少两帧图像。
在步骤S602中,根据至少两帧图像,确定深度相机对目标场景进行采集时的相对相机位姿。
在步骤S603中,判断采集目标场景得到的当前帧图像是否为关键帧,基于当前帧图像是关键帧的判断结果,存储该关键帧并执行步骤S604,基于当前帧图像不是关键帧的判断结果,等待下一帧图像重新执行步骤S603。
其中,对于相机采集的每帧图像,都可以判断该帧图像是否为关键帧,并存储判断出的关键帧,以按照关键帧率生成等值面以及作为历史关键帧在后续回环优化中使用。需要说明的是,相机采集的第一帧默认作为关键帧。
在步骤S604中,根据当前关键帧和历史关键帧进行回环检测,响应于确定回环成功,执行步骤S608(以进行栅格体素模型和等值面的优化更新)和步骤S6011(以进行相对相机位姿的优化更新)。
在步骤S605中,针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则。
在步骤S606中,依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型。
在步骤S607中,生成栅格体素模型的等值面,得到目标场景的三维重建模型。
在步骤S608中,在历史关键帧中选取与当前关键帧匹配的第一预设个数的匹配关键帧,并分别在选取的每个匹配关键帧对应的非关键帧中获取第二预设个数的非关键帧。
其中,为了实现模型重建的全局一致性,在采集的当前帧图像为关键帧的情况下,要在历史关键帧中选取与当前关键帧匹配的第一预设个数的匹配关键帧,在一实施例中,可以对当前关键帧与历史关键帧进行匹配运算,例如,可以计算当前关键帧与历史关键帧之间特征点间的汉明距离来完成当前关键帧与历史关键帧间的匹配。选取与当前关键帧匹配度高的第 一预设个数的历史关键帧,例如,选择与当前关键帧匹配度高的10个历史关键帧。每个关键帧都有与其对应的非关键帧,对每一个选出的匹配度高的历史关键帧,还要在其对应的非关键帧中选出第二预设个数的非关键帧,在一实施例中,可以在该历史关键帧对应的所有非关键帧中平均、分散地选取最多不超过11个的非关键帧,以提高优化更新效率的同时使优化帧选取更具有代表性。第一预设个数和第二预设个数可以是根据更新三维重建模型时的需要提前设定的。
在步骤S609中,根据当前关键帧与每个匹配关键帧的对应关系以及获取的非关键帧对三维重建模型的栅格体素模型进行优化更新。
其中,对三维重建模型的栅格体素模型进行优化更新分为对特征体素的更新以及对目标场景的栅格体素模型的更新。
在一实施例中,在进行特征体素的更新时,考虑到深度相机采集相邻两帧图像时的视角重叠过大,导致相邻两帧图像选取的特征体素几乎一致,且对每帧图像都进行一次特征体素的优化更新耗时较长,因此在更新特征体素时只对匹配的历史关键帧重新执行步骤S605完成特征体素的优化更新。
由于步骤S606生成目标场景的栅格体素模型是对每一帧图像进行处理后生成的,因此在进行目标场景的栅格体素模型的更新时,对匹配度高的历史关键帧及其对应的非关键帧都要进行优化更新,即在每一个关键帧到来之时,对步骤S608中选取的与当前关键帧匹配度高的第一预设个数的历史关键帧及每个历史关键帧对应的第二预设个数的非关键帧,去除对应融合数据,重新执行步骤S606进行融合计算,完成对目标场景的栅格体素模型的优化更新。
其中,无论是初始得到目标场景的栅格体素模型时的融合计算,还是栅格体素模型优化更新阶段的融合计算,可以将一个体素块作为一个融合对象进行融合计算。为了提高融合效率,也可将预设个数的体素块作为一个融合对象进行融合计算,例如大小为2×2×2个体素块的体元。
在步骤S610中,根据当前关键帧与每个匹配关键帧的对应关系对三维重建模型的等值面进行优化更新。
由于步骤S607仅对关键帧生成栅格体素模型的等值面,因此在进行等值面更新时,可以是只对步骤S608中选取的与当前关键帧匹配度高的历史关键重新执行步骤S607进行匹配关键帧的等值面的更新。
为了加快模型更新优化速度,对三维重建模型的等值面进行优化更新可以是:针对每个匹配关键帧,在所述当前关键帧对应的多个体素块中,选取至少一个体素块,所述至少一个体素块到目标场景表面的距离小于或等于所述匹配关键帧中对应体元的更新阈值;依据选取的至少一个体素块对每个匹配关键帧的等值面进行优化更新。
其中,更新阈值可以是在步骤S607生成栅格体素模型的等值面的同时,针对生成等值面所使用的关键帧中的每个体元,选取该体元中体素块到目标场景表面的距离的最大值,将所述最大值设置为该体元的更新阈值。也就是说,生成等值面所使用的关键帧中每个体元都设置有对应的更新阈值。
在一实施例中,可以计算当前关键帧的所有体素块到目标场景表面的距离,然后针对每个匹配关键帧,根据当前关键帧与该匹配关键帧的对应关系,确定该两帧图像的体元对应关系。按照体元对应关系在该匹配关键帧中找到与当前关键帧中当前体元对应的体元,以确定 对应的更新阈值,然后在当前体元的多个体素块中选取至少一个体素块,所述至少一个体素块到目标场景表面的距离小于或等于该更新阈值。由此逐个对当前关键帧中每个体元执行如上选取操作,完成了体素块的过滤,根据选取的体素块进行等值面优化更新,得到等值面的过程与步骤S607类似,不再赘述。而距离大于更新阈值的体素块为需忽略的体素块,不对其进行任何操作。由此过滤了部分体素块,能提高计算速度。
在一实施例中,为了避免访问一个体素块就要在哈希表中搜索一次哈希值,可以在访问体素块时一并在哈希表中搜索相邻的多个体素块的哈希值进行处理。
在步骤S6011中,根据当前关键帧对已确定的相对相机位姿进行全局一致的优化更新。更新相对相机位姿,以便于更新对应的栅格体素模型时使用。
为了保证三维重建的实时性,可以在步骤S601进行目标场景图像采集的同时,实时对每帧图像进步骤行S602相对相机位姿的确定以及步骤S603关键帧的判断,即一边采集图像一边进行位姿的计算及关键帧判断。且步骤S605到步骤S607生成目标场景的三维重建模型的过程与步骤S608到步骤S610对生成的三维重建模型进行更新的过程也是同时进行的,即在生成三维重建模型的过程中完成对已建部分模型的优化更新。
本实施例提供了一种基于深度相机的三维重建方法,通过获取深度相机采集的目标场景图像,确定深度相机在采集目标场景图像时的相对相机位姿,采用至少两级嵌套筛选方式确定每帧图像的特征体素,并进行融合计算得到目标场景的栅格体素模型,生成栅格体素模型的等值面,得到目标场景的三维重建模型,并根据当前关键帧、多个匹配关键帧以及多个匹配关键帧的非关键帧对目标场景的三维重建模型进行优化更新,保证模型的全局一致性。避免了对目标场景进行三维重建时运算量大的情况,实现了将三维重建应用于便携化的设备中,使得三维重建的应用更加广泛。
图7为本申请实施例提供的一种基于深度相机的三维重建装置的结构框图,该装置可执行本申请任意实施例所提供的基于深度相机的三维重建方法,具备执行方法相应的功能模块。该装置可以基于CPU实现。如图7所示,该装置包括图像获取模块701,位姿确定模块702,体素确定模块703,模型生成模块704以及三维重建模块705。
图像获取模块701,设置为获取深度相机对目标场景进行采集得到的至少两帧图像。
位姿确定模块702,设置为根据至少两帧图像,确定深度相机对目标场景进行采集时的相对相机位姿。
体素确定模块703,设置为针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则。
模型生成模块704,设置为依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型。
三维重建模块705,设置为生成栅格体素模型的等值面,得到目标场景的三维重建模型。
在一实施例中,三维重建模块705,设置为响应于确定采集目标场景得到的当前帧图像为关键帧,生成当前关键帧对应的体素块的等值面,并对等值面添加颜色,得到目标场景的三维重建模型。
本实施例提供了一种基于深度相机的三维重建装置,通过获取深度相机采集的目标场景图像,确定深度相机在采集目标场景图像时的相机位姿,采用至少两级嵌套筛选方式确定每帧图像的特征体素,并进行融合计算得到目标场景的栅格体素模型,生成栅格体素模型的等 值面,得到目标场景的三维重建模型。避免了对目标场景进行三维重建时运算量大的情况,实现了将三维重建应用于便携化的设备中,使得三维重建的应用更加广泛。
在一实施例中,上述位姿确定模块702包括特征点提取单元,匹配运算单元和位姿确定单元。
特征点提取单元,设置为对每帧图像进行特征提取,得到每帧图像的至少一个特征点。
匹配运算单元,设置为对相邻两帧图像间的特征点进行匹配运算,得到相邻两帧图像间的特征点对应关系。
位姿确定单元,设置为移除特征点对应关系中的异常对应关系,通过包含剩余特征点二阶统计量的线性成分以及包含相对相机位姿的非线性成分计算J(ξ) TJ(ξ)中的非线性项
Figure PCTCN2019084820-appb-000058
对δ=-(J(ξ) TJ(ξ)) -1J(ξ) Tr(ξ)进行多次迭代计算,求解重投影误差小于预设误差阈值时的相对相机位姿。
其中,r(ξ)表示包含所有重投影误差的向量,J(ξ)为r(ξ)的雅克比矩阵,ξ表示相对相机位姿的李代数,δ表示每次迭代时r(ξ)的增量值;R i表示采集第i帧图像时相机的旋转矩阵;R j表示采集第j帧图像时相机的旋转矩阵;
Figure PCTCN2019084820-appb-000059
表示第i帧图像上的第k个特征点;
Figure PCTCN2019084820-appb-000060
表示第j帧图像上的第k个特征点;C i,j表示第i帧图像与第j帧图像的特征点对应关系的集合;||C i,j||-1表示第i帧图像与第j帧图像的特征点对应关系的数量;[] ×表示向量积;||C i,j||表示取C i,j的范数。
在一实施例中,非线性项
Figure PCTCN2019084820-appb-000061
的表达式为:
Figure PCTCN2019084820-appb-000062
其中,
Figure PCTCN2019084820-appb-000063
表示线性成分;r il T和r jl表示非线性成分,r il T是旋转矩阵R i中的第l行,r jl是旋转矩阵R j中的第l行的转置,l=0,1,2。
在一实施例中,上述装置还包括关键帧确定模块,回环检测模块以及位姿更新模块。
关键帧确定模块,设置为对采集目标场景得到的当前帧图像与上一关键帧图像进行匹配运算,得到两帧图像之间的转换关系矩阵;若转换关系矩阵大于或等于预设转换阈值,则确定当前帧图像为当前关键帧。
回环检测模块,设置为响应于确定采集目标场景得到的当前帧图像为关键帧,根据当前关键帧和历史关键帧进行回环检测。
位姿更新模块,设置为响应于确定回环成功,根据当前关键帧对已确定的相对相机位姿进行全局一致的优化更新。
在一实施例中,上述体素确定模块703包括初始确定单元,索引块确定单元,特征块选取单元,特征体素确定单元以及循环单元。
初始确定单元,设置为针对每帧图像,将图像作为当前级筛选对象,并确定当前级体素单位。
索引块确定单元,设置为将当前级筛选对象按照当前级体素单位划分为体素块,根据体素块确定至少一个当前索引块;其中,当前索引块包含预设个数的体素块。
特征块选取单元,设置为在所有当前索引块中选取至少一个特征块,所述至少一个特征块到目标场景表面的距离小于当前级体素单位对应距离阈值的。
特征体素确定单元,设置为如果特征块满足最小级体素单位的划分条件,则将特征块作为特征体素。
循环单元,设置为如果特征块不满足最小级体素单位的划分条件,则将当前级筛选对象确定的所有特征块替换为新的当前级筛选对象,并选择下一级体素单位替换为新的当前级体素单位,返回执行针对当前级筛选对象的体素块划分操作;其中,体素单位逐级减小至最小级体素单位。
在一实施例中,上述特征块选取单元,设置为:针对每个当前索引块,按照当前索引块的哈希值访问索引块,依据采集每帧图像时的相对相机位姿及深度相机获取的图像深度值,分别计算当前索引块全部顶点到目标场景表面的距离;选取全部顶点到所述目标场景表面的距离均小于当前级体素单位对应距离阈值的当前索引块作为特征块。
在一实施例中,上述装置还包括匹配帧确定模块,模型更新模块以及等值面更新模块。
匹配帧确定模块,设置为响应于确定采集目标场景得到的当前帧图像为关键帧,在历史关键帧中选取与当前关键帧匹配的第一预设个数的匹配关键帧,并分别在选取的每个匹配关键帧对应的非关键帧中获取第二预设个数的非关键帧。
模型更新模块,设置为根据当前关键帧与每个匹配关键帧的对应关系以及获取的非关键帧对三维重建模型的栅格体素模型进行优化更新。
等值面更新模块,设置为根据当前关键帧与每个匹配关键帧的对应关系对三维重建模型的等值面进行优化更新。
在一实施例中,等值面更新模块,设置为针对每个匹配关键帧,在当前关键帧对应的多个体素块中选取至少一个体素块,至少一个体素块到目标场景表面的距离小于或等于匹配关键帧中对应体元的更新阈值;依据选取的至少一个体素块对匹配关键帧的等值面进行优化更新。
其中,三维重建模块705在生成当前关键帧图像对应的体素块的等值面的同时,还设置为针对生成等值面所使用的关键帧中的每个体元,选取该体元中所有体素块到所述目标场景表面的距离的最大值,将所述最大值设置为该体元的更新阈值。
图8为本申请实施例提供的一种电子设备的结构示意图,如图8所示,该电子设备包括存储装置80、一个或多个处理器81和至少一个深度相机82;电子设备的存储装置80、处理器81和深度相机82可以通过总线或其他方式连接,图8中以通过总线连接为例。
存储装置80作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的基于深度相机的三维重建装置对应的模块(例如,设置为基于深度相机的三维重建装置中的图像获取模块701)。处理器81通过处理存储在存储装置80中的软件程序、指令以及模块,从而执行电子设备设备的多种功能应用以及数据处理,即实现上述的基于深度相机的三维重建方法。在一实施例中,处理器81可以为中央处理器或高性能的图形处理器。
存储装置80可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、 至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储装置80可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储装置80可包括相对于处理器81远程设置的存储装置,这些远程存储装置可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
深度相机82可设置为在处理器81的控制下对目标场景进行图像采集。该深度相机可嵌入式安装在电子设备中,在一实施例中,该电子设备可以是便携式移动电子设备,例如,该电子设备可以是智能终端(手机、平板电脑)或三维视觉交互设备(虚拟现实(Virtual Reality,VR)眼镜、可戴式头盔),可以进行移动、旋转等操作下的图像拍摄。
本实施例提供的一种电子设备可设置为执行上述任意实施例提供的基于深度相机的三维重建方法,具备相应的功能。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时可实现上述实施例的基于深度相机的三维重建方法。
本申请实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)或闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或电子设备上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网(Local Area Network,LAN)或广域网(Wide Area Network,WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
综上所述,本申请实施例提供的基于深度相机的三维重建方案,在融合计算阶段,采用 由粗到细的嵌套筛选策略及稀疏采样的思想进行特征体素的选取,在保证重建精度的同时,极大地提升了融合速度;以关键帧率进行等值面的生成,能够提升等值面的生成速度;提升了三维重建效率。另外,通过优化更新阶段能够有效保证三维重建的全局一致性。
上述实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员应该明白,上述的本申请实施例的各模块或各操作可以用通用的计算装置来实现,它们可以集中在单个计算装置上,或者分布在多个计算装置所组成的网络上,例如,他们可以用计算机装置可执行的程序代码来实现,从而可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成集成电路模块,或者将它们中的多个模块或操作制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件的结合。
本说明书中的实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,实施例之间的相同或相似的部分互相参见即可。

Claims (15)

  1. 一种基于深度相机的三维重建方法,包括:
    获取深度相机对目标场景进行采集得到的至少两帧图像;
    根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿;
    针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则;
    依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型;
    生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型。
  2. 根据权利要求1所述的方法,其中,根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿,包括:
    对每帧图像进行特征提取,得到每帧图像的至少一个特征点;
    对相邻两帧图像间的特征点进行匹配运算,得到所述相邻两帧图像间的特征点对应关系;
    移除所述特征点对应关系中的异常对应关系,通过包含剩余特征点二阶统计量的线性成分以及包含相对相机位姿的非线性成分,计算J(ξ) TJ(ξ)中的非线性项
    Figure PCTCN2019084820-appb-100001
    对δ=-(J(ξ) TJ(ξ))- 1J(ξ) Tr(ξ)进行多次迭代计算,求解重投影误差小于预设误差阈值时的相对相机位姿;
    其中,r(ξ)表示包含所有重投影误差的向量,J(ξ)为r(ξ)的雅克比矩阵,ξ表示相对相机位姿的李代数,δ表示每次迭代时r(ξ)的增量值;R i表示采集第i帧图像时相机的旋转矩阵;R j表示采集第j帧图像时相机的旋转矩阵;
    Figure PCTCN2019084820-appb-100002
    表示第i帧图像上的第k个特征点;
    Figure PCTCN2019084820-appb-100003
    表示第j帧图像上的第k个特征点;C i,j表示第i帧图像与第j帧图像的特征点对应关系的集合;||C i,j||-1表示第i帧图像与第j帧图像的特征点对应关系的数量;[] ×表示向量积;||C i,j||表示取C i,j的范数。
  3. 根据权利要求2所述的方法,其中,所述非线性项
    Figure PCTCN2019084820-appb-100004
    的表达式为:
    Figure PCTCN2019084820-appb-100005
    其中,
    Figure PCTCN2019084820-appb-100006
    表示线性成分;r il T和r jl表示非线性成分,r il T是旋转矩阵R i中的第l行,r jl是旋转矩阵R j中的第l行的转置,l=0,1,2。
  4. 根据权利要求1或2所述的方法,根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿之后,还包括:
    响应于确定采集所述目标场景得到的当前帧图像为当前关键帧,根据当前关键帧和历史关键帧进行回环检测;
    响应于确定回环成功,根据所述当前关键帧对已确定的相对相机位姿进行全局一致的优化更新。
  5. 根据权利要求4所述的方法,在根据当前关键帧和历史关键帧进行回环检测之前,还包括:
    对采集所述目标场景得到的当前帧图像与上一关键帧图像进行匹配运算,得到两帧图像之间的转换关系矩阵;
    响应于确定所述转换关系矩阵大于或等于预设转换阈值,确定所述当前帧图像为所述当前关键帧。
  6. 根据权利要求1所述的方法,其中,针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,包括:
    针对每帧图像,将每帧图像作为当前级筛选对象,并确定当前级体素单位;
    将所述当前级筛选对象按照当前级体素单位划分为体素块,根据体素块确定至少一个当前索引块;其中,所述当前索引块包含预设个数的体素块;
    在所有当前索引块中选取至少一个特征块,所述至少一个特征块到目标场景表面的距离小于所述当前级体素单位对应距离阈值;
    在所述特征块满足最小级体素单位的划分条件的情况下,将所述特征块作为特征体素;
    在所述特征块不满足最小级体素单位的划分条件的情况下,将当前级筛选对象确定的所有特征块替换为新的当前级筛选对象,并选择下一级体素单位替换为新的当前级体素单位,返回执行针对当前级筛选对象的体素块划分操作;
    其中,体素单位逐级减小至最小级体素单位。
  7. 根据权利要求6所述的方法,其中,在所有当前索引块中选取至少一个特征块,所述至少一个特征块到目标场景表面的距离小于所述当前级体素单位对应距离阈值,包括:
    针对每个当前索引块,所述每个当前索引块具有多个顶点,按照当前索引块的哈希值访问索引块,依据采集每帧图像时的相对相机位姿及深度相机获取的图像深度值,分别计算所述当前索引块的每个顶点到所述目标场景表面的距离;
    选取每个顶点到所述目标场景表面的距离均小于所述当前级体素单位对应距离阈值的当前索引块,作为特征块。
  8. 根据权利要求1所述的方法,其中,生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型,包括:
    响应于确定采集所述目标场景得到的当前帧图像为关键帧,生成当前关键帧对应的体素块的等值面,并对所述等值面添加颜色,得到所述目标场景的三维重建模型。
  9. 根据权利要求1所述的方法,在生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型之后,还包括:
    响应于确定采集所述目标场景得到的当前帧图像为当前关键帧,在历史关键帧中选取与当前关键帧匹配的第一预设个数的匹配关键帧,并分别在选取的每个匹配关键帧对应的非关键帧中获取第二预设个数的非关键帧;
    根据所述当前关键帧与所述每个匹配关键帧的对应关系,以及获取的非关键帧,对所述三维重建模型的栅格体素模型进行优化更新;
    根据所述当前关键帧与所述每个匹配关键帧的对应关系,对所述三维重建模型的等值面进行优化更新。
  10. 根据权利要求9所述的方法,其中,根据所述当前关键帧与所述每个匹配关键帧的对应关系,对所述三维重建模型的等值面进行优化更新,包括:
    针对每个匹配关键帧,在所述当前关键帧对应的多个体素块中,选取至少一个体素块,所述至少一个体素块到所述目标场景表面的距离小于或等于所述匹配关键帧中对应体元的更新阈值;
    依据选取的所述至少一个体素块对所述匹配关键帧的等值面进行优化更新。
  11. 根据权利要求10所述的方法,其中,生成所述栅格体素模型的等值面,包括:
    针对生成等值面所使用的关键帧中的每个体元,选取该体元中所有体素块到所述目标场景表面的距离的最大值,将所述最大值设置为该体元的更新阈值。
  12. 一种基于深度相机的三维重建装置,包括:
    图像获取模块,设置为获取深度相机对目标场景进行采集得到的至少两帧图像;
    位姿确定模块,设置为根据所述至少两帧图像,确定所述深度相机对目标场景进行采集时的相对相机位姿;
    体素确定模块,设置为针对每帧图像,采用至少两级嵌套筛选方式从每帧图像中确定至少一个特征体素,其中,每级筛选采用与每级筛选对应的体素分块规则;
    模型生成模块,设置为依据每帧图像的相对相机位姿对每帧图像的至少一个特征体素进行融合计算,得到目标场景的栅格体素模型;
    三维重建模块,设置为生成所述栅格体素模型的等值面,得到所述目标场景的三维重建模型。
  13. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    至少一个深度相机,设置为对目标场景进行图像采集;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-11中任一所述的基于深度相机的三维重建方法。
  14. 根据权利要求13所述的设备,所述至少一个处理器为中央处理器;所述电子设备为便携式移动电子设备。
  15. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1-11中任一所述的基于深度相机的三维重建方法。
PCT/CN2019/084820 2018-03-05 2019-04-28 基于深度相机的三维重建方法、装置、设备及存储介质 WO2019170164A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/977,899 US20210110599A1 (en) 2018-03-05 2019-04-28 Depth camera-based three-dimensional reconstruction method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810179264.6A CN108537876B (zh) 2018-03-05 2018-03-05 三维重建方法、装置、设备及存储介质
CN201810179264.6 2018-03-05

Publications (1)

Publication Number Publication Date
WO2019170164A1 true WO2019170164A1 (zh) 2019-09-12

Family

ID=63486699

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/084820 WO2019170164A1 (zh) 2018-03-05 2019-04-28 基于深度相机的三维重建方法、装置、设备及存储介质

Country Status (3)

Country Link
US (1) US20210110599A1 (zh)
CN (1) CN108537876B (zh)
WO (1) WO2019170164A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627061A (zh) * 2020-06-03 2020-09-04 贝壳技术有限公司 位姿检测方法、装置以及电子设备、存储介质
CN112446951A (zh) * 2020-11-06 2021-03-05 杭州易现先进科技有限公司 三维重建方法、装置、电子设备及计算机存储介质
CN113284176A (zh) * 2021-06-04 2021-08-20 深圳积木易搭科技技术有限公司 一种结合几何和纹理的在线匹配优化方法和三维扫描系统
CN113409444A (zh) * 2021-05-21 2021-09-17 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN113470180A (zh) * 2021-05-25 2021-10-01 杭州思看科技有限公司 三维网格重建方法、装置、电子装置和存储介质
CN113689512A (zh) * 2021-08-23 2021-11-23 北京搜狗科技发展有限公司 一种要素点编码的方法及相关装置
CN117272758A (zh) * 2023-11-20 2023-12-22 埃洛克航空科技(北京)有限公司 基于三角格网的深度估计方法、装置、计算机设备和介质
WO2024113309A1 (zh) * 2022-12-01 2024-06-06 北京原创力科技有限公司 一种基于体素压缩的视频渲染方法及系统

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019058487A1 (ja) * 2017-09-21 2019-03-28 オリンパス株式会社 3次元復元画像処理装置、3次元復元画像処理方法及び3次元復元画像処理プログラムを記憶したコンピュータ読み取り可能な記憶媒体
CN108537876B (zh) * 2018-03-05 2020-10-16 清华-伯克利深圳学院筹备办公室 三维重建方法、装置、设备及存储介质
CN109377551B (zh) * 2018-10-16 2023-06-27 北京旷视科技有限公司 一种三维人脸重建方法、装置及其存储介质
WO2020113417A1 (zh) * 2018-12-04 2020-06-11 深圳市大疆创新科技有限公司 目标场景三维重建方法、系统及无人机
CN109840940B (zh) * 2019-02-11 2023-06-27 清华-伯克利深圳学院筹备办公室 动态三维重建方法、装置、设备、介质和系统
CN109993802B (zh) * 2019-04-03 2020-12-25 浙江工业大学 一种城市环境中的混合相机标定方法
CN110064200B (zh) 2019-04-25 2022-02-22 腾讯科技(深圳)有限公司 基于虚拟环境的物体构建方法、装置及可读存储介质
CN110349253B (zh) * 2019-07-01 2023-12-01 达闼机器人股份有限公司 一种场景的三维重建方法、终端和可读存储介质
CN112308904B (zh) * 2019-07-29 2024-07-02 北京初速度科技有限公司 一种基于视觉的建图方法、装置及车载终端
CN112446227A (zh) * 2019-08-12 2021-03-05 阿里巴巴集团控股有限公司 物体检测方法、装置及设备
CN112136311B (zh) * 2019-10-22 2022-07-12 深圳市大疆创新科技有限公司 一种图像处理方法、设备、成像系统及存储介质
CN112991427A (zh) * 2019-12-02 2021-06-18 顺丰科技有限公司 物体体积测量方法、装置、计算机设备和存储介质
CN111242847B (zh) * 2020-01-10 2021-03-30 上海西井信息科技有限公司 基于闸道的图像拼接方法、系统、设备及存储介质
CN111310654B (zh) * 2020-02-13 2023-09-08 北京百度网讯科技有限公司 一种地图要素的定位方法、装置、电子设备及存储介质
CN111325741B (zh) * 2020-03-02 2024-02-02 上海媒智科技有限公司 基于深度图像信息处理的物品数量估算方法、系统及设备
CN111598927B (zh) * 2020-05-18 2023-08-01 京东方科技集团股份有限公司 一种定位重建方法和装置
CN112115980A (zh) * 2020-08-25 2020-12-22 西北工业大学 基于光流跟踪和点线特征匹配的双目视觉里程计设计方法
CN112419482B (zh) * 2020-11-23 2023-12-01 太原理工大学 深度点云融合的矿井液压支架群位姿三维重建方法
CN112435206B (zh) * 2020-11-24 2023-11-21 北京交通大学 利用深度相机对物体进行三维信息重建的方法
CN112767538B (zh) * 2021-01-11 2024-06-07 浙江商汤科技开发有限公司 三维重建及相关交互、测量方法和相关装置、设备
CN112750201B (zh) * 2021-01-15 2024-03-29 浙江商汤科技开发有限公司 三维重建方法及相关装置、设备
CN113129348B (zh) * 2021-03-31 2022-09-30 中国地质大学(武汉) 一种基于单目视觉的道路场景中车辆目标的三维重建方法
CN113706373A (zh) * 2021-08-25 2021-11-26 深圳市慧鲤科技有限公司 模型重建方法及相关装置、电子设备和存储介质
CN113450457B (zh) * 2021-08-31 2021-12-14 腾讯科技(深圳)有限公司 道路重建方法、装置、计算机设备和存储介质
US11830140B2 (en) * 2021-09-29 2023-11-28 Verizon Patent And Licensing Inc. Methods and systems for 3D modeling of an object by merging voxelized representations of the object
CN114061488B (zh) * 2021-11-15 2024-05-14 华中科技大学鄂州工业技术研究院 一种物体测量方法、系统以及计算机可读存储介质
CN114241168A (zh) * 2021-12-01 2022-03-25 歌尔光学科技有限公司 显示方法、显示设备及计算机可读存储介质
CN114393575B (zh) * 2021-12-17 2024-04-02 重庆特斯联智慧科技股份有限公司 基于用户姿势高效能识别的机器人控制方法和系统
CN114255285B (zh) * 2021-12-23 2023-07-18 奥格科技股份有限公司 视频与城市信息模型三维场景融合方法、系统及存储介质
US12073512B2 (en) * 2022-09-21 2024-08-27 Streem, Llc Key frame selection using a voxel grid
CN116704152B (zh) * 2022-12-09 2024-04-19 荣耀终端有限公司 图像处理方法和电子设备
CN116258817B (zh) * 2023-02-16 2024-01-30 浙江大学 一种基于多视图三维重建的自动驾驶数字孪生场景构建方法和系统
CN117115333B (zh) * 2023-02-27 2024-09-06 荣耀终端有限公司 一种结合imu数据的三维重建方法
CN116363327B (zh) * 2023-05-29 2023-08-22 北京道仪数慧科技有限公司 体素地图生成方法及系统
CN116437063A (zh) * 2023-06-15 2023-07-14 广州科伊斯数字技术有限公司 一种三维图像显示系统及方法
CN117437288B (zh) * 2023-12-19 2024-05-03 先临三维科技股份有限公司 摄影测量方法、装置、设备及存储介质
CN117496074B (zh) * 2023-12-29 2024-03-22 中国人民解放军国防科技大学 一种适应相机快速移动的高效三维场景重建方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157367A (zh) * 2015-03-23 2016-11-23 联想(北京)有限公司 三维场景重建方法和设备
CN107194984A (zh) * 2016-03-14 2017-09-22 武汉小狮科技有限公司 移动端实时高精度三维建模方法
CN107358629A (zh) * 2017-07-07 2017-11-17 北京大学深圳研究生院 一种基于目标识别的室内建图与定位方法
US20180018805A1 (en) * 2016-07-13 2018-01-18 Intel Corporation Three dimensional scene reconstruction based on contextual analysis
CN108537876A (zh) * 2018-03-05 2018-09-14 清华-伯克利深圳学院筹备办公室 基于深度相机的三维重建方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8711206B2 (en) * 2011-01-31 2014-04-29 Microsoft Corporation Mobile camera localization using depth maps
CN105184784B (zh) * 2015-08-28 2018-01-16 西交利物浦大学 基于运动信息的单目相机获取深度信息的方法
US9892552B2 (en) * 2015-12-15 2018-02-13 Samsung Electronics Co., Ltd. Method and apparatus for creating 3-dimensional model using volumetric closest point approach
US10319141B2 (en) * 2016-06-21 2019-06-11 Apple Inc. Method and system for vision based 3D reconstruction and object tracking
CN106504320B (zh) * 2016-11-02 2019-12-17 华东师范大学 一种基于gpu及面向深度图像的实时三维重构方法
CN106803267B (zh) * 2017-01-10 2020-04-14 西安电子科技大学 基于Kinect的室内场景三维重建方法
CN106887037B (zh) * 2017-01-23 2019-12-17 杭州蓝芯科技有限公司 一种基于gpu和深度相机的室内三维重建方法
CN106910242B (zh) * 2017-01-23 2020-02-28 中国科学院自动化研究所 基于深度相机进行室内完整场景三维重建的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157367A (zh) * 2015-03-23 2016-11-23 联想(北京)有限公司 三维场景重建方法和设备
CN107194984A (zh) * 2016-03-14 2017-09-22 武汉小狮科技有限公司 移动端实时高精度三维建模方法
US20180018805A1 (en) * 2016-07-13 2018-01-18 Intel Corporation Three dimensional scene reconstruction based on contextual analysis
CN107358629A (zh) * 2017-07-07 2017-11-17 北京大学深圳研究生院 一种基于目标识别的室内建图与定位方法
CN108537876A (zh) * 2018-03-05 2018-09-14 清华-伯克利深圳学院筹备办公室 基于深度相机的三维重建方法、装置、设备及存储介质

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627061A (zh) * 2020-06-03 2020-09-04 贝壳技术有限公司 位姿检测方法、装置以及电子设备、存储介质
CN112446951A (zh) * 2020-11-06 2021-03-05 杭州易现先进科技有限公司 三维重建方法、装置、电子设备及计算机存储介质
CN112446951B (zh) * 2020-11-06 2024-03-26 杭州易现先进科技有限公司 三维重建方法、装置、电子设备及计算机存储介质
CN113409444A (zh) * 2021-05-21 2021-09-17 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN113409444B (zh) * 2021-05-21 2023-07-11 北京达佳互联信息技术有限公司 三维重建方法、装置、电子设备及存储介质
CN113470180B (zh) * 2021-05-25 2022-11-29 思看科技(杭州)股份有限公司 三维网格重建方法、装置、电子装置和存储介质
CN113470180A (zh) * 2021-05-25 2021-10-01 杭州思看科技有限公司 三维网格重建方法、装置、电子装置和存储介质
CN113284176B (zh) * 2021-06-04 2022-08-16 深圳积木易搭科技技术有限公司 一种结合几何和纹理的在线匹配优化方法和三维扫描系统
CN113284176A (zh) * 2021-06-04 2021-08-20 深圳积木易搭科技技术有限公司 一种结合几何和纹理的在线匹配优化方法和三维扫描系统
CN113689512A (zh) * 2021-08-23 2021-11-23 北京搜狗科技发展有限公司 一种要素点编码的方法及相关装置
WO2024113309A1 (zh) * 2022-12-01 2024-06-06 北京原创力科技有限公司 一种基于体素压缩的视频渲染方法及系统
CN117272758A (zh) * 2023-11-20 2023-12-22 埃洛克航空科技(北京)有限公司 基于三角格网的深度估计方法、装置、计算机设备和介质
CN117272758B (zh) * 2023-11-20 2024-03-15 埃洛克航空科技(北京)有限公司 基于三角格网的深度估计方法、装置、计算机设备和介质

Also Published As

Publication number Publication date
CN108537876A (zh) 2018-09-14
US20210110599A1 (en) 2021-04-15
CN108537876B (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2019170164A1 (zh) 基于深度相机的三维重建方法、装置、设备及存储介质
WO2020001168A1 (zh) 三维重建方法、装置、设备和存储介质
Li et al. DeepI2P: Image-to-point cloud registration via deep classification
CN108447097B (zh) 深度相机标定方法、装置、电子设备及存储介质
JP6811296B2 (ja) コレクターの相対パラメータのキャリブレーション方法、装置、機器及び媒体
EP3570253B1 (en) Method and device for reconstructing three-dimensional point cloud
WO2021004416A1 (zh) 一种基于视觉信标建立信标地图的方法、装置
EP3818741A1 (en) Method, apparatus and computer program for performing three dimensional radio model construction
CN109521879B (zh) 交互式投影控制方法、装置、存储介质及电子设备
CN110637461B (zh) 计算机视觉系统中的致密光学流处理
GB2580691A (en) Depth estimation
CN110111388A (zh) 三维物体位姿参数估计方法及视觉设备
US20230047211A1 (en) Method and system for automatic characterization of a three-dimensional (3d) point cloud
CN112784873A (zh) 一种语义地图的构建方法及设备
CN1136738C (zh) 一种微型实时立体视觉机
CN116129037B (zh) 视触觉传感器及其三维重建方法、系统、设备及存储介质
CN112183506A (zh) 一种人体姿态生成方法及其系统
CN113793370B (zh) 三维点云配准方法、装置、电子设备及可读介质
CN111829522B (zh) 即时定位与地图构建方法、计算机设备以及装置
CN111860651A (zh) 一种基于单目视觉的移动机器人半稠密地图构建方法
CN111325828A (zh) 一种基于三目相机的三维人脸采集方法及装置
CN110706332B (zh) 一种基于噪声点云的场景重建方法
CN112085842B (zh) 深度值确定方法及装置、电子设备和存储介质
CN116105721B (zh) 地图构建的回环优化方法、装置、设备及存储介质
CN116843754A (zh) 一种基于多特征融合的视觉定位方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19765081

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19765081

Country of ref document: EP

Kind code of ref document: A1