CN108537876B - Three-dimensional reconstruction method, device, equipment and storage medium - Google Patents

Three-dimensional reconstruction method, device, equipment and storage medium Download PDF

Info

Publication number
CN108537876B
CN108537876B CN201810179264.6A CN201810179264A CN108537876B CN 108537876 B CN108537876 B CN 108537876B CN 201810179264 A CN201810179264 A CN 201810179264A CN 108537876 B CN108537876 B CN 108537876B
Authority
CN
China
Prior art keywords
voxel
current
frame
image
target scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810179264.6A
Other languages
Chinese (zh)
Other versions
CN108537876A (en
Inventor
方璐
韩磊
苏卓
戴琼海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Tsinghua-Berkeley Shenzhen Institute Preparation Office
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua-Berkeley Shenzhen Institute Preparation Office filed Critical Tsinghua-Berkeley Shenzhen Institute Preparation Office
Priority to CN201810179264.6A priority Critical patent/CN108537876B/en
Publication of CN108537876A publication Critical patent/CN108537876A/en
Priority to PCT/CN2019/084820 priority patent/WO2019170164A1/en
Priority to US16/977,899 priority patent/US20210110599A1/en
Application granted granted Critical
Publication of CN108537876B publication Critical patent/CN108537876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/08Indexing scheme for image data processing or generation, in general involving all processing steps from image acquisition to 3D model generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Abstract

The embodiment of the invention discloses a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium based on a depth camera, wherein the method comprises the following steps: acquiring at least two frames of images acquired by a depth camera for acquiring a target scene; determining the relative camera pose during acquisition according to the at least two frames of images; determining at least one characteristic voxel from each frame of image by adopting at least two-stage nested screening modes, wherein each stage of screening adopts a corresponding voxel blocking rule; performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene; and generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene. The embodiment of the invention solves the problem of large operation amount when the target scene is subjected to three-dimensional reconstruction, realizes the application of the three-dimensional reconstruction to portable equipment, and makes the application of the three-dimensional reconstruction wider.

Description

Three-dimensional reconstruction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium.
Background
The three-dimensional reconstruction is to reconstruct a mathematical model of a three-dimensional object in the real world through a specific device and algorithm, and has extremely important significance for virtual reality, augmented reality, robot perception, human-computer interaction, robot path planning and the like.
In the current three-dimensional reconstruction method, in order to ensure the quality, consistency and real-time performance of the reconstruction result, a high-performance Graphics Processing Unit (GPU) and a depth camera (RGB-D camera) are generally required to complete the reconstruction. Firstly, shooting a target scene by using a depth camera to obtain at least two frames of images; solving each frame of image by using the GPU to acquire the relative camera pose of the depth camera when each frame of image is shot; traversing all voxels in each frame of image according to the relative camera pose corresponding to each frame of image to determine the voxels meeting certain conditions as candidate voxels; further, a Truncated Symbolic Distance Function (TSDF) model of each frame of image is constructed according to the candidate voxels in each frame of image; and finally, generating an isosurface for each frame of image on the basis of the TSDF model, thereby completing the real-time reconstruction of the target scene.
However, the existing three-dimensional reconstruction method has large operation amount and strong dependence on a GPU (graphics processing unit) special for image processing. However, the GPU cannot be portable, and is difficult to be applied to mobile robots, portable devices, wearable devices (such as augmented reality head display devices Microsoft HoloLens), and the like.
Disclosure of Invention
The embodiment of the invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, a three-dimensional reconstruction equipment and a storage medium, solves the problem of large calculation amount when a target scene is subjected to three-dimensional reconstruction, and realizes application of the three-dimensional reconstruction to portable equipment, so that the three-dimensional reconstruction is more widely applied.
In a first aspect, an embodiment of the present invention provides a three-dimensional reconstruction method, where the method includes:
acquiring at least two frames of images acquired by a depth camera for acquiring a target scene;
determining the relative camera pose during acquisition according to the at least two frames of images;
determining at least one characteristic voxel from each frame of image by adopting at least two-stage nested screening modes, wherein each stage of screening adopts a corresponding voxel blocking rule;
performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene;
and generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.
In a second aspect, an embodiment of the present invention further provides a three-dimensional reconstruction apparatus, where the apparatus includes:
the image acquisition module is used for acquiring at least two frames of images acquired by the depth camera for acquiring a target scene;
the pose determining module is used for determining the relative camera pose during acquisition according to the at least two frames of images;
the voxel determining module is used for determining at least one characteristic voxel from each frame of image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule;
the model generation module is used for carrying out fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene;
and the three-dimensional reconstruction module is used for generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
storage means for storing one or more programs;
at least one depth camera for image acquisition of a target scene;
when executed by the one or more processors, cause the one or more processors to implement a three-dimensional reconstruction method as described in any embodiment of the invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the three-dimensional reconstruction method according to any embodiment of the present invention.
The method comprises the steps of obtaining a target scene image collected by a depth camera, determining the relative camera pose of the depth camera when the depth camera collects the target scene image, determining the characteristic voxels of each frame of image by adopting at least two stages of nested screening modes, carrying out fusion calculation to obtain a grid voxel model of the target scene, generating an isosurface of the grid voxel model, and obtaining a three-dimensional reconstruction model of the target scene. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.
Drawings
In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.
Fig. 1 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present invention;
FIG. 2 is a schematic cube diagram of a two-level nested screening method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for determining a relative camera pose during acquisition according to a second embodiment of the present invention;
FIG. 4 is a flowchart of a method for determining at least one characteristic voxel from an image according to a third embodiment of the present invention;
FIG. 5 is a schematic plan view of determining at least one characteristic voxel according to a third embodiment of the present invention;
fig. 6 is a flowchart of a three-dimensional reconstruction method according to a fourth embodiment of the present invention;
fig. 7 is a block diagram of a three-dimensional reconstruction apparatus according to a fifth embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present invention, where this embodiment is applicable to a case where a depth camera is used to perform three-dimensional reconstruction on a target scene, and the method may be executed by a three-dimensional reconstruction device or an electronic device, where the device may be implemented in a hardware and/or software manner, and the three-dimensional reconstruction method in fig. 1 is schematically described below with reference to a cube diagram of a two-stage nested screening manner in fig. 2, where the method includes:
s101, acquiring at least two frames of images acquired by a depth camera for acquiring a target scene.
The depth camera is different from the traditional camera in that the camera can shoot image information of a scene and corresponding depth information of the scene at the same time, the design principle is that a reference beam is emitted aiming at a target scene to be detected, the distance of the shot scene is converted by calculating the time difference or phase difference of return light so as to generate the depth information, and in addition, the traditional camera is combined for shooting so as to obtain the image information. The target scene is a scene to be three-dimensionally reconstructed, for example, when an automatically driven automobile runs on a highway, the target scene is a running environment scene of the automobile, and a running environment image of the automobile is acquired in real time through a depth camera. Specifically, in order to accurately perform three-dimensional reconstruction on a target scene, at least two frames of images collected by the depth camera are acquired and processed, and the more the number of acquired frames is, the more accurate a reconstructed target scene model is. There are many methods for acquiring the image acquired by the depth camera, for example, the image may be acquired in a wired manner through a serial port, a network cable, or the like, or may be acquired in a wireless manner through bluetooth, a wireless broadband, or the like.
And S102, determining the relative camera pose during acquisition according to at least two frames of images.
The pose of the camera refers to the position and the pose of the camera, specifically, the position represents the translation distance of the camera (for example, translation transformation of the camera in X, Y, Z three directions), and the pose represents the rotation angle of the camera (for example, angle transformation α, β, γ of the camera in X, Y, Z three directions).
Because the field angle of the depth camera is fixed and the shooting angle is also fixed, in order to accurately reconstruct the target scene, the pose of the depth camera needs to be changed, and the target scene can be accurately reconstructed by shooting from different positions and angles. Therefore, the relative position and posture of the depth camera are different when each frame of image is shot, and can be represented by the relative posture of the depth camera, for example, the depth camera can automatically change the position and posture according to a certain track, or the depth camera can be manually rotated and moved to shoot. Therefore, the relative camera pose when each frame of image is acquired is determined, and the frame of image is accurately reconstructed to the position corresponding to the target scene.
Specifically, there are many methods for determining the pose of the depth camera, and for example, the pose of the camera can be directly acquired by installing sensors for measuring the translation distance and the rotation angle on the depth camera. Because the relative pose of the depth camera is not changed greatly when the two adjacent frames of images are collected, in order to acquire the relative camera pose more accurately, the relative pose of the camera when the camera collects the frame of image can be determined by processing the collected images.
S103, aiming at each frame of image, at least one characteristic voxel is determined from the image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule.
When the three-dimensional reconstruction of the target scene is performed, the embodiment of the present invention divides the reconstructed target scene into grid-shaped voxel blocks (fig. 2 is a partial grid-shaped voxel block of the reconstructed target scene), and corresponding to the corresponding position of each frame of image, each frame of image can be divided into planar voxel grids. The image acquired by the depth camera includes characteristic voxels and non-characteristic voxels when the target scene is reconstructed in three dimensions, for example, when the scene is reconstructed in a driving environment of an automobile, pedestrians, vehicles and the like in the image are characteristic voxels, and blue sky white clouds at a far distance are non-characteristic voxels. Therefore, voxels in each acquired frame of image are screened to find characteristic voxels when the target scene is reconstructed in three dimensions. The characteristic voxel may be composed of one voxel block, or may be composed of a preset number of voxel blocks.
If the judgment of whether the voxel lattices in each frame of image are characteristic voxels is carried out one by one, the operation amount is large, and preferably, at least one characteristic voxel can be determined from the image by adopting a mode of at least two-stage nested screening through a voxel blocking rule. Specifically, the voxel blocking rule may be that at least two levels of voxel units are set, each level of the filtered object is divided into at least two index blocks corresponding to the level of voxel units according to the voxel unit corresponding to the level, and the index blocks are filtered step by step.
Exemplarily, a two-level nested filtering manner is described as an example in conjunction with fig. 2, and it is assumed that two levels of voxel units corresponding to the two-level nested filtering are voxel units of 20mm and 5mm, specifically:
(1) a grid voxel of a target scene corresponding to one frame of image is divided into a plurality of first index blocks according to a voxel unit of 20mm (a cube 20 in fig. 2 is a first index block divided by a voxel unit of 20 mm).
(2) And performing primary screening on all the divided first index blocks, judging whether the first index blocks contain characteristic voxels, removing the characteristic voxels if the first index blocks (the cube 20) do not contain the characteristic voxels, and selecting the characteristic voxels as the characteristic blocks if the first index blocks (the cube 20) contain the characteristic voxels.
(3) Assuming that the cube 20 in fig. 2 includes a feature voxel, the selected feature block (cube 20) is further divided according to a 5mm voxel unit, and each feature block (cube 20) may be divided into 4 × 4 × 4 second index blocks (the cube 21 in fig. 2 is a second index block divided by a 5mm voxel unit).
(4) And performing secondary screening on all the divided second index blocks (cubes 21) to judge whether characteristic voxels are contained, removing the characteristic voxels if the second index blocks (cubes 21) do not contain the characteristic voxels, and selecting the characteristic voxels as the characteristic voxels if the second index blocks (cubes 21) contain the characteristic voxels.
If the multi-level nested screening is performed, except that the whole frame image is divided into a plurality of index blocks for screening for the first time, the rest of the multi-level nested screening divides the feature block containing the feature voxel, which is screened for the last time in the nesting, into a plurality of index blocks to be divided during the next-level screening according to the next-level voxel unit, and judges whether the feature voxel is contained or not until the nested screening of the last-level voxel unit is completed. For example, if the three-level nested screening is performed, after the two-level screening operation is performed, since the screening of the third-level voxel unit is not performed, all the second index blocks (cubes 21) including the feature voxels obtained in the step (4) of the two-level nested screening need to be further divided into a plurality of index blocks according to the third-level voxel unit as objects to be divided during the third-level screening, and then whether the feature voxels are included is determined.
And S104, performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene.
After at least one characteristic voxel corresponding to the image is determined in S103, in order to obtain a grid voxel model of the target scene, the determined at least one characteristic voxel is subjected to fusion calculation to obtain the grid voxel model of the target scene in combination with a relative camera pose when the depth camera acquires the frame of image. Each voxel in the grid voxel model stores the distance from the surface of the target scene and weight information representing the observation uncertainty.
Optionally, the grid voxel model in this embodiment may be a TSDF model, and specifically, as shown in fig. 2, it is assumed that a cube 21 is a feature voxel screened by multilevel nesting, and according to a formula
Figure GDA0002511173030000051
And performing fusion calculation on each characteristic voxel in each frame of image to obtain a TSDF model of the target scene. Wherein, tsdfavgAs a result of the fusion of the current characteristic voxels, tsdfi-1Is the distance, w, of the previous feature voxel to the surface of the target scenei-1As weight information of the previous feature voxel, tsdfiIs the distance, w, of the current characteristic voxel to the surface of the target sceneiWeight information for current characteristic voxels。
Optionally, when the feature voxels are screened in S103, in order to improve the screening rate, the screened feature voxels may include a preset number of voxel blocks corresponding to voxel units (for example, a feature voxel may be formed by 8 × 8 × 8 individual voxel blocks), at this time, when performing fusion calculation, the fusion calculation may be performed on the voxel blocks in each feature voxel according to a certain number, for example, the fusion calculation may be performed on the 8 × 8 × 8 individual voxel blocks in the feature voxels as a fusion object (i.e., a voxel) according to 2 × 2 × 2 individual voxel blocks.
Optionally, the feature voxels selected in S103 may be subjected to fusion computation in parallel, so as to improve the fusion rate of the grid voxel model of the target scene.
And S105, generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.
The grid voxel model of the target scene obtained in S104 is a distance model from the characteristic voxel to the surface of the target scene, and an isosurface needs to be generated on the basis of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene. For example, an iso-surface generation (i.e., generating a triangular patch representing a model surface), a tri-linear interpolation, a color extraction and addition, and a normal vector extraction may be performed by using a Marching Cubes (Marching Cubes) algorithm, thereby obtaining a three-dimensional reconstruction model of a target scene.
When the depth camera acquires an image of a target scene, most of the scenes in two adjacent frames of images are overlapped, and in order to increase the generation rate of the three-dimensional reconstruction model, optionally, generating an isosurface of the grid voxel model may include: and if the current frame image obtained by collecting the target scene is a key frame, generating an equivalent surface of each voxel block corresponding to the current key frame, and adding colors to the equivalent surface to obtain a three-dimensional reconstruction model of the target scene.
The method includes the steps that a key frame is set after feature point similarity between two frames of images collected by a depth camera is judged and processed, specifically, one key frame can be set for several continuous frames of images with high similarity, only the key frame is processed when an isosurface is generated, an isosurface of a voxel block corresponding to each key frame image is generated, the obtained model does not have color information and is not easy to identify each object in the image, for example, if a reconstructed target scene is a scene of an automobile driving environment, pedestrians, vehicles and roads in the model of the generated isosurface are integrated at the moment, which part is a pedestrian and which part is a vehicle cannot be distinguished, therefore, colors are added to the generated isosurface according to the color information in each frame of image, and each object in a three-dimensional reconstruction model of the target scene can be clearly identified.
It should be noted that the three-dimensional reconstruction process is a real-time dynamic process, and with the acquisition of images by a camera, the relative camera pose at the time of acquiring each frame of image is determined in real time, and the determination of the characteristic voxels and the generation of the grid voxel model and the isosurface thereof are performed for the corresponding images.
The embodiment provides a three-dimensional reconstruction method, which includes the steps of obtaining a target scene image collected by a depth camera, determining a relative camera pose of the depth camera when the depth camera collects the target scene image, determining a characteristic voxel of each frame of image by adopting at least two stages of nested screening modes, performing fusion calculation to obtain a grid voxel model of a target scene, generating an isosurface of the grid voxel model, and obtaining a three-dimensional reconstruction model of the target scene. In the fusion calculation stage, the characteristic voxels of each frame of image are determined by adopting at least two stages of nested screening modes, the voxels do not need to be traversed one by one, the calculated amount is reduced, the reconstruction precision is ensured, the fusion speed is greatly increased, and the three-dimensional reconstruction efficiency can be further improved. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.
Example two
On the basis of the above embodiments, the present embodiment further optimizes the determination of the relative camera pose at the time of acquisition according to at least two frames of images in S102. Fig. 3 is a flowchart of a method for determining a relative camera pose during acquisition according to a second embodiment of the present invention, and as shown in fig. 3, the method includes:
s301, extracting the features of each frame of image to obtain at least one feature point of each frame of image.
The feature extraction is performed on the image to find some pixel points (i.e., feature points) with landmark features in the frame image, for example, the pixel points may be corners, textures, and edges in a frame image. The feature extraction for each frame of image may use an organized FAST and Rotated BRIEF (ORB) algorithm to find at least one feature point in the frame of image.
And S302, performing matching operation on each feature point between two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images.
When the target scene is subjected to image acquisition, most contents of two adjacent frames of images are the same, so that a certain corresponding relationship also exists between the corresponding feature points of the two frames of images. Optionally, a fast search method (sparse matching algorithm) may be used to compare hamming distances between feature points of two adjacent frames of images to obtain a feature point corresponding relationship between two adjacent frames of images.
Specifically, taking a feature point between two adjacent frames of images as an example, assuming that feature points X1 and X2 representing the same texture feature in the two frames of images are located at different positions of the two frames of images, respectively, and H (X1 and X2) represents a hamming distance between two feature points X1 and X2, the two feature points are subjected to an exclusive or operation, and the number of the result is 1, which is taken as the hamming distance (i.e., a feature point corresponding relationship) of one feature point between two adjacent frames of images.
S303, removing abnormal corresponding relations in the corresponding relations of the feature points, and calculating J (ξ) through linear components containing the second-order statistics of the residual feature points and nonlinear components containing relative camera posesTNonlinear term in J (ξ)
Figure GDA0002511173030000071
Pair ═ J (ξ)TJ(ξ))-1J(ξ)Tr (ξ) performing multiple iterative calculations, and solving the relative camera pose when the reprojection error is smaller than the preset error thresholdThe pose when the reprojection error is minimized is calculated.
Where R (ξ) represents a vector containing all reprojection errors, J (ξ) is the Jacobian matrix of R (ξ), ξ represents the Li algebra relative to the camera pose, representing the delta value of R (ξ) at each iteration, and R (ξ) represents the vector containing all reprojection errors, and R (ξ) represents the delta value of R (ξ)iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; rjRepresenting a rotation matrix of the camera when the j frame image is acquired;
Figure GDA0002511173030000072
representing the kth characteristic point on the ith frame image;
Figure GDA0002511173030000073
representing the kth characteristic point on the jth frame image; ci,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i Ci,jThe | 1 represents the number of the corresponding relations of the characteristic points of the ith frame image and the jth frame image; []×Representing a vector product; i Ci,jI means taking Ci,jNorm of (d).
Further, the non-linear term
Figure GDA0002511173030000074
The expression of (a) is:
Figure GDA0002511173030000081
wherein the content of the first and second substances,
Figure GDA0002511173030000082
represents a linear component; r isil TAnd rjlRepresents a nonlinear component, ril TIs a rotation matrix RiLine I of (1), rjlIs a rotation matrix RjThe first row in (a) is transposed, i.e. 0,1,2 (this embodiment is based on the idea of programming, counting from 0, i.e. representing the so-called 1 st row of the matrix, and so on).
Specifically, some of the feature point correspondences between two adjacent frames of images obtained in S302 are abnormal correspondences,for example, in two adjacent frames of images, each frame of image must have feature points that are not included in the other frame of image, and the matching operation of S302 is performed on them, so that an abnormal correspondence relationship occurs. Optionally, a random sample Consensus (RANSAC) algorithm may be used to remove the abnormal corresponding relationship, and the obtained remaining feature point corresponding relationship may be represented as a residual feature point corresponding relationship
Figure GDA0002511173030000083
Wherein the content of the first and second substances,
Figure GDA0002511173030000084
representing the corresponding relation between the kth characteristic point between the ith frame image and the jth frame image; j-i-1.
When the relative camera pose is determined, certain errors are necessarily generated, so that the determination of the camera pose is to solve a nonlinear least square problem between two frames of images with the following formula as a cost function:
Figure GDA0002511173030000085
wherein E represents a reprojection error of the ith frame image in euclidean space compared with the jth frame image (the last frame image in this embodiment); t isiRepresents the pose of the camera when the ith frame image is acquired (as can be seen from the explanation of the pose of the camera, the pose of the camera actually means the change of the acquired ith frame image relative to the previous frame image), TjRepresenting the pose of the camera when acquiring the j frame image; n represents the total frame number collected by the camera;
Figure GDA0002511173030000086
representing the k-th feature point on the i-th frame image
Figure GDA0002511173030000087
The homogeneous coordinate of (a) is,
Figure GDA0002511173030000088
representing the kth feature point on the jth frame image
Figure GDA0002511173030000089
Homogeneous coordinates of (a). It should be noted that, when i and k have the same value,
Figure GDA00025111730300000810
and
Figure GDA00025111730300000811
represent the same spot, with the difference that
Figure GDA00025111730300000812
Is the local co-ordinate(s) of the location,
Figure GDA00025111730300000813
are homogeneous coordinates.
Specifically, in order to increase the operation rate when determining the relative camera pose, the cost function of the above equation is not directly calculated, but J (ξ) is calculated from a linear component including the correspondence of the second order statistic of the remaining feature points and a nonlinear component including the relative camera poseTNonlinear term in J (ξ)
Figure GDA00025111730300000814
Pair ═ J (ξ)TJ(ξ))-1J(ξ)Tr (ξ) performing multiple iterative calculations to solve the relative camera pose when the reprojection error is less than a preset error threshold value, and using a nonlinear term
Figure GDA0002511173030000091
By the expression (2), the nonlinear terms are carried out
Figure GDA0002511173030000092
In calculation, the linear part fixed between two frame images
Figure GDA0002511173030000093
The method is considered as a whole W to calculate, and does not need to calculate according to the number of the corresponding relations of the feature points, thereby reducing the complexity of the relative camera pose determination algorithm and increasingThe real-time performance of relative camera pose calculation is enhanced.
The derivation process of equation (1) is described below, and the derivation process is combined to analyze the principle of reducing the complexity of the algorithm.
Camera pose T when camera collects ith frame image in Euclidean spacei=[Ri/ti]In fact TiRefers to a pose transformation matrix including a rotation matrix R when the camera collects the ith frame image and the j (the last frame image in this embodiment) frame imageiAnd a translation matrix ti. Transforming the stiffness in Euclidean space by TiUsing lie algebra ξ in SE3 spaceiIs shown as ξiAlso represents the camera pose at which the camera captures the ith frame image, T (ξ)i) Lie algebra ξiMapping to T in Euclidean spacei
For each feature point correspondence
Figure GDA0002511173030000094
The reprojection error is:
Figure GDA0002511173030000095
the reprojection error in euclidean space in equation (1) can be expressed as E (ξ) | | r (ξ) |, and r (ξ) represents a vector containing all the reprojection errors, i.e.:
Figure GDA0002511173030000096
Figure GDA0002511173030000097
can be expressed as (for simplicity of presentation, ξ is omitted below)i):
Figure GDA0002511173030000098
Wherein the content of the first and second substances,
Figure GDA0002511173030000099
representing a rotation matrix RiLine i of (1); t is tilRepresenting a translation vector tiThe first element in (1), i ═ 0,1, 2.
Figure GDA00025111730300000910
Wherein the content of the first and second substances,
Figure GDA00025111730300000911
representing a Jacobian matrix corresponding to the corresponding relation of the feature points between the ith frame image and the jth frame image; m represents the mth feature point correspondence.
Figure GDA0002511173030000101
Figure GDA0002511173030000102
Is a 6 × 6 square matrix and is,
Figure GDA0002511173030000103
representation matrix
Figure GDA0002511173030000104
The transpose of (a) is performed,
Figure GDA0002511173030000105
the expression is as follows:
Figure GDA0002511173030000106
wherein, I3×3An identity matrix of 3 × 3 is represented according to equation (6) and equation (7),
Figure GDA0002511173030000107
the four non-zero 6 × 6 sub-matrices are:
Figure GDA0002511173030000108
the following are
Figure GDA0002511173030000109
For example, the other three non-zero submatrices are calculated similarly, and are not described again.
Figure GDA00025111730300001010
Wherein, the combination formula (5) can obtain:
Figure GDA00025111730300001011
will be provided with
Figure GDA00025111730300001012
Expressed as W, in combination with equation (5), the non-linear term in equation (10) can be expressed
Figure GDA00025111730300001013
Simplified as formula (1), structural terms in the nonlinear term
Figure GDA00025111730300001014
Is linearized as W. Albeit to the structural item
Figure GDA00025111730300001015
In the case of a non-woven fabric,
Figure GDA00025111730300001016
is non-linear, but through the analysis described above,
Figure GDA00025111730300001017
all non-zero elements of (1) and Ci,jThe second-order statistics of the medium structure terms are in linear relation, and the second-order statistics of the structure terms are
Figure GDA0002511173030000111
And
Figure GDA0002511173030000112
that is, the sparse matrix
Figure GDA0002511173030000113
To Ci,jThe second order statistics of the mesostructure terms are element linear.
It should be noted that each corresponding relationship
Figure GDA0002511173030000114
The Jacobian matrixes are all geometric terms ξi,ξjAnd structural items
Figure GDA0002511173030000115
And (6) determining. For the same frame pair Ci,jAll corresponding relations in (2), their corresponding jacobian matrices share the same geometric terms but have different structural terms. For one frame pair Ci,jCalculating
Figure GDA0002511173030000116
When existing algorithms rely on Ci,jThe number of the corresponding relations of the medium feature points, and the embodiment can efficiently calculate with fixed complexity
Figure GDA0002511173030000117
Only the second-order statistic W of the structural item needs to be calculated, and the related structural item does not need to be involved in the calculation of each corresponding relation, namely
Figure GDA0002511173030000118
The four non-zero submatrices in the system can replace the complexity O (| | C) with the complexity O (1)i,j| |) is calculated.
Thus, when ═ J (ξ)TJ(ξ))-1J(ξ)TSparse matrix J required in the iterative step of the non-linear Gauss-Newton optimization of r (ξ)TJ and JTr can be efficiently calculated by the complexity O (M) to replace the original calculation complexity O (N)coor),NcoorAnd M represents the number of the frame pairs. In general, O (N)coor) About 300 in sparse matching and about 10000 in dense matching, which is much larger than the number of frame pairsM。
Through the derivation, in the camera pose calculation process, W is calculated for each frame pair, and then the expressions (1), (10), (9), (8) and (6) are calculated to obtain
Figure GDA0002511173030000119
Further, ξ can be found by iterative calculation when r (ξ) is minimum.
S304, judging whether the current frame image obtained by collecting the target scene is a key frame, if so, executing S305, otherwise, waiting for the next frame image to execute S304 again.
The step of judging whether the current frame image obtained by collecting the target scene is a key frame may be: matching operation is carried out on a current frame image obtained by collecting a target scene and a previous key frame image to obtain a conversion relation matrix between the two frame images; and if the conversion relation matrix is greater than or equal to the preset conversion threshold value, determining the current frame image as the current key frame.
Specifically, similar to the method for determining the feature point correspondence between two adjacent frames of images in S302, matching operation may be performed on the current frame of image and the previous key frame to obtain a feature point correspondence matrix between the two frames of images, and when the matrix is greater than or equal to a preset conversion threshold, the current image is determined to be the current key frame. The conversion relation matrix between the two frames of images can be a matrix formed by corresponding relations of all characteristic points between the two frames of images.
It should be noted that the first frame image obtained by acquiring the target scene may be set as the first key frame, and the preset conversion threshold is set in advance according to the motion condition of the depth camera when acquiring the image, for example, if the pose change is large when the camera shoots two adjacent frames of images, the preset conversion threshold is set to be larger.
S305, loop detection is carried out according to the current key frame and the historical key frame; and if the loop is successful, performing globally consistent optimization updating on the determined relative camera pose according to the current key frame.
The globally consistent optimization updating refers to that in the process of reconstruction, a reconstruction algorithm continuously expands a three-dimensional reconstruction model of a target scene along with the movement of a camera, and when a depth camera moves to a place where the depth camera arrives once or has large overlap with a historical view angle, the expanded three-dimensional reconstruction model and a generated model are consistent or optimized and updated into a new model together, instead of phenomena of staggering, aliasing and the like. The loop detection is to determine whether the camera has moved to a place that has been reached or a place that has a large overlap with the historical viewing angle according to the current observation of the depth camera, and to optimize and reduce the accumulated error.
In order to improve the optimization rate, if the loop detection of the previous key frame and the historical key frame is successful (namely, the depth camera moves to a place which is reached once or a place which has larger overlap with the historical view angle), the generated model is optimized and updated in a global consistent manner through the current key frame and the historical key frame, and the error of the three-dimensional reconstruction model is reduced; and if the loop detection is unsuccessful, waiting for the occurrence of the next key frame, and performing loop detection on the next key frame. Specifically, the loop detection of the current key frame and the historical key frame may be performed by performing matching operation on feature points of the current key frame and the historical key frame, and if the matching degree is high, the loop detection is successful.
Optionally, global consistent optimization updating of the relative camera pose is performed, that is, the corresponding relation between the current key frame and one or more historical key frames with high matching degree is solved to obtain the corresponding relation
Figure GDA0002511173030000121
The method is a problem of minimized conversion error between a current key frame of a cost function and all history key frames with high matching degree. Wherein, E (T)1,T2,···,TN-1|Ti∈SE3,i∈[1,N-1]) Representing the conversion error of all frame pairs (any history matching key frame and the current key frame are one frame pair); n is the number of historical key frames with high matching degree with the current key frames; ei,jAnd representing the conversion error between the ith frame and the jth frame, wherein the conversion error is the reprojection error.
Specifically, in the process of performing the relative camera pose updating optimization, the relative poses of the non-key frames and the key frames corresponding to the non-key frames need to be kept unchanged, and the specific optimization updating algorithm uses the existing BA algorithm, or uses the method in S303, which is not described in detail.
The method for determining the relative camera pose during acquisition provided by this embodiment extracts at least one feature point of each frame of image, performs matching operation on each feature point between two adjacent frames of images to obtain a feature point corresponding relationship between the two adjacent frames of images, moves out an abnormal corresponding relationship therebetween, calculates the relative camera pose through a linear component containing a remaining feature point corresponding relationship and a nonlinear component containing the relative camera pose, and determines a key frame, and if the currently acquired image is a key frame and loop detection is successful, performs global consistent optimization update on the determined relative camera pose according to the current key frame and a historical key frame. The method has the advantages that the overall consistency is guaranteed, meanwhile, the operation amount in the three-dimensional reconstruction is reduced, the three-dimensional reconstruction is applied to portable equipment, and the three-dimensional reconstruction is more widely applied.
EXAMPLE III
Based on the above embodiments, the present embodiment further explains that, in S103, at least one characteristic voxel is determined from each image by using at least two levels of nested filtering manners. The method of fig. 4 for determining at least one characteristic voxel from an image is schematically described below in conjunction with the schematic plan view of fig. 5 for determining at least one characteristic voxel, the method comprising:
s401, regarding each frame of image, taking the image as a current-level screening object, and determining a current-level voxel unit.
The voxel unit represents the accuracy of the constructed three-dimensional reconstruction model, and is set in advance according to the accuracy of the three-dimensional reconstruction model of the target scene to be reconstructed, and may be, for example, 5mm or 10 mm. Since the embodiment determines at least one characteristic voxel from the image by at least two levels of nested screening, at least two levels of voxel units are set, where the minimum level of voxel unit is the accuracy required to reconstruct the model. Firstly, the acquired image is used as a current screening object to screen the characteristic voxels, and the current voxel unit is the largest-level voxel unit in the preset multi-level voxel units.
Illustratively, as shown in fig. 5, it is assumed that real-time three-dimensional reconstruction of a CPU-based 100Hz frame rate, 5mm voxel-level precision model is to be achieved, and two-level nested screening of feature voxels is performed in 20mm voxel units and 5mm voxel units, respectively. In this case, the acquired image is used as the current screening object, and the current level voxel unit is a voxel unit of 20 mm.
S402, dividing the current-level screening object into voxel blocks according to the current-level voxel unit, and determining at least one current index block according to the voxel blocks; wherein, the current index block comprises a preset number of voxel blocks.
In order to improve the screening rate, when screening the current-level screening object, at least one index block can be determined according to the preset number of voxel blocks divided according to the current voxel unit, and characteristic voxels are screened according to the index blocks. Note that the characteristic voxel size in this case is not the size of one voxel block, but the size of a predetermined number of voxel blocks.
For example, as shown in fig. 5, it is assumed that the current index block is composed of a preset number of 8 × 8 × 8 individual pixel blocks, the acquired image is divided into a plurality of 20 mm-side voxel blocks according to a 20mm voxel unit, the divided plurality of 20 mm-side voxel blocks are divided into at least one 160 mm-side index block corresponding to the 20mm voxel unit according to the 8 × 8 × 8 number, and mapping to the planar schematic diagram is to divide the entire image into 6 160 mm-side index blocks corresponding to the 20mm voxel unit according to an 8 × 8 block.
And S403, selecting at least one feature block with the distance to the surface of the target scene smaller than the corresponding distance threshold of the current-level voxel unit from all the current index blocks.
The distances between all current index blocks determined in S402 and the surface of the target scene are calculated, and the smaller the distance is, the closer the distance between the index block and the surface of the target scene is, a distance threshold is preset for each level of voxel unit, and when the distance between the index block and the surface of the target scene is smaller than the distance threshold corresponding to the current level of voxel unit, the index block is selected as a feature block. Wherein the distance threshold corresponding to the upper-level voxel unit is larger than the distance threshold corresponding to the lower-level voxel unit.
Optionally, at least one feature block, of which the distance to the surface of the target scene is smaller than the corresponding distance threshold of the current-level voxel unit, is selected from all current index blocks, and the feature block may be: aiming at each current index block, accessing the index block according to the hash value of the current index block, and respectively calculating the distance from each vertex of the current index block to the surface of a target scene according to the relative camera pose when each frame of image is collected and the image depth value obtained by a depth camera; and selecting the current index blocks with the vertex distances smaller than the corresponding distance threshold of the current-level voxel unit as the feature blocks.
Specifically, a hash value may be set for each current index chunk, and each index chunk is accessed through the hash value. Calculating the distance from the voxel block of each vertex of the current index block to the surface of the target scene according to the formula sdf | | xi-S | | -D (u, v), wherein sdf represents the distance from the voxel block (each vertex voxel block of the index block) to the surface of the target scene; xi represents the relative camera pose when the frame image is acquired; s represents the coordinates of the voxel block in a grid voxel model of a reconstruction space; d (u, v) represents the corresponding depth value of the voxel block in the depth camera acquired image. When the distance from each vertex of the index block to the surface of the target scene is smaller than the distance threshold corresponding to the current grade voxel unit, setting the index block as a characteristic block; and if the distance is larger than or equal to the distance corresponding to the current level voxel unit, removing the index block. Optionally, an average value of distances from each vertex of the index block to the surface of the target scene may also be calculated, and if the average value is smaller than a distance threshold corresponding to the current voxel unit, the index block is set as the feature block. Illustratively, as shown in fig. 5, a diagonal grid with a side length of 160mm in the graph is an index block to be removed divided by 20mm voxel units, that is, the distance from the partial index block to the surface of the target scene is greater than a distance threshold corresponding to 20mm voxel units.
And S404, judging whether the feature block meets the division condition of the minimum level voxel unit, if so, executing S405, and if not, executing S406.
Whether the feature block meets the division condition of the minimum-level voxel unit is judged, that is, whether the feature block selected in S403 is the feature block selected after the preset minimum-level voxel unit is divided is judged. For example, as shown in fig. 5, if the feature block selected in S403 is a feature block with a side length of 160mm divided by 20mm voxel units, and the minimum-level voxel unit is a voxel unit with a length of 5mm, it indicates that the feature block selected in S403 does not satisfy the division condition of the minimum-level 5mm voxel unit, and S406 is executed to perform the next-level 5mm voxel unit screening; if the feature block selected in S403 is a feature block divided by 5mm voxel units and having a side length of 40mm, it indicates that the feature block selected in S403 satisfies the minimum grade division condition of 5mm voxel units, and S405 is executed to use the feature block as a feature voxel.
S405, the feature block is used as a feature voxel.
And S406, replacing all the feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with a new current-level voxel unit, and returning to execute S402.
When the selected feature block in S403 does not satisfy the partition condition of the minimum-level voxel unit, all the feature blocks selected in S403 are used as new current-level screening objects, the next-level voxel unit is selected as the current-level voxel unit, the process returns to S402, and the feature blocks are screened again.
For example, as shown in fig. 5, if it is determined that the feature block selected in S403 is a feature block with a side length of 160mm divided by a 20mm voxel unit, and is not a feature block with a side length of 40mm divided by a minimum-level 5mm voxel unit, at this time, all feature blocks with a side length of 160mm divided by a 20mm voxel unit are used as a current-level screening object, a next-level 5mm voxel unit is selected as a current-level voxel unit, the process returns to S402, all feature blocks with a side length of 160mm screened in S403 are divided into a plurality of 5mm side voxel blocks according to a 5mm voxel unit, the divided plurality of 5mm side voxel blocks are divided into at least one 40mm side index block corresponding to a 5mm voxel unit according to a number of 8 × 8 × 8, and the whole image is mapped into 32 40mm side index blocks corresponding to a 5mm voxel unit according to an 8 × 8 block in a plane schematic diagram, then, S403 and S404 are executed, at this time, the obtained feature block with the side length of 40mm (for example, a blank square grid corresponding to the side length of 40mm in the drawing) is a feature block selected after the minimum grade 5mm voxel unit division, that is, the feature block is the selected feature voxel, and the dot square grid with the side length of 40mm in fig. 5 is an index block to be removed divided by the 5mm voxel unit.
The method for determining at least one characteristic voxel from an image provided by the embodiment determines at least one characteristic voxel from the image by adopting at least two levels of nested screening modes for each frame of image. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.
Example four
The present embodiment provides a preferred embodiment of three-dimensional reconstruction based on the above embodiments, as shown in fig. 6, the method includes:
s601, acquiring at least two frames of images acquired by the depth camera for acquiring the target scene.
And S602, determining the relative camera pose during acquisition according to at least two frames of images.
S603, judging whether the current frame image obtained by collecting the target scene is a key frame, if so, storing the key frame and executing S604, and if not, waiting for the next frame image to execute S603 again.
For each frame of image collected by the camera, whether the frame of image is a key frame or not can be judged, and the judged key frame is stored, so that an isosurface is generated according to the key frame rate and used as a historical key frame in subsequent loop optimization. It should be noted that the first frame acquired by the camera is used as a key frame by default.
And S604, loop detection is carried out according to the current key frame and the historical key frame, and if loop detection is successful, S608 (for optimizing and updating the grid voxel model and the isosurface) and S6011 (for optimizing and updating the relative camera pose) are executed.
S605, aiming at each frame of image, at least one characteristic voxel is determined from the image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule.
And S606, performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene.
And S607, generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.
S608, a first preset number of key frames matched with the current key frame are selected from the historical key frames, and a second preset number of non-key frames are respectively obtained from the non-key frames corresponding to the selected matched key frames.
In order to achieve global consistency of model reconstruction, if a current frame image is a key frame, a first preset number of key frames matched with the current key frame are selected from historical key frames, and specifically, matching operation can be performed on the current key frame and the historical key frames, for example, a hamming distance between feature points between the current key frame and the historical key frames can be calculated to complete matching between the current key frame and the historical key frames. Selecting a first preset number of historical key frames with high matching degree with the current key frame, for example, selecting 10 historical key frames with high matching degree with the current key frame. Each key frame has a non-key frame corresponding to the key frame, and for each selected historical key frame with high matching degree, a second preset number of non-key frames are also selected from the non-key frames corresponding to the selected historical key frame, optionally, at most no more than 11 non-key frames can be selected averagely and dispersedly from all the non-key frames corresponding to the historical key frame, so that the optimized frame selection is more representative while the optimized updating efficiency is improved. The first preset number and the second preset number may be set in advance according to a need when the three-dimensional reconstruction model is updated.
And S609, optimizing and updating the grid voxel model of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame and the acquired non-key frame.
And performing optimization updating on the grid voxel model of the three-dimensional reconstruction model, namely updating the characteristic voxels and updating the grid voxel model of the target scene.
Optionally, when updating the feature voxels, considering that the view angle overlap when the depth camera collects two adjacent frames of images is too large, so that the feature voxels selected by the two adjacent frames of images are almost consistent, and it takes a long time to perform the optimization updating of the feature voxels once for each frame of image, so that when updating the feature voxels, only performing S605 on each matched historical keyframe again to complete the optimization updating of the feature voxels.
Because the grid voxel model of the target scene generated in S606 is generated after processing each frame of image, when the grid voxel model of the target scene is updated, the historical key frames with high matching degree and the corresponding non-key frames are optimized and updated, that is, when each key frame arrives, the corresponding fusion data is removed, and S606 is executed again to perform fusion calculation, so as to complete the optimization and update of the grid voxel model of the target scene.
Whether the fusion calculation is performed when the grid voxel model of the target scene is obtained initially or the fusion calculation is performed in the optimization updating stage of the grid voxel model, a voxel block can be used as a fusion object for the fusion calculation. In order to improve the fusion efficiency, a predetermined number of voxel blocks, for example, voxels having a size of 2 × 2 × 2 voxel blocks, may be used as a fusion object for fusion calculation.
S610, optimizing and updating the isosurface of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame.
Since the iso-surface of the grid voxel model is generated only for the key frame in S607, when performing the iso-surface update, S607 may be performed again to update the iso-surface of the matching key frame only for the historical key selected in S608 with a high matching degree with the current key frame.
In order to accelerate the model updating optimization speed, the optimization updating of the iso-surface of the three-dimensional reconstruction model may be: for each matching key frame, selecting at least one voxel block of which the distance to the surface of a target scene is less than or equal to an update threshold of a corresponding voxel in the matching key frame from all voxel blocks corresponding to the current key frame; and performing optimization updating on the isosurface of each matched key frame according to the selected at least one voxel block.
The updating threshold may be set as the updating threshold of each voxel by selecting the maximum value of the distances from the voxel blocks in each voxel block to the surface of the target scene for each voxel in the keyframe used for generating the iso-surface while the iso-surface of the grid voxel model is generated in S607. That is, each voxel in the keyframe used to generate the iso-surface is set with a corresponding update threshold.
Specifically, the distance from each voxel block of the current key frame to the surface of the target scene may be calculated, and then, for each matching key frame, the voxel correspondence between the two frames of images is determined according to the correspondence between the current key frame and the matching key frame. And finding out the voxel corresponding to the current voxel in the current key frame in the matching key frame according to the voxel corresponding relation so as to determine a corresponding updating threshold, and then selecting a voxel block with the distance to the surface of the target scene less than or equal to the updating threshold from the voxel blocks of the current voxel. Therefore, the above selecting operation is performed on each voxel in the current keyframe one by one, the filtering of the voxel block is completed, the isosurface optimization updating is performed according to the selected voxel block, and the process of specifically obtaining the isosurface is similar to that of S607 and is not repeated. And the voxel blocks with the distance larger than the updating threshold are voxel blocks needing to be ignored, and no operation is carried out on the voxel blocks. Thus, part of the voxel blocks are filtered, and the calculation speed can be improved.
Optionally, to avoid searching the hash value in the hash table once to access one voxel block, the hash table may be searched for the hash values of a plurality of adjacent voxel blocks at the same time when accessing the voxel block.
S6011, performing global consistent optimization updating on the determined relative camera pose according to the current key frame. The relative camera pose is updated for use in updating the corresponding grid voxel model.
In order to ensure the real-time performance of three-dimensional reconstruction, while acquiring the target scene image in S601, the pose of the camera in S602 and the keyframe in S603 may be determined in real time for each frame of image, that is, the pose is calculated and the keyframe is determined while acquiring the image. And the process of generating the three-dimensional reconstruction model of the target scene in S605 to S607 and the process of updating the generated three-dimensional reconstruction model in S608 to S610 are also performed simultaneously, that is, the optimized updating of the built partial model is completed in the process of generating the three-dimensional reconstruction model.
The embodiment provides a three-dimensional reconstruction method, which includes the steps of obtaining a target scene image collected by a depth camera, determining a relative camera pose of the depth camera when the depth camera collects the target scene image, determining a characteristic voxel of each frame image by adopting at least two stages of nested screening modes, performing fusion calculation to obtain a grid voxel model of the target scene, generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene, and performing optimization updating on the three-dimensional reconstruction model of the target scene according to a current key frame, each matching key frame and a non-key frame of each matching key frame to ensure the global consistency of the model. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.
EXAMPLE five
Fig. 7 is a block diagram of a three-dimensional reconstruction apparatus according to a fifth embodiment of the present invention, which is capable of executing a three-dimensional reconstruction method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. The apparatus may be implemented on a CPU basis. As shown in fig. 7, the apparatus includes:
an image obtaining module 701, configured to obtain at least two frames of images acquired by a depth camera for a target scene;
a pose determination module 702 for determining a relative camera pose at the time of acquisition from the at least two frames of images;
a voxel determination module 703, configured to determine, for each frame of image, at least one characteristic voxel from the image in at least two levels of nested screening manners, where each level of screening employs a corresponding voxel blocking rule;
the model generation module 704 is configured to perform fusion calculation on at least one feature voxel of each frame of image according to the relative camera pose of each frame of image, so as to obtain a grid voxel model of the target scene;
and a three-dimensional reconstruction module 705, configured to generate an isosurface of the grid voxel model, to obtain a three-dimensional reconstruction model of the target scene.
Optionally, the three-dimensional reconstruction module 705 is specifically configured to, if a current frame image obtained by collecting the target scene is a key frame, generate an isosurface of each voxel block corresponding to the current key frame, and add a color to the isosurface to obtain a three-dimensional reconstruction model of the target scene.
The embodiment provides a three-dimensional reconstruction device, which is characterized in that a target scene image acquired by a depth camera is acquired, the camera pose of the depth camera when the target scene image is acquired is determined, at least two stages of nested screening modes are adopted to determine the characteristic voxels of each frame of image, fusion calculation is performed to obtain a grid voxel model of a target scene, an isosurface of the grid voxel model is generated, and a three-dimensional reconstruction model of the target scene is obtained. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.
Further, the pose determination module 702 includes:
the characteristic point extraction unit is used for extracting the characteristics of each frame of image to obtain at least one characteristic point of each frame of image;
the matching operation unit is used for performing matching operation on each feature point between two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images;
a pose determining unit for removing abnormal corresponding relation in the corresponding relation of the feature points, and calculating by linear components including second-order statistic of the residual feature points and nonlinear components including relative camera poseJ(ξ)TNonlinear term in J (ξ)
Figure GDA0002511173030000181
Pair ═ J (ξ)TJ(ξ))-1J(ξ)Tr (ξ) performing repeated iterative computation to solve the relative camera pose when the reprojection error is smaller than a preset error threshold;
where R (ξ) represents a vector containing all reprojection errors, J (ξ) is the Jacobian matrix of R (ξ), ξ represents the Li algebra relative to the camera pose, representing the delta value of R (ξ) at each iteration, and R (ξ) represents the vector containing all reprojection errors, and R (ξ) represents the delta value of R (ξ)iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; rjRepresenting a rotation matrix of the camera when the j frame image is acquired;
Figure GDA0002511173030000191
representing the kth characteristic point on the ith frame image;
Figure GDA0002511173030000192
representing the kth characteristic point on the jth frame image; ci,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i Ci,jThe | 1 represents the number of the corresponding relations of the characteristic points of the ith frame image and the jth frame image; []×Representing a vector product; i Ci,jI means taking Ci,jNorm of (d).
In particular, non-linear terms
Figure GDA0002511173030000193
The expression of (a) is:
Figure GDA0002511173030000194
wherein the content of the first and second substances,
Figure GDA0002511173030000195
represents a linear component; r isil TAnd rjlRepresents a nonlinear component, ril TIs a rotation matrix RiLine I of (1), rjlIs a rotation matrix RjThe transpose of the l-th line in (1), l is 0,1, 2.
Further, the above apparatus further comprises:
the key frame determining module is used for performing matching operation on a current frame image and a previous key frame image obtained by acquiring a target scene to obtain a conversion relation matrix between the two frames of images; and if the conversion relation matrix is greater than or equal to the preset conversion threshold value, determining the current frame image as the current key frame.
The loop detection module is used for performing loop detection according to the current key frame and the historical key frame if the current frame image obtained by collecting the target scene is the key frame;
and the pose updating module is used for performing global consistent optimization updating on the determined relative camera pose according to the current key frame if the loop is successful.
Further, the voxel determination module 703 includes:
the initial determining unit is used for taking each frame of image as a current-level screening object and determining a current-level voxel unit;
the index block determining unit is used for dividing the current-level screening object into voxel blocks according to the current-level voxel unit and determining at least one current index block according to the voxel blocks; the current index block comprises a preset number of voxel blocks;
the characteristic block selecting unit is used for selecting at least one characteristic block, of which the distance to the surface of the target scene is smaller than the corresponding distance threshold of the current-level voxel unit, from all current index blocks;
a characteristic voxel determining unit, configured to take the characteristic block as a characteristic voxel if the characteristic block meets a partition condition of a minimum-level voxel unit;
the circulation unit is used for replacing all the feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with the new current-level voxel unit and returning to execute the voxel block division operation aiming at the current-level screening object if the feature blocks do not meet the division condition of the minimum-level voxel unit; wherein the voxel units are reduced step by step to the minimum level voxel unit.
Optionally, the feature block selecting unit is specifically configured to:
aiming at each current index block, accessing the index block according to the hash value of the current index block, and respectively calculating the distance from each vertex of the current index block to the surface of a target scene according to the relative camera pose when each frame of image is collected and the image depth value obtained by a depth camera; and selecting the current index blocks with the vertex distances smaller than the corresponding distance threshold of the current-level voxel unit as the feature blocks.
Further, the above apparatus further comprises:
the matching frame determining module is used for selecting a first preset number of key frames matched with the current key frame from the historical key frames and respectively acquiring a second preset number of non-key frames from the non-key frames corresponding to the selected matching key frames if the current frame image acquired by acquiring the target scene is the key frame;
the model updating module is used for optimizing and updating the grid voxel model of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame and the acquired non-key frame;
and the isosurface updating module is used for optimizing and updating the isosurface of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame.
Optionally, the iso-surface updating module is specifically configured to select, for each matching keyframe, at least one voxel block, of which a distance to the surface of the target scene is less than or equal to an update threshold of a corresponding voxel in the matching keyframe, from among the voxel blocks corresponding to the current keyframe; and performing optimization updating on the isosurface of each matched key frame according to the selected at least one voxel block.
The three-dimensional reconstruction module 705 is further configured to, while generating an iso-surface of each voxel block corresponding to the current keyframe image, select a maximum value of a distance from each voxel block in the voxel to the target scene surface for each voxel in the keyframe used for generating the iso-surface, and set the maximum value as an update threshold of the voxel.
EXAMPLE six
Fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention, as shown in fig. 8, the electronic device includes a storage device 80, one or more processors 81, and at least one depth camera 82; the storage 80, processor 81 and depth camera 82 of the electronic device may be connected by a bus or other means, as exemplified by the bus connection in fig. 8.
The storage device 80, which is a computer-readable storage medium, can be used for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the three-dimensional reconstruction device in the embodiments of the present invention (for example, the image acquisition module 701 used in the three-dimensional reconstruction device). The processor 81 implements the three-dimensional reconstruction method described above by processing software programs, instructions, and modules stored in the storage device 80 to execute various functional applications and data processing of the electronic device. Alternatively, the processor 81 may be a central processing unit or a high-performance graphics processor.
The storage device 80 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage device 80 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage device 80 may further include a storage device remotely located from the processor 81, which may be connected to the appliance over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The depth camera 82 may be used for image acquisition of the target scene under the control of the processor 81. The depth camera can be embedded in an electronic device, and optionally, the electronic device can be a portable mobile electronic device, for example, the electronic device can be a smart terminal (a mobile phone, a tablet computer) or a three-dimensional visual interaction device (VR glasses, a wearable helmet), and can perform image shooting under operations of moving, rotating and the like.
The electronic device provided by the embodiment can be used for executing the three-dimensional reconstruction method provided by any embodiment, and has corresponding functions and beneficial effects.
EXAMPLE seven
The seventh embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the three-dimensional reconstruction method of the foregoing embodiments.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
In summary, in the three-dimensional reconstruction scheme provided by the embodiment of the present invention, in the fusion calculation stage, the selection of the characteristic voxels is performed by using a coarse-to-fine nested screening strategy and a sparse sampling concept, so that the reconstruction accuracy is ensured, and the fusion speed is greatly increased; the isosurface is generated at the key frame rate, so that the generation speed of the isosurface can be increased; and the three-dimensional reconstruction efficiency is improved. In addition, the global consistency of the three-dimensional reconstruction can be effectively ensured by optimizing the updating stage.
The above example numbers are for description only and do not represent the merits of the examples.
It will be appreciated by those of ordinary skill in the art that the modules or operations of the embodiments of the invention described above may be implemented using a general purpose computing device, which may be centralized on a single computing device or distributed across a network of computing devices, and that they may alternatively be implemented using program code executable by a computing device, such that the program code is stored in a memory device and executed by a computing device, and separately fabricated into integrated circuit modules, or fabricated into a single integrated circuit module from a plurality of modules or operations thereof. Thus, the present invention is not limited to any specific combination of hardware and software.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method of three-dimensional reconstruction, comprising:
acquiring at least two frames of images acquired by a depth camera for acquiring a target scene;
determining the relative camera pose during acquisition according to the at least two frames of images;
determining at least one characteristic voxel from each frame of image by adopting at least two-stage nested screening modes, wherein each stage of screening adopts a corresponding voxel blocking rule;
performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene;
generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene;
aiming at each frame of image, determining at least one characteristic voxel from the image by adopting at least two levels of nested screening modes, wherein the method comprises the following steps:
regarding each frame of image, taking the image as a current-level screening object, and determining a current-level voxel unit;
dividing the current-level screening object into voxel blocks according to the current-level voxel units, and determining at least one current index block according to the voxel blocks; the current index block comprises a preset number of voxel blocks;
selecting at least one feature block with the distance to the surface of the target scene smaller than the corresponding distance threshold of the current-level voxel unit from all current index blocks;
if the characteristic block meets the division condition of the minimum-level voxel unit, taking the characteristic block as a characteristic voxel;
if the feature block does not meet the partition condition of the minimum-level voxel unit, replacing all feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with the new current-level voxel unit, and returning to execute the voxel block partition operation aiming at the current-level screening object; wherein the voxel units are reduced step by step to the minimum level voxel unit.
2. The method of claim 1, wherein determining a relative camera pose at the time of acquisition from the at least two frames of images comprises:
extracting the features of each frame of image to obtain at least one feature point of each frame of image;
matching each feature point between two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images;
removing abnormal correspondences in the feature point correspondences, calculating J (ξ) by a linear component including a second order statistic of remaining feature points and a nonlinear component including a relative camera poseTNonlinear term in J (ξ)
Figure FDA0002511173020000011
Pair ═ J (ξ)TJ(ξ))-1J(ξ)Tr (ξ) performing repeated iterative computation to solve the relative camera pose when the reprojection error is smaller than a preset error threshold;
where r (ξ) represents a vector containing all reprojection errors, J (ξ) is the Jacobian matrix of r (ξ), ξ represents the Li algebra relative to the camera pose, and represents the delta value of r (ξ) at each iteration;RiRepresenting a rotation matrix of the camera when the ith frame of image is acquired; rjRepresenting a rotation matrix of the camera when the j frame image is acquired;
Figure FDA0002511173020000021
representing the kth characteristic point on the ith frame image;
Figure FDA0002511173020000022
representing the kth characteristic point on the jth frame image; ci,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i Ci,jThe | 1 represents the number of the corresponding relations of the characteristic points of the ith frame image and the jth frame image; []×Representing a vector product; i Ci,jI means taking Ci,jNorm of (d).
3. The method of claim 2, wherein the non-linear term
Figure FDA0002511173020000023
The expression of (a) is:
Figure FDA0002511173020000024
wherein the content of the first and second substances,
Figure FDA0002511173020000025
represents a linear component; r isil TAnd rjlRepresents a nonlinear component, ril TIs a rotation matrix RiLine I of (1), rjlIs a rotation matrix RjThe transpose of the l-th line in (1), l is 0,1, 2.
4. The method of claim 1 or 2, after determining the relative camera pose at the time of acquisition from the at least two frames of images, further comprising:
if the current frame image acquired by collecting the target scene is a key frame, loop detection is carried out according to the current key frame and a historical key frame;
and if the loop returning is successful, performing globally consistent optimization updating on the determined relative camera pose according to the current key frame.
5. The method of claim 4, further comprising, prior to performing loop back detection based on the current key frame and the historical key frame:
matching operation is carried out on the current frame image acquired by collecting the target scene and the previous key frame image, and a conversion relation matrix between the two frame images is acquired;
and if the conversion relation matrix is larger than or equal to a preset conversion threshold value, determining the current frame image as the current key frame.
6. The method according to claim 1, wherein selecting at least one feature block from all current index blocks having a distance to a target scene surface less than the current level voxel unit corresponding distance threshold comprises:
aiming at each current index block, accessing the index block according to the hash value of the current index block, and respectively calculating the distance from each vertex of the current index block to the surface of the target scene according to the relative camera pose when each frame of image is collected and the image depth value obtained by a depth camera;
and selecting the current index block with the vertex distance smaller than the corresponding distance threshold of the current stage voxel unit as a feature block.
7. The method of claim 1, wherein generating an iso-surface of the grid voxel model resulting in a three-dimensional reconstructed model of the target scene comprises:
and if the current frame image acquired by collecting the target scene is a key frame, generating an isosurface of each voxel block corresponding to the current key frame, and adding colors to the isosurface to obtain a three-dimensional reconstruction model of the target scene.
8. The method of claim 1, after generating an iso-surface of the grid voxel model to obtain a three-dimensional reconstructed model of the target scene, further comprising:
if the current frame image acquired by collecting the target scene is a key frame, selecting a first preset number of key frames matched with the current key frame from the historical key frames, and respectively acquiring a second preset number of non-key frames from the non-key frames corresponding to the selected matched key frames;
optimizing and updating the grid voxel model of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matching key frame and the acquired non-key frame;
and optimizing and updating the isosurface of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame.
9. The method of claim 8, wherein the optimizing the updating of the iso-surface of the three-dimensional reconstructed model according to the correspondence between the current keyframe and each matching keyframe comprises:
for each matching key frame, selecting at least one voxel block of which the distance to the surface of the target scene is less than or equal to an update threshold of a corresponding voxel in the matching key frame from all voxel blocks corresponding to the current key frame;
and performing optimization updating on the isosurface of the matched key frame according to the selected at least one voxel block.
10. The method of claim 9, wherein generating an iso-surface of the grid voxel model comprises:
and selecting the maximum value of the distance from each voxel block in the voxel to the surface of the target scene for each voxel in the keyframe used for generating the isosurface, and setting the maximum value as the updating threshold value of the voxel.
11. A three-dimensional reconstruction apparatus, comprising:
the image acquisition module is used for acquiring at least two frames of images acquired by the depth camera for acquiring a target scene;
the pose determining module is used for determining the relative camera pose during acquisition according to the at least two frames of images;
the voxel determining module is used for determining at least one characteristic voxel from each frame of image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule;
the model generation module is used for carrying out fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene;
the three-dimensional reconstruction module is used for generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene;
wherein the voxel determination module comprises:
the initial determining unit is used for taking each frame of image as a current-level screening object and determining a current-level voxel unit;
the index block determining unit is used for dividing the current-level screening object into voxel blocks according to the current-level voxel unit and determining at least one current index block according to the voxel blocks; the current index block comprises a preset number of voxel blocks;
the characteristic block selecting unit is used for selecting at least one characteristic block, of which the distance to the surface of the target scene is smaller than the corresponding distance threshold of the current-level voxel unit, from all current index blocks;
a characteristic voxel determining unit, configured to take the characteristic block as a characteristic voxel if the characteristic block meets a partition condition of a minimum-level voxel unit;
the circulation unit is used for replacing all the feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with the new current-level voxel unit and returning to execute the voxel block division operation aiming at the current-level screening object if the feature blocks do not meet the division condition of the minimum-level voxel unit; wherein the voxel units are reduced step by step to the minimum level voxel unit.
12. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
at least one depth camera for image acquisition of a target scene;
when executed by the one or more processors, cause the one or more processors to implement the three-dimensional reconstruction method of any one of claims 1-10.
13. The device of claim 12, wherein the one or more processors are central processors; the electronic device is a portable mobile electronic device.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the three-dimensional reconstruction method according to any one of claims 1 to 10.
CN201810179264.6A 2018-03-05 2018-03-05 Three-dimensional reconstruction method, device, equipment and storage medium Active CN108537876B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201810179264.6A CN108537876B (en) 2018-03-05 2018-03-05 Three-dimensional reconstruction method, device, equipment and storage medium
PCT/CN2019/084820 WO2019170164A1 (en) 2018-03-05 2019-04-28 Depth camera-based three-dimensional reconstruction method and apparatus, device, and storage medium
US16/977,899 US20210110599A1 (en) 2018-03-05 2019-04-28 Depth camera-based three-dimensional reconstruction method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810179264.6A CN108537876B (en) 2018-03-05 2018-03-05 Three-dimensional reconstruction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108537876A CN108537876A (en) 2018-09-14
CN108537876B true CN108537876B (en) 2020-10-16

Family

ID=63486699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810179264.6A Active CN108537876B (en) 2018-03-05 2018-03-05 Three-dimensional reconstruction method, device, equipment and storage medium

Country Status (3)

Country Link
US (1) US20210110599A1 (en)
CN (1) CN108537876B (en)
WO (1) WO2019170164A1 (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019058487A1 (en) * 2017-09-21 2019-03-28 オリンパス株式会社 Three-dimensional reconstructed image processing device, three-dimensional reconstructed image processing method, and computer-readable storage medium having three-dimensional reconstructed image processing program stored thereon
CN108537876B (en) * 2018-03-05 2020-10-16 清华-伯克利深圳学院筹备办公室 Three-dimensional reconstruction method, device, equipment and storage medium
CN109377551B (en) * 2018-10-16 2023-06-27 北京旷视科技有限公司 Three-dimensional face reconstruction method and device and storage medium thereof
WO2020113417A1 (en) * 2018-12-04 2020-06-11 深圳市大疆创新科技有限公司 Three-dimensional reconstruction method and system for target scene, and unmanned aerial vehicle
CN109840940B (en) * 2019-02-11 2023-06-27 清华-伯克利深圳学院筹备办公室 Dynamic three-dimensional reconstruction method, device, equipment, medium and system
CN109993802B (en) * 2019-04-03 2020-12-25 浙江工业大学 Hybrid camera calibration method in urban environment
CN110064200B (en) * 2019-04-25 2022-02-22 腾讯科技(深圳)有限公司 Object construction method and device based on virtual environment and readable storage medium
CN110349253B (en) * 2019-07-01 2023-12-01 达闼机器人股份有限公司 Three-dimensional reconstruction method of scene, terminal and readable storage medium
CN112308904A (en) * 2019-07-29 2021-02-02 北京初速度科技有限公司 Vision-based drawing construction method and device and vehicle-mounted terminal
WO2021077279A1 (en) * 2019-10-22 2021-04-29 深圳市大疆创新科技有限公司 Image processing method and device, and imaging system and storage medium
CN112991427A (en) * 2019-12-02 2021-06-18 顺丰科技有限公司 Object volume measuring method, device, computer equipment and storage medium
CN111242847B (en) * 2020-01-10 2021-03-30 上海西井信息科技有限公司 Gateway-based image splicing method, system, equipment and storage medium
CN111310654B (en) * 2020-02-13 2023-09-08 北京百度网讯科技有限公司 Map element positioning method and device, electronic equipment and storage medium
CN111325741B (en) * 2020-03-02 2024-02-02 上海媒智科技有限公司 Item quantity estimation method, system and equipment based on depth image information processing
CN111598927B (en) * 2020-05-18 2023-08-01 京东方科技集团股份有限公司 Positioning reconstruction method and device
CN111627061B (en) * 2020-06-03 2023-07-11 如你所视(北京)科技有限公司 Pose detection method and device, electronic equipment and storage medium
CN112115980A (en) * 2020-08-25 2020-12-22 西北工业大学 Binocular vision odometer design method based on optical flow tracking and point line feature matching
CN112446951B (en) * 2020-11-06 2024-03-26 杭州易现先进科技有限公司 Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer storage medium
CN112419482B (en) * 2020-11-23 2023-12-01 太原理工大学 Three-dimensional reconstruction method for group pose of mine hydraulic support with depth point cloud fusion
CN112435206B (en) * 2020-11-24 2023-11-21 北京交通大学 Method for reconstructing three-dimensional information of object by using depth camera
CN112767538A (en) * 2021-01-11 2021-05-07 浙江商汤科技开发有限公司 Three-dimensional reconstruction and related interaction and measurement method, and related device and equipment
CN112750201B (en) * 2021-01-15 2024-03-29 浙江商汤科技开发有限公司 Three-dimensional reconstruction method, related device and equipment
CN113129348B (en) * 2021-03-31 2022-09-30 中国地质大学(武汉) Monocular vision-based three-dimensional reconstruction method for vehicle target in road scene
CN113409444B (en) * 2021-05-21 2023-07-11 北京达佳互联信息技术有限公司 Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and storage medium
CN113470180B (en) * 2021-05-25 2022-11-29 思看科技(杭州)股份有限公司 Three-dimensional mesh reconstruction method, device, electronic device and storage medium
CN113284176B (en) * 2021-06-04 2022-08-16 深圳积木易搭科技技术有限公司 Online matching optimization method combining geometry and texture and three-dimensional scanning system
CN113450457B (en) * 2021-08-31 2021-12-14 腾讯科技(深圳)有限公司 Road reconstruction method, apparatus, computer device and storage medium
US11830140B2 (en) * 2021-09-29 2023-11-28 Verizon Patent And Licensing Inc. Methods and systems for 3D modeling of an object by merging voxelized representations of the object
CN114241168A (en) * 2021-12-01 2022-03-25 歌尔光学科技有限公司 Display method, display device, and computer-readable storage medium
CN114393575B (en) * 2021-12-17 2024-04-02 重庆特斯联智慧科技股份有限公司 Robot control method and system based on high-efficiency recognition of user gestures
CN114255285B (en) * 2021-12-23 2023-07-18 奥格科技股份有限公司 Video and urban information model three-dimensional scene fusion method, system and storage medium
CN116704152B (en) * 2022-12-09 2024-04-19 荣耀终端有限公司 Image processing method and electronic device
CN116258817B (en) * 2023-02-16 2024-01-30 浙江大学 Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN116363327B (en) * 2023-05-29 2023-08-22 北京道仪数慧科技有限公司 Voxel map generation method and system
CN116437063A (en) * 2023-06-15 2023-07-14 广州科伊斯数字技术有限公司 Three-dimensional image display system and method
CN117272758B (en) * 2023-11-20 2024-03-15 埃洛克航空科技(北京)有限公司 Depth estimation method, device, computer equipment and medium based on triangular grid
CN117496074B (en) * 2023-12-29 2024-03-22 中国人民解放军国防科技大学 Efficient three-dimensional scene reconstruction method suitable for rapid movement of camera

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609942A (en) * 2011-01-31 2012-07-25 微软公司 Mobile camera localization using depth maps
CN105184784A (en) * 2015-08-28 2015-12-23 西交利物浦大学 Motion information-based method for monocular camera to acquire depth information
CN106504320A (en) * 2016-11-02 2017-03-15 华东师范大学 A kind of based on GPU and the real-time three-dimensional reconstructing method towards depth image
CN106803267A (en) * 2017-01-10 2017-06-06 西安电子科技大学 Indoor scene three-dimensional rebuilding method based on Kinect
CN106887037A (en) * 2017-01-23 2017-06-23 杭州蓝芯科技有限公司 A kind of indoor three-dimensional rebuilding method based on GPU and depth camera
CN106910242A (en) * 2017-01-23 2017-06-30 中国科学院自动化研究所 The method and system of indoor full scene three-dimensional reconstruction are carried out based on depth camera

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106157367B (en) * 2015-03-23 2019-03-08 联想(北京)有限公司 Method for reconstructing three-dimensional scene and equipment
US9892552B2 (en) * 2015-12-15 2018-02-13 Samsung Electronics Co., Ltd. Method and apparatus for creating 3-dimensional model using volumetric closest point approach
CN107194984A (en) * 2016-03-14 2017-09-22 武汉小狮科技有限公司 Mobile terminal real-time high-precision three-dimensional modeling method
US10319141B2 (en) * 2016-06-21 2019-06-11 Apple Inc. Method and system for vision based 3D reconstruction and object tracking
US10573018B2 (en) * 2016-07-13 2020-02-25 Intel Corporation Three dimensional scene reconstruction based on contextual analysis
CN107358629B (en) * 2017-07-07 2020-11-10 北京大学深圳研究生院 Indoor mapping and positioning method based on target identification
CN108537876B (en) * 2018-03-05 2020-10-16 清华-伯克利深圳学院筹备办公室 Three-dimensional reconstruction method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609942A (en) * 2011-01-31 2012-07-25 微软公司 Mobile camera localization using depth maps
CN105184784A (en) * 2015-08-28 2015-12-23 西交利物浦大学 Motion information-based method for monocular camera to acquire depth information
CN106504320A (en) * 2016-11-02 2017-03-15 华东师范大学 A kind of based on GPU and the real-time three-dimensional reconstructing method towards depth image
CN106803267A (en) * 2017-01-10 2017-06-06 西安电子科技大学 Indoor scene three-dimensional rebuilding method based on Kinect
CN106887037A (en) * 2017-01-23 2017-06-23 杭州蓝芯科技有限公司 A kind of indoor three-dimensional rebuilding method based on GPU and depth camera
CN106910242A (en) * 2017-01-23 2017-06-30 中国科学院自动化研究所 The method and system of indoor full scene three-dimensional reconstruction are carried out based on depth camera

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于ORB关键帧闭环检测算法的SLAM方法研究;余杰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170515;第49-73页 *
基于RGB-D深度相机的室内场景重建;梅峰等;《中国图象图形学报》;20151031;第20卷(第10期);第1366-1373页 *
梅峰等.基于RGB-D深度相机的室内场景重建.《中国图象图形学报》.2015,第20卷(第10期), *

Also Published As

Publication number Publication date
US20210110599A1 (en) 2021-04-15
CN108537876A (en) 2018-09-14
WO2019170164A1 (en) 2019-09-12

Similar Documents

Publication Publication Date Title
CN108537876B (en) Three-dimensional reconstruction method, device, equipment and storage medium
CN108898630B (en) Three-dimensional reconstruction method, device, equipment and storage medium
CN109087359B (en) Pose determination method, pose determination apparatus, medium, and computing device
US10269148B2 (en) Real-time image undistortion for incremental 3D reconstruction
CN101996420B (en) Information processing device, information processing method and program
CN109658445A (en) Network training method, increment build drawing method, localization method, device and equipment
CN108898676B (en) Method and system for detecting collision and shielding between virtual and real objects
US20180315232A1 (en) Real-time incremental 3d reconstruction of sensor data
CN108701374B (en) Method and apparatus for three-dimensional point cloud reconstruction
CN110135455A (en) Image matching method, device and computer readable storage medium
US9747668B2 (en) Reconstruction of articulated objects from a moving camera
CN114332415B (en) Three-dimensional reconstruction method and device of power transmission line corridor based on multi-view technology
CN107329962B (en) Image retrieval database generation method, and method and device for enhancing reality
CN110097584B (en) Image registration method combining target detection and semantic segmentation
CN110853075A (en) Visual tracking positioning method based on dense point cloud and synthetic view
EP3326156B1 (en) Consistent tessellation via topology-aware surface tracking
CN112784873A (en) Semantic map construction method and equipment
CN112651944B (en) 3C component high-precision six-dimensional pose estimation method and system based on CAD model
CN108961385B (en) SLAM composition method and device
CN111192364A (en) Low-cost mobile multi-robot vision simultaneous positioning and map creating method
CN112200157A (en) Human body 3D posture recognition method and system for reducing image background interference
CN111402412A (en) Data acquisition method and device, equipment and storage medium
CN112183506A (en) Human body posture generation method and system
CN113223078A (en) Matching method and device of mark points, computer equipment and storage medium
CN114202632A (en) Grid linear structure recovery method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221118

Address after: 518000 2nd floor, building a, Tsinghua campus, Shenzhen University Town, Xili street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen International Graduate School of Tsinghua University

Address before: 518055 Nanshan Zhiyuan 1001, Xue Yuan Avenue, Nanshan District, Shenzhen, Guangdong.

Patentee before: TSINGHUA-BERKELEY SHENZHEN INSTITUTE