CN108537876B

CN108537876B - Three-dimensional reconstruction method, device, equipment and storage medium

Info

Publication number: CN108537876B
Application number: CN201810179264.6A
Authority: CN
Inventors: 方璐; 韩磊; 苏卓; 戴琼海
Original assignee: Tsinghua-Berkeley Shenzhen Institute Preparation Office
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2018-03-05
Filing date: 2018-03-05
Publication date: 2020-10-16
Anticipated expiration: 2038-03-05
Also published as: US20210110599A1; CN108537876A; WO2019170164A1

Abstract

The embodiment of the invention discloses a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium based on a depth camera, wherein the method comprises the following steps: acquiring at least two frames of images acquired by a depth camera for acquiring a target scene; determining the relative camera pose during acquisition according to the at least two frames of images; determining at least one characteristic voxel from each frame of image by adopting at least two-stage nested screening modes, wherein each stage of screening adopts a corresponding voxel blocking rule; performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene; and generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene. The embodiment of the invention solves the problem of large operation amount when the target scene is subjected to three-dimensional reconstruction, realizes the application of the three-dimensional reconstruction to portable equipment, and makes the application of the three-dimensional reconstruction wider.

Description

Three-dimensional reconstruction method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a three-dimensional reconstruction method, a three-dimensional reconstruction device, three-dimensional reconstruction equipment and a storage medium.

Background

The three-dimensional reconstruction is to reconstruct a mathematical model of a three-dimensional object in the real world through a specific device and algorithm, and has extremely important significance for virtual reality, augmented reality, robot perception, human-computer interaction, robot path planning and the like.

In the current three-dimensional reconstruction method, in order to ensure the quality, consistency and real-time performance of the reconstruction result, a high-performance Graphics Processing Unit (GPU) and a depth camera (RGB-D camera) are generally required to complete the reconstruction. Firstly, shooting a target scene by using a depth camera to obtain at least two frames of images; solving each frame of image by using the GPU to acquire the relative camera pose of the depth camera when each frame of image is shot; traversing all voxels in each frame of image according to the relative camera pose corresponding to each frame of image to determine the voxels meeting certain conditions as candidate voxels; further, a Truncated Symbolic Distance Function (TSDF) model of each frame of image is constructed according to the candidate voxels in each frame of image; and finally, generating an isosurface for each frame of image on the basis of the TSDF model, thereby completing the real-time reconstruction of the target scene.

However, the existing three-dimensional reconstruction method has large operation amount and strong dependence on a GPU (graphics processing unit) special for image processing. However, the GPU cannot be portable, and is difficult to be applied to mobile robots, portable devices, wearable devices (such as augmented reality head display devices Microsoft HoloLens), and the like.

Disclosure of Invention

The embodiment of the invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, a three-dimensional reconstruction equipment and a storage medium, solves the problem of large calculation amount when a target scene is subjected to three-dimensional reconstruction, and realizes application of the three-dimensional reconstruction to portable equipment, so that the three-dimensional reconstruction is more widely applied.

In a first aspect, an embodiment of the present invention provides a three-dimensional reconstruction method, where the method includes:

acquiring at least two frames of images acquired by a depth camera for acquiring a target scene;

determining the relative camera pose during acquisition according to the at least two frames of images;

determining at least one characteristic voxel from each frame of image by adopting at least two-stage nested screening modes, wherein each stage of screening adopts a corresponding voxel blocking rule;

performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene;

and generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.

In a second aspect, an embodiment of the present invention further provides a three-dimensional reconstruction apparatus, where the apparatus includes:

the image acquisition module is used for acquiring at least two frames of images acquired by the depth camera for acquiring a target scene;

the pose determining module is used for determining the relative camera pose during acquisition according to the at least two frames of images;

the voxel determining module is used for determining at least one characteristic voxel from each frame of image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule;

the model generation module is used for carrying out fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene;

and the three-dimensional reconstruction module is used for generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processors;

storage means for storing one or more programs;

at least one depth camera for image acquisition of a target scene;

when executed by the one or more processors, cause the one or more processors to implement a three-dimensional reconstruction method as described in any embodiment of the invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the three-dimensional reconstruction method according to any embodiment of the present invention.

The method comprises the steps of obtaining a target scene image collected by a depth camera, determining the relative camera pose of the depth camera when the depth camera collects the target scene image, determining the characteristic voxels of each frame of image by adopting at least two stages of nested screening modes, carrying out fusion calculation to obtain a grid voxel model of the target scene, generating an isosurface of the grid voxel model, and obtaining a three-dimensional reconstruction model of the target scene. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.

Fig. 1 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present invention;

FIG. 2 is a schematic cube diagram of a two-level nested screening method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for determining a relative camera pose during acquisition according to a second embodiment of the present invention;

FIG. 4 is a flowchart of a method for determining at least one characteristic voxel from an image according to a third embodiment of the present invention;

FIG. 5 is a schematic plan view of determining at least one characteristic voxel according to a third embodiment of the present invention;

fig. 6 is a flowchart of a three-dimensional reconstruction method according to a fourth embodiment of the present invention;

fig. 7 is a block diagram of a three-dimensional reconstruction apparatus according to a fifth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a three-dimensional reconstruction method according to an embodiment of the present invention, where this embodiment is applicable to a case where a depth camera is used to perform three-dimensional reconstruction on a target scene, and the method may be executed by a three-dimensional reconstruction device or an electronic device, where the device may be implemented in a hardware and/or software manner, and the three-dimensional reconstruction method in fig. 1 is schematically described below with reference to a cube diagram of a two-stage nested screening manner in fig. 2, where the method includes:

s101, acquiring at least two frames of images acquired by a depth camera for acquiring a target scene.

The depth camera is different from the traditional camera in that the camera can shoot image information of a scene and corresponding depth information of the scene at the same time, the design principle is that a reference beam is emitted aiming at a target scene to be detected, the distance of the shot scene is converted by calculating the time difference or phase difference of return light so as to generate the depth information, and in addition, the traditional camera is combined for shooting so as to obtain the image information. The target scene is a scene to be three-dimensionally reconstructed, for example, when an automatically driven automobile runs on a highway, the target scene is a running environment scene of the automobile, and a running environment image of the automobile is acquired in real time through a depth camera. Specifically, in order to accurately perform three-dimensional reconstruction on a target scene, at least two frames of images collected by the depth camera are acquired and processed, and the more the number of acquired frames is, the more accurate a reconstructed target scene model is. There are many methods for acquiring the image acquired by the depth camera, for example, the image may be acquired in a wired manner through a serial port, a network cable, or the like, or may be acquired in a wireless manner through bluetooth, a wireless broadband, or the like.

And S102, determining the relative camera pose during acquisition according to at least two frames of images.

The pose of the camera refers to the position and the pose of the camera, specifically, the position represents the translation distance of the camera (for example, translation transformation of the camera in X, Y, Z three directions), and the pose represents the rotation angle of the camera (for example, angle transformation α, β, γ of the camera in X, Y, Z three directions).

Because the field angle of the depth camera is fixed and the shooting angle is also fixed, in order to accurately reconstruct the target scene, the pose of the depth camera needs to be changed, and the target scene can be accurately reconstructed by shooting from different positions and angles. Therefore, the relative position and posture of the depth camera are different when each frame of image is shot, and can be represented by the relative posture of the depth camera, for example, the depth camera can automatically change the position and posture according to a certain track, or the depth camera can be manually rotated and moved to shoot. Therefore, the relative camera pose when each frame of image is acquired is determined, and the frame of image is accurately reconstructed to the position corresponding to the target scene.

Specifically, there are many methods for determining the pose of the depth camera, and for example, the pose of the camera can be directly acquired by installing sensors for measuring the translation distance and the rotation angle on the depth camera. Because the relative pose of the depth camera is not changed greatly when the two adjacent frames of images are collected, in order to acquire the relative camera pose more accurately, the relative pose of the camera when the camera collects the frame of image can be determined by processing the collected images.

S103, aiming at each frame of image, at least one characteristic voxel is determined from the image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule.

When the three-dimensional reconstruction of the target scene is performed, the embodiment of the present invention divides the reconstructed target scene into grid-shaped voxel blocks (fig. 2 is a partial grid-shaped voxel block of the reconstructed target scene), and corresponding to the corresponding position of each frame of image, each frame of image can be divided into planar voxel grids. The image acquired by the depth camera includes characteristic voxels and non-characteristic voxels when the target scene is reconstructed in three dimensions, for example, when the scene is reconstructed in a driving environment of an automobile, pedestrians, vehicles and the like in the image are characteristic voxels, and blue sky white clouds at a far distance are non-characteristic voxels. Therefore, voxels in each acquired frame of image are screened to find characteristic voxels when the target scene is reconstructed in three dimensions. The characteristic voxel may be composed of one voxel block, or may be composed of a preset number of voxel blocks.

If the judgment of whether the voxel lattices in each frame of image are characteristic voxels is carried out one by one, the operation amount is large, and preferably, at least one characteristic voxel can be determined from the image by adopting a mode of at least two-stage nested screening through a voxel blocking rule. Specifically, the voxel blocking rule may be that at least two levels of voxel units are set, each level of the filtered object is divided into at least two index blocks corresponding to the level of voxel units according to the voxel unit corresponding to the level, and the index blocks are filtered step by step.

Exemplarily, a two-level nested filtering manner is described as an example in conjunction with fig. 2, and it is assumed that two levels of voxel units corresponding to the two-level nested filtering are voxel units of 20mm and 5mm, specifically:

(1) a grid voxel of a target scene corresponding to one frame of image is divided into a plurality of first index blocks according to a voxel unit of 20mm (a cube 20 in fig. 2 is a first index block divided by a voxel unit of 20 mm).

(2) And performing primary screening on all the divided first index blocks, judging whether the first index blocks contain characteristic voxels, removing the characteristic voxels if the first index blocks (the cube 20) do not contain the characteristic voxels, and selecting the characteristic voxels as the characteristic blocks if the first index blocks (the cube 20) contain the characteristic voxels.

(3) Assuming that the cube 20 in fig. 2 includes a feature voxel, the selected feature block (cube 20) is further divided according to a 5mm voxel unit, and each feature block (cube 20) may be divided into 4 × 4 × 4 second index blocks (the cube 21 in fig. 2 is a second index block divided by a 5mm voxel unit).

(4) And performing secondary screening on all the divided second index blocks (cubes 21) to judge whether characteristic voxels are contained, removing the characteristic voxels if the second index blocks (cubes 21) do not contain the characteristic voxels, and selecting the characteristic voxels as the characteristic voxels if the second index blocks (cubes 21) contain the characteristic voxels.

If the multi-level nested screening is performed, except that the whole frame image is divided into a plurality of index blocks for screening for the first time, the rest of the multi-level nested screening divides the feature block containing the feature voxel, which is screened for the last time in the nesting, into a plurality of index blocks to be divided during the next-level screening according to the next-level voxel unit, and judges whether the feature voxel is contained or not until the nested screening of the last-level voxel unit is completed. For example, if the three-level nested screening is performed, after the two-level screening operation is performed, since the screening of the third-level voxel unit is not performed, all the second index blocks (cubes 21) including the feature voxels obtained in the step (4) of the two-level nested screening need to be further divided into a plurality of index blocks according to the third-level voxel unit as objects to be divided during the third-level screening, and then whether the feature voxels are included is determined.

And S104, performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene.

After at least one characteristic voxel corresponding to the image is determined in S103, in order to obtain a grid voxel model of the target scene, the determined at least one characteristic voxel is subjected to fusion calculation to obtain the grid voxel model of the target scene in combination with a relative camera pose when the depth camera acquires the frame of image. Each voxel in the grid voxel model stores the distance from the surface of the target scene and weight information representing the observation uncertainty.

Optionally, the grid voxel model in this embodiment may be a TSDF model, and specifically, as shown in fig. 2, it is assumed that a cube 21 is a feature voxel screened by multilevel nesting, and according to a formula

And performing fusion calculation on each characteristic voxel in each frame of image to obtain a TSDF model of the target scene. Wherein, tsdf^avgAs a result of the fusion of the current characteristic voxels, tsdf_i-1Is the distance, w, of the previous feature voxel to the surface of the target scene_i-1As weight information of the previous feature voxel, tsdf_iIs the distance, w, of the current characteristic voxel to the surface of the target scene_iWeight information for current characteristic voxels。

Optionally, when the feature voxels are screened in S103, in order to improve the screening rate, the screened feature voxels may include a preset number of voxel blocks corresponding to voxel units (for example, a feature voxel may be formed by 8 × 8 × 8 individual voxel blocks), at this time, when performing fusion calculation, the fusion calculation may be performed on the voxel blocks in each feature voxel according to a certain number, for example, the fusion calculation may be performed on the 8 × 8 × 8 individual voxel blocks in the feature voxels as a fusion object (i.e., a voxel) according to 2 × 2 × 2 individual voxel blocks.

Optionally, the feature voxels selected in S103 may be subjected to fusion computation in parallel, so as to improve the fusion rate of the grid voxel model of the target scene.

And S105, generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.

The grid voxel model of the target scene obtained in S104 is a distance model from the characteristic voxel to the surface of the target scene, and an isosurface needs to be generated on the basis of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene. For example, an iso-surface generation (i.e., generating a triangular patch representing a model surface), a tri-linear interpolation, a color extraction and addition, and a normal vector extraction may be performed by using a Marching Cubes (Marching Cubes) algorithm, thereby obtaining a three-dimensional reconstruction model of a target scene.

When the depth camera acquires an image of a target scene, most of the scenes in two adjacent frames of images are overlapped, and in order to increase the generation rate of the three-dimensional reconstruction model, optionally, generating an isosurface of the grid voxel model may include: and if the current frame image obtained by collecting the target scene is a key frame, generating an equivalent surface of each voxel block corresponding to the current key frame, and adding colors to the equivalent surface to obtain a three-dimensional reconstruction model of the target scene.

The method includes the steps that a key frame is set after feature point similarity between two frames of images collected by a depth camera is judged and processed, specifically, one key frame can be set for several continuous frames of images with high similarity, only the key frame is processed when an isosurface is generated, an isosurface of a voxel block corresponding to each key frame image is generated, the obtained model does not have color information and is not easy to identify each object in the image, for example, if a reconstructed target scene is a scene of an automobile driving environment, pedestrians, vehicles and roads in the model of the generated isosurface are integrated at the moment, which part is a pedestrian and which part is a vehicle cannot be distinguished, therefore, colors are added to the generated isosurface according to the color information in each frame of image, and each object in a three-dimensional reconstruction model of the target scene can be clearly identified.

It should be noted that the three-dimensional reconstruction process is a real-time dynamic process, and with the acquisition of images by a camera, the relative camera pose at the time of acquiring each frame of image is determined in real time, and the determination of the characteristic voxels and the generation of the grid voxel model and the isosurface thereof are performed for the corresponding images.

The embodiment provides a three-dimensional reconstruction method, which includes the steps of obtaining a target scene image collected by a depth camera, determining a relative camera pose of the depth camera when the depth camera collects the target scene image, determining a characteristic voxel of each frame of image by adopting at least two stages of nested screening modes, performing fusion calculation to obtain a grid voxel model of a target scene, generating an isosurface of the grid voxel model, and obtaining a three-dimensional reconstruction model of the target scene. In the fusion calculation stage, the characteristic voxels of each frame of image are determined by adopting at least two stages of nested screening modes, the voxels do not need to be traversed one by one, the calculated amount is reduced, the reconstruction precision is ensured, the fusion speed is greatly increased, and the three-dimensional reconstruction efficiency can be further improved. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.

Example two

On the basis of the above embodiments, the present embodiment further optimizes the determination of the relative camera pose at the time of acquisition according to at least two frames of images in S102. Fig. 3 is a flowchart of a method for determining a relative camera pose during acquisition according to a second embodiment of the present invention, and as shown in fig. 3, the method includes:

s301, extracting the features of each frame of image to obtain at least one feature point of each frame of image.

The feature extraction is performed on the image to find some pixel points (i.e., feature points) with landmark features in the frame image, for example, the pixel points may be corners, textures, and edges in a frame image. The feature extraction for each frame of image may use an organized FAST and Rotated BRIEF (ORB) algorithm to find at least one feature point in the frame of image.

And S302, performing matching operation on each feature point between two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images.

When the target scene is subjected to image acquisition, most contents of two adjacent frames of images are the same, so that a certain corresponding relationship also exists between the corresponding feature points of the two frames of images. Optionally, a fast search method (sparse matching algorithm) may be used to compare hamming distances between feature points of two adjacent frames of images to obtain a feature point corresponding relationship between two adjacent frames of images.

Specifically, taking a feature point between two adjacent frames of images as an example, assuming that feature points X1 and X2 representing the same texture feature in the two frames of images are located at different positions of the two frames of images, respectively, and H (X1 and X2) represents a hamming distance between two feature points X1 and X2, the two feature points are subjected to an exclusive or operation, and the number of the result is 1, which is taken as the hamming distance (i.e., a feature point corresponding relationship) of one feature point between two adjacent frames of images.

S303, removing abnormal corresponding relations in the corresponding relations of the feature points, and calculating J (ξ) through linear components containing the second-order statistics of the residual feature points and nonlinear components containing relative camera poses^TNonlinear term in J (ξ)

Pair ═ J (ξ)^TJ(ξ))^-1J(ξ)^Tr (ξ) performing multiple iterative calculations, and solving the relative camera pose when the reprojection error is smaller than the preset error thresholdThe pose when the reprojection error is minimized is calculated.

Where R (ξ) represents a vector containing all reprojection errors, J (ξ) is the Jacobian matrix of R (ξ), ξ represents the Li algebra relative to the camera pose, representing the delta value of R (ξ) at each iteration, and R (ξ) represents the vector containing all reprojection errors, and R (ξ) represents the delta value of R (ξ)_iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; r_jRepresenting a rotation matrix of the camera when the j frame image is acquired;

representing the kth characteristic point on the ith frame image;

representing the kth characteristic point on the jth frame image; c_i,jA set representing the corresponding relation of the characteristic points of the ith frame image and the jth frame image; i C_i,jThe | 1 represents the number of the corresponding relations of the characteristic points of the ith frame image and the jth frame image; []_×Representing a vector product; i C_i,jI means taking C_i,jNorm of (d).

Further, the non-linear term

The expression of (a) is:

wherein the content of the first and second substances,

represents a linear component; r is_il ^TAnd r_jlRepresents a nonlinear component, r_il ^TIs a rotation matrix R_iLine I of (1), r_jlIs a rotation matrix R_jThe first row in (a) is transposed, i.e. 0,1,2 (this embodiment is based on the idea of programming, counting from 0, i.e. representing the so-called 1 st row of the matrix, and so on).

Specifically, some of the feature point correspondences between two adjacent frames of images obtained in S302 are abnormal correspondences,for example, in two adjacent frames of images, each frame of image must have feature points that are not included in the other frame of image, and the matching operation of S302 is performed on them, so that an abnormal correspondence relationship occurs. Optionally, a random sample Consensus (RANSAC) algorithm may be used to remove the abnormal corresponding relationship, and the obtained remaining feature point corresponding relationship may be represented as a residual feature point corresponding relationship

Wherein the content of the first and second substances,

representing the corresponding relation between the kth characteristic point between the ith frame image and the jth frame image; j-i-1.

When the relative camera pose is determined, certain errors are necessarily generated, so that the determination of the camera pose is to solve a nonlinear least square problem between two frames of images with the following formula as a cost function:

wherein E represents a reprojection error of the ith frame image in euclidean space compared with the jth frame image (the last frame image in this embodiment); t is_iRepresents the pose of the camera when the ith frame image is acquired (as can be seen from the explanation of the pose of the camera, the pose of the camera actually means the change of the acquired ith frame image relative to the previous frame image), T_jRepresenting the pose of the camera when acquiring the j frame image; n represents the total frame number collected by the camera;

representing the k-th feature point on the i-th frame image

The homogeneous coordinate of (a) is,

representing the kth feature point on the jth frame image

Homogeneous coordinates of (a). It should be noted that, when i and k have the same value,

and

represent the same spot, with the difference that

Is the local co-ordinate(s) of the location,

are homogeneous coordinates.

Specifically, in order to increase the operation rate when determining the relative camera pose, the cost function of the above equation is not directly calculated, but J (ξ) is calculated from a linear component including the correspondence of the second order statistic of the remaining feature points and a nonlinear component including the relative camera pose^TNonlinear term in J (ξ)

Pair ═ J (ξ)^TJ(ξ))^-1J(ξ)^Tr (ξ) performing multiple iterative calculations to solve the relative camera pose when the reprojection error is less than a preset error threshold value, and using a nonlinear term

By the expression (2), the nonlinear terms are carried out

In calculation, the linear part fixed between two frame images

The method is considered as a whole W to calculate, and does not need to calculate according to the number of the corresponding relations of the feature points, thereby reducing the complexity of the relative camera pose determination algorithm and increasingThe real-time performance of relative camera pose calculation is enhanced.

The derivation process of equation (1) is described below, and the derivation process is combined to analyze the principle of reducing the complexity of the algorithm.

Camera pose T when camera collects ith frame image in Euclidean space_i＝[R_i/t_i]In fact T_iRefers to a pose transformation matrix including a rotation matrix R when the camera collects the ith frame image and the j (the last frame image in this embodiment) frame image_iAnd a translation matrix t_i. Transforming the stiffness in Euclidean space by T_iUsing lie algebra ξ in SE3 space_iIs shown as ξ_iAlso represents the camera pose at which the camera captures the ith frame image, T (ξ)_i) Lie algebra ξ_iMapping to T in Euclidean space_i。

For each feature point correspondence

The reprojection error is:

the reprojection error in euclidean space in equation (1) can be expressed as E (ξ) | | r (ξ) |, and r (ξ) represents a vector containing all the reprojection errors, i.e.:

can be expressed as (for simplicity of presentation, ξ is omitted below)_i)：

Wherein the content of the first and second substances,

representing a rotation matrix R_iLine i of (1); t is t_ilRepresenting a translation vector t_iThe first element in (1), i ═ 0,1, 2.

Wherein the content of the first and second substances,

representing a Jacobian matrix corresponding to the corresponding relation of the feature points between the ith frame image and the jth frame image; m represents the mth feature point correspondence.

Is a 6 × 6 square matrix and is,

representation matrix

The transpose of (a) is performed,

the expression is as follows:

wherein, I_3×3An identity matrix of 3 × 3 is represented according to equation (6) and equation (7),

the four non-zero 6 × 6 sub-matrices are:

the following are

For example, the other three non-zero submatrices are calculated similarly, and are not described again.

Wherein, the combination formula (5) can obtain:

will be provided with

Expressed as W, in combination with equation (5), the non-linear term in equation (10) can be expressed

Simplified as formula (1), structural terms in the nonlinear term

Is linearized as W. Albeit to the structural item

In the case of a non-woven fabric,

is non-linear, but through the analysis described above,

all non-zero elements of (1) and C_i,jThe second-order statistics of the medium structure terms are in linear relation, and the second-order statistics of the structure terms are

And

that is, the sparse matrix

To C_i,jThe second order statistics of the mesostructure terms are element linear.

It should be noted that each corresponding relationship

The Jacobian matrixes are all geometric terms ξ_i，ξ_jAnd structural items

And (6) determining. For the same frame pair C_i,jAll corresponding relations in (2), their corresponding jacobian matrices share the same geometric terms but have different structural terms. For one frame pair C_i,jCalculating

When existing algorithms rely on C_i,jThe number of the corresponding relations of the medium feature points, and the embodiment can efficiently calculate with fixed complexity

Only the second-order statistic W of the structural item needs to be calculated, and the related structural item does not need to be involved in the calculation of each corresponding relation, namely

The four non-zero submatrices in the system can replace the complexity O (| | C) with the complexity O (1)_i,j| |) is calculated.

Thus, when ═ J (ξ)^TJ(ξ))^-1J(ξ)^TSparse matrix J required in the iterative step of the non-linear Gauss-Newton optimization of r (ξ)^TJ and J^Tr can be efficiently calculated by the complexity O (M) to replace the original calculation complexity O (N)_coor)，N_coorAnd M represents the number of the frame pairs. In general, O (N)_coor) About 300 in sparse matching and about 10000 in dense matching, which is much larger than the number of frame pairsM。

Through the derivation, in the camera pose calculation process, W is calculated for each frame pair, and then the expressions (1), (10), (9), (8) and (6) are calculated to obtain

Further, ξ can be found by iterative calculation when r (ξ) is minimum.

S304, judging whether the current frame image obtained by collecting the target scene is a key frame, if so, executing S305, otherwise, waiting for the next frame image to execute S304 again.

The step of judging whether the current frame image obtained by collecting the target scene is a key frame may be: matching operation is carried out on a current frame image obtained by collecting a target scene and a previous key frame image to obtain a conversion relation matrix between the two frame images; and if the conversion relation matrix is greater than or equal to the preset conversion threshold value, determining the current frame image as the current key frame.

Specifically, similar to the method for determining the feature point correspondence between two adjacent frames of images in S302, matching operation may be performed on the current frame of image and the previous key frame to obtain a feature point correspondence matrix between the two frames of images, and when the matrix is greater than or equal to a preset conversion threshold, the current image is determined to be the current key frame. The conversion relation matrix between the two frames of images can be a matrix formed by corresponding relations of all characteristic points between the two frames of images.

It should be noted that the first frame image obtained by acquiring the target scene may be set as the first key frame, and the preset conversion threshold is set in advance according to the motion condition of the depth camera when acquiring the image, for example, if the pose change is large when the camera shoots two adjacent frames of images, the preset conversion threshold is set to be larger.

S305, loop detection is carried out according to the current key frame and the historical key frame; and if the loop is successful, performing globally consistent optimization updating on the determined relative camera pose according to the current key frame.

The globally consistent optimization updating refers to that in the process of reconstruction, a reconstruction algorithm continuously expands a three-dimensional reconstruction model of a target scene along with the movement of a camera, and when a depth camera moves to a place where the depth camera arrives once or has large overlap with a historical view angle, the expanded three-dimensional reconstruction model and a generated model are consistent or optimized and updated into a new model together, instead of phenomena of staggering, aliasing and the like. The loop detection is to determine whether the camera has moved to a place that has been reached or a place that has a large overlap with the historical viewing angle according to the current observation of the depth camera, and to optimize and reduce the accumulated error.

In order to improve the optimization rate, if the loop detection of the previous key frame and the historical key frame is successful (namely, the depth camera moves to a place which is reached once or a place which has larger overlap with the historical view angle), the generated model is optimized and updated in a global consistent manner through the current key frame and the historical key frame, and the error of the three-dimensional reconstruction model is reduced; and if the loop detection is unsuccessful, waiting for the occurrence of the next key frame, and performing loop detection on the next key frame. Specifically, the loop detection of the current key frame and the historical key frame may be performed by performing matching operation on feature points of the current key frame and the historical key frame, and if the matching degree is high, the loop detection is successful.

Optionally, global consistent optimization updating of the relative camera pose is performed, that is, the corresponding relation between the current key frame and one or more historical key frames with high matching degree is solved to obtain the corresponding relation

The method is a problem of minimized conversion error between a current key frame of a cost function and all history key frames with high matching degree. Wherein, E (T)₁,T₂,···,T_N-1|T_i∈SE3,i∈[1,N-1]) Representing the conversion error of all frame pairs (any history matching key frame and the current key frame are one frame pair); n is the number of historical key frames with high matching degree with the current key frames; e_i,jAnd representing the conversion error between the ith frame and the jth frame, wherein the conversion error is the reprojection error.

Specifically, in the process of performing the relative camera pose updating optimization, the relative poses of the non-key frames and the key frames corresponding to the non-key frames need to be kept unchanged, and the specific optimization updating algorithm uses the existing BA algorithm, or uses the method in S303, which is not described in detail.

The method for determining the relative camera pose during acquisition provided by this embodiment extracts at least one feature point of each frame of image, performs matching operation on each feature point between two adjacent frames of images to obtain a feature point corresponding relationship between the two adjacent frames of images, moves out an abnormal corresponding relationship therebetween, calculates the relative camera pose through a linear component containing a remaining feature point corresponding relationship and a nonlinear component containing the relative camera pose, and determines a key frame, and if the currently acquired image is a key frame and loop detection is successful, performs global consistent optimization update on the determined relative camera pose according to the current key frame and a historical key frame. The method has the advantages that the overall consistency is guaranteed, meanwhile, the operation amount in the three-dimensional reconstruction is reduced, the three-dimensional reconstruction is applied to portable equipment, and the three-dimensional reconstruction is more widely applied.

EXAMPLE III

Based on the above embodiments, the present embodiment further explains that, in S103, at least one characteristic voxel is determined from each image by using at least two levels of nested filtering manners. The method of fig. 4 for determining at least one characteristic voxel from an image is schematically described below in conjunction with the schematic plan view of fig. 5 for determining at least one characteristic voxel, the method comprising:

s401, regarding each frame of image, taking the image as a current-level screening object, and determining a current-level voxel unit.

The voxel unit represents the accuracy of the constructed three-dimensional reconstruction model, and is set in advance according to the accuracy of the three-dimensional reconstruction model of the target scene to be reconstructed, and may be, for example, 5mm or 10 mm. Since the embodiment determines at least one characteristic voxel from the image by at least two levels of nested screening, at least two levels of voxel units are set, where the minimum level of voxel unit is the accuracy required to reconstruct the model. Firstly, the acquired image is used as a current screening object to screen the characteristic voxels, and the current voxel unit is the largest-level voxel unit in the preset multi-level voxel units.

Illustratively, as shown in fig. 5, it is assumed that real-time three-dimensional reconstruction of a CPU-based 100Hz frame rate, 5mm voxel-level precision model is to be achieved, and two-level nested screening of feature voxels is performed in 20mm voxel units and 5mm voxel units, respectively. In this case, the acquired image is used as the current screening object, and the current level voxel unit is a voxel unit of 20 mm.

S402, dividing the current-level screening object into voxel blocks according to the current-level voxel unit, and determining at least one current index block according to the voxel blocks; wherein, the current index block comprises a preset number of voxel blocks.

In order to improve the screening rate, when screening the current-level screening object, at least one index block can be determined according to the preset number of voxel blocks divided according to the current voxel unit, and characteristic voxels are screened according to the index blocks. Note that the characteristic voxel size in this case is not the size of one voxel block, but the size of a predetermined number of voxel blocks.

For example, as shown in fig. 5, it is assumed that the current index block is composed of a preset number of 8 × 8 × 8 individual pixel blocks, the acquired image is divided into a plurality of 20 mm-side voxel blocks according to a 20mm voxel unit, the divided plurality of 20 mm-side voxel blocks are divided into at least one 160 mm-side index block corresponding to the 20mm voxel unit according to the 8 × 8 × 8 number, and mapping to the planar schematic diagram is to divide the entire image into 6 160 mm-side index blocks corresponding to the 20mm voxel unit according to an 8 × 8 block.

And S403, selecting at least one feature block with the distance to the surface of the target scene smaller than the corresponding distance threshold of the current-level voxel unit from all the current index blocks.

The distances between all current index blocks determined in S402 and the surface of the target scene are calculated, and the smaller the distance is, the closer the distance between the index block and the surface of the target scene is, a distance threshold is preset for each level of voxel unit, and when the distance between the index block and the surface of the target scene is smaller than the distance threshold corresponding to the current level of voxel unit, the index block is selected as a feature block. Wherein the distance threshold corresponding to the upper-level voxel unit is larger than the distance threshold corresponding to the lower-level voxel unit.

Optionally, at least one feature block, of which the distance to the surface of the target scene is smaller than the corresponding distance threshold of the current-level voxel unit, is selected from all current index blocks, and the feature block may be: aiming at each current index block, accessing the index block according to the hash value of the current index block, and respectively calculating the distance from each vertex of the current index block to the surface of a target scene according to the relative camera pose when each frame of image is collected and the image depth value obtained by a depth camera; and selecting the current index blocks with the vertex distances smaller than the corresponding distance threshold of the current-level voxel unit as the feature blocks.

Specifically, a hash value may be set for each current index chunk, and each index chunk is accessed through the hash value. Calculating the distance from the voxel block of each vertex of the current index block to the surface of the target scene according to the formula sdf | | xi-S | | -D (u, v), wherein sdf represents the distance from the voxel block (each vertex voxel block of the index block) to the surface of the target scene; xi represents the relative camera pose when the frame image is acquired; s represents the coordinates of the voxel block in a grid voxel model of a reconstruction space; d (u, v) represents the corresponding depth value of the voxel block in the depth camera acquired image. When the distance from each vertex of the index block to the surface of the target scene is smaller than the distance threshold corresponding to the current grade voxel unit, setting the index block as a characteristic block; and if the distance is larger than or equal to the distance corresponding to the current level voxel unit, removing the index block. Optionally, an average value of distances from each vertex of the index block to the surface of the target scene may also be calculated, and if the average value is smaller than a distance threshold corresponding to the current voxel unit, the index block is set as the feature block. Illustratively, as shown in fig. 5, a diagonal grid with a side length of 160mm in the graph is an index block to be removed divided by 20mm voxel units, that is, the distance from the partial index block to the surface of the target scene is greater than a distance threshold corresponding to 20mm voxel units.

And S404, judging whether the feature block meets the division condition of the minimum level voxel unit, if so, executing S405, and if not, executing S406.

Whether the feature block meets the division condition of the minimum-level voxel unit is judged, that is, whether the feature block selected in S403 is the feature block selected after the preset minimum-level voxel unit is divided is judged. For example, as shown in fig. 5, if the feature block selected in S403 is a feature block with a side length of 160mm divided by 20mm voxel units, and the minimum-level voxel unit is a voxel unit with a length of 5mm, it indicates that the feature block selected in S403 does not satisfy the division condition of the minimum-level 5mm voxel unit, and S406 is executed to perform the next-level 5mm voxel unit screening; if the feature block selected in S403 is a feature block divided by 5mm voxel units and having a side length of 40mm, it indicates that the feature block selected in S403 satisfies the minimum grade division condition of 5mm voxel units, and S405 is executed to use the feature block as a feature voxel.

S405, the feature block is used as a feature voxel.

And S406, replacing all the feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with a new current-level voxel unit, and returning to execute S402.

When the selected feature block in S403 does not satisfy the partition condition of the minimum-level voxel unit, all the feature blocks selected in S403 are used as new current-level screening objects, the next-level voxel unit is selected as the current-level voxel unit, the process returns to S402, and the feature blocks are screened again.

For example, as shown in fig. 5, if it is determined that the feature block selected in S403 is a feature block with a side length of 160mm divided by a 20mm voxel unit, and is not a feature block with a side length of 40mm divided by a minimum-level 5mm voxel unit, at this time, all feature blocks with a side length of 160mm divided by a 20mm voxel unit are used as a current-level screening object, a next-level 5mm voxel unit is selected as a current-level voxel unit, the process returns to S402, all feature blocks with a side length of 160mm screened in S403 are divided into a plurality of 5mm side voxel blocks according to a 5mm voxel unit, the divided plurality of 5mm side voxel blocks are divided into at least one 40mm side index block corresponding to a 5mm voxel unit according to a number of 8 × 8 × 8, and the whole image is mapped into 32 40mm side index blocks corresponding to a 5mm voxel unit according to an 8 × 8 block in a plane schematic diagram, then, S403 and S404 are executed, at this time, the obtained feature block with the side length of 40mm (for example, a blank square grid corresponding to the side length of 40mm in the drawing) is a feature block selected after the minimum grade 5mm voxel unit division, that is, the feature block is the selected feature voxel, and the dot square grid with the side length of 40mm in fig. 5 is an index block to be removed divided by the 5mm voxel unit.

The method for determining at least one characteristic voxel from an image provided by the embodiment determines at least one characteristic voxel from the image by adopting at least two levels of nested screening modes for each frame of image. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.

Example four

The present embodiment provides a preferred embodiment of three-dimensional reconstruction based on the above embodiments, as shown in fig. 6, the method includes:

s601, acquiring at least two frames of images acquired by the depth camera for acquiring the target scene.

And S602, determining the relative camera pose during acquisition according to at least two frames of images.

S603, judging whether the current frame image obtained by collecting the target scene is a key frame, if so, storing the key frame and executing S604, and if not, waiting for the next frame image to execute S603 again.

For each frame of image collected by the camera, whether the frame of image is a key frame or not can be judged, and the judged key frame is stored, so that an isosurface is generated according to the key frame rate and used as a historical key frame in subsequent loop optimization. It should be noted that the first frame acquired by the camera is used as a key frame by default.

And S604, loop detection is carried out according to the current key frame and the historical key frame, and if loop detection is successful, S608 (for optimizing and updating the grid voxel model and the isosurface) and S6011 (for optimizing and updating the relative camera pose) are executed.

S605, aiming at each frame of image, at least one characteristic voxel is determined from the image by adopting at least two levels of nested screening modes, wherein each level of screening adopts a corresponding voxel blocking rule.

And S606, performing fusion calculation on at least one characteristic voxel of each frame of image according to the relative camera pose of each frame of image to obtain a grid voxel model of the target scene.

And S607, generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene.

S608, a first preset number of key frames matched with the current key frame are selected from the historical key frames, and a second preset number of non-key frames are respectively obtained from the non-key frames corresponding to the selected matched key frames.

In order to achieve global consistency of model reconstruction, if a current frame image is a key frame, a first preset number of key frames matched with the current key frame are selected from historical key frames, and specifically, matching operation can be performed on the current key frame and the historical key frames, for example, a hamming distance between feature points between the current key frame and the historical key frames can be calculated to complete matching between the current key frame and the historical key frames. Selecting a first preset number of historical key frames with high matching degree with the current key frame, for example, selecting 10 historical key frames with high matching degree with the current key frame. Each key frame has a non-key frame corresponding to the key frame, and for each selected historical key frame with high matching degree, a second preset number of non-key frames are also selected from the non-key frames corresponding to the selected historical key frame, optionally, at most no more than 11 non-key frames can be selected averagely and dispersedly from all the non-key frames corresponding to the historical key frame, so that the optimized frame selection is more representative while the optimized updating efficiency is improved. The first preset number and the second preset number may be set in advance according to a need when the three-dimensional reconstruction model is updated.

And S609, optimizing and updating the grid voxel model of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame and the acquired non-key frame.

And performing optimization updating on the grid voxel model of the three-dimensional reconstruction model, namely updating the characteristic voxels and updating the grid voxel model of the target scene.

Optionally, when updating the feature voxels, considering that the view angle overlap when the depth camera collects two adjacent frames of images is too large, so that the feature voxels selected by the two adjacent frames of images are almost consistent, and it takes a long time to perform the optimization updating of the feature voxels once for each frame of image, so that when updating the feature voxels, only performing S605 on each matched historical keyframe again to complete the optimization updating of the feature voxels.

Because the grid voxel model of the target scene generated in S606 is generated after processing each frame of image, when the grid voxel model of the target scene is updated, the historical key frames with high matching degree and the corresponding non-key frames are optimized and updated, that is, when each key frame arrives, the corresponding fusion data is removed, and S606 is executed again to perform fusion calculation, so as to complete the optimization and update of the grid voxel model of the target scene.

Whether the fusion calculation is performed when the grid voxel model of the target scene is obtained initially or the fusion calculation is performed in the optimization updating stage of the grid voxel model, a voxel block can be used as a fusion object for the fusion calculation. In order to improve the fusion efficiency, a predetermined number of voxel blocks, for example, voxels having a size of 2 × 2 × 2 voxel blocks, may be used as a fusion object for fusion calculation.

S610, optimizing and updating the isosurface of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame.

Since the iso-surface of the grid voxel model is generated only for the key frame in S607, when performing the iso-surface update, S607 may be performed again to update the iso-surface of the matching key frame only for the historical key selected in S608 with a high matching degree with the current key frame.

In order to accelerate the model updating optimization speed, the optimization updating of the iso-surface of the three-dimensional reconstruction model may be: for each matching key frame, selecting at least one voxel block of which the distance to the surface of a target scene is less than or equal to an update threshold of a corresponding voxel in the matching key frame from all voxel blocks corresponding to the current key frame; and performing optimization updating on the isosurface of each matched key frame according to the selected at least one voxel block.

The updating threshold may be set as the updating threshold of each voxel by selecting the maximum value of the distances from the voxel blocks in each voxel block to the surface of the target scene for each voxel in the keyframe used for generating the iso-surface while the iso-surface of the grid voxel model is generated in S607. That is, each voxel in the keyframe used to generate the iso-surface is set with a corresponding update threshold.

Specifically, the distance from each voxel block of the current key frame to the surface of the target scene may be calculated, and then, for each matching key frame, the voxel correspondence between the two frames of images is determined according to the correspondence between the current key frame and the matching key frame. And finding out the voxel corresponding to the current voxel in the current key frame in the matching key frame according to the voxel corresponding relation so as to determine a corresponding updating threshold, and then selecting a voxel block with the distance to the surface of the target scene less than or equal to the updating threshold from the voxel blocks of the current voxel. Therefore, the above selecting operation is performed on each voxel in the current keyframe one by one, the filtering of the voxel block is completed, the isosurface optimization updating is performed according to the selected voxel block, and the process of specifically obtaining the isosurface is similar to that of S607 and is not repeated. And the voxel blocks with the distance larger than the updating threshold are voxel blocks needing to be ignored, and no operation is carried out on the voxel blocks. Thus, part of the voxel blocks are filtered, and the calculation speed can be improved.

Optionally, to avoid searching the hash value in the hash table once to access one voxel block, the hash table may be searched for the hash values of a plurality of adjacent voxel blocks at the same time when accessing the voxel block.

S6011, performing global consistent optimization updating on the determined relative camera pose according to the current key frame. The relative camera pose is updated for use in updating the corresponding grid voxel model.

In order to ensure the real-time performance of three-dimensional reconstruction, while acquiring the target scene image in S601, the pose of the camera in S602 and the keyframe in S603 may be determined in real time for each frame of image, that is, the pose is calculated and the keyframe is determined while acquiring the image. And the process of generating the three-dimensional reconstruction model of the target scene in S605 to S607 and the process of updating the generated three-dimensional reconstruction model in S608 to S610 are also performed simultaneously, that is, the optimized updating of the built partial model is completed in the process of generating the three-dimensional reconstruction model.

The embodiment provides a three-dimensional reconstruction method, which includes the steps of obtaining a target scene image collected by a depth camera, determining a relative camera pose of the depth camera when the depth camera collects the target scene image, determining a characteristic voxel of each frame image by adopting at least two stages of nested screening modes, performing fusion calculation to obtain a grid voxel model of the target scene, generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene, and performing optimization updating on the three-dimensional reconstruction model of the target scene according to a current key frame, each matching key frame and a non-key frame of each matching key frame to ensure the global consistency of the model. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.

EXAMPLE five

Fig. 7 is a block diagram of a three-dimensional reconstruction apparatus according to a fifth embodiment of the present invention, which is capable of executing a three-dimensional reconstruction method according to any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. The apparatus may be implemented on a CPU basis. As shown in fig. 7, the apparatus includes:

an image obtaining module 701, configured to obtain at least two frames of images acquired by a depth camera for a target scene;

a pose determination module 702 for determining a relative camera pose at the time of acquisition from the at least two frames of images;

a voxel determination module 703, configured to determine, for each frame of image, at least one characteristic voxel from the image in at least two levels of nested screening manners, where each level of screening employs a corresponding voxel blocking rule;

the model generation module 704 is configured to perform fusion calculation on at least one feature voxel of each frame of image according to the relative camera pose of each frame of image, so as to obtain a grid voxel model of the target scene;

and a three-dimensional reconstruction module 705, configured to generate an isosurface of the grid voxel model, to obtain a three-dimensional reconstruction model of the target scene.

Optionally, the three-dimensional reconstruction module 705 is specifically configured to, if a current frame image obtained by collecting the target scene is a key frame, generate an isosurface of each voxel block corresponding to the current key frame, and add a color to the isosurface to obtain a three-dimensional reconstruction model of the target scene.

The embodiment provides a three-dimensional reconstruction device, which is characterized in that a target scene image acquired by a depth camera is acquired, the camera pose of the depth camera when the target scene image is acquired is determined, at least two stages of nested screening modes are adopted to determine the characteristic voxels of each frame of image, fusion calculation is performed to obtain a grid voxel model of a target scene, an isosurface of the grid voxel model is generated, and a three-dimensional reconstruction model of the target scene is obtained. The problem of large computation amount when the target scene is subjected to three-dimensional reconstruction is solved, and the three-dimensional reconstruction is applied to portable equipment, so that the three-dimensional reconstruction is more widely applied.

Further, the pose determination module 702 includes:

the characteristic point extraction unit is used for extracting the characteristics of each frame of image to obtain at least one characteristic point of each frame of image;

the matching operation unit is used for performing matching operation on each feature point between two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images;

a pose determining unit for removing abnormal corresponding relation in the corresponding relation of the feature points, and calculating by linear components including second-order statistic of the residual feature points and nonlinear components including relative camera poseJ(ξ)^TNonlinear term in J (ξ)

Pair ═ J (ξ)^TJ(ξ))^-1J(ξ)^Tr (ξ) performing repeated iterative computation to solve the relative camera pose when the reprojection error is smaller than a preset error threshold;

representing the kth characteristic point on the ith frame image;

In particular, non-linear terms

The expression of (a) is:

wherein the content of the first and second substances,

represents a linear component; r is_il ^TAnd r_jlRepresents a nonlinear component, r_il ^TIs a rotation matrix R_iLine I of (1), r_jlIs a rotation matrix R_jThe transpose of the l-th line in (1), l is 0,1, 2.

Further, the above apparatus further comprises:

the key frame determining module is used for performing matching operation on a current frame image and a previous key frame image obtained by acquiring a target scene to obtain a conversion relation matrix between the two frames of images; and if the conversion relation matrix is greater than or equal to the preset conversion threshold value, determining the current frame image as the current key frame.

The loop detection module is used for performing loop detection according to the current key frame and the historical key frame if the current frame image obtained by collecting the target scene is the key frame;

and the pose updating module is used for performing global consistent optimization updating on the determined relative camera pose according to the current key frame if the loop is successful.

Further, the voxel determination module 703 includes:

the initial determining unit is used for taking each frame of image as a current-level screening object and determining a current-level voxel unit;

the index block determining unit is used for dividing the current-level screening object into voxel blocks according to the current-level voxel unit and determining at least one current index block according to the voxel blocks; the current index block comprises a preset number of voxel blocks;

the characteristic block selecting unit is used for selecting at least one characteristic block, of which the distance to the surface of the target scene is smaller than the corresponding distance threshold of the current-level voxel unit, from all current index blocks;

a characteristic voxel determining unit, configured to take the characteristic block as a characteristic voxel if the characteristic block meets a partition condition of a minimum-level voxel unit;

the circulation unit is used for replacing all the feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with the new current-level voxel unit and returning to execute the voxel block division operation aiming at the current-level screening object if the feature blocks do not meet the division condition of the minimum-level voxel unit; wherein the voxel units are reduced step by step to the minimum level voxel unit.

Optionally, the feature block selecting unit is specifically configured to:

aiming at each current index block, accessing the index block according to the hash value of the current index block, and respectively calculating the distance from each vertex of the current index block to the surface of a target scene according to the relative camera pose when each frame of image is collected and the image depth value obtained by a depth camera; and selecting the current index blocks with the vertex distances smaller than the corresponding distance threshold of the current-level voxel unit as the feature blocks.

Further, the above apparatus further comprises:

the matching frame determining module is used for selecting a first preset number of key frames matched with the current key frame from the historical key frames and respectively acquiring a second preset number of non-key frames from the non-key frames corresponding to the selected matching key frames if the current frame image acquired by acquiring the target scene is the key frame;

the model updating module is used for optimizing and updating the grid voxel model of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame and the acquired non-key frame;

and the isosurface updating module is used for optimizing and updating the isosurface of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame.

Optionally, the iso-surface updating module is specifically configured to select, for each matching keyframe, at least one voxel block, of which a distance to the surface of the target scene is less than or equal to an update threshold of a corresponding voxel in the matching keyframe, from among the voxel blocks corresponding to the current keyframe; and performing optimization updating on the isosurface of each matched key frame according to the selected at least one voxel block.

The three-dimensional reconstruction module 705 is further configured to, while generating an iso-surface of each voxel block corresponding to the current keyframe image, select a maximum value of a distance from each voxel block in the voxel to the target scene surface for each voxel in the keyframe used for generating the iso-surface, and set the maximum value as an update threshold of the voxel.

EXAMPLE six

Fig. 8 is a schematic structural diagram of an electronic device according to a sixth embodiment of the present invention, as shown in fig. 8, the electronic device includes a storage device 80, one or more processors 81, and at least one depth camera 82; the storage 80, processor 81 and depth camera 82 of the electronic device may be connected by a bus or other means, as exemplified by the bus connection in fig. 8.

The storage device 80, which is a computer-readable storage medium, can be used for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the three-dimensional reconstruction device in the embodiments of the present invention (for example, the image acquisition module 701 used in the three-dimensional reconstruction device). The processor 81 implements the three-dimensional reconstruction method described above by processing software programs, instructions, and modules stored in the storage device 80 to execute various functional applications and data processing of the electronic device. Alternatively, the processor 81 may be a central processing unit or a high-performance graphics processor.

The storage device 80 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage device 80 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage device 80 may further include a storage device remotely located from the processor 81, which may be connected to the appliance over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The depth camera 82 may be used for image acquisition of the target scene under the control of the processor 81. The depth camera can be embedded in an electronic device, and optionally, the electronic device can be a portable mobile electronic device, for example, the electronic device can be a smart terminal (a mobile phone, a tablet computer) or a three-dimensional visual interaction device (VR glasses, a wearable helmet), and can perform image shooting under operations of moving, rotating and the like.

The electronic device provided by the embodiment can be used for executing the three-dimensional reconstruction method provided by any embodiment, and has corresponding functions and beneficial effects.

EXAMPLE seven

The seventh embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the three-dimensional reconstruction method of the foregoing embodiments.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

In summary, in the three-dimensional reconstruction scheme provided by the embodiment of the present invention, in the fusion calculation stage, the selection of the characteristic voxels is performed by using a coarse-to-fine nested screening strategy and a sparse sampling concept, so that the reconstruction accuracy is ensured, and the fusion speed is greatly increased; the isosurface is generated at the key frame rate, so that the generation speed of the isosurface can be increased; and the three-dimensional reconstruction efficiency is improved. In addition, the global consistency of the three-dimensional reconstruction can be effectively ensured by optimizing the updating stage.

The above example numbers are for description only and do not represent the merits of the examples.

It will be appreciated by those of ordinary skill in the art that the modules or operations of the embodiments of the invention described above may be implemented using a general purpose computing device, which may be centralized on a single computing device or distributed across a network of computing devices, and that they may alternatively be implemented using program code executable by a computing device, such that the program code is stored in a memory device and executed by a computing device, and separately fabricated into integrated circuit modules, or fabricated into a single integrated circuit module from a plurality of modules or operations thereof. Thus, the present invention is not limited to any specific combination of hardware and software.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of three-dimensional reconstruction, comprising:

generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene;

aiming at each frame of image, determining at least one characteristic voxel from the image by adopting at least two levels of nested screening modes, wherein the method comprises the following steps:

regarding each frame of image, taking the image as a current-level screening object, and determining a current-level voxel unit;

dividing the current-level screening object into voxel blocks according to the current-level voxel units, and determining at least one current index block according to the voxel blocks; the current index block comprises a preset number of voxel blocks;

selecting at least one feature block with the distance to the surface of the target scene smaller than the corresponding distance threshold of the current-level voxel unit from all current index blocks;

if the characteristic block meets the division condition of the minimum-level voxel unit, taking the characteristic block as a characteristic voxel;

if the feature block does not meet the partition condition of the minimum-level voxel unit, replacing all feature blocks determined by the current-level screening object with a new current-level screening object, selecting the next-level voxel unit to be replaced with the new current-level voxel unit, and returning to execute the voxel block partition operation aiming at the current-level screening object; wherein the voxel units are reduced step by step to the minimum level voxel unit.

2. The method of claim 1, wherein determining a relative camera pose at the time of acquisition from the at least two frames of images comprises:

extracting the features of each frame of image to obtain at least one feature point of each frame of image;

matching each feature point between two adjacent frames of images to obtain the corresponding relationship of the feature points between the two adjacent frames of images;

removing abnormal correspondences in the feature point correspondences, calculating J (ξ) by a linear component including a second order statistic of remaining feature points and a nonlinear component including a relative camera pose^TNonlinear term in J (ξ)

where r (ξ) represents a vector containing all reprojection errors, J (ξ) is the Jacobian matrix of r (ξ), ξ represents the Li algebra relative to the camera pose, and represents the delta value of r (ξ) at each iteration；R_iRepresenting a rotation matrix of the camera when the ith frame of image is acquired; r_jRepresenting a rotation matrix of the camera when the j frame image is acquired;

representing the kth characteristic point on the ith frame image;

3. The method of claim 2, wherein the non-linear term

The expression of (a) is:

wherein the content of the first and second substances,

4. The method of claim 1 or 2, after determining the relative camera pose at the time of acquisition from the at least two frames of images, further comprising:

if the current frame image acquired by collecting the target scene is a key frame, loop detection is carried out according to the current key frame and a historical key frame;

and if the loop returning is successful, performing globally consistent optimization updating on the determined relative camera pose according to the current key frame.

5. The method of claim 4, further comprising, prior to performing loop back detection based on the current key frame and the historical key frame:

matching operation is carried out on the current frame image acquired by collecting the target scene and the previous key frame image, and a conversion relation matrix between the two frame images is acquired;

and if the conversion relation matrix is larger than or equal to a preset conversion threshold value, determining the current frame image as the current key frame.

6. The method according to claim 1, wherein selecting at least one feature block from all current index blocks having a distance to a target scene surface less than the current level voxel unit corresponding distance threshold comprises:

aiming at each current index block, accessing the index block according to the hash value of the current index block, and respectively calculating the distance from each vertex of the current index block to the surface of the target scene according to the relative camera pose when each frame of image is collected and the image depth value obtained by a depth camera;

and selecting the current index block with the vertex distance smaller than the corresponding distance threshold of the current stage voxel unit as a feature block.

7. The method of claim 1, wherein generating an iso-surface of the grid voxel model resulting in a three-dimensional reconstructed model of the target scene comprises:

and if the current frame image acquired by collecting the target scene is a key frame, generating an isosurface of each voxel block corresponding to the current key frame, and adding colors to the isosurface to obtain a three-dimensional reconstruction model of the target scene.

8. The method of claim 1, after generating an iso-surface of the grid voxel model to obtain a three-dimensional reconstructed model of the target scene, further comprising:

if the current frame image acquired by collecting the target scene is a key frame, selecting a first preset number of key frames matched with the current key frame from the historical key frames, and respectively acquiring a second preset number of non-key frames from the non-key frames corresponding to the selected matched key frames;

optimizing and updating the grid voxel model of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matching key frame and the acquired non-key frame;

and optimizing and updating the isosurface of the three-dimensional reconstruction model according to the corresponding relation between the current key frame and each matched key frame.

9. The method of claim 8, wherein the optimizing the updating of the iso-surface of the three-dimensional reconstructed model according to the correspondence between the current keyframe and each matching keyframe comprises:

for each matching key frame, selecting at least one voxel block of which the distance to the surface of the target scene is less than or equal to an update threshold of a corresponding voxel in the matching key frame from all voxel blocks corresponding to the current key frame;

and performing optimization updating on the isosurface of the matched key frame according to the selected at least one voxel block.

10. The method of claim 9, wherein generating an iso-surface of the grid voxel model comprises:

and selecting the maximum value of the distance from each voxel block in the voxel to the surface of the target scene for each voxel in the keyframe used for generating the isosurface, and setting the maximum value as the updating threshold value of the voxel.

11. A three-dimensional reconstruction apparatus, comprising:

the three-dimensional reconstruction module is used for generating an isosurface of the grid voxel model to obtain a three-dimensional reconstruction model of the target scene;

wherein the voxel determination module comprises:

12. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

at least one depth camera for image acquisition of a target scene;

when executed by the one or more processors, cause the one or more processors to implement the three-dimensional reconstruction method of any one of claims 1-10.

13. The device of claim 12, wherein the one or more processors are central processors; the electronic device is a portable mobile electronic device.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the three-dimensional reconstruction method according to any one of claims 1 to 10.