CN113178009A

CN113178009A - Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair

Info

Publication number: CN113178009A
Application number: CN202110420925.1A
Authority: CN
Inventors: 芮挺; 杨成松; 解文彬; 王东; 刘恂; 郑南; 赵杰; 殷宏; 曾拥华; 胡睿哲
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-07-27
Anticipated expiration: 2041-04-19
Also published as: CN113178009B

Abstract

An indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair belongs to the technical field of three-dimensional scene reconstruction in the field of computer vision. The method comprises the following steps: acquiring an image, extracting feature points of the image by using a scale invariant feature transform algorithm, and reconstructing sparse point cloud by using an SFM algorithm; and exporting the point cloud at the positions of the light reflection and the local reflection, carrying out dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision method, and carrying out grid processing on the dense point cloud to complete the reconstruction of the three-dimensional scene. The invention provides an indoor three-dimensional reconstruction method, which introduces point cloud segmentation in the traditional three-dimensional reconstruction process, processes the problem of point cloud acquisition which is difficult to solve by traditional multi-view stereoscopic vision methods such as reflecting and high-reflection areas and the like through a deep learning method, introduces a mesh repairing technology into a reconstructed mesh model, enables the reconstruction to be more suitable for an indoor scene, and obviously improves the indoor three-dimensional modeling precision.

Description

Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair

Technical Field

The invention belongs to the technical field of three-dimensional scene reconstruction in the field of computer vision, and discloses an indoor three-dimensional reconstruction method by utilizing point cloud segmentation and grid repair.

Background

Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the image is more suitable for human eyes to observe or is transmitted to an instrument to detect. Three-dimensional reconstruction refers to the establishment of a mathematical model suitable for computer representation and processing of a three-dimensional object, is the basis for processing, operating and analyzing the properties of the three-dimensional object in a computer environment, and is also a key technology for establishing virtual reality expressing an objective world in a computer. The main idea of the Motion recovery Structure (SFM) is to first acquire two-dimensional images of a scene by using a camera, obtain a corresponding relationship between the images by matching the two-dimensional images, and further analyze and calculate by using a key technology related to three-dimensional reconstruction to acquire internal and external parameters of the camera and spatial three-dimensional data of the scene. The method does not depend on specific assumed conditions and has wide application range. The reconstruction based on the SFM can be carried out in a module mode, namely, the whole reconstruction process can be divided into a plurality of parts. The mode that each part can be independently realized is convenient to control and improve, and has robustness to interference factors such as environment and the like. Multi-view stereo (MVS) is a popularization of stereo vision, and can observe and acquire images of a scene at Multiple viewing angles, and thus complete matching and depth estimation, and can perform dense three-dimensional reconstruction on the scene using Multiple images, wherein common scene representation methods in the dense three-dimensional reconstruction method include voxels, level sets, polygonal grids, and depth maps. MVS has the same principle as classical stereo vision algorithms, but aims at handling images with more varying viewing angles, such as a set of images taken around an object, and can also handle even millions of large numbers of images. The appearance of the neural network enables the stereo reconstruction to be improved and has a new direction. Theoretically, the deep learning method has global semantic information, so that the model learns highlight or reflection information to better perform 2-dimensional to 3-dimensional matching. There are also methods that use two visual images instead of synthesizing similar criteria manually to perform deep learning based stereo matching, which show satisfactory results and slowly outperform the conventional methods.

However, most of the existing three-dimensional reconstruction techniques are shot for one circle around the periphery of an object for modeling, and there are few methods suitable for modeling after shooting for one circle around a room from the inside of the room, and the direct application of these three-dimensional reconstruction techniques to indoor reconstruction is not good. In addition, although the conventional MVS methods exhibit high accuracy in an ideal scene, they all have a common limitation, and are easily interfered by low texture, reflection and local reflection, which makes matching difficult, thereby resulting in incomplete reconstructed 3D information, and there is a great room for improvement in reconstruction integrity. The deep learning method can better reconstruct a highlight or reflection model, but is often unsatisfactory in terms of accuracy and generalization.

Finally, most of the existing three-dimensional reconstruction technologies only achieve dense reconstruction, after the visual angle is drawn close, the model becomes a scattered point state, and observation cannot be continued, the defect is particularly obvious in indoor reconstruction, because a plurality of small objects are often arranged in an indoor environment and the objects are relatively close to each other, the visual angle is drawn close inevitably when observation is needed, once the visual angle is drawn close, details of the objects cannot be seen completely, the reconstructed three-dimensional accuracy is extremely poor, and because the number of holes of the indoor reconstructed grid model is large, the original positions with holes are distorted after texture mapping. Therefore, at present, various problems to be solved still exist in the three-dimensional reconstruction of the indoor scene.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides an indoor three-dimensional scene reconstruction method by utilizing point cloud segmentation and mesh repair. Because of the symmetrical design of indoor scenes, scenes in different physical positions often have similar feature points, and the relative position relationship between the scenes is disordered when the feature matching is directly carried out, so that in the invention, the feature points are only matched between images with adjacent camera poses; according to the method, the highlight or reflection model with poor reconstruction effect of the traditional MVS method is subjected to dense reconstruction by the deep learning method, and other positions in the scene are subjected to reconstruction by the MVS method due to the fact that the precision and the generalization of the deep learning method are not good enough. And finally, improving the reconstruction precision by meshing the dense point cloud and repairing the holes and texture maps in the dense point cloud.

The technical scheme of the invention is as follows: an indoor three-dimensional reconstruction method by utilizing point cloud segmentation and grid repair is based on an indoor scene image to carry out three-dimensional reconstruction and comprises the following steps:

1) acquiring an image, namely acquiring an image covering the whole scene to be modeled;

2) extracting feature points of the images by using a Scale-invariant feature transform (SIFT) algorithm, estimating the camera pose of each image, and performing feature matching only between the images of adjacent poses;

3) performing sparse point cloud reconstruction by using a motion recovery structure algorithm SFM;

4) exporting point clouds at the positions of light reflection and local reflection of a glass window and the like, and carrying out dense reconstruction on the exported point clouds by using a deep learning method;

5) carrying out dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision (MVS), and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4);

6) and carrying out grid processing on the dense point cloud, repairing the holes in the grid model by using a grid repairing method, and adding a texture map to complete the reconstruction of the three-dimensional scene.

As a further improvement, in step 1) of the present invention, during image acquisition, a full view of the whole room is taken from a position in the room as much as possible, and then the position is taken as a starting point and a circle is taken around the room to take the opposite scenes of all wall surfaces.

As a further improvement, step 2) of the present invention includes the following steps:

2.1) utilizing an SIFT algorithm to extract features, and calculating through Gaussian filters with different sizes to obtain position information (x, y) of feature points;

2.2) solving by using an iterative closest point algorithm to obtain the position and posture information of the camera;

2.3) matching the feature points between the pictures adjacent to the pose, calculating the distance between the matched feature points, and when the distance between the matched feature points is greater than a set threshold value, considering the matched feature points as a group of error matches, and filtering the group of matched feature points;

as a further improvement, the step 3) of the invention utilizes an SFM algorithm to reconstruct the sparse point cloud.

As a further improvement, the step 4) comprises the following steps:

4.1) after the sparse reconstruction is completed, converting the obtained result into a txt form, and obtaining 3 types of information in total: the camera.txt containing all image camera internal parameters, the images.txt containing all image postures and key points and the points.txt containing all reconstructed 3D points are derived by selecting corresponding information according to the three-dimensional coordinate range of the glass window.

4.2) carrying out densification on the point cloud file exported in the step 4.1) by using a deep learning method to obtain a dense point cloud model.

As a further improvement, step 5) of the invention performs dense reconstruction on the original sparse point cloud by using a multi-view stereo vision method MVS, and replaces the corresponding position of the dense point cloud by using the dense point cloud obtained in step 4);

as a further improvement, step 6) of the present invention includes the following steps:

6.1) obtaining a mesh model of a three-dimensional scene by adopting a mesh segmentation algorithm based on Delaunay tetrahedron subdivision;

6.2) repairing the holes in the grids in the step 6.1);

6.3) generating a texture atlas, realizing the grid texture of the model, and obtaining the three-dimensional model subjected to texture mapping.

The invention has the following beneficial effects:

the invention provides a data set acquisition mode suitable for indoor modeling, the problem that scenes at different physical positions have similar characteristic points due to the symmetrical design of indoor scenes is solved by only matching the characteristic points of adjacent pose pictures, and a dense reconstruction model with good precision is obtained by combining the traditional MVS method and the deep learning method. And then, the problem of frequent cavities of the indoor scene is solved by utilizing a grid repairing technology, and an integral indoor three-dimensional scene model with good precision is obtained.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2(a) is a graph showing the reconstruction effect of the glass window before combination.

Fig. 2(b) is a reconstruction effect diagram of the combined glass window.

FIG. 3(a) is a schematic diagram of mesh patch when a triangle is added.

FIG. 3(b) is a schematic diagram of mesh patch when two triangles are added.

Fig. 4(a) is a graph showing the effect of mesh reconstruction before repair.

Fig. 4(b) is a graph showing the effect of mesh reconstruction after the repair.

Fig. 5 is a schematic diagram of the final indoor reconstruction effect.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, an indoor three-dimensional scene reconstruction method based on a motion recovery structure and multi-view stereo vision includes the following steps: 1) carrying out data acquisition on a scene to be modeled, and acquiring picture information covering the whole scene; 2) extracting feature points of the images by using an SIFT algorithm, estimating the camera pose of each image, and only performing feature matching between the images of adjacent poses; 3) performing sparse point cloud reconstruction by using a motion recovery structure algorithm SFM; 4) reflecting light of a glass window and the like, exporting point clouds at local reflection positions, and performing dense reconstruction on the exported point clouds by using a deep learning method; 5) carrying out dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision (MVS) method, and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4); 6) and carrying out grid processing on the dense point cloud, repairing the holes in the grid model by using a grid repairing method, and adding a texture map to complete the reconstruction of the three-dimensional scene.

The invention provides an indoor three-dimensional reconstruction method by utilizing point cloud segmentation and grid repair, which comprises the following steps:

1) and acquiring an image, namely acquiring an image covering the whole scene to be modeled. The method comprises the steps of shooting the whole room as far as possible from a corner position in the room, then taking the corner as a starting point, surrounding the room for a circle, and shooting the opposite scenes of all wall surfaces.

2) And extracting feature points of the images by using a scale-invariant feature transformation algorithm, estimating the camera pose of each image, and only performing feature matching between the images of adjacent poses. The method specifically comprises the following steps:

2.1) extracting features by using a SIFT algorithm, calculating position information (x, y) of feature points by using Gaussian filters (DOG) with different sizes, and providing descriptor information, wherein each histogram comprises 8 gradient directions in a 4 × 4 square histogram around a feature point, so that a 4 × 4 × 8= 128-dimensional feature vector is obtained. In addition, the SIFT algorithm obtains size (scale) and orientation (orientation) information by calculation.

Specifically, a Gaussian filter is subjected to curve fitting, a three-dimensional quadratic function is fitted, the fitted function is derived and the equation is set to be 0, and then extreme points can be obtained

Where x, y are positional information, and

is the size information calculated together. After the position and size calculation of the feature points is completed, a reference direction needs to be assigned to each feature point by using the local features of the image. Collecting images of all detected feature points

And (4) distributing the gradient and direction of the pixels in the neighborhood window, and then counting the gradient and direction of the pixels in the neighborhood by utilizing the histogram. The gradient histogram divides the direction range of 0-360 degrees into 36 bins (bins), wherein each bin is 10 degrees, and the maximum value in the histogram is taken as the main direction of the key point.

the Iterative Closest Point (ICP) algorithm is essentially an optimal registration method based on the least squares method. The algorithm repeatedly selects the corresponding relation point pairs and calculates the optimal rigid body transformation until the convergence precision requirement of correct registration is met. Two three-dimensional point sets X1 and X2 are provided, and the registration steps of the ICP method for these two point sets are as follows: first, calculating corresponding near points of each point in the point set X2 in the point set X1;

secondly, calculating the rigid body transformation which enables the corresponding point to have the minimum average distance variation, and solving a translation parameter and a rotation parameter;

thirdly, obtaining a new transformation point set by using the translation and rotation parameters obtained in the previous step for X2;

and fourthly, stopping iterative computation if the average distance between the new transformation point set and the reference point set is smaller than a given threshold value, otherwise, taking the new transformation point set as a new X2 point set to continue to participate in iteration until the requirement of the objective function is met.

2.3) for any picture, selecting z pictures with the most similar poses, and generally setting z>=4, then, the feature matching is performed between the unmarked matched images by using a nearest neighbor matching algorithm, the algorithm performs the matching of the feature points by using the nearest neighbor distance and the next nearest neighbor distance of the feature vector, and using the ratio of the nearest neighbor distance and the next nearest neighbor distance as the matching constraint condition: for a certain feature vector in the image I

Calculating and screening out two feature vectors which are closest to the feature vector in the feature vector of another image J

And

defining the distance ratio between them as

It is compared with a preset threshold value T, if d<And T, the two feature points are successfully matched, otherwise, the matching does not exist, and T is usually set to be more than or equal to 0.6. And after the two images are matched, marking the two images as matched.

3) Sparse point cloud reconstruction is performed using a Motion recovery Structure algorithm (SFM). The method specifically comprises the following steps:

3.1) after the SFM algorithm adopted by the invention is completed (camera pose estimation, feature point extraction and matching), the three-dimensional point coordinates corresponding to the matched feature points can be recovered by utilizing a triangulation algorithm. The algorithm solves for the optimal three-dimensional coordinates by minimizing the sum of the squared errors between the observed image points and the predicted coordinates of the three-dimensional points on the image.

3.2) the SFM algorithm employed by the present invention, after the sparse reconstruction is completed, there are 3 files, camera. Camera.txt containing all image camera internal parameters, images.txt containing all image poses and keypoints, and points.txt containing all reconstructed 3D points. Marking a range of three-dimensional points according to positions of glass and the like with poor reconstruction effect in the sparse point cloud model, and deleting points which are not in the range from the points file.

4) And (4) exporting point clouds at the positions of light reflection and local reflection such as glass windows, and performing dense reconstruction on the exported point clouds by using a deep learning method. The method specifically comprises the following steps:

4.1) after the sparse reconstruction is completed, converting the obtained result into a txt form, and obtaining 3 types of information in total: the method comprises the steps of obtaining a camera txt containing internal parameters of all image cameras, an image txt containing all image postures and key points and a points txt containing all reconstructed 3D points, and selecting corresponding information to be derived according to a three-dimensional coordinate range of a glass window;

4.2) carrying out densification on the point cloud file exported in the step 4.1) by using a deep learning method to obtain a dense point cloud model. The deep learning method has a good reconstruction effect on scenes such as glass windows with highlight or reflection characteristics. The step is realized by using a model MVSNet, and the model can read point clouds and directly process the point clouds to obtain well-processed dense point clouds of a glass window scene.

5) Carrying out dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision method, and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4); the method specifically comprises the following steps:

5.1) carrying out dense reconstruction on the sparse point cloud model reconstructed in the step 3) by utilizing a traditional MVS method; here, we directly use an open source openMVS model that can automatically generate dense point clouds from sparse reconstruction information (camera pose, feature point pose, etc.) of the point clouds.

5.2) deleting the three-dimensional coordinate range determined in the 4.1) from the dense point cloud obtained in the 5.1), merging the 4.2) and the 5.1) residual dense point clouds by using a Point Cloud Library (PCL), and filling the deleted position with the dense point cloud constructed in the 4.2) to obtain a complete dense point cloud.

The effect of the generated dense point cloud model glass window part is shown in fig. 2, wherein fig. 2(a) and fig. 2(b) are respectively 5.1) built dense point cloud and 5.2) combined dense point cloud, the reconstruction effect of the window in fig. 2(a) is very poor because of the high transparency and light reflectivity of the glass, and the glass window effect of the dense point cloud graph 2(b) reconstructed by the deep learning method is very good, and the shape of the glass window is basically restored. However, there are many holes in the scene, and since the scene is realized in a point cloud form, after the view angle is dragged and observed, the model is in a scattered point state, and details cannot be observed.

6) And carrying out grid processing on the dense point cloud, repairing the holes in the grid model by using a grid repairing method, and adding a texture map to complete the reconstruction of the three-dimensional scene. The method specifically comprises the following steps:

6.1) adopting a mesh segmentation algorithm based on Delaunay tetrahedron subdivision, and then further carrying out weighting, graph cutting and mesh reconstruction on the Delaunay tetrahedron to obtain a mesh model of the three-dimensional scene.

The generated mesh model has the effect as shown in fig. 4(a), and at this time, the problem that the dense point cloud model is the largest, that is, a smooth and complete model is not solved, but at this time, all details of the scene to be modeled are lost by the model, only the external contour of the scene to be modeled can be observed, and because of the complexity of the indoor environment, even after the mesh is formed, some large holes still exist.

6.2) repairing the grid of 6.1), because texture mapping is carried out later, the repaired grid can be covered by mapping, and the reconstruction speed is considered, so that the minimum angle method is not needed to be used for repairing without too fine grid repairing algorithm, and the steps are as follows:

6.2.1) obtaining the information of the hole boundary point and calculating the average value I of the boundary edge length.

6.2.2) calculating the included angle of two adjacent edges of each boundary point.

6.2.3) finding the boundary point with the minimum included angle, calculating the distance s between two adjacent boundary points, and judging whether s <2 × I is true: if this is true, one triangle is added as shown in fig. 3(a), and if this is not true, two triangles are added as shown in fig. 3 (b).

6.2.4) updating the boundary point information.

6.2.5) judging whether the hole is repaired completely, if not, turning to 6.2.2), otherwise, ending.

The mesh model after repairing is shown in FIG. 4(b), and compared with FIG. 4(a), all of the large holes in the model have been repaired.

6.3) initializing the view of each acquired image, assigning the optimal view to each surface of the model, generating a texture map, realizing the grid texture of the model, and obtaining the three-dimensional model subjected to texture mapping.

The method utilizes the final effect of indoor scene reconstruction as shown in figure 5 to obtain a smooth, complete, void-free and detail-fully-displayed three-dimensional reconstruction model with extremely high precision.

Claims

1. An indoor three-dimensional reconstruction method by utilizing point cloud segmentation and grid repair is characterized in that three-dimensional reconstruction is carried out based on an indoor scene image, and the method comprises the following steps:

2) extracting feature points of the images by using a scale invariant feature transformation algorithm, estimating the camera pose of each image, and only performing feature matching between the images of adjacent poses;

3) performing sparse point cloud reconstruction by using a motion recovery structure algorithm;

4) exporting point clouds at the positions of light reflection and local reflection in the glass window, and performing dense reconstruction on the exported point clouds by using a deep learning method;

5) carrying out dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision method, and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4);

2. The method as claimed in claim 1, wherein the image acquisition comprises taking a complete picture of the whole room from a position in the room, and taking the opposite scenes of all wall surfaces around the room from the position as a starting point.

3. The method of claim 1, wherein the step 2) comprises the following steps:

2.1) extracting features by using a scale invariant feature transformation algorithm, and calculating position information (x, y) of feature points by using Gaussian filters with different sizes;

and 2.3) matching the feature points between the pictures adjacent to the pose, calculating the distance between the matched feature points, and when the distance between the matched feature points is greater than a set threshold value, considering the matched feature points as a group of error matches and filtering the group of matched feature points.

4. The method of claim 1, wherein the sparse point cloud is reconstructed by using a motion recovery structure algorithm in step 3).

5. The method of claim 1, wherein the step 4) comprises the following steps:

6. The method of claim 1, wherein the step 5) comprises the following steps: carrying out dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision method, and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4).

7. The method of claim 1, wherein the step 6) comprises the following steps:

6.2) repairing the holes in the 6.1) grids;