CN113178009B

CN113178009B - Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair

Info

Publication number: CN113178009B
Application number: CN202110420925.1A
Authority: CN
Inventors: 芮挺; 杨成松; 解文彬; 王东; 刘恂; 郑南; 赵杰; 殷宏; 曾拥华; 胡睿哲
Original assignee: Army Engineering University of PLA
Current assignee: Army Engineering University of PLA
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2023-08-25
Anticipated expiration: 2041-04-19
Also published as: CN113178009A

Abstract

An indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair belongs to the technical field of three-dimensional scene reconstruction in the field of computer vision. Comprising the following steps: image acquisition, extracting characteristic points of an image by using a scale invariant feature transform algorithm, and performing sparse point cloud reconstruction by using an SFM algorithm; and (3) deriving point clouds at positions of reflection and partial reflection, densely reconstructing the original sparse point clouds by using a multi-view stereoscopic vision method, and performing grid processing on the dense point clouds to finish reconstruction of the three-dimensional scene. The invention provides an indoor three-dimensional reconstruction method, which introduces point cloud segmentation in the process of traditional three-dimensional reconstruction, processes the point cloud acquisition problem which is not easy to solve by the traditional multi-view stereoscopic vision method such as light reflection and high reflection areas through a deep learning method, and introduces a grid repair technology in a reconstructed grid model, so that the method is more suitable for reconstructing an indoor scene, and the indoor three-dimensional modeling precision is remarkably improved.

Description

Indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair

Technical Field

The invention belongs to the technical field of three-dimensional scene reconstruction in the field of computer vision, and discloses an indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair.

Background

Computer vision is a science of how to make a machine "look at", and more specifically, it means that a camera and a computer are used to replace human eyes to perform machine vision such as recognition, tracking and measurement on a target, and further perform graphic processing to obtain an image more suitable for human eyes to observe or transmit to an instrument to detect. Three-dimensional reconstruction refers to the establishment of a mathematical model suitable for computer representation and processing of a three-dimensional object, is the basis for processing, operating and analyzing the properties of the three-dimensional object in a computer environment, and is also a key technology for establishing virtual reality expressing an objective world in a computer. The main idea of the motion restoration structure (Structure from Motion, SFM) is that firstly, a camera is used for collecting two-dimensional images of a scene, the corresponding relation between the images is obtained through matching the two-dimensional images, and then, key technologies related to three-dimensional reconstruction are used for further analysis and calculation, so that the internal participation and external participation of the camera and the space three-dimensional data of the scene are obtained. The method does not depend on specific assumption conditions, and has wide application range. The reconstruction based on SFM can be performed by a split module, i.e. the whole reconstruction process can be divided into a plurality of parts. The mode in which each part can be independently realized is convenient for control and improvement, and has robustness to interference factors such as environment and the like. Multi-view stereovision (MVS) is a popularization of stereovision, and can observe and acquire images of a scene at Multiple views, and complete matching and depth estimation, so that dense three-dimensional reconstruction can be implemented on the scene by using Multiple images, wherein common scene representation modes in the dense three-dimensional reconstruction method include voxels, a level set, a polygonal grid and a depth map. MVS has the same principle as classical stereoscopic algorithms, but it aims to handle images with more changes in viewing angle, such as a set of images taken around a target, and also to handle even millions of large numbers of images. The occurrence of the neural network has a new direction for improving the stereo reconstruction. Theoretically, the deep learning method has global semantic information, so that the model learns highlight or reflected information to better perform 2-dimensional to 3-dimensional matching. There are also methods to use two visual patterns instead of manually synthesizing similar criteria for depth-learning based stereo matching, which exhibit satisfactory results and slowly surpass the effects of the conventional methods.

However, most of the existing three-dimensional reconstruction techniques are used for modeling by shooting around the periphery of an object, so that few methods are available for modeling after shooting around the periphery of a room from the inside of a room, and the direct application of these three-dimensional reconstruction techniques to indoor reconstruction is not good. In addition, conventional MVS methods, although exhibiting high accuracy in ideal scenarios, have a common limitation, are susceptible to low-texture, reflective and partially reflective interference, and make matching difficult, thereby resulting in incomplete reconstructed 3D information, and leave much room for improvement in terms of reconstruction integrity. While the deep learning method can better reconstruct the model of high light or reflection, the method is often unsatisfactory in terms of precision and generalization.

Finally, most of the existing three-dimensional reconstruction technology only performs dense reconstruction, after the view angle is pulled up, the model can be changed into a scattered point state and cannot be observed continuously, the defect is particularly obvious in the indoor reconstruction, because a plurality of small objects are often arranged in the indoor environment, the objects are relatively close to each other, the view angle is required to be pulled up when the observation is required, once the view angle is pulled up, the object details cannot be seen completely, the three-dimensional accuracy of the reconstruction is extremely poor, and because the number of holes of the grid model reconstructed indoors is more, the positions with the holes originally are distorted after texture mapping. Therefore, at present, three-dimensional reconstruction of indoor scenes still has various problems to be solved.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides an indoor three-dimensional scene reconstruction method utilizing point cloud segmentation and grid repair. Because of the symmetrical design, the indoor scenes often have similar characteristic points in scenes with different physical positions, and the direct characteristic matching can lead to disorder of the relative position relation among the scenes, so in the invention, only the characteristic points are matched among images with adjacent camera poses; the invention transmits the highlight or reflected model with poor reconstruction effect of the traditional MVS method to the deep learning method for dense reconstruction, and transmits other positions in the scene to the MVS method for reconstruction due to the fact that the precision and generalization of the deep learning method are not good enough. And finally, the reconstruction accuracy is improved by gridding the dense point cloud and repairing the holes and texture maps in the dense point cloud.

The technical scheme of the invention is as follows: an indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid patching performs three-dimensional reconstruction based on an indoor scene image, and comprises the following steps:

1) Collecting images of the whole scene to be modeled;

2) Extracting feature points of the images by using a Scale-invariant feature transform (SIFT) algorithm, estimating the camera pose of each image, and performing feature matching only between images of adjacent poses;

3) Performing sparse point cloud reconstruction by utilizing a motion restoration structure algorithm SFM;

4) Deriving point clouds at reflecting and local reflecting positions of a glass window and the like, and performing dense reconstruction on the derived point clouds by using a deep learning method;

5) Performing MVS dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision method, and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4);

6) And (3) carrying out grid processing on the dense point cloud, repairing holes in the grid model by using a grid repairing method, and adding texture mapping to complete reconstruction of the three-dimensional scene.

In step 1) of the present invention, the image acquisition is performed by taking the whole view of the whole room as much as possible from one position in the room, and then taking the position as the starting point, taking the whole view around the room for one circle, and taking the right-facing views of all the wall surfaces.

As a further improvement, the step 2) of the present invention includes the following steps:

2.1 Performing feature extraction by using a SIFT algorithm, and calculating to obtain position information (x, y) of the feature points through Gaussian filters with different sizes;

2.2 Solving by using an iterative nearest point algorithm to obtain the position and posture information of the camera;

2.3 Matching the feature points between the adjacent pictures of the pose, calculating the distance between the matched feature points, and when the distance between the matched feature points is larger than a set threshold value, considering the matching as a group of error matching, and filtering the group of matching points;

as a further improvement, step 3) of the present invention uses an algorithm of SFM to reconstruct the sparse point cloud.

As a further improvement, the step 4) of the present invention includes the following steps:

4.1 After the sparse reconstruction is completed, we convert the obtained result into txt form, and obtain 3 kinds of information in total: camera. Txt containing all the image camera internal parameters, images. Txt containing all the image pose and key points, and points. Txt containing all the reconstructed 3D points, we select the corresponding information to derive based on the three-dimensional coordinate range of the glass window.

4.2 And (3) densifying the point cloud file derived in the step 4.1) by using a deep learning method to obtain a dense point cloud model.

As a further improvement, step 5) of the invention uses multi-view stereoscopic method MVS dense reconstruction to the original sparse point cloud, and uses the dense point cloud obtained in step 4) to replace the corresponding position of the dense point cloud;

as a further improvement, the step 6) of the present invention includes the following steps:

6.1 A grid segmentation algorithm based on Delaunay tetrahedral subdivision is adopted to obtain a grid model of the three-dimensional scene;

6.2 Repairing the holes in the grid in the step 6.1);

6.3 Generating a texture map, realizing the grid texture of the model, and obtaining the three-dimensional model passing through the texture map.

The beneficial effects of the invention are as follows:

the invention provides a data set acquisition mode suitable for indoor modeling, which solves the problem that scenes with similar characteristic points at different physical positions are caused by symmetrical design of indoor scenes by only matching the characteristic points of adjacent pose pictures, and obtains a dense reconstruction model with good precision by combining the traditional MVS method and the deep learning method. And then, the problem of frequent holes of the indoor scene is solved by using a grid repair technology, and the complete and well-accurate indoor three-dimensional scene model is obtained.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 (a) is a graph of the reconstruction effect of a glass window before consolidation.

Fig. 2 (b) is a graph of the reconstruction effect of the combined glass window.

Fig. 3 (a) is a diagram of mesh repair with the addition of a triangle.

Fig. 3 (b) is a diagram of mesh repair when two triangles are added.

Fig. 4 (a) is a graph showing the effect of mesh reconstruction before repair.

Fig. 4 (b) is a graph of the effect of mesh reconstruction after repair.

Fig. 5 is a schematic view of the indoor final reconstruction effect.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, an indoor three-dimensional scene reconstruction method based on a motion restoration structure and multi-view stereoscopic vision includes the following steps: 1) Data acquisition is carried out on a scene to be modeled, and picture information covering the whole scene is acquired; 2) Extracting feature points of images by using a SIFT algorithm, estimating the pose of a camera of each image, and performing feature matching only between images of adjacent poses; 3) Performing sparse point cloud reconstruction by utilizing a motion restoration structure algorithm SFM; 4) The point clouds at the positions of reflection, such as glass windows, and the like, of the local reflection are exported, and the exported point clouds are reconstructed densely by using a deep learning method; 5) Performing MVS dense reconstruction on the original sparse point cloud by using a multi-view stereoscopic vision method, and replacing the corresponding position of the dense point cloud by using the dense point cloud obtained in the step 4); 6) And (3) carrying out grid processing on the dense point cloud, repairing holes in the grid model by using a grid repairing method, and adding texture mapping to complete reconstruction of the three-dimensional scene.

The invention provides an indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid repair, which comprises the following steps:

1) And (3) image acquisition, namely acquiring an image covering the whole scene to be molded. Firstly, taking the whole picture of the whole room as far as possible from one corner position in a room, and then taking the corner as a starting point, encircling the room for one circle, and shooting the opposite scenes of all the wall surfaces.

2) And extracting characteristic points of the images by using a scale-invariant characteristic transformation algorithm, estimating the pose of a camera of each image, and performing characteristic matching only between images of adjacent poses. The method specifically comprises the following steps:

2.1 Feature extraction is performed by using a SIFT algorithm, position information (x, y) of feature points is calculated by using gaussian filters (Difference of Gaussian, DOG) with different sizes, and descriptor information is provided at the same time, and in a 4×4 square histogram around a feature point, each histogram contains 8 gradient directions, so as to obtain a 4×4×8=128-dimensional feature vector. In addition, the SIFT algorithm obtains size (scale) and direction (orientation) information by calculation.

Specifically, curve fitting is carried out on a Gaussian filter, a three-dimensional quadratic function is fitted, the fitting function is derived, and the equation is equal to 0, so that an extreme point can be obtainedWherein x, y are position information and +.>Is the size information calculated together. After the position and size calculation of the feature points is completed, a reference direction needs to be assigned to each feature point by using the local features of the image. For all the detected feature points, the image in which the feature points are located is acquired +.>The gradient and direction distribution characteristics of the pixels in the neighborhood window are then used to count the gradient and direction of the pixels in the neighborhood. The gradient histogram divides the direction range of 0-360 degrees into 36 bins (bins), wherein each bin is 10 degrees, and the maximum value in the histogram is taken as the main direction of the key point.

the iterative closest point (Iterative Closest Point, ICP) algorithm is essentially an optimal registration method based on the least squares method. The algorithm repeatedly selects corresponding relation point pairs, and calculates the optimal rigid body transformation until the convergence accuracy requirement of correct registration is met. Two three-dimensional point sets X1 and X2 are provided, and the registration steps of the ICP method are carried out on the two point sets as follows: the first step, calculating the corresponding near point of each point in the point set X2 in the point set X1;

step two, calculating the minimum rigid transformation which enables the average distance of the corresponding point pair to be changed, and obtaining translation parameters and rotation parameters;

thirdly, using the translation and rotation parameters obtained in the previous step for X2 to obtain a new transformation point set;

and fourthly, stopping iterative computation if the average distance between the new transformation point set and the reference point set meets the condition that the average distance between the two point sets is smaller than a given threshold value, otherwise, continuing to participate in iteration as a new X2 point set until the requirement of an objective function is met.

2.3 For any picture, selectSelecting z pictures with the most similar pose, and setting z in general>And 4, performing feature matching on unmarked matched images by using a nearest neighbor matching algorithm, wherein the algorithm performs feature point matching by using the nearest neighbor distance and the next nearest neighbor distance of the feature vector, and taking the ratio of the nearest neighbor distance and the next nearest neighbor distance as a matching constraint condition: for a certain feature vector in image ICalculating and screening two feature vectors nearest to the feature vector in the feature vector of the other image J>And->Define the ratio of the distance between them as +.>Compares it with a preset threshold value T, if d<T indicates that the two feature points are successfully matched, otherwise, the matching does not exist, and T is usually more than or equal to 0.6. After the matching of the two images is completed, the two images are marked as matched.

3) Sparse point cloud reconstruction is performed using a motion restoration structure algorithm (Structure from Motion, SFM). The method specifically comprises the following steps:

3.1 After the SFM algorithm adopted by the invention is finished (camera pose estimation, extraction and matching of the feature points), three-dimensional point coordinates corresponding to the matched feature points can be recovered by using a triangulation algorithm. The algorithm solves for the optimal three-dimensional coordinates by minimizing the sum of squares of the errors between the observed image point and the predicted coordinates of the three-dimensional point on the image.

3.2 The SFM algorithm adopted by the invention has 3 files, namely, camera. Bin, images. Bin and points. Bin after the sparse reconstruction is finished, and in order to enable the SFM algorithm to be in a form usable by a deep learning method, the bin files are converted into txt text files: camera. Txt containing all the image camera internal parameters, images. Txt containing all the image poses and key points, and points. Txt containing all reconstructed 3D points. Marking the range of three-dimensional points according to the positions of glass and the like with poor reconstruction effect in the sparse point cloud model, and deleting points which are not in the range from the points file.

4) And (3) deriving point clouds at reflective and local reflective positions of glass windows and the like, and performing dense reconstruction on the derived point clouds by using a deep learning method. The method specifically comprises the following steps:

4.1 After the sparse reconstruction is completed, we convert the obtained result into txt form, and obtain 3 kinds of information in total: camera. Txt containing all the internal parameters of the image camera, images. Txt containing all the image gestures and key points and points. Txt containing all the reconstructed 3D points, we select the corresponding information to derive according to the three-dimensional coordinate range of the glass window;

4.2 And (3) densifying the point cloud file derived in the step 4.1) by using a deep learning method to obtain a dense point cloud model. The deep learning method has very good reconstruction effect on scenes such as glass windows with highlight or reflection characteristics. The method is realized by using a model MVSNet, and the model can read the point cloud for direct processing to obtain a dense point cloud of a well-processed glass window scene.

5) The original sparse point cloud is densely reconstructed by utilizing a multi-view stereoscopic vision method, and the dense point cloud obtained in the step 4) is utilized to replace the corresponding position of the dense point cloud; the method specifically comprises the following steps:

5.1 Performing dense reconstruction on the sparse point cloud model reconstructed in the step 3) by using a traditional MVS method; here, we directly use an open source openMVS model that can automatically generate a dense point cloud from sparse reconstruction information of the point cloud (camera pose, feature point pose, etc.).

5.2 Deleting the three-dimensional coordinate range determined in 4.1) from the dense point cloud obtained in 5.1), merging the remaining dense point clouds of 4.2) and 5.1) by using a point cloud library (PointCloudLibrary, PCL), and filling the deleted positions by the dense point clouds constructed in 4.2) to obtain a complete dense point cloud.

The generated dense point cloud model glass window has the partial effect shown in fig. 2, wherein fig. 2 (a) and fig. 2 (b) are respectively 5.1) of the built dense point cloud and 5.2) of the combined dense point cloud, the window in fig. 2 (a) has very poor reconstruction effect because of high transparency and reflectivity of glass, and the dense point cloud fig. 2 (b) reconstructed by a deep learning method has very good glass window effect and basically restores the shape of the glass window. However, at this time, a plurality of holes are still formed in the scene, and the scene is realized in a point cloud form, so that after the view angle is dragged into observation, the model can be in a scattered point state, and details cannot be observed.

6) And (3) carrying out grid processing on the dense point cloud, repairing holes in the grid model by using a grid repairing method, and adding texture mapping to complete reconstruction of the three-dimensional scene. The method specifically comprises the following steps:

6.1 The grid segmentation algorithm based on the Delaunay tetrahedron subdivision is adopted, and then the Delaunay tetrahedron is further weighted, cut in a graph and reconstructed in a grid, so that a grid model of the three-dimensional scene is obtained.

The effect of the generated grid model is shown in fig. 4 (a), at this time, the problem of the maximum dense point cloud model, namely, a smooth and complete model is solved, but at this time, the model loses all details of the scene to be modeled, only the external contour of the scene to be modeled can be observed, and because of the complexity of the indoor environment, some large holes still exist even after meshing.

6.2 6.1) repairing the grid, wherein the texture mapping is performed later, the repaired grid is covered by the mapping, and the repairing is performed by using a minimum angle method without too fine grid repairing algorithm in consideration of the reconstruction speed, and the steps are as follows:

6.2.1 Obtaining hole boundary point information, and calculating an average value I of the length of the boundary edge.

6.2.2 Calculating the included angle between two adjacent edges of each boundary point.

6.2.3 Finding out the boundary point with the minimum included angle, calculating the distance s between two adjacent boundary points, and judging whether s <2*I is true or not: if so, one triangle is added as shown in fig. 3 (a), and if not, two triangles are added as shown in fig. 3 (b).

6.2.4 Updating the boundary point information.

6.2.5 Judging whether the hole is repaired completely, if not, turning to 6.2.2), otherwise, ending.

As shown in FIG. 4 (b), the repaired mesh model has been completed by filling all of those large voids in the model, as compared to FIG. 4 (a).

6.3 Initializing the view of each acquired image, designating the optimal view for each face of the model, generating a texture map, realizing the grid texture of the model, and obtaining the three-dimensional model through texture mapping.

The final effect of indoor scene reconstruction is shown in fig. 5, and a three-dimensional reconstruction model which is smooth, complete, free of holes, complete in detail and extremely high in accuracy is obtained.

Claims

1. An indoor three-dimensional reconstruction method utilizing point cloud segmentation and grid patching is characterized by carrying out three-dimensional reconstruction based on an indoor scene image and comprises the following steps:

1) Collecting images of the whole scene to be modeled;

2) Extracting feature points of images by using a scale-invariant feature transformation algorithm, estimating the pose of a camera of each image, and performing feature matching only between images of adjacent poses;

3) Performing sparse point cloud reconstruction by using a motion restoration structure algorithm;

4) Deriving point clouds at reflective and local reflective positions in a glass window, and performing dense reconstruction on the derived point clouds by using a deep learning method;

5) The original sparse point cloud is densely reconstructed by utilizing a multi-view stereoscopic vision method, and the dense point cloud obtained in the step 4) is utilized to replace the corresponding position of the dense point cloud;

2. The indoor three-dimensional reconstruction method using point cloud segmentation and grid patching according to claim 1, wherein in the image acquisition, firstly, the whole view of the whole room is taken as much as possible from one position in the room, and then the position is taken as a starting point, and the whole view of all the wall surfaces is taken around the room for one circle.

3. The indoor three-dimensional reconstruction method using point cloud segmentation and mesh patching according to claim 1, wherein the step 2) comprises the steps of:

2.1 Extracting features by using a scale-invariant feature transformation algorithm, and calculating to obtain position information (x, y) of feature points through Gaussian filters with different sizes;

2.3 Matching the feature points between the adjacent pictures of the pose, calculating the distance between the matched feature points, and when the distance between the matched feature points is larger than a set threshold value, considering the matching as a group of error matching, and filtering the group of matching points.

4. The indoor three-dimensional reconstruction method using point cloud segmentation and mesh patching according to claim 1, wherein the step 4) comprises the steps of:

5. The indoor three-dimensional reconstruction method using point cloud segmentation and mesh patching according to claim 1, wherein the step 6) comprises the steps of:

6.2 Repairing the holes in the grid of 6.1);