CN116704123A

CN116704123A - Three-dimensional reconstruction method combined with image main body extraction technology

Info

Publication number: CN116704123A
Application number: CN202310604414.4A
Authority: CN
Inventors: 吴磊; 叶许超; 尹志城; 程秀超
Original assignee: Hebei Dinglian Technology Co ltd
Current assignee: Hebei Dinglian Technology Co ltd
Priority date: 2023-05-26
Filing date: 2023-05-26
Publication date: 2023-09-05

Abstract

The invention relates to a three-dimensional reconstruction method combining an image main body extraction technology, which comprises the following steps: sequentially placing foreground objects into at least two different spatial postures, and respectively collecting image sequences; performing interactive image segmentation on part of the images, clicking a front Jing Shengcheng positive label, clicking a background to generate a negative label, and generating a segmentation result; combining image feature extraction and an image feature matching algorithm, sequentially transmitting the positive label and the negative label to an image pair with highest matching degree through image feature matching points, so as to finish image segmentation; taking the image extracted by the main body and the image matching result as input, executing an SfM process, completing the calculation of camera parameters, and triangulating a sparse point cloud according to the calculated camera parameters; registering the sparse point clouds which finish camera parameter calculation under different spatial postures, uniformly transforming the data to the same coordinate system, and executing a subsequent reconstruction step on the data transformed to the uniform coordinate system to finish three-dimensional reconstruction.

Description

Three-dimensional reconstruction method combined with image main body extraction technology

Technical Field

The invention relates to the technical field of three-dimensional reconstruction, in particular to a three-dimensional reconstruction method combined with an image main body extraction technology.

Background

Three-dimensional reconstruction is a computer technique that uses two-dimensional projections to recover three-dimensional information (shape, etc.) of an object. With the rapid development of computer software and hardware, the real-time performance of drawing large-scale and high-precision three-dimensional scenes is continuously improved, and the difficulty is greatly reduced. Meanwhile, three-dimensional reconstruction technology with high computational complexity has also advanced, and some classical three-dimensional reconstruction algorithms have been successfully applied in the fields of reverse engineering, video entertainment, industrial design, cultural relics protection, urban informatization such as "digital earth", "smart city", and the like. The three-dimensional modeling approach can be divided into three types: three-dimensional software modeling based on geometric information, distance-based measurement modeling, and image-based modeling. Three data sources correspond to three modeling methods. The image-based model reconstruction method has the advantages of convenience in data acquisition, low equipment price and the like, and rapidly becomes one of the important methods of current three-dimensional modeling. The generation of three-dimensional models via motion restoration structures (SfM, structure from Motion) using two-dimensional images is a key step in image-based reconstruction, making it a popular research direction in recent years based on image three-dimensional modeling.

However, the three-dimensional reconstruction process is easy to be interfered by the background in the acquired image, and meanwhile, the reconstructed three-dimensional model needs manual intervention to delete the background part. The manual mode has the defects that: during image acquisition, the surrounding environment is acquired outside the main body object, the three-dimensional reconstruction process including feature matching and the like is influenced, and the accuracy is reduced. The three-dimensional model containing the environment scene needs manual intervention and model cutting to remove the background scene, and no better method is available at present to solve the problem. Before three-dimensional reconstruction, the image is preprocessed, the background is removed, the main body is extracted, the interference of the background can be reduced, and the manual interference can be reduced.

For foreground extraction, mask_rcnn is a classical image style deep learning framework. Different instances in a picture can be partitioned on a priori basis. But because mask rcnn is a priori based, it is not well processed for objects or combinations of objects that have not been seen. In the three-dimensional reconstruction task, a plurality of unusual objects exist, and the foreground extraction task cannot be effectively performed.

Based on saliency detection techniques, such as SaliencyFilters, often do not distinguish between foreground and background correctly, attempts to combine SaliencyFilters with SuperPixels have not been ideal.

When the image sequence of the object is acquired, the generated model needs to be manually repaired because the bottom of the object cannot be acquired.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide a three-dimensional reconstruction method combined with an image main body extraction technology, so as to solve the technical problems in the prior art.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention provides a three-dimensional reconstruction method combined with an image main body extraction technology, which comprises the following steps:

s1, sequentially placing foreground objects into at least two different spatial postures, and respectively collecting image sequences under each spatial posture;

s2, performing interactive image segmentation on part of the image, clicking a front Jing Shengcheng positive label, clicking a background to generate a negative label, and generating a segmentation result according to the positive label and the negative label;

s3, combining image feature extraction and an image feature matching algorithm, sequentially transmitting the positive label and the negative label to an image pair with highest matching degree through image feature matching points, and thus completing image segmentation;

s4, taking the image extracted by the main body and an image matching result as input, executing an SfM process, completing camera parameter calculation, and triangulating a sparse point cloud according to the calculated camera parameters;

and S5, registering the sparse point clouds which finish camera parameter calculation under different spatial postures, uniformly transforming the data to the same coordinate system, realizing coordinate alignment, and executing a subsequent reconstruction step on the data transformed to the uniform coordinate system to finish three-dimensional reconstruction.

Further, the interactive image segmentation in the step S2 includes: after clicking a single image, generating a rough mask by the network model according to the clicking area, extracting the area concerned by the user through an image morphology algorithm, carrying out local prediction, updating the mask, carrying out fine segmentation, taking the mask and the original image obtained in the previous step as input, adopting a conditional random field, combining color consistency and feature similarity, constructing an energy equation, and refining a segmentation boundary to obtain a segmentation result.

Further, the step S3 includes: searching a new image with highest matching degree with the segmented image, wherein the image characteristic points matched with the foreground part become positive labels of the new image, the image characteristic points matched with the background part become negative labels of the new image, and the new image is automatically segmented.

By adopting the technical scheme, the invention has the following beneficial effects:

the invention adopts a method based on image technology, collects image sequences of at least two different spatial attitudes of an object, extracts the main body of the image, removes the background, reduces the interference of surrounding environment, improves the precision of three-dimensional reconstruction, and performs pose registration on the intermediate results of the two groups of image sequences in the reconstruction process and transforms the intermediate results into the same coordinate system. The reconstructed three-dimensional model does not need manual intervention either so as to remove the environmental part and generate complete model results.

According to the invention, the interactive segmentation operation is not required for each image by a user, only the interactive image segmentation is required for part of the images, then the image feature extraction and the image feature matching algorithm are combined, and the positive labels and the negative labels are sequentially transmitted to the image pair with the highest matching degree through the image feature matching points. This process is performed until all the images that can be matched have been image segmented, leaving the results of foreground extraction and image matching.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will briefly explain the drawings needed in the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a three-dimensional reconstruction method combined with an image main body extraction technology according to an embodiment of the present invention;

fig. 2 is a flowchart of interactive image segmentation according to an embodiment of the present invention.

Description of the embodiments

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

Referring to fig. 1 and fig. 2, the embodiment provides a specific implementation manner of a three-dimensional reconstruction method combining with an image main body extraction technology, specifically, an object a to be reconstructed is placed in a first posture, image data acquisition is performed according to a shooting requirement based on image three-dimensional reconstruction, and an image sequence L1 is obtained, wherein the bottom of the object a cannot be acquired due to shielding.

And placing the object A to be rebuilt in a second posture, so that the part incapable of being collected due to shielding is fully displayed, and collecting image data to obtain an image sequence L2.

And storing the two groups of image sequences under two file paths, and performing three-dimensional reconstruction.

Referring to fig. 1, in the reconstruction process, an image matching process, a camera parameter resolving process and a sparse point cloud generating process are sequentially performed on 2 groups of data in combination with interactive segmentation. And then registering based on the point cloud, carrying out coordinate transformation on the other group of data by taking the coordinate space of the one group of data as a reference, and unifying the coordinate spaces of the two groups of data. And after the coordinates are aligned, dense point cloud reconstruction, gridding and texture attaching processes are carried out.

The image matching process in combination with interactive image segmentation is embodied as follows. First, interactive image segmentation is performed in a partial image in which a subject appears to be significant. And marking the foreground points and the background points in a clicking mode, and segmenting according to a weight mask generated by the foreground points and the background points by adopting a deep learning method. The image segmentation network in the step can select a proper semantic segmentation network according to the requirement, such as a full convolution neural network, a semantic segmentation network combined with Self-Attention and transform, and the like, so as to generate a rough segmentation result.

And (3) taking the result generated in the step as a mask, adopting a conditional random field, combining color consistency and feature similarity, constructing an energy equation, and refining a segmentation boundary to obtain a refined segmentation result.

The feature extraction and matching are carried out on all images, and the conventional ORB, SURF, SIFT algorithm and the deep learning-based SuperPoint and SuperGlue algorithm are commonly used. And (3) dividing the partial image to obtain a corresponding mask, wherein the characteristic points falling in the foreground region correspond to positive labels, and the characteristic points falling in the background region correspond to negative labels. And traversing the residual images, dividing the image with the highest matching degree every time, and generating an initial mask by taking the characteristic points corresponding to the positive labels and the matching points corresponding to the negative labels as inputs in the characteristic matching point pairs of the divided images, so as to divide the image. And when no new matching image exists, completing the image segmentation engineering.

Taking the image which is subjected to image matching and foreground extraction as input, carrying out camera parameter calculation, and generating sparse point cloud according to the dispersed camera parameters.

When the 2 groups of acquired image sequences all generate sparse point clouds, registering the 2 groups of sparse point clouds, and solving 7 degrees of freedom including translation, selection and scale by adopting a CPD (Coherent Point Drift) algorithm. And (3) taking the data coordinate space calculated by the image sequence L1 as a reference, transforming the data calculated by the image sequence L1 into the same coordinate space, realizing coordinate alignment, combining the two groups of data, and carrying out subsequent processing, wherein the data cover the complete object to be reconstructed. And dense reconstruction, gridding and texture attachment are carried out through algorithms based on multi-view imaging such as OpenMVS, so that three-dimensional reconstruction is realized.

The three-dimensional reconstruction based on the image combines the interactive image segmentation technology; based on three-dimensional reconstruction of images, data acquisition is carried out on the object to be reconstructed in 2 different postures, and two groups of data are subjected to coordinate alignment in the reconstruction process, so that final reconstruction is completed, and a complete reconstruction result is generated.

Of course, it is also possible to place the reconstructed object a in three or more spatial poses, thereby obtaining three or more sets of image sequences, and further to process each set of image sequences.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The three-dimensional reconstruction method combined with the image main body extraction technology is characterized by comprising the following steps of:

2. The three-dimensional reconstruction method according to claim 1, wherein the interactive image segmentation in step S2 comprises: after clicking a single image, generating a rough mask by the network model according to the clicking area, extracting the area concerned by the user through an image morphology algorithm, carrying out local prediction, updating the mask, carrying out fine segmentation, taking the mask and the original image obtained in the previous step as input, adopting a conditional random field, combining color consistency and feature similarity, constructing an energy equation, and refining a segmentation boundary to obtain a segmentation result.

3. The three-dimensional reconstruction method according to claim 1, wherein the step S3 comprises: searching a new image with highest matching degree with the segmented image, wherein the image characteristic points matched with the foreground part become positive labels of the new image, the image characteristic points matched with the background part become negative labels of the new image, and the new image is automatically segmented.