CN113177999B

CN113177999B - Visual three-dimensional reconstruction method, system, electronic device and storage medium

Info

Publication number: CN113177999B
Application number: CN202110321318.XA
Authority: CN
Inventors: 王成; 丛林
Original assignee: Hangzhou Yixian Advanced Technology Co ltd
Current assignee: Hangzhou Yixian Advanced Technology Co ltd
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2022-12-16
Anticipated expiration: 2041-03-25
Also published as: CN113177999A

Abstract

The present application relates to a method, a system, an electronic device and a storage medium for visual three-dimensional reconstruction, wherein the method comprises: acquiring an image cluster topological relation graph, and determining a root node in an image cluster through a graph centrality algorithm; determining the reconstruction sequence of the image clusters by using a breadth-first search algorithm on the basis of the root nodes, and performing local optimization reconstruction on the image clusters to obtain a plurality of SFM models; calculating a plurality of SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in a topological relation graph, outputting to obtain a maximum spanning tree, performing similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to a root node coordinate system; optimizing the maximum spanning tree through a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of the optimized unified coordinate system, and performing local BA optimization on the plurality of SFM models to obtain a target SFM model. Under the condition of not needing global BA optimization, the calculation speed and the precision are improved.

Description

Visual three-dimensional reconstruction method, system, electronic device and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a method, system, electronic device, and storage medium for visual three-dimensional reconstruction.

Background

The three-dimensional reconstruction refers to a process of reconstructing three-dimensional information according to images of a single view or a plurality of views, and the main method is to calibrate a camera, calculate the relationship between an image coordinate system of the camera and a world coordinate system, and reconstruct the three-dimensional information by using information in a plurality of two-dimensional images, wherein a Structure From Motion (SFM) technology is a common method for three-dimensional reconstruction. Conventional SFM techniques are generally divided into the following steps: 1. extracting image characteristic points; 2. matching two images and geometrically verifying; 3. tracking a plurality of image characteristic points; 4. triangularization is carried out on two or more images; 5. and performing global Beam Adjustment (BA) overall optimization on the plurality of images. However, when the SFM technique is used for visual three-dimensional reconstruction, global BA optimization is very slow in the face of very large scenes, such as urban-level aerial three-dimensional reconstruction, and the computer memory cannot be loaded simultaneously due to the excessive data volume.

In the related technology, aiming at the problems that global BA optimization is slow, and the memory of a computer cannot be loaded simultaneously, few researches are carried out, and currently, in 2015 Robert Toldo and the like, an SFM model is reconstructed by adopting a hierarchical clustering and global BA optimization method, so that the tracking of characteristic points of a plurality of images can be screened, but the complexity of a hierarchical clustering algorithm is high, so that the operation speed of the computer is slow; in 2017, S.Zhu and the like define matching weights according to feature point matching numbers between images, partition division is carried out by using normalization, threshold constraints are set for the proportion of intersection union sets for overlapping intersection, enough constraints are guaranteed to be spliced between every two subgraphs, and when splicing optimization is carried out, the images are optimized by using global BA after motion averaging. However, the method has the disadvantages that the normalization algorithm is slow, matrix solving is needed, when the matrix is too large, the computer is easy to overflow, and in addition, when the number of images and the number of three-dimensional points are too large, the speed of global BA optimization is very slow; zhang Yong in 2019, etc. calculates connection scores according to the matching conditions between images, and forms a matching graph into an adjacent matrix, and then performs image matching by using a matrix bandwidth reduction algorithm. However, the matching graph is used as an adjacency matrix, the matrix is too large, the memory overflow of a computer is easy to occur, and the adjacency matrix cannot consider the weight and cannot fuse the pose prior and other information, so that the expansibility is poor. In addition, the algorithm only uses pose information for splicing, and the model precision is poor.

At present, an effective solution is not provided aiming at the problems of low optimization speed and low model precision when the SFM technology is adopted to carry out three-dimensional reconstruction on a real scene in the related technology.

Disclosure of Invention

The embodiment of the application provides a method, a system, an electronic device and a storage medium for visual three-dimensional reconstruction, and at least solves the problems of low optimization speed and low model precision when an SFM technology is adopted to carry out three-dimensional reconstruction on a real scene in the related technology.

In a first aspect, an embodiment of the present application provides a method for visual three-dimensional reconstruction, where the method includes:

acquiring a topological relation graph of an image cluster, and determining a root node in the image cluster through a graph centrality algorithm, wherein the topological relation graph comprises a connection relation between nodes and the nodes in a physical space;

determining the reconstruction sequence of the image cluster by using a breadth-first search algorithm on the basis of the root node, and performing local optimization reconstruction on the image cluster to obtain a plurality of SFM models;

calculating the multiple SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in the topological relation graph, outputting to obtain a maximum spanning tree, performing similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to a coordinate system of the root node;

optimizing the maximum spanning tree through a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of the optimized unified coordinate system, and performing local BA optimization on the SFM models to obtain a target SFM model.

In some of these embodiments, locally optimized reconstruction of the image cluster includes:

and under the condition of adding a new image cluster, fixing the pose of the existing image cluster, and fixing the 3D point which is not observed together by the existing image cluster and the new image cluster.

In some of these embodiments, performing local BA optimization on the plurality of SFM models comprises:

performing classified BA optimization on the 3D points and poses in the plurality of SFM models;

when the back projection error of the 3D point is smaller than a preset threshold value in a BA algorithm, the 3D point is fixed-pt-3D and does not participate in BA optimization, otherwise, the 3D point participates in BA optimization;

when the percentage of fixed-pt-3d in the SFM models is larger than a preset threshold, the pose of the SFM models is fixed-position and does not participate in BA optimization, otherwise, the pose of the SFM models participates in BA optimization.

In some of these embodiments, the SFM model includes a 3D point cloud, a camera cluster model, an active pixel point, and a correspondence between the 3D point and the active pixel point, wherein the camera cluster model includes a plurality of camera models and a relative pose relationship between the camera models.

In some embodiments, the calculating the plurality of SFM models through a PnP stochastic consistency similarity transformation algorithm based on a camera cluster to obtain the connection relationship in the topological relation diagram includes:

registering a first camera group model of N images in a first model in a first coordinate system to a second model to obtain a second camera group model of M images which are successfully registered in a second coordinate system, wherein the first model and the second model are any two SFM models in the multiple SFM models, the first model is the SFM model in the first coordinate system, the second model is the SFM model in the second coordinate system, N and M are natural numbers, and N is more than or equal to M;

under the condition that M is larger than 3, based on a RANSAC algorithm, obtaining relative pose similar transformation between the first model and the second model according to a first position of the M images in the first coordinate system and a second position of the M images in the second coordinate system;

and calculating to obtain edges in the topological relation graph according to the relative pose similarity transformation between any two SFM models in the plurality of SFM models, wherein the weight of the edges in the topological relation graph is the number of interior point images corresponding to the relative pose similarity transformation.

In some embodiments, the obtaining, according to a first position of the M images in the first coordinate system and a second position of the M images in the second coordinate system, a relative pose similarity transformation between the first model and the second model includes:

calculating similar transformation according to the first positions of the M images in the first coordinate system and the second positions of the M images in the second coordinate system, and transforming the 3D point cloud observed by the M images in the first model by using the similar transformation respectively to obtain the 3D point cloud of the 3D point cloud in the first model in the second coordinate system;

projectively transforming the 3D point cloud into a second camera group model of the M images in a second coordinate system respectively for verification to obtain an interior point image of which the verification result accords with a preset pixel error threshold;

and under the condition that the number of the interior point images is the same, selecting the similar transformation with the minimum average pixel error as the relative pose similar transformation, otherwise, selecting the similar transformation with the maximum number of the interior point images as the relative pose similar transformation.

In some embodiments, the optimizing the maximum spanning tree by a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of a unified coordinate system includes:

and performing similar transformation on the plurality of SFM models on other nodes by taking the coordinate system of the root node as a reference coordinate system, unifying the coordinate systems of the plurality of SFM models, and obtaining the poses of the plurality of SFM models.

In a second aspect, an embodiment of the present application provides a system for visual three-dimensional reconstruction, the system including:

the acquisition module is used for acquiring a topological relation graph of an image cluster and determining a root node in the image cluster through a graph centrality algorithm, wherein the topological relation graph comprises nodes and the connection relation of the nodes on a physical space;

a calculation module for determining the reconstruction sequence of the image cluster by a breadth-first search algorithm based on the root node, and performing local optimization reconstruction on the image cluster to obtain a plurality of SFM models,

calculating the plurality of SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in the topological relation graph, outputting to obtain a maximum spanning tree, performing similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to the root node coordinate system;

and the optimization module is used for optimizing the maximum spanning tree through a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of the optimized unified coordinate system, and carrying out local BA optimization on the plurality of SFM models to obtain a target SFM model.

In a third aspect, an embodiment of the present application provides an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method of visual three-dimensional reconstruction described in any one of the above.

In a fourth aspect, the present application provides a storage medium having a computer program stored therein, wherein the computer program is configured to execute the method of visual three-dimensional reconstruction described in any one of the above when the computer program runs.

Compared with the related art, the method for visual three-dimensional reconstruction provided by the embodiment of the application obtains the topological relation graph of the image cluster, and determines the root node in the image cluster through the graph centrality algorithm, wherein the topological relation graph comprises the connection relation of the nodes and the nodes in the physical space; determining the reconstruction sequence of the image cluster by using a breadth-first search algorithm based on the root node, and performing local optimization reconstruction on the image cluster to obtain a plurality of SFM models; calculating a plurality of SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in a topological relation graph, outputting to obtain a maximum spanning tree, performing similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to a root node coordinate system; the maximum spanning tree is optimized through a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of an optimized unified coordinate system, and local BA optimization is performed on the plurality of SFM models to obtain a target SFM model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a method of visual three-dimensional reconstruction according to an embodiment of the present application;

FIG. 2 is a schematic diagram of image cluster topology according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a breadth first search tree according to an embodiment of the present application;

FIG. 4 is a schematic diagram of locally optimized reconstruction according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an SFM model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a PnP random consistency similarity transformation based on a camera group according to an embodiment of the present application;

FIG. 7 is a schematic diagram of single image registration according to an embodiment of the present application;

FIG. 8 is a schematic diagram of pose graph optimization according to an embodiment of the present application;

FIG. 9 is a block diagram of a visual three-dimensional reconstruction system according to an embodiment of the present application;

fig. 10 is an internal structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that such a development effort might be complex and tedious, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure, given the benefit of this disclosure, without departing from the scope of this disclosure.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by one of ordinary skill in the art that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of describing the invention (including a single reference) are to be construed in a non-limiting sense as indicating either the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The present embodiment provides a method for visual three-dimensional reconstruction, and fig. 1 is a flowchart of a visual three-dimensional reconstruction method according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:

step S101, acquiring a topological relation graph of an image cluster, and determining a root node in the image cluster through a graph centrality algorithm, wherein the topological relation graph comprises the connection relation of the nodes and the nodes in a physical space;

FIG. 2 is a schematic diagram of a topological relation of image clusters according to an embodiment of the present application, and as shown in FIG. 2, each node represents an image cluster, i.e., a plurality of images, for example, route-X is an image cluster named X; the connection relation of the nodes in the physical space represents that the image clusters where the nodes are located are adjacent in the physical space, and a matching relation between the images can be established, for example, if Route-X is connected with Route-a, the two image clusters are adjacent in the physical space. In addition, the present application assumes that the matching relationship between all images is known, i.e. when the images are manually captured, the topological relationship between the images has already been defined. The number of image clusters and the edges between the image clusters in fig. 2 are only schematic and do not limit the present application.

Step S102, determining a reconstruction sequence of an image cluster through a breadth-first search algorithm on the basis of a root node, and performing local optimization reconstruction on the image cluster to obtain a plurality of SFM models; optionally, the order of reconstructing the image clusters may also be obtained by other algorithms of traversing the graph, which is not limited herein. Fig. 3 is a schematic diagram of a breadth-first search tree according to an embodiment of the present application, where the breadth-first search tree is a subset of a topological relation graph, as shown in fig. 3. The edges between the image clusters in fig. 3 are only schematic and do not limit the present application.

Preferably, in this embodiment, based on the root node in step S101, a reconstruction order of an image cluster is determined through a breadth-first search algorithm, and local optimized reconstruction is performed on the image cluster, and in the case that each new image cluster is added, a pose of an existing image cluster and a 3D point where the existing image cluster and the new image cluster do not have common observation are fixed, fig. 4 is a schematic diagram of local optimized reconstruction according to an embodiment of the present application, as shown in fig. 4, in the case that a new white image cluster is added, a pose of an existing image cluster in a dotted rectangle 1 and a 3D point where the existing image cluster and the newly added white image cluster do not have common observation are fixed; similarly, subsequently, under the condition of adding a new gray image cluster, fixing the pose and the 3D point in the dotted line rectangle 2 to obtain a plurality of SFM models. Through the local optimization reconstruction, the variables needing to be optimized are changed into the pose of a new image cluster and 3D points without common observation, the variables to be optimized are greatly reduced, and the optimization speed can be improved.

It should be noted that the obtained SFM model may be an image collection, that is, an image cluster, and is reconstructed according to a conventional SFM algorithm. Fig. 5 is a schematic diagram of an SFM model according to an embodiment of the present application, and as shown in fig. 5, the SFM model includes a three-dimensional point cloud, a camera group model, an effective pixel point, and a corresponding relationship between the three-dimensional point and the effective pixel point, where the camera group refers to a plurality of cameras bound on the same rigid body, and the camera group model includes a plurality of camera models and a relative pose relationship between the camera models. It should be noted that the rigid body is only a concept, and it means that the phase pose relationship between the cameras is a rigid body transformation, and the transformation does not change with time.

Generally, the camera model includes camera intrinsic parameters, such as the intrinsic parameters fx, fy, cx, and cy of the simplest pinhole camera. Preferably, the camera model in this embodiment includes not only camera internal parameters, but also camera external parameters, that is, the pose of the camera in the world coordinate system, and the pixel feature points extracted from the camera image and the feature descriptors of the pixel feature points.

Step S103, calculating a plurality of SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in a topological relation graph, outputting to obtain a maximum spanning tree, performing similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to a root node coordinate system, wherein the maximum spanning tree is a subset of the topological relation graph;

preferably, in this embodiment, the coordinate system where the root node in the maximum spanning tree is located is a reference coordinate system, the multiple SFM models on other nodes other than the root node are subjected to similar transformation, the coordinate systems of the multiple SFM models are unified, and the poses of the multiple SFM models are obtained. In the image imaging, the pose itself is a rigid body transformation and is 6 degrees of freedom, and the similarity transformation (similarity transformation or sim 3) is one more scale on the basis of the rigid body transformation, so that the similarity transformation is 7 degrees of freedom.

Preferably, the connection relationship in the topological relation diagram is calculated by the following algorithm of random consistency similarity transformation (N-Cameras-PnP-RANSAC-sim 3) of PnP based on camera cluster, and it should be noted that the connection relationship here is an edge. Fig. 6 is a schematic diagram of PnP random consistency similarity transformation based on a camera cluster according to an embodiment of the present application, where PnP (Perspective-n-Point) refers to n three-dimensional points in a known coordinate system X and their projections (pixel points) on an image, and a pose of a camera model corresponding to the image in the coordinate system X with 6 degrees of freedom is solved. The method specifically comprises the following steps:

s1: and registering the first camera group model of the N images in the first model in the first coordinate system to a second model to obtain a second camera group model of the M images which are successfully registered in the second coordinate system, wherein the first model and the second model are any two SFM models in a plurality of SFM models, the first model is the SFM model in the first coordinate system, the second model is the SFM model in the second coordinate system, N and M are natural numbers, and N is more than or equal to M. For convenience of description, assume that a first coordinate system is a coordinate system X, a second coordinate system is a coordinate system Y, the first Model is Model _ in _ X, the second Model is Model _ in _ Y, a first camera cluster Model of N images in the Model _ in _ X in the coordinate system X is denoted as N-Cameras _ in _ X, the N-Cameras _ in _ X is registered into the Model _ in _ Y, the number of successfully registered images is M, and a second camera cluster Model of M images in the coordinate system Y is obtained and denoted as M-Cameras _ in _ Y.

Fig. 7 is a schematic diagram of single-image registration according to an embodiment of the present application, as shown in fig. 7. Image registration (image registration) refers to an SFM Model (Model _ in _ X for short) under a known coordinate system X, given feature points and feature descriptors extracted from images a and a, obtaining a corresponding relation between three-dimensional points and pixel points according to matching of the feature descriptors in the Model _ in _ X, and obtaining the pose of the image a under the coordinate system X by using a Random Sample Consensus (RASANC) algorithm.

S2: and under the condition that M is larger than 3, based on the RANSAC algorithm, obtaining the relative pose similar transformation between the first model and the second model according to the first position of the M images in the first coordinate system and the second position of the M images in the second coordinate system.

The RANSAC algorithm is an algorithm for obtaining effective sample data by calculating mathematical model parameters of data according to a group of sample data sets containing abnormal data. Preferably, a RANSAC algorithm is used for calculating a similarity transformation according to a first position of the M images in the first coordinate system and a second position of the M images in the second coordinate system, and respectively transforming 3D point clouds observed by the M images in the first model by using the similarity transformation to obtain the 3D point cloud of the 3D point cloud in the first model in the second coordinate system. And projectively transforming the 3D point cloud into a second camera group model of M images in a second coordinate system for verification to obtain an interior point image of which the verification result accords with a preset pixel error threshold. And under the condition that the number of the interior point images is the same, selecting the similar transformation with the minimum average pixel error as the relative pose similar transformation, otherwise, selecting the similar transformation with the maximum number of the interior point images as the relative pose similar transformation.

Schematically, sim3 transformation is calculated according to the Poses M-Poses _ in _ X of M images in a coordinate system X and the Poses M-Poses _ in _ Y of M spread images in the coordinate system Y, then Model _ in _ X is transformed by sim3 to obtain a parameter Model-X _ in _ Y of a Model under the coordinate system X in the coordinate system Y, then three-dimensional point clouds Points3d-X _ in _ Y in the parameter Model are projectively transformed to a camera group Model M-Caamera _ in _ Y of the M images in the coordinate system Y for verification, and the sim3 transformation which meets a pixel error threshold value and has the largest number (namely the relative pose similarity transformation between Model _ in _ X and Model _ in _ Y) is selected as an output (namely the relative pose similarity transformation between Model _ in _ X and Model _ in _ Y), and if the number of the interior point images is the same, the sim3 transformation with the smallest average pixel error is selected as the output.

S3: and performing similarity transformation calculation according to the relative pose between any two SFM models in the plurality of SFM models to obtain the edges of the topological relation graph, wherein the weight of the edges in the topological relation graph is the number of interior point images corresponding to the similarity transformation of the relative pose.

Through the steps S1 to S3, the accuracy can be improved as much as possible under the condition of ensuring the model merging speed.

And S104, optimizing the maximum spanning tree through a pose graph optimization (sim 3-dose-graph) algorithm of similarity transformation to obtain a plurality of SFM model poses of the optimized unified coordinate system, and performing local BA optimization on the plurality of SFM models to obtain a target SFM model.

Fig. 8 is a schematic diagram of pose graph optimization according to an embodiment of the present application, as shown in fig. 8, in this embodiment, on the basis of a maximum spanning tree, by additionally using other information in a topological relation graph except for side information included in the maximum spanning tree, an error between a virtual frame and a dotted line, that is, relative pose similarity transformation information between SFM models in the topological relation graph is corrected, as shown in fig. 8, the maximum spanning tree is optimized, and a model pose with higher accuracy can be obtained. The preset optimization function is shown in equation 1:

wherein N represents the number of SFM models in the topological relation graph, pose ₁ …Pose _N And the pose of 7 degrees of freedom of the coordinate system of the N SFM models in the root node is represented, and the pose is a variable to be solved. Constraint to M edges: this sim3 transformation, the constraint thereof, pose, between two SFM models with sim3 transformations _x _to _ymeasured That is, one of the M edges, representing the relative Pose similarity transformation, pose, between Model _ in _ x and Model _ in _ y _y _to _x In the optimization process, the relative pose similarity transformation between Model _ in _ y and Model _ in _ x is performed, model _ in _ x represents an SFM Model under a coordinate system x, model _ in _ y represents an SFM Model under a coordinate system y, and the relative pose similarity transformation is 7 degrees of freedom. Omega _xy The information matrix is an information matrix of relative pose similarity transformation and can be generally set as an identity matrix.

The pose accuracy of the model can be further improved through calculation of a pose graph optimization algorithm of similarity transformation.

And finally, performing local BA optimization on the poses of the obtained SFM models, and performing classified BA optimization on the 3D points and the poses in the obtained SFM models. The specific classification is as follows:

(1) Classifying the 3D points: when the back projection error of the 3D point is smaller than a preset threshold value in the BA algorithm, the 3D point is fixed-pt-3D and does not participate in BA optimization; otherwise, if the value is larger than the threshold value, the value is opt-pt-3d, and the point needs to participate in BA optimization;

(2) And (4) classifying the alignment postures: when the percentage of fixed-pt-3D in the observable 3D points in the multiple SFM models is greater than a preset threshold, for example, the threshold is set to 80%, the pose of the SFM model is fixed-pos and does not participate in BA optimization, and otherwise, opt-pos and needs to participate in BA optimization.

And performing BA optimization on the opt-pt-3d and opt-position obtained by the classification to obtain a final target SFM model. By classifying the variables (3D points and poses) and optimizing the local BA, compared with the global BA variable optimization, the method greatly reduces the quantity to be optimized and effectively improves the calculation speed and the model precision.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The present embodiment further provides a system for visual three-dimensional reconstruction, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.

Fig. 9 is a block diagram of a visual three-dimensional reconstruction system according to an embodiment of the present application, and as shown in fig. 9, the system includes an acquisition module 91, a calculation module 92, and an optimization module 93:

the acquiring module 91 is configured to acquire a topological relation graph of the image cluster, and determine a root node in the image cluster through a graph centrality algorithm, where the topological relation graph includes connection relationships between nodes and nodes in a physical space; the calculation module 92 is configured to determine a reconstruction sequence of an image cluster through a breadth-first search algorithm based on a root node, perform local optimization reconstruction on the image cluster to obtain a plurality of SFM models, then calculate the plurality of SFM models through a PnP random consistency similarity transformation algorithm based on a camera cluster to obtain a connection relationship in a topological relation graph, output a maximum spanning tree, perform similarity transformation on the maximum spanning tree, and unify a coordinate system of a non-root node to a root node coordinate system; and the judging module 33 is configured to optimize the maximum spanning tree through a pose graph optimization algorithm of similarity transformation to obtain multiple SFM model poses of the optimized unified coordinate system, and perform local BA optimization on the multiple SFM models to obtain a target SFM model.

Through the system, the acquisition module 91 acquires the topological relation graph of the image cluster, and determines a root node in the image cluster through a graph centrality algorithm; the calculation module 92 greatly reduces the variables to be optimized through local optimization reconstruction, improves the optimization speed, and further improves the accuracy of the model pose by using the calculation of the pose graph optimization algorithm of the similarity transformation; the optimization module 93 performs classified local BA optimization on variables, namely 3D points and poses, so that the amount to be optimized is greatly reduced, and the calculation speed and model accuracy are effectively improved, compared with global BA variable optimization. The whole system solves the problems of low optimization speed and low model precision when the SFM technology is adopted to carry out three-dimensional reconstruction on a real scene, and improves the calculation speed and precision under the condition of not needing global BA optimization.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In addition, in combination with the method for visual three-dimensional reconstruction in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the above-described embodiments of the method of visual three-dimensional reconstruction.

In one embodiment, fig. 10 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 10, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 10. The electronic device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the electronic device is used for storing data. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of visual three-dimensional reconstruction.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the present solution and does not constitute a limitation on the electronic devices to which the present solution applies, and that a particular electronic device may include more or less components than those shown, or combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various technical features of the above-described embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above-described embodiments are not described, however, so long as there is no contradiction between the combinations of the technical features, they should be considered as being within the scope of the present description.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of visual three-dimensional reconstruction, the method comprising:

acquiring a topological relation graph of an image cluster, and determining a root node in the image cluster through a graph centrality algorithm, wherein the topological relation graph comprises a connection relation between nodes and the nodes on a physical space;

determining a reconstruction sequence of the image cluster by using a breadth-first search algorithm on the basis of the root node, and performing local optimization reconstruction on the image cluster to obtain a plurality of SFM models, wherein the SFM models comprise 3D point clouds, camera cluster models, effective pixel points and corresponding relations between the 3D points and the effective pixel points, and the camera cluster models comprise a plurality of camera models and relative pose relations between the camera models;

calculating the multiple SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in the topological relation diagram, wherein the method specifically comprises the following steps: registering a first camera group model of N images in a first model in a first coordinate system to a second model to obtain a second camera group model of M images which are successfully registered in a second coordinate system, wherein the first model and the second model are any two SFM models in the multiple SFM models, the first model is an SFM model in the first coordinate system, the second model is an SFM model in the second coordinate system, N and M are natural numbers, and N is more than or equal to M; under the condition that M is larger than 3, based on a RANSAC algorithm, obtaining relative pose similar transformation between the first model and the second model according to a first position of the M images in the first coordinate system and a second position of the M images in the second coordinate system; calculating to obtain edges in the topological relation graph according to relative pose similarity transformation between any two SFM models in the plurality of SFM models, wherein the weight of the edges in the topological relation graph is the number of interior point images corresponding to the relative pose similarity transformation; thereby outputting a maximum spanning tree, performing similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to the coordinate system of the root node;

optimizing the maximum spanning tree through a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of the optimized unified coordinate system, and performing local BA optimization on the plurality of SFM models to obtain a target SFM model.

2. The method of claim 1, wherein locally optimizing reconstruction of the image cluster comprises:

3. The method of claim 1, wherein performing the local BA optimization on the plurality of SFM models comprises:

performing classified BA optimization on the 3D points and the poses in the plurality of SFM models;

when the back projection error of the 3D point is smaller than a preset threshold value in the BA algorithm, the 3D point is fixed-pt-3D and does not participate in BA optimization, otherwise, the 3D point participates in BA optimization;

4. The method according to claim 1, wherein the obtaining the relative pose similarity transformation between the first model and the second model according to the first position of the M images in the first coordinate system and the second position of the M images in the second coordinate system comprises:

calculating similarity transformation according to a first position of the M images in the first coordinate system and a second position of the M images in the second coordinate system, and transforming 3D point clouds observed by the M images in the first model by using the similarity transformation respectively to obtain the 3D point clouds of the first model in the second coordinate system;

5. The method of claim 1, wherein the optimizing the maximum spanning tree by a pose graph optimization algorithm of similarity transformation to obtain a plurality of SFM model poses of a unified coordinate system comprises:

and performing similar transformation on a plurality of SFM models on other nodes by using the coordinate system of the root node as a reference coordinate system, unifying the coordinate systems of the plurality of SFM models, and obtaining the poses of the plurality of SFM models.

6. A system for visual three-dimensional reconstruction, the system comprising:

a calculation module, configured to determine a reconstruction sequence of the image cluster through a breadth-first search algorithm based on the root node, and perform local optimization reconstruction on the image cluster to obtain multiple SFM models, where the SFM models include a 3D point cloud, a camera group model, an effective pixel point, and a corresponding relationship between the 3D point and the effective pixel point, where the camera group model includes multiple camera models and a relative pose relationship between the camera models,

calculating the plurality of SFM models through a PnP random consistency similarity transformation algorithm based on a camera group to obtain a connection relation in the topological relation graph, wherein the method specifically comprises the following steps: registering a first camera group model of N images in a first model in a first coordinate system to a second model to obtain a second camera group model of M images which are successfully registered in a second coordinate system, wherein the first model and the second model are any two SFM models in the multiple SFM models, the first model is the SFM model in the first coordinate system, the second model is the SFM model in the second coordinate system, N and M are natural numbers, and N is more than or equal to M; under the condition that M is larger than 3, based on a RANSAC algorithm, obtaining relative pose similar transformation between the first model and the second model according to a first position of the M images in the first coordinate system and a second position of the M images in the second coordinate system; calculating to obtain edges in the topological relation graph according to relative pose similar transformation between any two SFM models in the plurality of SFM models, wherein the weight of the edges in the topological relation graph is the number of interior point images corresponding to the relative pose similar transformation; thereby outputting and obtaining a maximum spanning tree, carrying out similarity transformation on the maximum spanning tree, and unifying a coordinate system of a non-root node to the root node coordinate system;

7. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method of visual three-dimensional reconstruction according to any one of claims 1 to 5.

8. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of visual three-dimensional reconstruction according to any one of claims 1 to 5 when executed.