CN113436230B

CN113436230B - Incremental translational averaging method, system and equipment

Info

Publication number: CN113436230B
Application number: CN202110992939.0A
Authority: CN
Inventors: 高翔; 李梦晗; 马孝冬; 解则晓
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2021-11-19
Anticipated expiration: 2041-08-27
Also published as: CN113436230A

Abstract

The invention belongs to the technical field of motion recovery structures in three-dimensional reconstruction, and particularly relates to an incremental translational averaging method, system and device, aiming at solving the problems of high complexity, low accuracy and poor robustness of the existing translational averaging method. The method includes constructing an epi-polar geometry; constructing a camera quadruple set, and selecting an initial camera quadruple as an initial seed view based on local optimization; constructing a third vertex set; forming a camera triple, and determining the increment sequence of the vertex by adopting a next optimal view selection strategy based on a weighted support set; performing weighted local optimization/weighted global optimization on the current estimated absolute position; and performing double translation averaging on the vertexes of all the estimated absolute positions after the global optimization. The invention reduces the complexity of the translational averaging method and improves the accuracy and robustness of absolute position estimation.

Description

Incremental translational averaging method, system and equipment

Technical Field

The invention belongs to the technical field of structure recovery from motion in three-dimensional reconstruction, and particularly relates to an incremental translational averaging method, system and device.

Background

The motion recovery structure (structure from motion) is a key step in large-scale scene three-dimensional reconstruction based on images, and the development is rapid in recent years, the input of the motion recovery structure is image feature matching, and the output of the motion recovery structure is absolute pose of a camera and a scene structure. According to different initialization modes of camera poses, the motion recovery structure can be roughly divided into an incremental type and a global type. The incremental method completes initialization of the camera pose and the scene structure through iterative camera pose estimation and scene structure expansion, and in the iterative process, in order to cope with inevitable feature matching outliers, the method also introduces a random sample consensus (random sample consensus) algorithm and a bundle adjustment (bundle adjustment) technology. Unlike the incremental method, the global slave motion recovery structure mainly uses a motion averaging technique to complete initialization of the camera pose, and generally includes two steps of rotation averaging (rotation averaging) and translation averaging (translation averaging). Compared with a global structure recovery from motion, the incremental method calls a model estimation algorithm based on random sampling consistency and a parameter optimization technology based on binding adjustment more frequently, so that the result is more accurate and robust.

Translational averaging refers to estimating the absolute position of a camera given a relative translational measurement. The relative translation measurements are typically obtained by estimation and decomposition of an essential matrix (intrinsic matrix). Compared with the rotational averaging, the translational averaging is more difficult for three reasons: 1) the essential matrix only contains the direction information of relative translation, and the problem of scale uncertainty of the relative translation obtained by decomposing the essential matrix is solved; 2) the accuracy of the relative translation found from the essential matrix is more susceptible to false feature matching than relative rotation; 3) only cameras in the same parallel rigid component can be uniquely estimated by means of translational averaging, while rotational averaging requires that all cameras are in the same connected component. Currently, although the translational averaging problem has been widely studied, it is far from being solved and has been a hot topic compared to rotational averaging.

The existing translational averaging method mainly focuses on the following three aspects: 1) designing a proper cost function form and an optimization scheme; 2) studying a filtering/optimizing strategy for epi-polar geometry; 3) and introducing auxiliary information such as feature tracks, camera triples or rank constraints and the like. Although the above methods have achieved good results, they are more complex and less efficient due to excessive dependence on complex objective function forms and optimizations, elaborate initialization operations, or other additional information, and moreover, accuracy and robustness remain key challenges they face. The invention provides an incremental translational averaging method, which is inspired by an incremental structure recovery method from motion.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problems of high complexity, low accuracy and poor robustness of the conventional translational averaging method, a first aspect of the present invention provides an incremental translational averaging method applied to global solution of an absolute position of a camera in a motion recovery structure, the method comprising:

step S100, obtaining a plurality of frames of images, carrying out feature matching between every two images, and constructing an epi-polar geometric figure according to the epi-polar geometric relationship

And then calculating relative rotation and relative translation between the matched image pairs; wherein the content of the first and second substances,

a set of vertices, representing a set of cameras that capture images of a scene,

the set of edges represents the set of epipolar geometry edges between two cameras which shoot different images and contains the motion information between the cameras;

step S200, selecting the front with the maximum feature matching quantity in the epi-polar geometric figure

The camera quadruple set comprises camera quadruples formed by edges, a quadruple set is constructed, and the absolute position of each camera in each camera quadruple of the quadruple set under a local coordinate system is calculated; calculating the selection cost of each camera quadruple in the quadruple set by combining the absolute position of each camera, and taking the view corresponding to the camera quadruple with the maximum selection cost as an initial seed view;

step S300, constructing a vertex set with an estimated absolute position based on the vertex corresponding to the initial seed view, taking the vertex set with the estimated absolute position in the epi-polar geometric figure as a first vertex set, and taking the vertex set with the estimated absolute position in the epi-polar geometric figure as a second vertex set; selecting the front vertex with the maximum number of connecting edges with all the vertexes in the first vertex set in the second vertex set

Each vertex is used for constructing a third vertex set;

step S400, forming a camera triple by each vertex in the third vertex set and the vertex in the first vertex set, and calculating the absolute position of each vertex in the third vertex set by a linear trifocal tensor solution; calculating the selection cost of each vertex according to the obtained absolute position, and taking the view corresponding to the vertex with the maximum selection cost as the next optimal view;

step S500, fixing the estimated absolute position of the camera in the first vertex set, and only performing weighted local optimization on the absolute position of the vertex corresponding to the newly estimated next optimal view; after the weighted local optimization is completed, judging the growth ratio of the number of vertexes of the current estimated absolute position, and if the ratio is greater than a set threshold, performing weighted global optimization on all the current estimated absolute positions;

the weighted local optimization is: calculating the relative position error corresponding to each side as a first error by combining the absolute position in the first vertex set and the relative position measured between two vertexes connected with each side in the first edge set for the absolute position of the vertex corresponding to the selected next optimal view; if the first error is less than the set error threshold, the corresponding edge is taken as the inner value edge, and the inner value edge is further based on

The norm carries out weighted local optimization on the absolute position of the vertex corresponding to the selected next optimal view; the first edge set is a set of epipolar geometric edges between the first vertex set and the vertex corresponding to the selected next optimal view;

the weighted global optimization is as follows: calculating relative position errors corresponding to geometric edges of the outer poles by combining the absolute positions of all the vertexes with the relative positions obtained by measuring every two vertexes as second errors; if the second error is less than the set error threshold, the corresponding edge is taken as the inner value edge, and the inner value edge is further based on

Performing weighted global optimization on the absolute positions of the vertexes of all the estimated absolute positions by using the norm;

after the weighted global optimization is completed, further carrying out retranslation averaging on all vertexes of the estimated absolute position by the weighted global optimization method;

and step S600, after the estimation of all the absolute positions is finished, outputting the absolute positions obtained by performing weighted global optimization and retranslation averaging on all the estimable vertexes as the final estimation result of the absolute position of each camera.

In some preferred embodiments, the absolute position of each camera in the camera quadruples of the quadruple set in the local coordinate system is calculated by:

wherein the content of the first and second substances,

respectively representing the camera in a local coordinate system

The absolute position after the optimization is carried out,

the distance of the square chord is represented,

for the representation of the relative translation transformation into the global coordinate system,

to represent

Any one of the edges of the strip is,

any one camera quad representing a set of quads,

、

presentation camera

The initial absolute position of (a).

In some preferred embodiments, the selecting cost of each camera quadruple in the quadruple set is calculated by:

wherein the content of the first and second substances,

represents the cost of the selection of the camera quadruple,

representing the weight corresponding to each epipolar geometry edge in the camera quadruple.

In some preferred embodiments, the cost of selecting each vertex is calculated according to the absolute position obtained by:

wherein the content of the first and second substances,

representing the cost of selection of each vertex in the next best view selection,

is composed of

One of the edges of the one of the two,

for the vertex in the third vertex set

And a set of epipolar geometry edges between vertices in the first set of vertices,

representing edges

Corresponding to the relative position between the two vertexes,

representing vertices in a first set of vertices

The current estimate of the absolute position is,

representing the weight corresponding to each epipolar geometry edge.

In some preferred embodiments, the method for obtaining the inner value edge in the weighted local optimization is as follows:

wherein the content of the first and second substances,

representing a first set of vertices

Vertex corresponding to the selected next optimal view

The set of connecting edges between them,

is composed of

Any one of the edges of the strip is,

to represent

The set of inner-value edges in (b),

representing edges

The relative position between the two cameras connected,

representing vertices

The current estimate of the absolute position is,

to represent

The absolute position of the initialization is set to be,

representing the relative position error corresponding to each epipolar geometry,

indicating a set error threshold.

In some preferred embodiments, based on

The norm carries out weighted local optimization on the absolute position of the vertex corresponding to the selected next optimal view, and the method comprises the following steps:

wherein the content of the first and second substances,

indicating absolute position

The result of the local optimization is weighted and,

to represent

Any one of the edges of the strip is,

representing edges

Relative position between the two connected cameras.

In some preferred embodiments, the method for obtaining the inner value edge in the weighted global optimization is as follows:

wherein the content of the first and second substances,

representing the set of inner-valued edges at the time of weighted global optimization,

the set of edges between vertices representing all estimated absolute positions,

to represent

Any one of the edges of the strip is,

representing edges

The relative position between the two cameras connected,

representing vertices

A current absolute position estimate.

In some preferred embodiments, based on

The norm carries out weighted global optimization on the absolute positions of all the vertexes with the estimated absolute positions, and the method comprises the following steps:

wherein the content of the first and second substances,

representing a set of absolute positions

The result of the weighted global optimization is performed,

is composed of

Any one of the edges of the strip is,

representing edges

Relative position between the two connected cameras.

In a second aspect of the present invention, an incremental translational averaging system is provided, the system including: the system comprises an epi-polar geometry diagram construction module, an initial seed view selection module, a set construction module, a next optimal view selection module, an optimization module and an absolute position estimation output module;

the external pole geometric figure construction module is configured to acquire a plurality of frames of images, perform feature matching between every two images, and construct an external pole geometric figure according to the external pole geometric relation

the initial seed view selecting module is configured to select the front with the largest number of feature matches in the epipolar geometry

the set building module is configured to build a vertex set with an estimated absolute position based on a vertex corresponding to the initial seed view, the vertex set is used as a first vertex set, and a vertex set with an unexstimated absolute position in the epipolar geometry map is used as a second vertex set; selecting the front vertex with the maximum number of connecting edges with all the vertexes in the first vertex set in the second vertex set

Each vertex is used for constructing a third vertex set;

the next optimal view selecting module is configured to combine each vertex in the third vertex set and a vertex in the first vertex set into a camera triple, and calculate an absolute position of each vertex in the third vertex set by a linear trifocal tensor solution; calculating the selection cost of each vertex according to the obtained absolute position, and taking the view corresponding to the vertex with the maximum selection cost as the next optimal view;

the optimization module is configured to fix the estimated absolute position of the camera in the first vertex set, and perform weighted local optimization on the absolute position of the vertex corresponding to the next most optimal view which is estimated most recently; after the weighted local optimization is completed, judging the growth ratio of the number of vertexes of the current estimated absolute position, and if the ratio is greater than a set threshold, performing weighted global optimization on all the current estimated absolute positions;

and the absolute position estimation output module is configured to output the absolute positions obtained by performing weighted global optimization and retranslation averaging on all the estimable vertexes after finishing estimation of all the absolute positions, and the absolute positions are used as final estimation results of the absolute positions of all the cameras.

In a third aspect of the invention, an electronic device is proposed, at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the incremental translational averaging method as recited in the claims.

The invention has the beneficial effects that:

the invention reduces the complexity of the translational averaging method and improves the accuracy and robustness of absolute position estimation.

1) The method adopts an initial four-tuple selection strategy based on local optimization to realize the selection and construction of the seed view; determining the increment sequence of the vertex by adopting a next optimal view selection strategy based on the weighted support set; and performing weighted local or global optimization after the selection and initialization of the next optimal view, and performing one-step re-translation averaging operation after weighted global optimization to enable the estimation of the absolute position to be more accurate and robust, so that the result of the estimation of the absolute position is improved, and the accuracy and the robustness of the estimation of the absolute position are improved.

2) Due to the effectiveness of the incremental parameter estimation method, the translational averaging method provided by the invention is less dependent on the common robust operation in other methods, and a simpler and more efficient way is provided for the implementation of translational averaging.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart of an incremental translational averaging method according to an embodiment of the present invention;

FIG. 2 is a block diagram of an incremental translational averaging system in accordance with an embodiment of the present invention;

FIG. 3 is a detailed flowchart of an incremental translational averaging method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention discloses an incremental translational averaging method, which is applied to solving the absolute position of a camera in a global motion recovery structure, and comprises the following steps:

Each vertex is used for constructing a third vertex set;

the weighted local optimization is: combining the absolute positions of the vertices corresponding to the selected next optimal view with the absolute positions of the vertices in the first setThe absolute position and the relative position obtained by measuring between two vertexes connected with each side in the first side set are used for calculating the relative position error corresponding to each side as a first error; if the first error is less than the set error threshold, the corresponding edge is taken as the inner value edge, and the inner value edge is further based on

For a clearer explanation of the incremental translational averaging method of the present invention, the following will discuss the steps in one embodiment of the method of the present invention with reference to fig. 1 and 3.

in the present embodiment, the epi-polar geometry used for translational averaging is denoted as

Wherein

And

respectively representing the vertex set and the edge set of the epipolar geometry. The input to the invention is the relative translation between the matched image pairs

It is transformed into the global coordinate system by known absolute rotations, denoted

Wherein, in the step (A),

to represent

Any one of the edges of the strip is,

is relatively translated

By the formula

A transformation to a representation in a global coordinate system, wherein,

is a vertex

Absolute rotation in a global coordinate system. The output of the invention is the absolute position of each camera after optimization, and is recorded as

Wherein, in the step (A),

to represent

Any one of the vertices in (a) is,

to represent

The absolute position of the corresponding camera after optimization. FIG. 1 is a flow chart of the method of the present invention, which mainly includes the following parts: 1) selecting and constructing a seed view by adopting an initial quadruple selection strategy based on local optimization; 2) determining the increment sequence of the vertex by adopting a next optimal view selection strategy based on the weighted support set; 3) performing weighted local or global optimization after the selection and initialization of the next optimal view; 4) in order to make the estimation of the absolute position more accurate and robust, a one-step retranslation averaging operation is performed after the weighted global optimization. Specifically, the following steps are described.

the selection of the initial seed view is both a key step in the incremental recovery of the structure from motion and a key step in the method of the present invention. The most intuitive way to process is to choose the camera pair or camera triplet with the smallest rotational cycle bias as the initial seed view according to:

（1）

（2）

wherein the content of the first and second substances,

is that

The angular distance of (a) above (b),

to represent

A set of camera triplets consisting of all epipolar geometry edges.

However, on the same side, with high precision

Cannot ensure high precision

Therefore, the choice of the initial seed view in translational averaging should depend on its own cyclic offset of position rather than the cyclic offset of rotation. In addition, due to the loss of the relative translation modular length, the camera triplet is the minimum configuration solution in the camera position calculation, and in order to evaluate the effectiveness of the initial seed view selection and recovery, an additional camera is required. Finally, in consideration of robustness, in the present embodiment, the initial seed view is selected and constructed by using a camera quadruple instead of a camera pair or a camera triplet, and the specific selection manner is described as follows:

before selection, it should be noted that the epi-polar geometry usually contains a large number of quads, which is especially evident when the number of vertices is large. In order to balance the effectiveness and the high efficiency of the four-tuple selection process, the invention only considers the front with the maximum number of feature matching

（

Is a natural number, the invention

Preferably set to 100) camera quadruples of edges and the set of quadruples is noted as

Wherein, in the step (A),

the selected set of four-tuples is represented,

to represent

Any one of themCamera quadruplets.

For the

Each quadruple in (2)

The absolute position of each camera in the camera quadruple in the local coordinate system can be obtained by:

（3）

wherein the content of the first and second substances,

respectively representing the camera in a local coordinate system

The absolute position after the optimization is carried out,

the distance of the square chord is represented,

to represent

Any one of the edges of the strip is,

any one camera quad representing a set of quads,

、

presentation camera

The initial absolute position of (a).

At a set absolute position

When estimating the initial, first, will

And

is initialized to

And

and respectively performing linear trifocal tensor solver on the triples through the known absolute rotation

And

in-process initialization

And

. After the above optimization, the following formula is used

Each quadruple in

Calculating a selection cost:

（4）

wherein the content of the first and second substances,

represents the cost of the selection of the camera quadruple,

Finally, the selected initial quadruple (or initial seed view) can be obtained by:

（5）

wherein the content of the first and second substances,

representing the serial number of each vertex camera of the initial quadruple, the corresponding four absolute positions are

，

，

，

。

Each vertex is used for constructing a third vertex set;

the next optimal view selection is another key step in incrementally restoring the structure from motion, which also requires significant consideration. A simpler processing method is to select the camera with the largest number of edges connected to the camera with the current estimated absolute position, and use the corresponding view as the next optimal view. However, collections

Different sides in the middle correspond to different relative translation measurement errors

These edges should not be treated equally in the selection process. Therefore, in order to improve the robustness of the translational averaging method, the invention designs a next optimal view selection strategy based on a weighted support set, and the specific flow is described as follows

In this embodiment, the vertex sets of the current estimated and unexstimated absolute positions are respectively recorded as

And

by using

Set of vertices representing the current estimated absolute position, as the first set of vertices

The set of vertices representing the current unexstimated absolute position as the second set of vertices, i.e.

. The next optimal view is selected for the purpose of selecting from

Selects a vertex which can make the incremental absolute position calculation process more robust

. To improve the efficiency of the next optimal view selection, only consider here

Neutralization of

The front with the maximum number of connecting edges of all the top points

（

For natural numbers, the invention is preferably provided

10) vertices, and records the set of vertices as

As a third set of vertices, where,

to represent

Neutralization of

All vertices in the set connect the top 10 vertices with the highest number of edges,

to represent

Any one vertex in (b).

in the present embodiment, it is preferred that,

each vertex in (1)

Can all be combined with

The vertices in (a) constitute a plurality of camera triplets, denoted as

. The camera triplet is used here to eliminate the scale ambiguity problem in absolute position estimation, each belonging to

Of (2)

All can be

Calculating an absolute camera position, recording as

. At an estimated absolute position

And

and relative position after measurement

，

And

in the case where it is known that,

can be calculated by a linear trifocal tensor solution. Ideally, aggregate

Each absolute position in

Should remain equal, but in practice this does not happen due to the effect of the absolute position estimation error and the relative position measurement error. Therefore, to select the next optimal view, the set needs to be calculated according to the following formula

The cost of selecting each absolute position in (1):

（6）

wherein the content of the first and second substances,

is composed of

One of the edges of the one of the two,

for the vertex in the third vertex set

representing edges

Corresponding to the relative position between the two vertexes,

representing vertices in a first set of vertices

The current estimate of the absolute position is,

representing the weight corresponding to each epipolar geometry edge.

Subsequently, the set of pairs of the following formula is utilized

Absolute position of representative in

Selecting:

（7）

wherein the content of the first and second substances,

representation collection

The number of representative absolute positions in (1). Finally, the selected next optimal view can be obtained by:

（8）

wherein the content of the first and second substances,

indicating the sequence number of the vertex corresponding to the next optimal view selected, the absolute position of the vertex being initialized to

. Since the next optimal view selection strategy proposed by the present invention is based on the support set weighted by the recomputed position deviation, it can deal more robustly with the relative translational outliers.

The norm carries out weighted local optimization on the absolute position of the vertex corresponding to the selected next optimal view; what is needed isThe first edge set is a set of epipolar geometric edges between the first vertex set and the vertex corresponding to the selected next optimal view;

in this embodiment, after the next optimal view is selected, the vertex is selected

Is initialized to an absolute position of

In order to further improve the accuracy of absolute position estimation, the invention carries out local or global optimization on the currently estimated absolute position. Local optimization optimizes only the most recently estimated absolute position while fixing other absolute positions, global optimization simultaneously pairs sets

Is optimized for all estimated absolute positions. In consideration of efficiency, the local optimization and the global optimization in the invention are alternately carried out, and a certain growth rate is achieved only at the currently estimated absolute position

(in the invention)

Preferably set to 50%) is globally optimized. Similar to the selection of the initial quadruple and the selection of the next optimal view, the local optimization and the global optimization in the invention also introduce a weighted idea. In addition, in order to deal with the drift problem in the incremental estimation scheme, after each weighted global optimization, the invention also carries out retranslation averaging on the local epipolar geometry diagram. The specific flow of weighted local and global optimization and re-panning averaging is described as follows:

for the weighted local optimization, on the basis of the selection and initialization of the next optimal view, firstly, an inner value edge set is solved, and the formula is as follows:

（9）

wherein the content of the first and second substances,

representing a first set of vertices

Vertex corresponding to the selected next optimal view

The set of connecting edges between them,

is composed of

Any one of the edges of the strip is,

to represent

The set of inner-value edges in (b),

representing edges

The relative position between the two cameras connected,

representing vertices

The current estimate of the absolute position is,

to represent

The absolute position of the initialization is set to be,

the error threshold value (namely the error threshold value of the included angle between two positions) is shown in the experiment of the invention

。

Then, the vertex is aligned

Absolute position of

Performing weighted local optimization, wherein the formula is as follows:

（10）

wherein the content of the first and second substances,

indicating absolute position

The result of the local optimization is weighted and,

to represent

Any one of the edges of the strip is,

representing edges

Relative position between the two connected cameras.

For weighted global optimization, similar to weighted local optimization, first a set of edges from all current estimated absolute positions is needed

In-take inner value edge set

The formula is as follows:

（11）

wherein the content of the first and second substances,

to represent

Any one of the edges of the strip is,

representing edges

The relative position between the two cameras connected,

representing vertices

A current absolute position estimate.

Then, the sets are combined

Is weighted globally, the formula is as follows:

（12）

wherein the content of the first and second substances,

representing a set of absolute positions

The result of the weighted global optimization is performed,

is composed of

Any one of the edges of the strip is,

representing edges

Relative position between the two connected cameras.

After the weighted global optimization, for the retranslation averaging, the absolute position set is obtained through the weighted global optimization

And the formula in the weighted global optimization is used for solving the inner value edge set again, and the currently estimated absolute position is optimized again, so that the accuracy and the robustness of the method are further improved.

And acquiring the three-dimensional coordinates of the point cloud under the global coordinate system according to the optimized absolute position of the camera to obtain a sparse reconstruction result. On the basis, a final three-dimensional model can be generated through the steps of dense reconstruction, point cloud modeling and the like.

In addition, in order to verify the effect of the present invention, we performed test experiments on 1DSfM data set, and the related information of the data set is listed in table 1, ALM (ALM-Alamo), ELS (ELS-Ellis Island), MDR (MDR-Madrid Metrop), MND (MND-monotreat not date), NYC (NYC-NYC Library), PDP (PDP-Piazza del Popolo), PIC (PIC-Piccadilly), ROF (ROF-Roman Forum), TOL (TOL-Tower of London), USQ (USQ-Union Square), VNC (VNC-Vienna cathodal), YKM (YKM-Yorkminster) denote data sets, collectively referred to as 1DSfM data set, and can refer to the following documents: "K, Wilson and N, Snavely, road global transformations with 1DSfM, In European Conference on Computer Vision (ECCV), pages 61-75, 2014. In the experiment, the result of the Bundler calibration is used as the true value of the absolute position of the camera, and the error median of the absolute position estimation is used as an evaluation index.

TABLE 1

In order to verify the effectiveness of the key technology provided by the invention, a plurality of ablation experiments are carried out, including initial four-tuple selection (ablation one) based on local optimization, next optimal view selection (ablation two) based on a weighted support set, weighting (ablation three), repeated translation averaging (ablation four), global optimization (ablation five), local optimization (ablation six) without weighting, and the following six conditions are briefly described:

1) selecting the initial seed view as a camera triple with the minimum rotation cycle deviation under the condition of no initial quadruple selection based on local optimization;

2) under the condition that no next optimal view based on the weighted support set is selected, selecting the next optimal view as the camera with the largest number of edges connected with the camera with the current estimated absolute position;

3) all relative translation measurements are treated equally without weighting;

4) under the condition of no re-translational averaging, re-translational averaging is not performed after each weighted global optimization;

5) under the condition of no weighted global optimization, weighted global optimization and retranslation averaging are not carried out in the incremental absolute position calculation process;

6) under the condition of non-weighted local optimization, no optimization operation is carried out in the incremental absolute position calculation process, and each absolute position is set as an initial value given after the next optimal view is selected.

The results of the ablation experiments are shown in table 2, from which it can be seen that: for most test data, the translational averaging estimation errors in all ablation experiments are increased, which shows that the key technologies proposed in the invention are effective in improving the accuracy and robustness of the method.

TABLE 2

In comparative experiments, we compared the method of the present invention with five other methods, corresponding documents of which are:

[1] Z. Cui and P. Tan. Global structure-from-motion by similarity averaging. In IEEE International Conference on Computer Vision (ICCV), pages 864–872, 2015.

[2] C. Sweeney, T. Sattler, T. Höllerer, M. Turk, and M. Pollefeys. Optimizing the viewing graph for structure-from-motion. In IEEE International Conference on Computer Vision (ICCV), pages 801–809, 2015.

[3] T. Goldstein, P. Hand, C. Lee, V. Voroninski, and S. Soatto. ShapeFit and ShapeKick for robust, scalable structure from motion. In European Computer Vision (ECCV), pages 289–304, 2016.

[4] B. Zhuang, L. Cheong, and G. H. Lee. Baseline desensitizing in translation averaging. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4539–4547, 2018.

[5] Y. Kasten, A. Geifman, M. Galun, and R. Basri. Algebraic characterization of essential matrices and their averaging in multiview settings. In IEEE/CVF International Conference on Computer Vision (ICCV), pages 5894–5902, 2019.

the results of the comparative experiments are shown in Table 3, from which it can be seen that: in all the comparison methods, the translational averaging method provided by the invention achieves the overall optimization in the aspects of accuracy, robustness and the like. In the comparison of the result accuracy, only the results of the first comparison experiment and the fourth comparison experiment have smaller difference with the result accuracy of the invention, but the methods need to use additional information or initialization operation, such as characteristic tracks, local binding adjustment, more accurate initial values and the like, and the solving process is more complicated.

TABLE 3

An incremental translational averaging system according to a second embodiment of the present invention, as shown in fig. 2, includes: the system comprises an epi-polar geometry diagram construction module 100, an initial seed view selection module 200, a set construction module 300, a next optimal view selection module 400, an optimization module 500 and an absolute position estimation output module 600;

the epi-polar geometry map construction module 100 is configured to acquire a plurality of frames of images, perform feature matching between every two images, and construct the epi-polar geometry map according to the epi-polar geometry relationship

set of epipolar geometric edges between two cameras that capture different images, packageMotion information between cameras is contained;

the initial seed view selection module 200 is configured to select the top with the largest number of feature matches in the epipolar geometry

the set constructing module 300 is configured to construct a vertex set with an estimated absolute position based on the vertex corresponding to the initial seed view as a first vertex set, and construct a vertex set with an un-estimated absolute position in the epipolar geometry map as a second vertex set; selecting the front vertex with the maximum number of connecting edges with all the vertexes in the first vertex set in the second vertex set

Each vertex is used for constructing a third vertex set;

the next optimal view selecting module 400 is configured to combine each vertex in the third vertex set and a vertex in the first vertex set into a camera triplet, and calculate an absolute position of each vertex in the third vertex set by a linear trifocal tensor solution; calculating the selection cost of each vertex according to the obtained absolute position, and taking the view corresponding to the vertex with the maximum selection cost as the next optimal view;

the optimization module 500 is configured to fix the estimated absolute position of the camera in the first vertex set, and perform weighted local optimization only on the absolute position of the vertex corresponding to the most recently estimated next optimal view; after the weighted local optimization is completed, judging the growth ratio of the number of vertexes of the current estimated absolute position, and if the ratio is greater than a set threshold, performing weighted global optimization on all the current estimated absolute positions;

the absolute position estimation output module 600 is configured to output the absolute positions obtained by performing weighted global optimization and retranslation averaging on all the estimable vertices after completing estimation of all the absolute positions, as the final estimation result of the absolute positions of each camera.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that, the incremental translational averaging system provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the above function allocation may be completed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An electronic device according to a third embodiment of the present invention includes at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the incremental translational averaging method as recited in the claims.

A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the incremental translational averaging method recited in the claims above.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the above-described apparatuses and computer-readable storage media may refer to the corresponding processes in the foregoing method examples, and are not described herein again.

Referring now to FIG. 4, there is illustrated a block diagram of a computer system suitable for use as a server in implementing embodiments of the method, system, and apparatus of the present application. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 4, the computer system includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for system operation are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. An incremental translational averaging method applied to global solution of absolute camera positions in a motion recovery structure, the method is characterized by comprising the following steps:

step S300, constructing a vertex set with an estimated absolute position based on the vertex corresponding to the initial seed view, taking the vertex set with the estimated absolute position in the epi-polar geometric figure as a first vertex set, and taking the vertex set with the estimated absolute position in the epi-polar geometric figure as a second vertex set; selecting the secondThe front vertex set with the maximum number of connecting edges with all the vertexes in the first vertex set

Each vertex is used for constructing a third vertex set;

the weighted global optimization is as follows: calculating relative position errors corresponding to geometric edges of the outer poles by combining the absolute positions of all the vertexes with the relative positions obtained by measuring every two vertexes as second errors;if the second error is less than the set error threshold, the corresponding edge is taken as the inner value edge, and the inner value edge is further based on

2. The incremental translational averaging method according to claim 1, wherein the absolute position of each camera in the camera quadruples of the quadruple set in the local coordinate system is calculated by:

wherein the content of the first and second substances,

respectively representing the camera in a local coordinate system

The absolute position after the optimization is carried out,

the distance of the square chord is represented,

to represent

Any one of the edges of the strip is,

any one camera quad representing a set of quads,

、

presentation camera

The initial absolute position of (a).

3. The incremental translational averaging method of claim 2, wherein the cost of selecting each camera quad in the set of quad is calculated by:

wherein the content of the first and second substances,

represents the cost of the selection of the camera quadruple,

4. The incremental translational averaging method according to claim 1, wherein the selection cost of each vertex is calculated from the obtained absolute position by:

wherein the content of the first and second substances,

is composed of

One of the edges of the one of the two,

for the vertex in the third vertex set

representing edges

Corresponding to the relative position between the two vertexes,

representing vertices in a first set of vertices

The current estimate of the absolute position is,

representing the weight corresponding to each epipolar geometry edge.

5. The incremental translational averaging method according to claim 2, wherein the inner-value edge obtaining method in the weighted local optimization is:

wherein the content of the first and second substances,

representing a first set of vertices

Vertex corresponding to the selected next optimal view

The set of geometric edges of the outer pole in between,

is composed of

Any one of the edges of the strip is,

to represent

The set of inner-value edges in (b),

representing edges

The relative position between the two cameras connected,

representing vertices

The current estimate of the absolute position is,

to represent

The absolute position of the initialization is set to be,

the relative position error corresponding to each side is shown,

indicating a set error threshold.

6. The incremental translational averaging method according to claim 5, based on

wherein the content of the first and second substances,

indicating absolute position

The result of the local optimization is weighted and,

to represent

Any one of the edges of the strip is,

representing edges

Relative position between the two connected cameras.

7. The incremental translational averaging method according to claim 6, wherein the inner-value edge obtaining method in the weighted global optimization is:

wherein the content of the first and second substances,

to represent

Any one of the edges of the strip is,

representing edges

The relative position between the two cameras connected,

representing vertices

A current absolute position estimate.

8. The incremental translational averaging method according to claim 7, based on

wherein the content of the first and second substances,

representing a set of absolute positions

Performing weighted globalThe result after the optimization is carried out,

is composed of

Any one of the edges of the strip is,

representing edges

Relative position between the two connected cameras.

9. An incremental translational averaging system, the system comprising: the system comprises an epi-polar geometry diagram construction module, an initial seed view selection module, a set construction module, a next optimal view selection module, an optimization module and an absolute position estimation output module;

the set building module is configured to set the vertexes of the estimated and non-estimated absolute positions in the epipolar geometry map as a first vertex set and a second vertex set; selecting the front vertex with the maximum number of connecting edges with all the vertexes in the first vertex set in the second vertex set

Each vertex is used for constructing a third vertex set;

the weighted local optimization is: for the absolute position of the vertex corresponding to the selected next optimal view, the absolute position in the first vertex set and the relative position measured between two vertexes connected with each edge in the first edge set are combined to obtainCalculating the corresponding relative position error of each side as a first error; if the first error is less than the set error threshold, the corresponding edge is taken as the inner value edge, and the inner value edge is further based on

10. An electronic device, comprising:

at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the incremental translational averaging method of any one of claims 1-8.