CN113436264B

CN113436264B - Pose calculation method and system based on monocular and monocular hybrid positioning

Info

Publication number: CN113436264B
Application number: CN202110977859.8A
Authority: CN
Inventors: 李骥; 赵信宇; 邢志伟; 魏伟; 龙建睿; 魏金生; 肖崇泳
Original assignee: Shenzhen Dadao Zhichuang Technology Co ltd
Current assignee: Guangdong Dadao Zhichuang Technology Co.,Ltd.
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2021-11-19
Anticipated expiration: 2041-08-25
Also published as: CN113436264A

Abstract

The application relates to a pose calculation method and a pose calculation system based on monocular and monocular hybrid positioning. The pose estimation is carried out by utilizing the basic monocular feature set and the basic monocular matching set, the rotation amount of one monocular camera of the multi-view cameras, namely initial rotation information, can be quickly and accurately estimated, then the initial rotation information is used as the initial value of the rotation amount, the multi-view pose calculation can be carried out by combining the mixed feature set and the mixed matching set, the translation amount of the multi-view camera is obtained, and the relative pose of the multi-view camera is calculated. Because the rotation amount is calculated in advance in the monocular pose calculation process, and the reference of the rotation amount is directly provided for the monocular pose calculation, the translation amount can be calculated more quickly by the monocular pose calculation, the calculation efficiency is greatly improved, and the pose positioning is completed more quickly.

Description

Pose calculation method and system based on monocular and monocular hybrid positioning

Technical Field

The application relates to the field of vision SLAM technology, in particular to a pose calculation method and system based on monocular and monocular hybrid positioning.

Background

The SLAM technology is an instant positioning and map construction technology, is mainly applied to the field of positioning and navigation, can calculate the current pose of a robot carrying a camera by analyzing images acquired by the camera, and constructs a map of the surrounding environment to complete positioning and navigation, and is a key technology for solving the problem of planning the movement route of an autonomous mobile robot at present.

In the robot pose calculation in the prior art, a system can extract feature points from an image acquired by a camera, track and match the feature points with existing visual features in a map template to obtain matched feature Point pairs, then perform triangularization calculation on the feature Point pairs, calculate the difference between the pose of the camera when the map template is shot and the pose of the camera when the image is shot by using a PnP (Passive-n-Point) method, thereby calculating the current pose of a robot carrying the camera, and by continuously taking the image acquired in the last pose positioning calculation as the map template for the next pose positioning calculation, the robot can continuously perform pose positioning calculation in the moving process, thereby realizing autonomous movement.

Since it is difficult to calculate depth information in PnP calculation for an image acquired by the same monocular camera alone, a general camera may employ a multi-view camera such as a binocular camera. However, the pose positioning calculation of the existing SLAM system has the problem of low pose calculation speed.

Disclosure of Invention

The method for calculating the pose based on monocular and monocular hybrid positioning has the advantages of being high in pose calculation speed and strong in stability.

The above object of the present invention is achieved by the following technical solutions:

the pose calculation method based on monocular and monocular hybrid positioning comprises the following steps:

monocular feature matching, namely determining a basic monocular feature set and a basic monocular matching set based on a monocular image and a map image corresponding to the monocular image; the monocular image is obtained based on one of the eyes of the multi-eye camera, and the basic monocular feature set comprises a plurality of basic monocular tracking points extracted from the same monocular image; the base monocular matching set includes base monocular matching points that match each of the base monocular tracking points in the map image;

monocular rotation calculation, namely determining initial rotation information based on the basic monocular feature set and the basic monocular matching set; wherein the initial rotation information is used to reflect an amount of rotation;

a mixed feature matching step of determining a mixed feature set and a mixed matching set based on at least two monocular images and map images corresponding to the respective monocular images; each monocular image is acquired based on the multi-view camera, and the mixed feature set comprises a plurality of mixed tracking points extracted from one or more monocular images; the set of blended matches includes pairs of blended match points that match each of the blended tracking points in each of the map images;

calculating a multi-view pose, and determining final pose information based on the initial rotation information, the mixed feature set and the mixed matching set; and the final pose information is used for reflecting the rotation amount and the translation amount of the multi-view camera.

By adopting the technical scheme, the multi-view camera consists of at least two monocular cameras, and each monocular camera moves synchronously under the state that the robot moves normally. The pose estimation is carried out by utilizing the basic monocular feature set and the basic monocular matching set, the rotation amount of one monocular camera of the multi-view cameras, namely initial rotation information, can be quickly and accurately estimated, then the initial rotation information is used as the initial value of the rotation amount, the multi-view pose calculation can be carried out by combining the mixed feature set and the mixed matching set, the translation amount of the multi-view camera is obtained, and the relative pose of the multi-view camera is calculated. Because the rotation amount is calculated in advance in the monocular pose calculation process, and the reference of the rotation amount is directly provided for the monocular pose calculation, the translation amount can be calculated more quickly by the monocular pose calculation, the calculation efficiency is greatly improved, and the pose positioning is completed more quickly.

Optionally, in a specific method of the monocular rotation calculating step, the method includes:

determining a monocular point pair set based on the basic monocular feature set and the basic monocular matching set; wherein the monocular point pair set contains a plurality of matching feature point pairs; the matching feature point pair consists of the basic monocular tracking point and the basic monocular matching point matched with the basic monocular tracking point;

determining a monocular movement simulation set; wherein, the monocular motion simulation set comprises a plurality of monocular motion simulation parameters for simulating different motion quantities;

performing monocular projection error estimation on each matched characteristic point pair based on a monocular motion simulation set to construct a monocular pose error model;

the monocular projection error can reflect an error between the position of the feature point in the monocular image, which is consistent with the basic monocular matching point, and the position of the basic monocular tracking point after position transformation based on the rotation amount; the monocular projection error model can reflect the size of the monocular projection error corresponding to different monocular motion simulation parameters;

performing model optimization based on the monocular pose error model to determine a monocular final error model;

and determining initial rotation information based on the monocular rotation simulation parameters corresponding to the monocular final error model.

By adopting the technical scheme, various different rotation amounts can be simulated through each monocular motion simulation parameter, and if the rotation amount corresponding to the monocular motion simulation parameter is completely consistent with the actual rotation amount of the monocular camera, the basic monocular tracking point can move reversely to the position overlapped with the basic monocular matching point based on the rotation amount. The position of the basic monocular tracking point is transformed based on different monocular motion simulation parameters, namely, the basic monocular tracking point is reversely moved based on different rotation amounts, so that a monocular projection error between the position of the monocular tracking point after being reversely moved and the position of the basic monocular matching point can be calculated, if the monocular projection error is larger, the difference between the rotation amount corresponding to the monocular motion simulation parameters and the actual rotation amount of the monocular camera is larger, and if the monocular projection error is smaller, the closer the rotation amount corresponding to the monocular motion simulation parameters and the actual rotation amount of the monocular camera is. The monocular pose error model can reflect monocular projection errors simulated by different monocular motion simulation parameters, and the monocular pose error model with smaller monocular projection errors can be obtained by optimizing the monocular pose error model, so that the monocular motion simulation parameters which are closer to the actual rotation amount of the monocular camera are determined, and initial rotation information is obtained.

Optionally, the monocular motion simulation parameters include monocular rotation simulation parameters and monocular translation simulation parameters, where the monocular rotation simulation parameters are used to simulate a rotation amount of the monocular camera, and the monocular translation simulation parameters are used to simulate a translation amount of the monocular camera.

By adopting the technical scheme, the movement of the monocular camera not only has rotation amount, but also has translation amount, and the translation amount is also used as the simulated movement amount to participate in calculation by utilizing monocular translation simulation parameters, so that a monocular projection error model can be obtained more accurately.

Optionally, the mixed feature set includes a mixed monocular feature subset and a mixed monocular feature subset;

and extracting each mixed tracking point in the mixed monocular feature subset from the same monocular image, and extracting each mixed tracking point in the mixed monocular feature subset from the overlapping part of each monocular image.

By adopting the technical scheme, the mixed monocular feature subset is introduced to calculate the pose, instead of only using the mixed monocular feature subset, the pose calculation can reduce the problem of visual field reduction caused by the monocular camera, and the pose calculation stability is improved.

Optionally, in a specific method for multi-purpose pose calculation, the method includes:

determining a multi-view motion simulation set; wherein, the multi-view motion simulation set comprises a plurality of multi-view translation simulation parameters for simulating different translation amounts;

performing coordinate transfer on the mixed matching set based on the initial rotation information and the multi-view motion simulation set, and determining a reverse-thrust matching set; the backstepping matching set comprises a plurality of backstepping matching point pairs which are in one-to-one correspondence with the mixed matching point pairs, and the backstepping matching point pairs can reflect points which are projected on the monocular image after the mixed matching point pairs are subjected to position adjustment based on rotation amount and translation amount;

respectively carrying out multi-view projection error estimation on the mixed monocular feature subset and the mixed multi-view feature subset and a reverse-thrust matching set to construct a multi-view pose error model;

performing model optimization based on the multi-view pose error model to determine a multi-view final error model;

and determining final pose information based on the rotation amount and the translation amount corresponding to the multi-view final error model.

By adopting the technical scheme, the mixed matching point pairs successfully matched in the map image are reversely moved based on the rotation amount and the translation amount, so that the reverse-thrust matching point pairs can be obtained, and if the reverse-thrust matching point pairs corresponding to the mixed matching point pairs are overlapped with the mixed tracking points corresponding to the mixed matching point pairs, the rotation amount and the translation amount corresponding to the reverse-thrust matching point pairs are the actual rotation amount and the translation amount of the multi-view camera. By continuously selecting different multi-view translation simulation parameters, a multi-view pose error model representing a projection error can be obtained, and a multi-view pose error model with a smaller error overall can be obtained through the multi-view pose error model, so that the closer the rotation amount and the translation amount to the actual motion of the monocular camera are determined, and the final pose information is determined.

Optionally, the specific method for performing model optimization based on the multi-view pose error model and determining the multi-view final error model includes:

performing iterative optimization by using a Gauss-Newton iteration method based on the translation amount corresponding to the multi-eye pose error model to determine a first optimization error model;

and (4) performing iterative optimization by using a Gauss-Newton iteration method based on the corresponding rotation amount of the multi-eye pose error model, and determining a multi-eye final error model.

By adopting the technical scheme, the initial value of the rotation amount in the multi-view pose error model is determined, so that the translation amount in the multi-view pose error model can be optimized, and after more accurate translation amount is obtained, the rotation amount in the multi-view pose error model is optimized, which is equivalent to fine adjustment of the rotation amount, so that the rotation amount and the translation amount are equally optimized, and the pose calculation accuracy is improved.

Optionally, the specific method for performing multi-view projection error estimation and constructing a multi-view pose error model based on the mixed monocular feature subset and the mixed multi-view feature subset and the back-projection matching set respectively includes:

respectively carrying out multi-view projection error estimation on the mixed monocular feature subset and the mixed multi-view feature subset and a reverse-thrust matching set to construct a basic error model;

and optimizing the basic error model based on the robust kernel function to determine a multi-objective pose error model.

By adopting the technical scheme, the robust kernel function can reduce the influence of larger errors in the basic error model on pose calculation, and obtain a more stable calculation result.

Optionally, in the specific method for monocular feature matching, the method includes:

monocular feature extraction, namely determining an initial monocular feature set and an initial monocular matching set based on a monocular image and a map image corresponding to the monocular image; the initial monocular feature set comprises a plurality of initial monocular feature points, and the initial monocular matching set comprises a plurality of initial monocular matching points;

screening monocular characteristics, performing filtering optimization based on the initial monocular characteristic set and the initial monocular matching set, and determining a basic monocular characteristic set and a basic monocular matching set;

in the specific method for screening the monocular characteristics, the method comprises the following steps:

based on a matching distance threshold, searching and removing feature misleading points in the initial monocular feature set, searching and removing mismatching points in the initial monocular matching set, and determining a basic monocular feature set and a basic monocular matching set;

when the distance between the initial monocular feature point and the corresponding initial monocular matching point is greater than a matching distance threshold, determining the initial monocular feature point as a feature misleading point, and determining the initial monocular matching point corresponding to the feature misleading point as the mismatching point;

and/or the presence of a gas in the gas,

determining virtual feature points in the monocular image based on the matching distance threshold and the initial monocular matching points; the virtual feature point is a point with the highest matching degree with the initial monocular matching point in the distance range corresponding to the matching distance threshold;

determining a reverse-thrust virtual point in the map image based on the matching distance threshold and the virtual feature point; the reverse-deducing virtual point is the point with the highest matching degree with the virtual feature point in the distance range corresponding to the matching distance threshold;

based on each reverse-deducing virtual point, searching and removing feature misleading points in the initial monocular feature set, searching and removing mismatching points in the initial monocular matching set, and determining a basic monocular feature set and a basic monocular matching set;

when the reverse-thrust virtual point corresponding to the initial monocular matching point deviates from the initial monocular feature point corresponding to the initial monocular matching point, determining the initial monocular matching point as the error matching point, and determining the initial monocular feature point as a feature error guide point;

and/or the presence of a gas in the gas,

determining alternative feature points based on the initial monocular matching points;

based on a difference threshold value, initial monocular feature points and alternative feature points, searching and removing feature misleading points in an initial monocular feature set, searching and removing mismatching points in an initial monocular matching set, and determining a basic monocular feature set and a basic monocular matching set;

when the difference value between the initial monocular feature point and the alternative feature point is smaller than a difference threshold value, the initial monocular feature point is determined as a feature misleading point, and the initial monocular matching point corresponding to the initial monocular feature point is determined as the mismatching point.

By adopting the technical scheme, the rotation range of the monocular camera in a short time is limited, so that the distance between the initial monocular characteristic point and the initial monocular matching point is limited, and when the distance is greater than the matching distance threshold, the two pixel points are wrongly matched and need to be eliminated.

The monocular camera has a limited rotation range in a short time, so that the maximum range of position transfer of the initial monocular matching point can be preset, the virtual feature point closest to the initial monocular matching point is searched in the range, if the virtual feature point is consistent with the initial monocular feature point, the initial monocular feature point is correctly matched with the initial monocular matching point, otherwise, the two pixel points are wrongly matched and need to be eliminated.

Because the initial monocular feature points and the initial monocular matching points have a one-to-one correspondence relationship, if an alternative feature point with higher similarity to the initial monocular feature points exists, the alternative feature point may have a matching relationship with the initial monocular matching points, and the risk of matching errors between the initial monocular feature points and the initial monocular matching points is higher and needs to be eliminated.

The second purpose of the application is to provide a pose calculation system based on monocular and monocular hybrid positioning, and the pose calculation system has the characteristics of high pose calculation speed and high stability.

The second objective of the present invention is achieved by the following technical solutions:

position appearance computational system based on many mesh of monocular are mixed and are fixed a position includes:

the monocular feature matching module is used for determining a basic monocular feature set and a basic monocular matching set based on the monocular image and the map image corresponding to the monocular image; the monocular image is obtained based on one of the eyes of the multi-eye camera, and the basic monocular feature set comprises a plurality of basic monocular tracking points extracted from the same monocular image; the base monocular matching set includes base monocular matching points that match each of the base monocular tracking points in the map image;

the monocular rotation calculating module is used for determining initial rotation information based on the basic monocular feature set and the basic monocular matching set; wherein the initial rotation information is used to reflect an amount of rotation;

a mixed feature matching module for determining a mixed feature set and a mixed matching set based on at least two monocular images and a map image corresponding to each monocular image; each monocular image is acquired based on the multi-view camera, and the mixed feature set comprises a plurality of mixed tracking points extracted from one or more monocular images; the set of blended matches includes pairs of blended match points that match each of the blended tracking points in each of the map images;

the multi-view pose calculation module is used for determining final pose information based on the initial rotation information, the mixed feature set and the mixed matching set; and the final pose information is used for reflecting the rotation amount and the translation amount of the multi-view camera.

The third purpose of the application is to provide an intelligent terminal, which has the characteristics of high pose calculation speed and high stability.

The third object of the invention is achieved by the following technical scheme:

an intelligent terminal comprises a memory and a processor, wherein the memory is stored with a computer program which can be loaded by the processor and executes the pose calculation method based on monocular and monocular hybrid positioning.

Drawings

Fig. 1 is a schematic flowchart of a pose calculation method according to a first embodiment of the present application.

Fig. 2 is a sub-flow diagram of a monocular feature matching step in the pose calculation method according to the first embodiment of the present application.

Fig. 3 is a sub-flow diagram of a monocular rotation calculating step in the pose calculating method according to the first embodiment of the present application.

Fig. 4 is a sub-flow diagram of a multi-purpose pose calculation step in the pose calculation method according to the first embodiment of the present application.

FIG. 5 is a schematic diagram of determining a blended feature set based on a monocular image.

Fig. 6 is a schematic flowchart of a pose calculation method according to a second embodiment of the present application.

Fig. 7 is a sub-flow diagram of a monocular feature matching step in the pose calculation method according to the second embodiment of the present application.

Fig. 8 is a module schematic diagram of a pose calculation system according to a third embodiment of the present application.

Fig. 9 is a schematic diagram of an intelligent terminal according to a fourth embodiment of the present application.

In the figure, 1, a monocular feature matching module; 2. a monocular rotation calculation module; 3. a mixed feature matching module; 4. and a multi-eye pose calculation module.

Detailed Description

In a visual SLAM system based on feature points, after a real-time picture acquired by a camera is converted into a set of feature points, matching and positioning calculation with existing visual features in a map are required, and relative motion of the camera is analyzed in a visual positioning mode. The core principle of visual positioning is to judge the relative motion of the camera by the same physical position seen at different times or the same physical position seen in different directions at the same time.

Assuming that the physical position seen by the camera is (X, Y, Z), the physical position is transformed to (X) for the camera after making rigid body motion in three-dimensional space^，,Y^，,Z^，) The coordinate transformation formula is shown in formula (2),

（2）

wherein t is translation (three-degree-of-freedom position), R is rotation matrix (three-degree-of-freedom rotation), and t and R are in a combined posture. In engineering practice, the rotation matrix is not directly solved, and R is expressed and solved by Euler angles, quaternions, rotation axes, angles and the like.

Since the physical position is represented in the form of physical coordinates, and the position of the visual feature in the image is represented in the form of pixel coordinates, coordinate conversion between the physical coordinates and the pixel coordinates is required through a projection model.

In the pinhole imaging camera, the projection model is shown in formula (3),

（3）

wherein c is_uAnd c_vFor the pixel position corresponding to the optical center, f is the relative focal length (dimensionless), these three parameters can be obtained by off-line calibration, and X, Y, Z are the physical coordinates corresponding to the feature point, X indicates facing to the right, Y indicates facing to the right and downwards, and Z indicates facing to the right and front, respectively.

In the two left and right images obtained by the binocular camera, if one physical point (X, Y, Z) is observed and matched by the two monocular cameras at the same time, the physical coordinates can be calculated by using the formula (4) and the formula (5) using the triangulation method:

（4）

（5）

wherein (u)₀,v₀) Projection of a physical point in the left image, (u)₁,v₁) The projection of the physical point in the right image is shown, and B is the base length of the binocular camera.

The SLAM pose positioning method in the prior art has the following technical defects:

1. in the pose positioning calculation only adopting the monocular camera, the PNP calculation of the monocular camera is difficult to reflect the depth information, so the pose positioning calculation can only calculate the relative rotation and the relative pose lacking one-dimensional dimension, and the pose calculation is not comprehensive;

2. when pose positioning calculation is carried out based on the matching feature point pairs, positioning accuracy is reduced due to the problem of matching errors;

3. when a multi-camera such as a binocular camera is used for pose positioning calculation, the problem of low positioning calculation speed exists;

4. when a multi-view camera such as a binocular camera is used for pose positioning calculation, physical features in a common visual area of the binocular camera need to be extracted, the problem of reduction of a shooting visual field exists, and particularly under the conditions that the visual field is blocked, the long-time operation deviates from preset calibration parameters, the scene changes and the like, the pose positioning calculation stability is poor.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the reference numerals of the steps in this embodiment are only for convenience of description, and do not represent the limitation of the execution sequence of the steps, and in actual application, the execution sequence of the steps may be adjusted or performed simultaneously as needed, and these adjustments or substitutions all belong to the protection scope of the present invention.

Embodiments of the present application are described in further detail below with reference to figures 1-9 of the specification.

The first embodiment is as follows:

the embodiment of the application provides a pose calculation method based on monocular and monocular hybrid positioning, and the main flow of the method is described as follows.

Referring to fig. 1, S01, monocular feature matching, a base monocular feature set and a base monocular matching set are determined based on a monocular image and a map image corresponding to the monocular image.

The multi-view camera can be composed of a plurality of monocular cameras, overlapped areas are arranged among visual field areas of the monocular cameras, and when the robot moves, the monocular cameras forming the multi-view camera can move synchronously along with the robot.

A monocular image refers to an image acquired by a monocular camera; the map image refers to an image acquired at a time node on a monocular camera that acquires the monocular image. Therefore, the map image can be understood as an image acquired by the monocular camera before moving, the monocular image can be understood as an image acquired by the monocular camera after moving, the map image and the monocular image should have the same characteristic point, and the rotation amount and the translation amount of the monocular camera, namely whether the monocular camera rotates, translates or both the rotation amount and the translation amount can be calculated by comparing the position change between the characteristic point of the map image and the corresponding characteristic point of the monocular image, so that the pose of the monocular camera can be calculated.

The basic monocular feature set comprises a plurality of basic monocular tracking points, and each basic monocular tracking point is a feature point extracted from the same monocular image. The basic monocular matching set comprises a plurality of basic monocular matching points, each basic monocular matching point is a feature point in the same map image, and each basic monocular matching point and each basic monocular tracking point have a one-to-one matching relationship.

Specifically, the monocular camera for acquiring the monocular image should be the same as the monocular camera for acquiring the map image, and if the monocular camera for acquiring the monocular image is the left eye camera in the binocular cameras, the monocular camera for acquiring the map image is also the left eye camera in the binocular cameras, if the binocular camera is adopted in this embodiment.

In step S01, the method includes:

referring to fig. 2, S011, monocular feature extraction, an initial monocular feature set and an initial monocular matching set are determined based on a monocular image and a map image corresponding to the monocular image.

The initial monocular feature set comprises a plurality of initial monocular feature points, and each initial monocular feature point is a pixel point extracted from the same monocular image. The basic monocular feature set is a subset of the initial monocular feature set, and the basic monocular feature set can be obtained through screening optimization of the initial monocular feature set.

The initial monocular matching set comprises a plurality of initial monocular matching points, each initial monocular matching point is a pixel point extracted from the same monocular image, and each initial monocular matching point and each initial monocular feature point have one-to-one matching relationship. The basic monocular matching set is a subset of the initial monocular matching set, and the basic monocular matching set can be obtained through screening and optimizing the initial monocular matching set.

The initial monocular matching point and the initial monocular feature point are feature point pairs obtained through a feature matching algorithm, and the initial monocular matching set is recorded as follows:

，

wherein (u)_i,v_i) The pixel position coordinates of the initial monocular matching point are represented, s is the intensity value, d is the descriptor, and the descriptor is a matrix with a fixed size in the embodiment.

Note that the initial monocular feature set is:

；

wherein (u)_j,v_j) The pixel position coordinates of the initial monocular matching point are represented, s is the intensity value, d is the descriptor, and the descriptor is a matrix with a fixed size in the embodiment.

The feature matching algorithm is to calculate a set

So that the difference value

Minimizing or satisfying a predetermined minimum threshold value, from determining initial monocular matching points and initial monocular matching points which match each otherStarting the monocular feature points.

And S012, screening monocular characteristics, performing filtering optimization based on the initial monocular characteristic set and the initial monocular matching set, and determining a basic monocular characteristic set and a basic monocular matching set.

In engineering practice, errors may occur in the feature matching algorithm, so that the initial monocular matching points and the corresponding initial monocular feature points are not actually matched with each other, and if the pose calculation is continuously performed on the part of the pixel points with the matching errors, the calculation accuracy and stability are affected, so that each initial monocular matching point and each initial monocular feature point need to be screened, and the pixel points with the matching errors are filtered.

In step S012, the method includes:

s0121, based on the matching distance threshold, searching and removing the characteristic misleading points in the initial monocular characteristic set, and searching and removing the mismatching points in the initial monocular matching set.

Since the monocular camera has a limited rotation range in a short time, if the initial monocular matching point and the initial monocular feature point indicate the same actual feature, the deviation between the pixel coordinate of the initial monocular matching point and the pixel coordinate of the initial monocular feature point has a maximum movement range, and the matching distance threshold is a parameter value for indicating the maximum movement range. In this embodiment, the matching distance threshold is preset in the system.

In this step, the feature misleading point refers to an initial monocular feature point that is mismatched, and a pixel coordinate difference between the feature misleading point and the corresponding initial monocular matching point is greater than or equal to a matching distance threshold. The mismatching point refers to an initial monocular matching point of the mismatching, and the mismatching point corresponds to the feature misleading point.

In step S0121, when the maximum range of the matching distance threshold between the initial monocular matching point and the initial monocular feature point is limited, the distance matching verification requirement is satisfied between the initial monocular matching point and the initial monocular feature point, and the initial monocular matching point and the initial monocular feature point which do not satisfy the verification requirement are removed, thereby reducing the probability of matching errors.

S0122, determining virtual feature points in the monocular image based on the matching distance threshold and the initial monocular matching points.

And each virtual feature point corresponds to each initial monocular matching point one to one. The virtual feature point refers to a point with the highest matching degree with the initial monocular matching point in a range area which takes the coordinate of the initial monocular matching point as the center of a circle and the matching distance threshold as the radius in the monocular image. Wherein, the specific condition that the matching degree is the highest is that the descriptor difference value is the smallest.

If the initial monocular matching point and the initial monocular feature point are correctly matched, the virtual feature point is overlapped with the initial monocular feature point in the monocular image.

S0123, determining a reverse virtual point in the map image based on the matching distance threshold and the virtual feature point.

Each reverse-thrust virtual point corresponds to each virtual feature point one to one, and each reverse-thrust virtual point also corresponds to the initial monocular matching point corresponding to each virtual feature point. The reverse-thrust virtual point refers to a point with the highest matching degree with the virtual feature point in a range area which takes the coordinate of the virtual feature point as the center of a circle and the matching distance threshold value as the radius in the map image. Wherein, the specific condition that the matching degree is the highest is that the descriptor difference value is the smallest.

If the initial monocular matching point is correctly matched with the initial monocular feature point and the virtual feature point is superposed with the initial monocular feature point, the reverse-deducing that the virtual point is superposed with the initial monocular matching point in the map image.

S0124, based on the reverse-deducing virtual points, searching and rejecting the error matching points in the initial monocular matching set, and searching and rejecting the feature error guide points in the initial monocular feature set.

If the reverse-deducing virtual point is overlapped with the initial monocular matching point corresponding to the reverse-deducing virtual point, the initial monocular matching point is correctly matched with the corresponding initial monocular feature point; otherwise, the initial monocular matching point and the corresponding initial monocular feature point are indicated as error matching.

In the step, the characteristic error guide points refer to initial monocular characteristic points which are in error matching, and each characteristic error guide point is removed from the initial monocular characteristic set; the error matching points refer to initial monocular matching points which are in error matching, the error matching points correspond to the feature misleading points, and each error matching point is removed from the initial monocular matching set.

In steps S0122 to S0124, when the initial monocular matching point receives matching with the back-stepping virtual point, the initial monocular matching point may complete bidirectional matching verification from the map image to the monocular image and then from the monocular image to the map image, and the initial monocular matching point that does not satisfy the bidirectional matching verification may be eliminated, reducing the probability of matching error.

S0125, determining alternative characteristic points based on the initial monocular matching points.

The alternative feature points refer to the pixel points with the smallest descriptor difference value between the pixel points extracted from the monocular image and except the initial monocular feature points and the initial monocular matching points, namely, the points which are extracted from the monocular image and have the highest matching degree with the initial monocular matching points except the initial monocular feature points.

In this embodiment, each initial monocular matching point corresponds to two candidate feature points, which are a first candidate feature point and a second candidate feature point, respectively, where a matching degree between the first candidate feature point and the initial monocular matching point is higher than a matching degree between the second candidate feature point and the initial monocular matching point.

S0126, based on the difference threshold, the initial monocular feature points and the alternative feature points, searching and removing feature misleading points in the initial monocular feature set, searching and removing mismatching points in the initial monocular matching set, and determining a basic monocular feature set and a basic monocular matching set.

The initial monocular feature point and the initial monocular matching point are in one-to-one correspondence, if the initial monocular matching point simultaneously has a plurality of pixel points with high matching degrees, the matching between the initial monocular feature point and the initial monocular matching point is fuzzy, and the difference threshold is used for reflecting the fuzzy degree.

When the difference value between the descriptor of the initial monocular feature point and the descriptor of the alternative feature point is smaller than or equal to the difference threshold value, it is indicated that the initial monocular feature point and the alternative feature point are fuzzy, the risk of matching errors between the initial monocular feature point and the initial monocular matching point is high, at this time, the corresponding initial monocular feature point is determined as a feature error guide point, and the corresponding initial monocular matching point is determined as an error matching point.

In this embodiment, the difference between the descriptors between the initial monocular feature point and the first candidate feature point and the difference between the descriptors between the initial monocular feature point and the second candidate feature point should satisfy the condition of being equal to or less than the difference threshold at the same time.

When the difference value between the descriptor of the initial monocular feature point and the descriptor of the alternative feature point is larger than the difference threshold value, the difference between the initial monocular feature point and the alternative feature point is larger, and the only pixel point of the initial monocular feature point, which can be matched with the initial monocular matching point, can be determined, at the moment, the initial monocular feature point has uniqueness, and the initial monocular feature point and the initial monocular matching point are regarded as successful matching.

In steps S0125 to S0126, when the difference between the initial monocular feature point and the alternative feature point is large, the initial monocular feature point satisfies the requirement of uniqueness verification, and the initial monocular feature point and the initial monocular matching point that do not satisfy the requirement of uniqueness verification are eliminated, thereby reducing the probability of matching errors.

By using the steps S0121 to S0126, each feature error guide point can be removed and filtered from the initial monocular feature set, the filtered initial monocular feature set is a basic monocular feature set, each error matching point is removed and filtered from the initial monocular matching set, and the filtered initial monocular matching set is a basic monocular matching set.

Because the binocular camera can obtain two groups of monocular images and two groups of map images through the two monocular cameras, based on the monocular image of the left purpose and the map image of the left purpose, an initial monocular feature set and an initial monocular matching set based on the left purpose can be extracted and recorded as an initial left-eye feature set and an initial left-eye matching set; and extracting an initial monocular feature set and an initial monocular matching set based on the right eye, and recording the initial monocular feature set and the initial monocular matching set as an initial right eye feature set and an initial right eye matching set.

In this embodiment, the initial left eye feature set and the initial left eye matching set are screened in steps S0121 to S0126, so as to obtain a basic monocular feature set and a basic monocular matching set based on the left eye, and these sets are recorded as the basic left eye feature set and the basic left eye matching set.

Meanwhile, the initial right eye feature set and the initial right eye matching set are also screened by using the steps S0121 to S0126, so that a right eye-based basic monocular feature set and a basic monocular matching set are obtained and recorded as a basic right eye feature set and a basic right eye matching set.

And S02, performing monocular rotation calculation, and determining initial rotation information based on the basic monocular feature set and the basic monocular matching set.

The map image and the monocular image are two images obtained by the monocular camera at different moments, the basic monocular matching point and the basic monocular tracking point are equivalent to a characteristic point pair which can identify the same spatial position in the two images, and the movement amount of the monocular camera can be calculated by analyzing the position transformation of the characteristic point pair through geometric perspective. The motion amount of the monocular camera comprises a representative rotation amount and a translation amount, wherein the rotation amount can indicate a rotation angle and a rotation direction, the translation amount can indicate a translation distance and a translation direction, and the combination of the two can reflect the finished motion of the monocular camera. The initial rotation information refers to information that can reflect the amount of rotation of the monocular camera.

In this embodiment, only one group of the basic monocular feature set and the basic monocular matching set participates in the calculation of the monocular rotation calculation step in the combination formed by the basic left-eye feature set and the basic left-eye matching set, and the combination formed by the basic right-eye feature set and the basic right-eye matching set, which is specifically set by the system.

In step S02, the method includes:

referring to fig. 3, S021, a monocular point pair set is determined based on the basic monocular feature set and the basic monocular matching set.

In the basic monocular feature set and the basic monocular matching set, any one basic monocular tracking point and the basic monocular matching point matched with the basic monocular tracking point can form a matching feature point pair, and the matching feature point pairs form a monocular point pair set.

S022, performing outlier elimination on the monocular point pair set by utilizing a random sampling consistency algorithm.

The monocular point pair set may include matching feature point pairs with matching errors, the matching feature point pairs with matching errors may cause a large calculation error in subsequent pose calculation, and the outlier rejection refers to rejecting the matching feature point pairs serving as outliers from the monocular point pair set, so as to eliminate adverse effects of the matching feature point pairs with matching errors on pose calculation.

In this embodiment, it is preferable that the random sampling consistency algorithm is a method for executing outlier rejection, and the implementation method of the random sampling consistency algorithm is as follows: randomly selecting a minimum solvable matching characteristic point pair from the monocular point pair set, and calculating a hypothetical pose result as a hypothetical pose; then, error calculation is carried out, the assumed pose is substituted into all the matching feature point pairs, and a plurality of calculation errors corresponding to the assumed pose are calculated; and then carrying out error statistics, comparing the obtained calculation errors with preset error thresholds one by one, and when any one calculation error is smaller than the error threshold, regarding the matching feature point pair corresponding to the calculation error as supporting the assumed pose, so that one assumed pose can obtain a plurality of supported matching feature point pairs.

And repeating the process of randomly selecting a minimum solvable matching feature point pair and calculating a postulated pose result as a postulated pose for n times to obtain n postulated poses, wherein the postulated pose supported by the most obtained matching feature point pair is the closest actual postulated pose, and all matching feature point pairs which do not support the postulated pose become outliers and are removed from the monocular point pair set.

S023, determining a monocular movement simulation set.

The monocular motion simulation set comprises a plurality of monocular motion simulation parameters with different numerical values, and each monocular motion simulation parameter is used for simulating different amounts of exercise. In this embodiment, each monocular movement simulation parameter is divided into a monocular rotation simulation parameter and a monocular translation simulation parameter, wherein the monocular rotation simulation parameter is used for simulating the rotation amount of the monocular camera, and the monocular translation simulation parameter is used for simulating the translation amount of the monocular camera.

S024, performing monocular projection error estimation on each matched feature point pair based on a monocular motion simulation parameter, and constructing a monocular pose error model.

Different monocular motion simulation parameters simulate different motion amounts of the monocular camera, and the monocular camera can be deduced to perform reverse motion through the different motion amounts. Theoretically, if the amount of motion simulated by the monocular motion simulation parameter is completely consistent with the actual amount of motion of the monocular camera, after the monocular camera performs reverse motion, the position where the monocular camera acquires the monocular image should be consistent with the position where the monocular camera acquires the map image, and therefore, the visual feature of the monocular image can be overlapped with the visual feature of the map image.

Monocular projection error estimation refers to the error between the position of the basic monocular tracking point after position transformation is carried out on the basic monocular tracking point based on the reverse motion derivation of the monocular camera in the matching feature point pair and the position of the basic monocular matching point; if the motion amount simulated by the monocular motion simulation parameters is closer to the actual motion amount of the monocular camera, the monocular projection error is smaller as a whole, otherwise, the monocular projection error is maximum as a whole, and the monocular pose error model is used for reflecting the whole size of the monocular projection error.

In the specific process of constructing the monocular pose error model, firstly, the matched feature point pair { u, v, u^,,v^,Coordinate conversion is carried out through a formula (6), and normalized coordinates { x, y, x ] of the matched feature point pairs are obtained^,,y^,}，

（6）

Wherein, c is_u、c_vThe pixel position corresponding to the optical center, f is the relative focal length (dimensionless);

then, based on the normalized coordinates { x, y, x of the matching feature point pairs^,,y^,The monocular pose error model about the monocular projection error can be constructed through the formula (7),

（7）

wherein [ t ] is]_XIn the form of a cross product matrix of the amount of translation, R is a rotation matrix representing the amount of rotation.

Specifically, f (t, R) can represent the size of the monocular projection error, not the monocular projection error itself, and when the monocular motion simulation parameters correctly simulate the actual motion of the monocular camera, f (t, R) = 0.

And S025, performing model optimization based on the monocular pose error model, and determining a monocular final error model.

At least 6 matching characteristic point pairs are substituted into the monocular pose error model, and the rotation amount and the translation amount in the monocular pose error model can be solved. Different monocular projection errors can be obtained by continuously substituting the matched characteristic point pairs and continuously selecting different monocular rotation simulation parameters and monocular translation simulation parameters. Considering that the pixel coordinates of the matched feature point pairs possibly have errors, a large number of matched feature point pairs need to be put into the monocular pose error model for calculation, so that the calculation accuracy is improved. At this time, the monocular pose error model becomes an optimization problem of a quadratic error function, so that model optimization needs to be performed on the monocular pose error model.

In this embodiment, the specific method for model optimization is as follows: and (4) carrying out iterative optimization on the monocular pose error model by adopting a Gauss-Newton iteration method, and taking the function model after the iteration is finished as a monocular final error model.

And S026, determining initial rotation information based on monocular rotation simulation parameters corresponding to the monocular final error model.

The monocular final error model represents a function model when the monocular projection error is small as a whole, and at the moment, the rotation amount corresponding to the monocular final error model is close to the actual rotation amount of the monocular camera, so that the initial rotation information can be determined according to the rotation amount simulated by the monocular rotation simulation parameters.

The initial rotation information can reflect the rotation amount of the monocular camera, and the initial rotation information can reflect the rotation amount of the whole binocular camera because the monocular cameras constituting the binocular camera move synchronously.

And S03, performing mixed feature matching, and determining a mixed feature set and a mixed matching set based on the at least two monocular images and the map images corresponding to the monocular images.

The mixed matching set comprises a plurality of mixed matching point pairs, each mixed matching point pair is extracted from an overlapping area in each map image, namely, the physical characteristics represented by the mixed matching point pairs are shot by two monocular cameras (namely binocular cameras) at the same time.

The mixed feature set comprises a plurality of mixed tracking points, and each mixed tracking point has a matching relation with each mixed matching point pair. Specifically, the mixed feature set includes a mixed monocular feature subset and a mixed monocular feature subset, where the mixed monocular feature subset includes mixed monocular tracking points extracted from a single monocular image, and the mixed monocular feature subset includes mixed monocular tracking point pairs extracted from an overlapping region (i.e., a common-view region of an actual scene) of a plurality of monocular images.

Specifically, taking a binocular camera as an example, the pixel coordinate of the mixed matching point pair is { u }₀ ^,,v₀ ^,,u₁ ^,,v₁ ^,And f, wherein the pixel coordinate of the mixed matching point pair projected on the map image on the left side is (u)₀ ^,,v₀ ^,) The pixel coordinate of the map image projected on the right side of the mixed matching point pair is (u)₁ ^,,v₁ ^,）；

The pixel coordinate of the mixed multi-eye tracking point pair is { u }₀,v₀,u₁,v₁Wherein the pixel coordinate of the mixed multi-view tracking point pair projected on the left monocular image is (u)₀,v₀) Recording as a mixed left eye tracking point; the pixel coordinate of the mixed multi-view tracking point pair projected on the right monocular image is (u)₁,v₁) And is marked as a mixed right eye tracking point.

Referring to fig. 5, the blended monocular feature subset may be divided into a blended left eye feature subset and a blended right eye feature subset, where the blended left eye feature subset contains all blended left eye tracking points (u) extracted from the left eye monocular image₀,v₀) The mixed left eye feature subset comprises pixels which do not exist in the mixed multi-eye feature subset; the blended right feature subset contains all blended right tracking points (u) extracted from the right monocular image₁,v₁) The mixed right eye feature subset also includes pixels that do not exist in the mixed multi-eye feature subset.

It can be seen that, in this embodiment, the pixel point pairs that can be calculated with the mixed matching point pair may not be complete, and when the features captured by both the binocular cameras can be matched with the mixed matching point pair, then { u } may be used₀ ^,,v₀ ^,,u₁ ^,,v₁ ^,}; when only features captured by the left eye camera can be matched to a pair of hybrid matching points, then a hybrid left eye tracking point (u) can be used₀,v₀) (ii) a When only features captured by the right-eye camera can be matched to the hybrid matching point pair, then the hybrid right-eye tracking point (u) can be used₁,v₁）。

In the present embodiment, in order to reduce the matching error rate between the mixed matching point pair and the mixed tracking point, the mixed matching point pair should be extracted from the intersection between the base left matching set and the base right matching set; the mixed multi-eye tracking point pair is extracted from the intersection between the basic left eye feature set and the basic right eye feature set; the mixed left eye tracking point is extracted from the basic left eye feature set; the mixed right eye tracking point is extracted from the basic right eye feature set; so that the mixed matching point pairs and each mixed tracking point are filtered through the steps S0121 to S0126, and the pixel points with wrong matching are eliminated.

Further, the mixed matching set may include a mixed matching point pair with a matching error, and the part of the erroneous mixed matching point pair is an outlier, which may cause a large calculation error in the subsequent pose calculation. In order to avoid adverse effects caused by the part of outliers, the mixed monocular feature subsets and the mixed monocular feature subsets need to remove the pixel points with wrong matching through a random sampling consistency algorithm, and the corresponding mixed matching point pairs also need to be removed from the mixed matching set.

Referring to fig. 3, S04, multi-view pose calculation, final pose information is determined based on the initial rotation information, the blended feature set, and the blended match set.

And the final pose information is used for reflecting the rotation amount and the translation amount of the multi-view camera. The mixed feature set and the mixed matching set can estimate the rotation amount and the translation amount of the multi-view camera through coordinate position analysis, and determine the pose of the multi-view camera. The initial rotation information in the monocular pose calculation is used as the initial value of the rotation amount in the monocular pose calculation for calculation, and the calculation efficiency of the monocular pose calculation can be improved.

Because the mixed feature set not only comprises the mixed multi-eye tracking point pairs, but also comprises the mixed left eye tracking points and the mixed right eye tracking points, in addition to the shared area of the left eye monocular image and the right eye monocular image, part of visual features in the left eye monocular image or the right eye monocular image can also participate in calculation, the problem of visual field reduction of the binocular camera in multi-eye pose calculation is solved, and the pose calculation stability is improved.

In step S04, the method includes:

referring to fig. 4, S041, a multi-view motion simulation set is determined.

The multi-view motion simulation set comprises a plurality of multi-view translation simulation parameters and a plurality of multi-view rotation simulation parameters, wherein the multi-view translation simulation parameters are used for simulating different translation amounts, and the multi-view rotation simulation parameters are used for simulating different rotation amounts.

And S042, performing coordinate transfer on the mixed matching set based on the initial rotation information and the multi-view motion simulation set, and determining a reverse-thrust matching set.

Different multi-view motion simulation parameters simulate different motion amounts of the multi-view camera, and the multi-view camera can be deduced to move reversely through the different motion amounts. And each mixed matching point pair carries out position transfer based on the reverse motion of the multi-view camera, so that each reverse-thrust matching point pair can be obtained, and each reverse-thrust matching point pair forms a reverse-thrust matching set.

In this embodiment, since there is already initial rotation information as the initial value of the rotation amount, in step S042, the multi-view translation simulation parameters are used to simulate different translation amounts, and the derivation of the reverse motion is performed for the multi-view camera.

Specifically, the pixel coordinates { u ] of the mixed matching point pair are expressed by formula (8)₀ ^,,v₀ ^,,u₁ ^,,v₁ ^,Coordinate conversion is carried out, the actual coordinates (X, Y, Z) of the mixed matching point pairs can be obtained,

（8）

wherein c is_u、c_vThe pixel coordinates corresponding to the optical centers are respectively, f is a relative focal length (dimensionless), and B is the base length of the binocular camera;

the actual coordinates (X, Y, Z), the initial rotation information and the multi-view translation simulation parameters are expressed by a formula (9), so that the actual coordinates (X) of the reverse-deducing matching point pairs can be obtained^，,Y^，,Z^，）；

（9）

Wherein t is translation amount, and R is a rotation matrix obtained by converting initial rotation information;

but by extrapolating the actual coordinates (X) of the matched pairs back^，,Y^，,Z^，) Substituting into formula (10), the pixel coordinates { u ] of the corresponding point pair can be obtained₀ ^,,,v₀ ^,,,u₁ ^,,,v₁ ^,,}；

（10）

Wherein (u)₀ ^,,,v₀ ^,,) The projection coordinates of the matching point pairs in the left eye monocular image are reversely deduced and are recorded as left eye monocular restoring points (u)₀ ^,,,v₀ ^,,)；(u₁ ^,,,v₁ ^,,) The projection coordinates of the matching point pairs in the right monocular image are reversely deduced and are marked as the right monocular image deduction point (u)₁ ^,,,v₁ ^,,) (ii) a And B is the base length of the binocular camera.

And S043, respectively carrying out multi-view projection error estimation on the mixed monocular feature subset and the mixed multi-view feature subset and a reverse-thrust matching set, and constructing a multi-view pose error model.

The multi-view projection error estimation refers to a position error between the mixed monocular tracking point and the backstepping matching point pair or a position error between the mixed monocular tracking point pair and the backstepping matching point pair. If the motion amount simulated by the multi-view motion simulation parameter is closer to the actual motion amount of the multi-view camera, the integral multi-view projection error is smaller, otherwise, the integral multi-view projection error is maximum, and the multi-view pose error model is used for reflecting the integral size of the multi-view projection error.

In this embodiment, the specific calculation method of the multi-view projection error is as follows: and accumulating the square of the difference between each left eye single-pushing point and the corresponding mixed left eye tracking point, the square of the difference between each right eye single-pushing point and the corresponding mixed right eye tracking point and the square of the difference between each reverse-pushing matching point pair and the corresponding mixed multi-eye tracking point pair to construct a multi-eye pose error model.

In step S043, the method includes:

and S0431, respectively carrying out multi-view projection error estimation on the mixed monocular feature subset and the mixed multi-view feature subset and a reverse-thrust matching set, and constructing a basic error model.

Based on each left eye single-projection point, each right eye single-projection point, each mixed multi-eye tracking point pair, each left eye single-projection point, each right eye single-projection point and each reverse-projection matching point pair, a basic error model reflecting the overall size of the multi-eye projection error can be constructed.

And S0432, optimizing the basic error model based on the robust kernel function, and determining the multi-objective pose error model.

Different multi-view projection errors can be obtained by continuously substituting each pixel point in the function model and continuously selecting different multi-view translation simulation parameters. In the process of analysis and calculation, a large number of pixel points need to be placed in a basic error model for calculation, so that the basic error model becomes an optimization problem of a quadratic error function. However, if the pixel points with matching errors are substituted into the function model, the calculation result will be seriously affected.

In order to reduce adverse effects caused by matching wrong pixel points, in this embodiment, a robust kernel function is used to perform model optimization on a basic error model, and an error function in a quadratic optimization problem is changed from a quadratic function to the robust kernel function, so as to obtain a multi-objective pose error model.

A specific method of modeling using a robust kernel function is shown in equation (11),

（11）

wherein, the error e refers to the difference between the matched pixels, such as the difference between the left eye single-point pushing and the corresponding mixed left eye tracking point, the difference between the right eye single-point pushing and the corresponding mixed right eye tracking point, and the difference between the reverse-pushing matching point pair and the corresponding mixed multi-eye tracking point pair; the threshold δ is an error threshold preset by the system. When the error e is larger than the threshold value delta, matching errors may exist in pixel points corresponding to the error e, and in the multi-view pose error model, the linearity presented by the error e is slowed down, so that the influence of the matching errors on multi-view pose calculation is weakened, and the calculation stability is higher.

And S044, performing model optimization based on the multi-view pose error model, and determining a multi-view final error model.

The multi-view final error model refers to a function model obtained after iterative optimization is carried out on the multi-view pose error model, and the multi-view projection error is integrally smaller in the multi-view final error model, so that the translation amount and the rotation amount corresponding to the multi-view final error model are closer to the actual translation amount and rotation amount of the multi-view camera.

In step S044, the method includes:

and S0441, performing iterative optimization by using a Gauss-Newton iteration method based on the translation amount corresponding to the multi-eye pose error model, and determining a first optimization error model.

Because the rotation amount in the multi-view pose error model has a relatively accurate initial amount, a first optimization error model can be obtained by using a Gaussian Newton iteration method based on the translation amount iterative optimization in the multi-view pose error model, and the translation amount corresponding to the first optimization error model is closer to the actual translation amount of the multi-view camera.

And S0442, performing iterative optimization by using a Gauss-Newton iteration method based on the corresponding rotation amount of the multi-eye pose error model, and determining a multi-eye final error model.

Because the translation amount in the multi-view pose error model has a relatively accurate initial amount, the rotation amount in the multi-view pose error model can be iteratively optimized by using a Gaussian Newton iteration method, and equivalently, the rotation amount is finely adjusted, so that a more accurate rotation amount is obtained, and a multi-view final error model is further determined.

And S045, determining final pose information based on the rotation amount and the translation amount corresponding to the multi-view final error model.

And performing joint optimization on the rotation amount and the translation amount in the multi-view final error model through multiple iterations of a Levenberg-Marquardt iteration method of Gauss Newton and a variant thereof to obtain the rotation amount and the translation amount capable of reflecting the actual motion of the multi-view camera, thereby determining the final pose information. Finally, the pose information can reflect the relative rotation and the relative translation of the multi-view camera, so that the pose of the multi-view camera can be realized.

The implementation principle of the first embodiment of the application is as follows: and performing pose estimation by using the basic monocular feature set and the basic monocular matching set, and quickly and accurately estimating the rotation amount of one monocular camera of the multi-view cameras, namely initial rotation information. In the subsequent calculation process of multi-objective pose positioning, the initial rotation information can be used as the initial value of the rotation amount, a function model of an error function is constructed by combining a mixed feature set and a mixed matching set, the translation amount in the function model is optimized firstly, and then the rotation amount in the function model is optimized continuously, and the translation amount and the rotation amount are sequentially optimized in a combined mode, so that the calculation efficiency and the numerical stability can be greatly improved, and the pose positioning can be completed more quickly and stably.

In the calculation process of the monocular pose, the translation amount and the rotation amount of the monocular camera can be simulated, the monocular projection error is calculated, and a function model of an error function is constructed, but in the solving process, the establishment of an equation is not influenced by the proportional scaling of the translation amount of the function model, so the solved translation amount only has directivity but does not have a determined scale, the scale of the translation amount can be calculated by using the calibrated binocular camera, in the embodiment, the rotation amount of the monocular camera is directly used as the basic initial amount of the subsequent multi-view pose calculation, the effect of the translation amount of the monocular camera in the subsequent multi-view pose calculation is ignored, and the multi-view calculation is more stable.

Example two:

the difference between the present embodiment and the first embodiment is that, in step S02 of the pose calculation method based on monocular and monocular hybrid localization, the monocular movement simulation set includes two cases, namely, the case where the monocular translation simulation parameter is not equal to 0 and the case where the monocular translation simulation parameter is equal to 0.

Referring to fig. 6, step S02 of the present embodiment includes:

and S021, determining a monocular point pair set based on the basic monocular feature set and the basic monocular matching set.

S023, determining a monocular movement simulation set.

In step S024, the method includes:

s0241, performing monocular projection error estimation on each matched characteristic point pair based on a monocular rotational motion simulation parameter and a monocular translation simulation parameter, and constructing a first monocular pose error model.

And the translation amount simulated by the monocular translation simulation parameter is not equal to 0. First, the matching feature point pair { u, v, u is needed^,,v^,Coordinate conversion is carried out through a formula (6), and normalized coordinates { x, y, x ] of the matched feature point pairs are obtained^,,y^,}，

（6）

then, based on the normalized coordinates { x, y, x of the matching feature point pairs^,,y^,A first monocular pose error model for the monocular projection error can be constructed by equation (7),

（7）

wherein [ t ] is]_XIn the form of a cross product matrix of the amount of translation, R is a rotation matrix representing the amount of rotation。

S0242, performing monocular projection error estimation on each matched feature point pair based on the monocular rotational motion simulation parameters, and constructing a second monocular pose error model.

Since the monocular translation simulation parameter does not participate in the calculation, the translation amount simulated by the monocular translation simulation parameter is equal to 0. First, the matching feature point pair { u, v, u is needed^,,v^,Coordinate conversion is carried out through a formula (6), and normalized coordinates { x, y, x ] of the matched feature point pairs are obtained^,,y^,}，

（6）

then, based on the normalized coordinates { x, y, x of the matching feature point pairs^,,y^,And constructing a second monocular pose error model about the monocular projection error through homogeneous transformation.

In step S025, the method includes:

s0251, performing model optimization based on the first monocular pose error model, and determining a first monocular final error model.

And continuously substituting a large number of matching characteristic point pairs into the first monocular pose error model, continuously selecting different monocular rotation simulation parameters and monocular translation simulation parameters, and performing iterative optimization on the first monocular pose error model by a Gauss-Newton iteration method to obtain a first monocular final error model.

S0252, performing model optimization based on the second monocular pose error model, and determining a second monocular final error model.

And continuously substituting a large number of matching characteristic point pairs into the second monocular pose error model, continuously selecting different monocular rotation simulation parameters, and performing iterative optimization on the second monocular pose error model by a Gauss-Newton iteration method to obtain a second monocular final error model.

In step S026, the method includes:

s0261, determining a first projection error based on a monocular rotation simulation parameter and a monocular translation simulation parameter corresponding to the first monocular final error model.

The first projection error is a sum of squares of differences between each base monocular tracking point and each base monocular matching point after each base monocular tracking point performs coordinate position transfer based on the rotation amount and the translation amount simulated in the first monocular final error model.

And S0262, determining a second projection error based on the monocular rotation simulation parameters corresponding to the second monocular final error model.

The second projection error is a sum of squares of differences between each base monocular tracking point and each base monocular matching point after each base monocular tracking point performs coordinate position transfer based on the rotation amount simulated in the second monocular final error model.

And S0263, determining initial rotation information based on the first projection error and the second projection error.

Comparing the first projection error with the second projection error, and if the first projection error is smaller than or equal to the second projection error, determining initial rotation information based on a rotation amount corresponding to the first monocular final error model; if the first projection error is larger than the second projection error, determining initial rotation information based on a rotation amount corresponding to the second monocular final error model;

the implementation principle of the second embodiment of the present application is as follows: in a function equation for constructing the first monocular pose error model, when the translation amount is 0, the solution is difficult, so that the situations that the monocular translation simulation parameter is not equal to 0 and the monocular translation simulation parameter is equal to 0 need to be distinguished for calculation, and the minimum monocular projection error under the two situations is respectively solved, namely the first projection error and the second projection error are obtained. By comparing the first projection error with the second projection error, the smaller value of the first projection error and the second projection error can be determined to be closer to the actual projection error, and therefore the initial rotation information can be determined through the rotation amount corresponding to the smaller value. The method for distinguishing whether the translation amount is 0 or not and separately calculating can reflect the actual pose more accurately.

Example three:

referring to fig. 8, in an embodiment, a pose calculation system based on monocular and monocular hybrid positioning is provided, which corresponds to the pose calculation method based on monocular and monocular hybrid positioning in the first embodiment one to one, and includes a monocular feature matching module 1, a monocular rotation calculation module 2, a hybrid feature matching module 3, and a monocular pose calculation module 4. The functional modules are explained in detail as follows:

the monocular feature matching module 1 is used for determining a basic monocular feature set and a basic monocular matching set based on the monocular image and the map image corresponding to the monocular image, and sending monocular feature matching information to the monocular rotation calculating module 2; the monocular image is obtained based on one of the eyes of the multi-eye camera, and the basic monocular feature set comprises a plurality of basic monocular tracking points extracted from the same monocular image; the basic monocular matching set comprises basic monocular matching points which are matched with the basic monocular tracking points in the map image;

the monocular rotation calculating module 2 is used for determining initial rotation information based on the basic monocular feature set and the basic monocular matching set and sending monocular rotation calculating information to the mixed feature matching module 3; wherein the initial rotation information is used to reflect the amount of rotation;

the mixed feature matching module 3 is used for determining a mixed feature set and a mixed matching set based on at least two monocular images and the map images corresponding to the monocular images, and sending mixed feature matching information to the multi-view pose calculation module 4; each monocular image is acquired based on a multi-view camera, and the mixed feature set comprises a plurality of mixed tracking points extracted from one or more monocular images; the mixed matching set comprises mixed matching point pairs matched with the mixed tracking points in the map images;

the multi-view pose calculation module 4 is used for determining final pose information based on the initial rotation information, the mixed feature set and the mixed matching set; and the final pose information is used for reflecting the rotation amount and the translation amount of the multi-view camera.

Example four:

referring to fig. 9, in one embodiment, an intelligent terminal is provided and includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the memory stores training data, algorithm formulas, filtering mechanisms, and the like in a training model. The processor is used for providing calculation and control capability, and the processor realizes the following steps when executing the computer program:

and S01, performing monocular feature matching, and determining a basic monocular feature set and a basic monocular matching set based on the monocular image and the map image corresponding to the monocular image.

In step S01, the method includes:

and S011, extracting monocular characteristics, and determining an initial monocular characteristic set and an initial monocular matching set based on the monocular image and the map image corresponding to the monocular image.

In step S012, the method includes:

S0123, determining a reverse-deducing matching point pair in the map image based on the matching distance threshold and the virtual feature point.

S0124, based on the reverse-deducing matching point pair, searching and rejecting the error matching point in the initial monocular matching set, and searching and rejecting the characteristic error guide point in the initial monocular characteristic set.

In step S02, the method includes:

S023, determining a monocular movement simulation set.

S024, performing monocular projection error estimation on each matched characteristic point pair based on the monocular rotational motion simulation parameters and the monocular translation simulation parameters, and constructing a monocular pose error model.

And S04, calculating the multi-view pose, and determining the final pose information based on the initial rotation information, the mixed feature set and the mixed matching set.

In step S04, the method includes:

and S041, determining a multi-view motion simulation set.

In step S043, the method includes:

In step S044, the method includes:

Example five:

in one embodiment, a computer-readable storage medium is provided, which stores a computer program that can be loaded by a processor and executes the above pose calculation method based on monocular-multi-ocular hybrid localization, and when executed by the processor, the computer program implements the following steps:

In step S01, the method includes:

In step S012, the method includes:

In step S02, the method includes:

S023, determining a monocular movement simulation set.

In step S04, the method includes:

and S041, determining a multi-view motion simulation set.

In step S043, the method includes:

In step S044, the method includes:

The computer-readable storage medium includes, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments are preferred embodiments of the present application, and the scope of the present application is not limited by the embodiments, so: all equivalent variations made according to the methods and principles of the present application should be covered by the protection scope of the present application.

Claims

1. A pose calculation method based on monocular and monocular hybrid positioning is characterized by comprising the following steps:

2. The pose calculation method based on monocular and monocular hybrid localization as claimed in claim 1, wherein in the specific method of the monocular rotation calculation step, the pose calculation method comprises:

3. The pose calculation method based on monocular and monocular hybrid localization as claimed in claim 2, wherein: the monocular movement simulation parameters comprise monocular rotation simulation parameters and monocular translation simulation parameters, wherein the monocular rotation simulation parameters are used for simulating the rotation amount of the monocular camera, and the monocular translation simulation parameters are used for simulating the translation amount of the monocular camera.

4. The pose calculation method based on monocular and monocular hybrid localization as claimed in claim 1, wherein: the mixed feature set comprises a mixed monocular feature subset and a mixed monocular feature subset;

5. The pose calculation method based on monocular and monocular hybrid positioning according to claim 4, wherein in the specific method of multi-view pose calculation, the method comprises:

performing coordinate transfer on the mixed matching set based on the initial rotation information and the multi-view motion simulation set, and determining a reverse-thrust matching set; the backstepping matching set comprises a plurality of backstepping virtual points which are in one-to-one correspondence with the mixed matching point pairs, and the backstepping virtual points can reflect points which are projected on the monocular image after the mixed matching point pairs are subjected to position adjustment based on rotation amount and translation amount;

6. The pose calculation method based on monocular and monocular hybrid positioning according to claim 5, wherein the specific method for performing model optimization based on the monocular and monocular pose error model and determining the monocular final error model comprises:

7. The pose calculation method based on monocular and monocular hybrid positioning according to claim 5, wherein the specific method for constructing the monocular pose error model based on the monocular hybrid feature subset and the back-projection matching set for performing the monocular projection error estimation comprises the following steps:

8. The pose calculation method based on monocular and monocular hybrid localization as claimed in claim 1, wherein in the specific method of monocular feature matching, the method comprises:

and/or the presence of a gas in the gas,

9. Position appearance computational system based on monocular multiocular hybrid localization, its characterized in that includes:

the monocular feature matching module (1) is used for determining a basic monocular feature set and a basic monocular matching set based on the monocular image and a map image corresponding to the monocular image; the monocular image is obtained based on one of the eyes of the multi-eye camera, and the basic monocular feature set comprises a plurality of basic monocular tracking points extracted from the same monocular image; the base monocular matching set includes base monocular matching points that match each of the base monocular tracking points in the map image;

the monocular rotation calculating module (2) is used for determining initial rotation information based on the basic monocular feature set and the basic monocular matching set; wherein the initial rotation information is used to reflect an amount of rotation;

a mixed feature matching module (3) for determining a mixed feature set and a mixed matching set based on at least two monocular images and a map image corresponding to each monocular image; each monocular image is acquired based on the multi-view camera, and the mixed feature set comprises a plurality of mixed tracking points extracted from one or more monocular images; the set of blended matches includes pairs of blended match points that match each of the blended tracking points in each of the map images;

a multi-view pose calculation module (4) for determining final pose information based on the initial rotation information, the mixed feature set and the mixed matching set; and the final pose information is used for reflecting the rotation amount and the translation amount of the multi-view camera.

10. Intelligent terminal, characterized in that it comprises a memory and a processor, said memory having stored thereon a computer program that can be loaded by the processor and that executes the method according to any one of claims 1 to 8.