CN115451996A

CN115451996A - Homography vision mileometer method for indoor environment

Info

Publication number: CN115451996A
Application number: CN202211044239.XA
Authority: CN
Inventors: 田联房; 刘海林; 杜启亮
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-12-09
Anticipated expiration: 2042-08-30
Also published as: CN115451996B

Abstract

The invention discloses a homography visual odometer method facing indoor environment, which comprises the following steps: 1) Extracting ORB feature points of the current frame and performing feature matching with the reference frame; 2) Finding a coplanar map point combination by using a weighted random sampling strategy; 3) Calculating a homography transformation matrix of the positions of the current frame and the reference frame according to the plane with the most map points, and then obtaining a rotation position matrix and a translation position vector by resolving; 4) Optimizing a rotation pose matrix and a translation pose vector by minimizing a weighted re-projection error function; 5) And constructing a local map, and optimizing a rotation pose matrix, a translation pose vector and the local map by a local light beam adjustment method. The accuracy and robustness of the visual odometry method in the indoor environment are improved by utilizing the coplanar constraint relation of map points.

Description

Homography vision odometer method for indoor environment

Technical Field

The invention relates to the technical field of vision simultaneous positioning and drawing, in particular to a homography vision odometer method facing indoor environment.

Background

The vision simultaneous localization and mapping technology is widely applied to the fields of unmanned autonomous systems such as mobile robots, unmanned vehicles, intelligent unmanned planes and AR/VR systems. The visual odometer is a core technology for simultaneously positioning and mapping vision, and has the main function of finishing real-time self pose estimation in the working process of an unmanned autonomous system. The conventional visual odometry method mainly comprises a PnP algorithm based on 2D-3D matching and an ICP algorithm based on 3D-3D matching, wherein the PnP algorithm can be used in monocular, binocular and RGB-D cameras due to less required information, and the ICP algorithm needs to acquire a three-dimensional point cloud of a current frame and is usually used in the RGB-D cameras. The mainstream visual odometry methods take less account of the spatial constraint relationship between the raw data. The method can improve the precision of the existing visual odometry method by exploring the space geometric relation of the original data and using the space geometric relation as the prior constraint information of the system, thereby finally improving the robustness and efficiency of the visual simultaneous positioning and mapping technology in the work of the unmanned autonomous system.

In addition, the artificial indoor environment is a common working place of an unmanned autonomous system, and the indoor environment often contains a large amount of plane information, so that the accuracy and robustness of the indoor environment visual odometry method are improved by fully utilizing the coplanar constraint relation of the characteristic points.

In combination with the above discussion, the invention provides the homography vision odometry method for the indoor environment, and the method has a high practical application value.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a homography visual odometer method facing an indoor environment, and the accuracy of the visual odometer is improved by comprehensively considering the coplanarity constraint prior relation and the observed times of map points.

In order to realize the purpose, the technical scheme provided by the invention is as follows: a homography vision odometer method facing indoor environment comprises the following steps:

1) Reading data, including a reference frame, a current frame and a local map, extracting ORB feature points from the current frame, filtering the extracted feature points by a quadtree method, and finally completing feature matching of the current frame and the reference frame;

2) Finding a coplanar map point combination by using a weighted random sampling strategy;

3) Calculating the pose of the current frame and the reference frame according to the plane with the most map pointsInverse transformation matrix H _m Then, obtaining a rotation pose matrix R and a translation pose vector t by solution;

4) Optimizing a rotation pose matrix R and a translation pose vector t by minimizing a weighted reprojection error function;

5) And constructing a local map, and optimizing the rotation pose matrix R, the translation pose vector t and the local map by a local beam adjustment method.

Further, in step 1), the current frame refers to a current image of a pose to be estimated read from an image sequence, and the reference frame refers to a previous frame of image read from the image sequence, and step 1) includes the following steps:

1.1 Adopting a scaling coefficient b to continuously down-sample an original image of a current frame to obtain a d-layer image pyramid, and then extracting a corresponding number of FAST angular points according to the resolution of each layer of image, wherein the total number of extracted feature points is N;

1.2 Filtering the feature points extracted in the step 1.1) by a quadtree method, and then calculating BRIEF descriptors of all the remaining feature points;

1.3 For each feature point in the current frame, searching the feature point with the minimum Hamming distance with its descriptor in the neighboring area of the same position in the reference frame as the best match, and searching the next best match if the Hamming distance is the second smallest, and if the Hamming distance of the best match is greater than T _b Or the difference between the best matching distance and the next best matching distance is less than T _d If not, the best match is retained as the matched pair of feature match, wherein T _b 、T _d Is a set threshold; all matching pairs are denoted as q _cr ＝{(q _c1 ,q _r1 ),(q _c2 ,q _r2 ),...,(q _ck ,q _rk ),...,(q _cn ,q _rn ) Where k =1,2.., n, q _ck 、q _rk The matching pairs of the initialized map points represent the feature points of the current frame and the corresponding feature points of the reference frame as

Wherein i =1,2,. Cndot., m, q _ci 、q _ri The current frame feature points and the reference frame corresponding feature points of the initialized map points are represented, and the corresponding map points are

Wherein i =1,2,. Multidot.m, P _i Representing a map point corresponding to a number of observations of the map point of

Wherein i =1,2,. Cndot., m, N _i Representing the number of times the corresponding map point is observed, and the matching pair of uninitialized map points is

Wherein l =1,2, n-m, q _c(m+l) 、q _r(m+l) The current frame feature points and the reference frame corresponding feature points representing uninitialized map points.

Further, in step 2), searching a coplanar map point combination through a weighted random sampling strategy, comprising the following steps:

2.1 According to the observation times, giving the ith map point the following weight:

in the formula, W _i Representing map points P _i B is the scaling factor of the image pyramid, s _u Is the u-th observed map point P _i Number of pyramid layers where feature points are located, N _i For the number of observations of the ith map point,

is an intermediate variable;

setting map point probability:

in the formula I _i Is the probability of the ith map point, W is the intermediate variable;

2.2 ) randomly selecting 3 map points P according to the result of step 2.1) _v1 、P _v2 、P _v3 If the three points are collinear, discarding and re-selecting, wherein the map point probability l _i As map point P _i Probability of being selected;

2.3 ) randomly selecting a map point P _v4 If P is selected _v1 、P _v2 Or P _v3 Then abandon and re-select, wherein the map point probability l _i As map point P _i Probability of being selected, then calculating P _v4 To by P _v1 、P _v2 、P _v3 Form a plane

Is a distance h ₄ If h is ₄ Less than T _h Then calculate the map point P _v1 To from P _v2 、P _v3 、P _v4 Form a plane

Is a distance h ₁ ，P ₂ ' to from P _v1 、P _v3 、P _v4 Form a plane

Is a distance h ₂ And P _v3 To by P _v1 、P _v2 、P _v4 Form a plane

Is a distance h ₃ If h is ₁ 、h ₂ 、h ₃ Are all less than T _h Then the map point combination is passed throughPre-screening the dough and putting it into the set S _p In, T _h Is a set threshold; repeating the step until a specified number of coplanar map point combinations are pre-screened out or all map points are traversed, and discarding and re-selecting if repeated map point combinations appear in the process;

2.4 Repeating the steps 2.2) to 2.3) until the pre-screened coplanar map point combinations meet the requirement of the maximum number, or traversing all map point combinations, and if the same map point combinations appear in the process, abandoning the cycle and re-selecting;

2.5 ) further examining the co-planar map point combinations pre-screened in step 2.4) using space vector relations for S _p One coplanar map point combination P in _v1 、P _v2 、P _v3 、P _v4 Separately solving the linear equation P _v1 ＝x ₁ P _v2 +y ₁ P _v3 +z ₁ P _v4 ，P _v2 ＝x ₂ P _v1 +y ₂ P _v3 +z ₂ P _v4 ，P _v3 ＝x ₃ P _v1 +y ₃ P _v2 +z ₃ P _v4 ，P _v4 ＝x ₄ P _v1 +y ₄ P _v2 +z ₄ P _v3 Then calculating the error e of each item ₁ ＝|1-x ₁ -y ₁ -z ₁ |，e ₂ ＝|1-x ₂ -y ₂ -z ₂ |，e ₃ ＝|1-x ₃ -y ₃ -z ₃ |，e ₄ ＝|1-x ₄ -y ₄ -z ₄ If e ₁ 、e ₂ 、e ₃ 、e ₄ Are all less than T _e Put the coplanar combination into the final coplanar map point set S _a Wherein x is _g 、y _g 、z _g G =1,2,3,4 is the variable for solving the system of equations, T _e Is the set threshold.

Further, in step 3), the calculation of the homographic transformation matrix including the most map point plane correspondences includes the following steps:

3.1 ) set a threshold value T _a =0, homography transformation matrix H _m And inner circumference value matchPair set q _c ′ _r ；

3.2 Read S) _a To obtain a set of coplanar points P _v1 、P _v2 、P _v3 、P _v4 And corresponding matching pair (q) _c1 ,q _r1 )、(q _c2 ,q _r2 )、(q _c3 ,q _r3 )、(q _c4 ,q _r4 ) Calculating homography transformation matrix H by four-point method, and checking homography constraint deviation of all other matching pairs

If e _hk ＜T _h Then the matching pair (q) will be matched _ck ,q _rk ) Marked as an inlier value, otherwise marked as an outlier, wherein,

is the homogeneous coordinate, T, of the kth feature point of the reference frame and the current frame _h Is a set threshold, the symbol ^ represents antisymmetric transformation; if the number of inner peripheral values T _c ＞T _a Then let T _a ＝T _c 、H _m = H, then emptying the set q _c ′ _r And putting the matching pair marked as the inner surrounding value into q _c ′ _r ；

3.3 ) repeat step 3.2) until S _a The point combination of the middle and common plane maps is completely read to obtain a homography matrix H _m And a set of inlier value matching pairs q _c ′ _r (ii) a Then decomposing the homographic transformation matrix H by utilizing a decomplexer homographic Mat function in an Opencv library _m And obtaining a rotation pose matrix R and a horizontal displacement pose vector t.

Further, in step 4), the rotation pose matrix R and the translation pose vector t are optimized by minimizing a weighted reprojection error function, and the specific expression is as follows:

the above formula is solved iteratively by a Gauss-Newton method; in the formula, argmin (. Cndot.) represents the optimization of the functionSmall-valued parameter, K is the camera internal parameter matrix, function pi (·) represents the de-homogeneous operation, W _i Is a map point P _i Weighted value of (1), P _i Represents the ith map point, q _ci 、q _ri Feature points of the current frame representing initialized map points and corresponding feature points of the reference frame, N _m Representing the total number of map points.

Further, in step 5), local map construction and local beam adjustment optimization are performed, including the following steps:

5.1 According to the rotation pose matrix R and the translation pose vector t optimized in the step 4), matching pairs of uninitialized map points

Map points initialized by triangularization method

Wherein q is _c(m+1) ,q _c(m+2) ,...,q _cn Representing the feature points of the current frame, q _r(m+1) ,q _r(m+2) ,...,q _rn Representing reference frame feature points, P _m+1 ,P _m+2 ,...,P _n Representing map points;

5.2 Set the weight of the newly initialized map point:

where m + l = m, m +1 _m+l Representing map points P _m+l B is the scaling factor of the image pyramid, s _c(m+l) 、S _r(m+l) Respectively representing the observation map points P of the current frame and the reference frame _(m+l) The number of layers of the pyramid where the feature points are located,

is an intermediate variable;

5.3 By minimizing the weighted reprojection error function, the rotation pose matrix R, the translation pose vector t, and all map points are optimized

The specific expression is as follows:

the above formula is solved iteratively by a Gauss-Newton method; wherein argmin (r) represents a parameter when taking the minimum value of the function, K is a camera internal reference matrix, the function pi (r) represents a homogeneous operation, W _k Is a map point P _k Weighted value of (1), P _k Represents the kth map point, k =1,2 _ck 、q _rk Representing the feature points of the current frame and the corresponding feature points of the reference frame, and n represents the total number of map points.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention fully utilizes abundant plane information in the indoor environment, provides effective prior constraint information for the visual odometer, and improves the precision and robustness of the visual odometer method in the indoor environment.

2. The method comprehensively considers the observed times and variance of the map points, qualitatively estimates the accuracy of the map points, and improves the reliability of the coplanar constraint relation of the searched map points; and by endowing different weight coefficients to the nonlinear optimization error function, the reliability of the camera pose and local map optimization result is improved.

3. The invention has good working performance in indoor environment and is suitable for monocular, binocular and RGB-D cameras.

Drawings

FIG. 1 is a block diagram of a homography visual odometer process in the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

As shown in fig. 1, the embodiment discloses a homography visual odometer method facing indoor environment, which comprises the following steps:

1) Firstly, reading data including a reference frame, a current frame and a local map, then extracting ORB feature points of the current frame, filtering the extracted feature points by a quadtree method, and finally completing feature matching of the current frame and the reference frame, wherein the specific steps are as follows:

1.1 Continuously down-sampling 7 times by adopting a 1.2 scaling factor on an original image of a current frame to construct an 8-layer image pyramid, then extracting a corresponding number of FAST angular points according to the resolution of each layer of image, and totally extracting 1000 feature points;

1.3 For each feature point in the current frame, taking the same position in the reference frame as a center O, searching a feature point with the minimum Hamming distance from a descriptor in a region with the radius r =20 pixels as the best match, taking the Hamming distance with the second smallest Hamming distance as the next best match, if the Hamming distance corresponding to the best match is greater than 30 or the distance difference between the distance of the next best match and the distance of the best match is less than 20, discarding the best match, otherwise, keeping the best match as a matching pair of feature matches; all matching pairs are denoted as q _cr ＝{(q _c1 ,q _r1 ),(q _c2 ,q _r2 ),...,(q _ck ,q _rk ),...,(q _cn ,q _rn ) Where k =1,2.., n, q _ck 、q _rk The matching pair of the initialized map point represents the current frame feature point and the corresponding feature point of the reference frame is

Wherein i =1,2., m, q _ci 、q _ri The feature points of the current frame representing the initialized map points and the corresponding feature points of the reference frame are

Wherein i =1,2,. Cndot., m, N _i Representing the observed times of the corresponding map points, the matching pairs of uninitialized map points are

Wherein l =1,2.,. N-m, q _c(m+l) 、q _r(m+l) The current frame feature points and the reference frame corresponding feature points representing uninitialized map points.

2) Searching a coplanar map point combination by using a weighted random sampling strategy, which comprises the following specific steps:

2.1 According to the observation times, endowing the ith map point with the following weight:

in the formula, W _i Representing map points P _i B is the scaling factor of the image pyramid, s _u Is the u-th observation map point P _i Number of pyramid layers where feature points are located, N _i For the number of observations of the ith map point,

is an intermediate variable;

setting map point probability:

2.3 ) randomly selecting a map point P _v4 If P is selected _v1 、P _v2 Or P _v3 Then abandon and re-select, wherein the map point probability l _i As map point P _i Probability of being selected, then calculating P _v4 To from P _v1 、P _v2 、P _v3 Form a plane

A distance h ₄ If h is ₄ Less than T _h (preferably 0.02 m), the map point P is calculated _v1 To by P _v2 、P _v3 、P _v4 Form a plane

Is a distance h ₁ ，P _v2 To by P _v1 、P _v3 、P _v4 Form a plane

Is a distance h ₂ And P _v3 To from P _v1 、P _v2 、P _v4 Form a plane

Is a distance h ₃ If h is ₁ 、h ₂ 、h ₃ Are all less than T _h Then the map point combination is put into the set S through coplanar pre-screening _p In, T _h Is a set threshold; repeating the steps until a specified number of coplanar map point combinations are pre-screened out or all map points are traversed, wherein the process is carried outIf repeated map point combinations appear, discarding and reselecting;

2.4 Repeating the step 2.3) until 10 groups of coplanar map points are pre-screened out or all map points are traversed, and if repeated map points appear in the process, re-selecting;

2.5 Repeating the steps 2.2) to 2.4) until 50 groups of coplanar map points are pre-screened out or all map point combinations are traversed, if S appears in the process _p If the map points exist, reselecting;

2.6 In turn over S _p Middle coplanar map point combination P _v1 、P _v2 、P _v3 、P _v4 Separately solving the linear equation P _v1 ＝x ₁ P _v2 +y ₁ P _v3 +z ₁ P _v4 ，P _v2 ＝x ₂ P _v1 +y ₂ P _v3 +z ₂ P _v4 ，P _v3 ＝x ₃ P _v1 +y ₃ P _v2 +z ₃ P _v4 ，P _v4 ＝x ₄ P _v1 +y ₄ P _v2 +z ₄ P _v3 If the error of each result satisfies e ₁ ＝|1-x ₁ -y ₁ -z ₁ |＜0.05，e ₂ ＝|1-x ₂ -y ₂ -z ₂ |＜0.05，e ₃ ＝|1-x ₃ -y ₃ -z ₃ |＜0.05，e ₄ ＝|1-x ₄ -y ₄ -z ₄ If the absolute value is less than 0.05, putting the coplanar combination into a final coplanar map point set S _a Wherein x is _g 、y _g 、z _g G =1,2,3,4 is a variable for solving the system of equations.

3) Finding the plane with the most map points, and calculating the homography transformation matrix H of the poses of the current frame and the reference frame _m And finally, calculating a rotation pose matrix R and a translation pose vector t, and specifically comprising the following steps:

3.1 ) set a threshold value T _a =0, homography transformation matrix H _m ＝I ₃ And a set of inlier value matching pairs q _c ′ _r = phi, wherein I ₃ Representing a 3-order identity matrix, phi represents a null set;

3.2 Read)Get set S _a To obtain a set of coplanar map points P _v1 、P _v2 、P _v3 、P _v4 And corresponding matching pair (q) _c1 ,q _r1 )、(q _c2 ,q _r2 )、(q _c3 ,q _r3 )、(q _c4 ,q _r4 ) Calculating homography transformation matrix H by four-point method, and calculating homography constraint deviation of all other matching pairs

If e _hk If < 4, then match pair (q) _ck ,q _rk ) Marked as an inlier value, otherwise marked as an outlier, wherein,

is the homogeneous coordinate of the kth characteristic point of the reference frame and the current frame; if the number of inner peripheral values T _c ＞T _a Then let T _a ＝T _c 、H _m = H, then emptying the set q _c ′ _r Putting the matching pair marked as the inner value at present;

3.3 ) repeating step 3.2) until S _a The point combination of the middle and common plane maps is completely processed to obtain a homography matrix H _m And inner periphery value set q _c ′ _r (ii) a Decomposing a homography transformation matrix H by utilizing a decomplexeHomographyMat function in an Opencv library _m And obtaining a rotation pose matrix R and a translation pose vector t.

4) By minimizing the weighted re-projection error function, the rotation pose matrix R and the translation pose vector t are optimized as follows:

the above formula is solved iteratively by a Gauss-Newton method; where K is the camera internal reference matrix and the function π (-) represents the de-homogeneous operation.

5) Constructing a local map, and optimizing a rotation pose matrix R, a translation pose vector t and the local map by a local beam adjustment method, wherein the method specifically comprises the following steps:

5.1 ) matching pairs of uninitialized map points according to the rotation pose matrix R and the translation pose vector t optimized in the step 4)

Map points initialized by triangularization method

Wherein q is _c(m+1) ,q _c(m+2) ,…,q _cn Representing the feature points of the current frame, q _r(m+1) ,q _r(m+2) ,…,q _rn Representing reference frame feature points, P _m+1 ,P _m+2 ,...,P _n Representing map points;

5.2 Set new initialized map point weight:

is an intermediate variable;

The concrete expression is as follows:

the above formula is solved iteratively by a Gauss-Newton method; in the formula, argmin (·) represents the parameter when taking the minimum value to the function, K is the camera internal parameter matrix, function π (·) represents the homogeneous operation, W _k Is a map point P _k Weighted value of (2), P _k Denotes the kth map point, k =1,2, \ 8230;, n, q _ck 、q _rk Representing the feature points of the current frame and the corresponding feature points of the reference frame, and n represents the total number of map points.

In conclusion, the invention makes full use of the plane constraint relationship in the indoor environment, and improves the precision and robustness of the visual odometry method in the indoor environment. According to the method, the re-projection error function is weighted and optimized according to the observed times and the estimated covariance of the map points, and the reliability of the camera pose and the local map optimization result is improved. The invention is suitable for common visible light cameras and obtains good working performance in indoor environment.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A homography visual odometer method facing indoor environment is characterized by comprising the following steps:

1) Reading data including a reference frame, a current frame and a local map, then extracting ORB feature points from the current frame, filtering the extracted feature points by a quadtree method, and finally completing feature matching of the current frame and the reference frame;

3) Calculating a homographic transformation matrix H of the poses of the current frame and the reference frame according to the plane with the most map points _m Then, obtaining a rotation pose matrix R and a translation pose vector t by solution;

2. The method for homography visual odometry towards an indoor environment according to claim 1, wherein in step 1), the current frame refers to a current image of a pose to be estimated read from an image sequence, the reference frame refers to a previous frame image read from the image sequence, and the step 1) comprises the following steps:

1.1 Adopting a scaling coefficient b to continuously perform downsampling on an original image of a current frame to obtain a d-layer image pyramid, and then extracting a corresponding number of FAST angular points according to the resolution of each layer of image, wherein the total number of extracted feature points is N;

1.3 For each feature point in the current frame, searching the feature point with the minimum Hamming distance with its descriptor in the neighboring area of the same position in the reference frame as the best match, and searching the next best match if the Hamming distance is the second smallest, and if the Hamming distance of the best match is greater than T _b Or the difference between the best matching distance and the next best matching distance is less than T _d If not, the best match is retained as the matched pair of feature match, wherein T _b 、T _d Is a set threshold; all matching pairs are denoted as q _cr ＝{(q _c1 ,q _r1 ),(q _c2 ,q _r2 ),...,(q _ck ,q _rk ),...,(q _cn ,q _rn ) Where k =1,2.., n, q _ck 、q _rk The matching pair of the initialized map point represents the current frame feature point and the corresponding feature point of the reference frame is

Wherein i =1,2., m, q _ci 、q _ri Indicates that it is already at firstThe feature points of the current frame and the reference frame of the initialization map point correspond to the feature points, and the corresponding map points are

Wherein i =1,2,. Cndot., m, P _i Representing map points corresponding to the number of observations of the map points as

Wherein i =1,2., m, N _i Representing the number of times the corresponding map point is observed, and the matching pair of uninitialized map points is

Wherein l =1,2., n-m, q _c(m+l) 、q _r(m+l) And representing the current frame characteristic points of the uninitialized map points and the corresponding characteristic points of the reference frame.

3. The homography visual odometry method facing indoor environment of claim 2, wherein in step 2), the coplanar map point combination is searched for by a weighted random sampling strategy, comprising the following steps:

is an intermediate variable;

setting map point probability:

2.2 ) randomly selecting 3 map points P according to the result of step 2.1) _v1 、P _v2 、P _v3 If the three points are collinear, then the point probability l is discarded and reselected _i As map point P _i Probability of being selected;

Is a distance h ₄ If h is ₄ Less than T _h Then calculate map point P _v1 To from P _v2 、P _v3 、P _v4 Form a plane

A distance h ₁ ，P′ ₂ To from P _v1 、P _v3 、P _v4 Form a plane

Is a distance h ₂ And P _v3 To by P _v1 、P _v2 、P _v4 Form a plane

A distance h ₃ If h is ₁ 、h ₂ 、h ₃ Are all less than T _h Then the map point combination is put into the set S through coplanar pre-screening _p In, T _h Is a set threshold; repeating the step until a specified number of coplanar map point combinations are pre-screened out or all map points are traversed, and discarding and re-selecting if repeated map point combinations appear in the process;

2.4 Repeating the steps 2.2) to 2.3) until the pre-screened coplanar map point combination meets the requirement of the maximum number, or traversing all map point combinations, and if the same map point combination appears in the process, abandoning the cycle and re-selecting;

2.5 ) further examining the co-planar map point combinations pre-screened in step 2.4) using space vector relations for S _p One coplanar map point combination P in _v1 、P _v2 、P _v3 、P _v4 Separately solving the linear equation P _v1 ＝x ₁ P _v2 +y ₁ P _v3 +z ₁ P _v4 ，P _v2 ＝x ₂ P _v1 +y ₂ P _v3 +z ₂ P _v4 ，P _v3 ＝x ₃ P _v1 +y ₃ P _v2 +z ₃ P _v4 ，P _v4 ＝x ₄ P _v1 +y ₄ P _v2 +z ₄ P _v3 Then calculating the error e of each term ₁ ＝|1-x ₁ -y ₁ -z ₁ |，e ₂ ＝|1-x ₂ -y ₂ -z ₂ |，e ₃ ＝|1-x ₃ -y ₃ -z ₃ |，e ₄ ＝|1-x ₄ -y ₄ -z ₄ If e ₁ 、e ₂ 、e ₃ 、e ₄ Are all less than T _e Put the coplanar combination into the final coplanar map point set S _a Wherein x is _g 、y _g 、z _g G =1,2,3,4 is the variable for solving the system of equations, T _e Is the set threshold.

4. A homographic visual odometry method facing an indoor environment according to claim 3, wherein in step 3), a homographic transformation matrix containing at most map point plane correspondences is calculated, comprising the steps of:

3.1 ) set a threshold value T _a =0, homography transformation matrix H _m And inner value matching pair set q' _cr ；

is the homogeneous coordinate, T, of the kth feature point of the reference frame and the current frame _h Is a set threshold, the symbol ^ represents antisymmetric transformation; if the number of inner peripheral values T _c ＞T _a Then let T _a ＝T _c 、H _m = H, then emptying the set q' _cr And putting the matching pair currently marked as the inner surrounding value into q' _cr ；

3.3 ) repeat step 3.2) until S _a The point combination of the middle coplanar map is completely read to obtain a homography matrix H _m And inner value matching pair set q' _cr (ii) a And decomposing the homography transformation matrix H by utilizing a decomplexeHomographyMat function in an Opencv library _m And obtaining a rotation pose matrix R and a horizontal displacement pose vector t.

5. The homography visual odometry method facing indoor environment of claim 4, wherein in step 4), the rotation pose matrix R and the translation pose vector t are optimized by minimizing a weighted reprojection error function, and the specific expressions are as follows:

the above formula is solved iteratively by a Gauss-Newton method; in the formula, argmin (-) represents the parameter when taking the minimum value to the function, K is the camera internal parameter matrix, pi (-) represents the homogeneous operation, W _i Is a map point P _i Weighted value of (2), P _i Represents the ith map point, q _ci 、q _ri Feature points of the current frame representing initialized map points and corresponding feature points of the reference frame, N _m Representing the total number of map points.

6. The homography visual odometry method facing indoor environment of claim 5, wherein in step 5), local map construction and local beam adjustment optimization are performed, comprising the following steps:

Map points initialized by triangularization method

Wherein q is _c(m+1) ,q _c(m+2) ,...,q _cn Representing the feature point of the current frame, q _r(m+1) ,q _r(m+2) ,...,q _rn Representing reference frame feature points, P _m+1 ,P _m+2 ,...,P _n Representing map points;

5.2 Set the weight of the newly initialized map point:

where m + l = m, m +1, \ 8230, n is the index number of the newly initialized map point, W _m+l Representing map points P _m+l B is the scaling factor of the image pyramid, s _c(m+l) 、S _r(m+l) Respectively representing the observation map points P of the current frame and the reference frame _(m+l) The number of layers of the pyramid where the feature points are located,

is an intermediate variable;

5.3 Optimization of the rotation pose matrix R, translation pose vector t and all map points by minimizing the weighted reprojection error function

The specific expression is as follows:

the above formula is solved iteratively by a Gauss-Newton method; wherein argmin (r) represents a parameter when taking the minimum value of the function, K is a camera internal reference matrix, the function pi (r) represents a homogeneous operation, W _k Is a map point P _k Weighted value of (2), P _k Represents the kth map point, k =1,2 _ck 、q _rk Representing the feature points of the current frame and the corresponding feature points of the reference frame, and n represents the total number of map points.