CN114821113A

CN114821113A - Monocular vision inertia SLAM method and system based on adaptive robust kernel

Info

Publication number: CN114821113A
Application number: CN202111376206.0A
Authority: CN
Inventors: 伍雪冬; 明德琦
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-07-29

Abstract

The invention discloses a monocular vision inertia SLAM method and system based on a self-adaptive robust kernel, and belongs to the field of vision robots. The invention mainly provides improvement aiming at monocular initialization, rear end optimization and loop detection parts, initial characteristic matching is carried out on a current frame and a reference frame during monocular initialization, two motion models of a homography matrix and a basic matrix are calculated in parallel, the homography matrix or the basic matrix is selected through geometric robust information content, and model decomposition and beam method adjustment BA optimization are carried out; constructing a cost function for minimizing a reprojection error in a local graph construction process, and applying a self-adaptive robust kernel to BA optimization; and reducing accumulated errors in the loop correction process by using the adaptive robust core in the loop detection thread. Compared with the prior art, the method improves the monocular initialization process, reduces the influence of outliers by adopting the adaptive robust kernel, enhances the optimization effect and improves the robustness of the system.

Description

Monocular vision inertia SLAM method and system based on adaptive robust kernel

Technical Field

The invention belongs to the field of vision robots, relates to a monocular vision inertia SLAM method and system based on a self-adaptive robust kernel, and particularly relates to monocular initialization, back-end optimization and optimization of a loop detection part in the monocular vision inertia SLAM.

Background

With the rapid development of automation integration technology, communication and electronic technology and high-precision sensor technology, robotics has become an irreplaceable part of social production life, and will more widely affect the global automation degree in the future. The robot industry has been recognized as one of the most important and promising high-tech industries in the world, but it cannot be denied that many problems still exist in the fields of robots and artificial intelligence and wait for the exploration of the robots and the artificial intelligence. Therefore, a plurality of expert scholars are involved to intensively study the 'perception and exploration of unknown environment realized by the robot and the accurate realization of self-positioning problem'.

Meanwhile, positioning and Mapping (SLAM) are taken as main research hotspots in the field of machine sensing and navigation, and are deeply concerned by researchers. At present, SLAM technology is applied to the fields of robot navigation, unmanned driving, virtual reality, augmented reality, three-dimensional reconstruction and the like. SLAM is one of core technologies of an intelligent mobile robot, an environment map can be generated in scenes such as indoor and underwater scenes with failed GPS, tunnels and the like, and the robot can perform autonomous positioning and navigation by using the map. In the field of unmanned driving, the SLAM technology can position lane lines, and centimeter-level positioning accuracy is achieved, so that GPS positioning is assisted. The SLAM technology can keep the geometric consistency of the virtual object and the real scene, so that the superposed virtual object has reality, and the map building function of the SLAM technology can be utilized to obtain the three-dimensional models of the object and the scene.

In the early stages of SLAM technology development, the laser radar SLAM was mainly surrounded. However, the high-precision laser radar is expensive, and the low-precision laser radar has a large error, which greatly affects the precision of the SLAM system, and thus is not widely used. Compared with a laser radar sensor, the camera has the advantages of low cost, small size, small quality and the like, and the image contains abundant information. With the improvement of hardware performance and the development of computer vision, visual SLAM (vslam) based on a vision sensor is known. The visual sensor is low in price and contains abundant texture information, so that the visual sensor is widely concerned by researchers. Although the VSLAM technique can achieve good results in general scenes, it has poor performance in environments with fast motion, geometric changes of images caused by viewpoint changes, severe illumination changes, and even no light. To improve the robustness of SLAM system operation, researchers have begun moving the direction of research into multi-sensor fusion SLAMs, the most representative of which is visual inertial SLAM. VISLAM introduces an inertial measurement unit on the basis of visual SLAM, so that the defects of a pure visual navigation system can be overcome, the matching speed of feature points can be improved, and the robustness of an algorithm can be enhanced.

Nowadays, although the research of the visual SLAM has been successful in many directions, there are still many problems to be solved by optimization, such as the visual SLAM is difficult to operate stably for a long time in dynamic scenes, texture loss and complex environments. The existing monocular vision research mainly has the following defects: (1) in monocular vision, because the translation vector needs to be normalized, the scale uncertainty of the monocular vision is caused, the monocular initialization cannot obtain the depth information from a single frame, and in addition, when only the pure rotation condition exists in the monocular initialization process, the monocular vision cannot be initialized. (2) In the local mapping thread, although the overall error caused by the reasons of mismatching and the like is reduced, some larger errors still cause unpredictable influence on the result, and the precision cannot be improved in a breakthrough manner. (3) In the visual SLAM front end, no matter pose estimation or mapping is carried out, the pose estimation or mapping is completed by utilizing the relation between adjacent frames, and the algorithm depending on local constraint inevitably causes the accumulation of optimization errors frame by frame, and finally generates a large error drift. The research on the visual SLAM system with high robustness and high accuracy not only has important value for scientific research in the fields of visual robots, computer vision and the like, but also has wide application range theoretical application, so that the research on the visual SLAM technology has important significance.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the defects of the prior art, the invention aims to provide a monocular vision inertia SLAM method and system based on a self-adaptive robust kernel, improve the monocular initialization process, reduce the influence of outliers by adopting the self-adaptive robust kernel, enhance the optimization effect and improve the robustness of the system.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:

a monocular vision inertia SLAM method based on a self-adaptive robust kernel mainly comprises the steps of optimizing monocular initialization, local mapping and loop detection, and specifically comprises the following steps:

performing initial feature matching on a current frame and a reference frame during monocular initialization, calculating two motion models of a homography matrix and a basic matrix in parallel, selecting the homography matrix or the basic matrix according to the content of geometric robust information, and performing model decomposition and adjustment BA optimization by a beam method;

constructing a cost function of minimizing a reprojection error in a local graph construction process, and applying a self-adaptive robust kernel to adjustment (BA) optimization of a beam method;

and reducing accumulated errors in the loop correction process by using the adaptive robust core in the loop detection thread.

Preferably, the model selection through the geometric robust information content is expressed as:

wherein i is the geometric robust information content, k represents the key point serial number, L is a robust function, e represents the reprojection error,

for limiting the size of the reprojection error, β represents the size of the model, where β ═ 2 is the homography matrix, β ═ 3 is the basis matrix, n is the number of map points tracked between two keyframes,

representing degrees of freedom in a modelA number of, wherein

A basic matrix is represented that is,

representing a homography matrix, theta ² Representing the covariance of the keypoint localization error, λ represents the dimensionality of the data, and the two-dimensional correspondence between the two frames λ 4.

Preferably, the kernel function ρ of the adaptive robust kernel is expressed as:

where μ, ω, c are parameters, and the constant c >0 is used to constrain the secondary loss region size around μ ═ 0.

Preferably, in the local mapping thread, the cost function for minimizing the reprojection error is expressed as:

wherein R is a rotation matrix, t is a translation vector,

for the set of all matching feature points, p is the robust kernel function,

for projection coordinates, X ^α Are the coordinates of the proxels in a world coordinate system,

representing the two-norm sum of squares of the reprojection error.

Preferably, in the loop detection thread, an adaptive robust kernel function is applied to Sim (3) solution, which is expressed as:

wherein S is ₁₂ For transforming matrices, p _1,i For the key points i, p in the key frame 1 _1,j For the key point j, P matching with i in the key frame 2 _1,i As 3D coordinates, P, of the key point i in the world coordinate system _2,j Is the 3D coordinate of the key point j matched with the key point i in the world coordinate system, N is the number of matched key points, pi _i (i ═ 1,2) denotes a projection function.

Based on the same inventive concept, the invention provides a monocular vision inertial SLAM system based on an adaptive robust kernel, which comprises:

the monocular initialization module is used for carrying out initial feature matching on the current frame and the reference frame, calculating two motion models of a homography matrix and a basic matrix in parallel, selecting the homography matrix H or the basic matrix F according to the content of geometric robust information, and carrying out model decomposition and beam method adjustment BA optimization;

the local mapping module is used for constructing a cost function for minimizing the reprojection error and applying the adaptive robust kernel to BA optimization;

and a loop detection module for reducing accumulated errors in the loop correction process by using the adaptive robust kernel.

Based on the same inventive concept, the present invention also provides a computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program, when loaded into the processor, implements the monocular visual inertia SLAM method based on the adaptive robust kernel.

Has the advantages that: the invention provides a monocular vision inertia SLAM method based on a self-adaptive robust kernel, which mainly provides an improvement method aiming at a monocular initialization part, a back end optimization part and a loop detection part, and has the following main advantages:

(1) for the scale uncertainty in monocular vision and the pure rotation problem of a monocular camera, the method for estimating the motion during initialization of the monocular system is improved, the problem of initialization failure with little or no translation can be solved, an initial point cloud map is constructed, and an initial value is provided for later calculation.

(2) In SLAM, a minimized reprojection error is used as a cost function for many state estimations, an outlier with a large deviation can have a huge influence on an estimation result, model parameters are directly optimized and solved by adopting a self-adaptive robust kernel function, the influence of error data on the overall robustness is reduced by reducing the weight of the outlier, and the system precision can be improved.

(3) And accumulated errors generated in the SLAM tracking process are reduced in the loop detection thread by using the adaptive robust core.

Drawings

FIG. 1 is a flow chart of a method of an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments.

As shown in fig. 1, in a monocular vision inertial SLAM method based on an adaptive robust kernel according to an embodiment of the present invention, feature points are extracted from an image frame for feature matching, initial data association is found, then, distortion removal processing (only radial distortion is considered) is performed on the image, a homography matrix model and a basic matrix model are calculated in parallel, a motion model is selected according to geometric information content, then, an initial motion estimation is obtained by performing outlier rejection and decomposition on the selected motion model, and finally, global BA optimization initial reconstruction is performed. The embodiment of the invention mainly relates to (1) analyzing the monocular initialization process and designing a new model selection mechanism; (2) constructing a cost function for minimizing the reprojection error, and applying a self-adaptive robust kernel to BA optimization; (3) the accumulated error in the loop back correction process is reduced by using an adaptive robust kernel. The method comprises the following steps:

1. monocular visual inertia SLAM monocular initialization process based on adaptive robust kernel, comprising the steps of:

(1) performing an initial feature matching, i.e. the current frame F _c And reference frame F _r The feature matching in (1):

extracting the current frame F _c Of (ORB) (organized Fast and Rotated Brief) and then in the reference coordinate system F (only at the finest scale) _r Searching for matching feature points

If not enough matches are found, the reference coordinate system is reset. Searching for matching points, carrying out feature point matching after the ORB features of the first two frames of images are extracted, and failing in matching if the number of matching is less than 100; then, carrying out distortion removal processing on the image frames subjected to feature matching;

(2) parallel calculation of the homography matrix model and the basic matrix model is carried out, and a homography matrix H or a basic matrix F is selected according to the content of the geometric robust information;

computing a homography matrix H in parallel threads _cr And a basic matrix F _cr ：

x _c ＝H _cr x _r #(1)

A model selection method adopting geometric robust information content is shown as a formula (3):

wherein G is the content of geometric robust information, k represents the key point serial number, L is a robust function, e represents the reprojection error,

representing degrees of freedom in the model, wherein

A basic matrix is represented that is,

representing a homography matrix, theta ² Representing the covariance of the keypoint localization error, λ represents the dimensionality of the data, and the two-dimensional correspondence between the two frames λ 4. The geometric robust information content of the homography matrix and the basis matrix are computed separately so that when there is little or no translation between frames, homographies are chosen to better represent the course of motion to enhance the monocular initialization process.

(3) Decomposing the motion model to obtain initial motion estimation;

and for the homography matrix H, decomposing by adopting a numerical method to obtain R and t.

For the fundamental matrix F, it is converted into the fundamental matrix E using the calibration matrix K:

E＝K ^T FK#(5)

and (3) recovering a rotation matrix and a translation vector of the camera motion through singular value decomposition according to the basic matrix:

E＝U∑V ^T #(6)

u and V are orthogonal matrixes, sigma is a singular value matrix, four movement hypotheses of R and t are obtained, and an optimal scheme is obtained through triangulation. If there is not enough solution, the initialization fails and the reference coordinate system is reset.

(4) Adjusting by a light beam method;

BA is performed to optimize the initial reconstruction.

2. The monocular vision inertia SLAM local mapping process based on the adaptive robust kernel comprises the following main processes:

(1) processing the key frame;

in the process, key frames are selected from the image frame sequence, BoW vectors of key frame feature points are calculated, the connection between the key frames is established, and then the key frames are inserted into a map.

(2) Removing redundant map points;

and traversing all map points of the current key frame, and eliminating the map points which do not meet the conditions.

(3) Adding new map points;

during the camera movement, some map points can be recovered from adjacent key frames through triangulation.

(4) If the current key frame is the last frame in the key frame queue, then local BA is performed.

Further, a cost function is derived that minimizes the reprojection error:

map point three-dimensional position

Rotation and keyframe pose T _iw Belongs to SE (3), wherein SE (3) is a special Euclidean group, W represents a world coordinate system, and the matching key points are matched

The reprojection error of (2) is optimized. The error term for observing map point j in key frame i is:

e _i,j ＝x _i,j -π _i (T _iw ,X _w,j )#(7)

wherein, pi _i Represents the projection function:

[x _i,j y _i,j z _i,j ] ^T ＝R _i,w X _w,j +t _i,w #(9)

in the formula R _iw Epsilon SO (3) and

are each T _iw SO (3) is a special orthogonal group, (f) _i,u ,f _i,v ) And (c) _i,u ,c _i,v ) Is the focal length and principal point associated with the i frame camera, and u and v represent the image being dewaxed. The cost function for minimizing the reprojection error is:

where p is the Huber kernel function,

is the set of all matching feature points. In the case of a global BA, all points and keyframes are optimized, except for the case where the first keyframe remains fixed as the origin. In local BA, all points contained in the local area will be optimized, while the subset of keyframes is fixed. In "pose optimization" or "motion-only BA", all points are fixed and only the camera pose is optimized.

The adaptive robust kernel is applied to BA optimization:

in BA optimization, the quadratic sum of two norms of the minimized error term is taken as an objective function, if a larger error term occurs, other correct values are influenced, the robust kernel function replaces the original two-norm measurement of the error into a function which is not increased so fast, and meanwhile, the smooth property is ensured, so that the whole optimization result is more stable. The adaptive robust kernel is as follows:

by analyzing the value of ω, the final robust kernel function is:

where μ, ω, c are parameters, ω is used to control the robust kernel shape, c is generally a constant, and c >0 is a scale parameter used to constrain the scale of the quadratic loss region in the vicinity of μ ═ 0. Interpolation is performed between a series of robust kernels by adjusting ω. To automatically determine the optimal nuclear shape by the parameter ω, we consider ω as an additional unknown parameter while minimizing the error:

by choosing ω, all residuals are weighted to smaller values and the reprojection error is minimized without affecting the model parameters R and t.

(5) And removing redundant key frames, reducing the system calculation amount, and adding the selected key frames into a loop detection thread for loop correction.

3. The adaptive robust kernel is applied to similarity transformation Sim (3) calculation of loop correction in a monocular vision inertia SLAM loop detection thread based on the adaptive robust kernel:

(1) detecting whether the key frame sequence generates a loop;

(2) calculating a similarity transformation matrix for the detected loop candidate key frame;

given keyframe 1 and keyframe 2, and a set of N matching keypoints in keyframes 1,2 and the 3D patch points associated with the keypoints, for each pair of matching points the following transformation relationship is found:

p _2,j ＝sRp _1,j +t#(13)

where s is the scale factor, R is the rotation matrix, and t is the translation vector. In fact, the data inevitably has noise and errors, defining an error e ₁ ,e ₂ The appropriate scale factor combination is found, and rotation and translation minimize its error for all data. Optimizing relative Sim (3) transformsMatrix array

To minimize the reprojection error in both images:

e ₁ ＝p _1,i -π ₁ (S ₁₂ ,P _2,j )#(14)

an adaptive robust kernel function is applied to Sim (3) and the cost function to minimize:

the scale factor, the rotation matrix, the translation vector and the parameter omega which satisfy the above relations are obtained through the above formula, so that the whole closed loop is optimized, and the error of the closed loop is minimized.

(3) And (6) loop correction. Constructing a continuous group of the current key frames, and correcting the poses of the key frames in the continuous group according to the calculated similarity transformation matrix; and recalculating all map points of the adjacent key frames according to the updated camera pose (similarity transformation matrix), and then performing map point fusion and global BA optimization.

The standard deviation (Std/m) of the invention compared with the experimental result of ORB-SLAM3 in KITTI data set is shown in the following table, and the positioning precision of the invention is better than that of ORB-SLAM3 by comparing the experimental result.

Based on the same inventive concept, an embodiment of the present invention provides a monocular vision inertial SLAM system based on an adaptive robust kernel, including: the monocular initialization module is used for carrying out initial feature matching on the current frame and the reference frame, calculating two motion models of a homography matrix and a basic matrix in parallel, selecting the homography matrix H or the basic matrix F according to the content of geometric robust information, and carrying out model decomposition and beam method adjustment BA optimization; the local mapping module is used for constructing a cost function for minimizing the reprojection error and applying a self-adaptive robust kernel to BA optimization; and a loop detection module for reducing accumulated errors in the loop correction process by using the adaptive robust kernel. For the specific implementation of each module, reference is made to the above method embodiments, which are not described herein again.

Based on the same inventive concept, the present invention also provides a computer system, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program, when loaded into the processor, implements the aforementioned monocular vision inertia SLAM method based on an adaptive robust kernel.

Claims

1. A monocular visual inertia SLAM method based on an adaptive robust kernel is characterized by comprising the following steps:

constructing a cost function for minimizing a reprojection error in a local graph construction process, and applying a self-adaptive robust kernel to BA optimization;

2. The monocular visual inertia SLAM method based on an adaptive robust kernel as claimed in claim 1, characterized by model selection through geometric robust information content, expressed as:

representing degrees of freedom in the model, wherein

A basic matrix is represented which is a matrix of,

3. The adaptive robust kernel-based monocular visual inertia SLAM method of claim 1, wherein the kernel function ρ of the adaptive robust kernel is expressed as:

4. The adaptive robust kernel-based monocular visual inertial SLAM method of claim 3, wherein in the local mapping thread, the cost function that minimizes the reprojection error is expressed as:

wherein R is a rotation matrix, t is a translation vector,

for the set of all matching feature points, p is the robust kernel function,

representing the two-norm sum of squares of the reprojection error.

5. The adaptive robust kernel based monocular visual inertia SLAM method of claim 3, wherein in the loop detection thread, the adaptive robust kernel function is applied to the Sim (3) solution, expressed as:

6. A monocular visual inertial SLAM system based on an adaptive robust kernel, comprising:

the local mapping module is used for constructing a cost function for minimizing the reprojection error and applying a self-adaptive robust kernel to BA optimization;

7. The monocular visual inertial SLAM system of claim 6, wherein the monocular initialization module, via model selection of geometric robust information content, is represented as:

representing degrees of freedom in the model, wherein

A basic matrix is represented that is,

8. The adaptive robust kernel based monocular visual inertial SLAM system of claim 6, wherein the kernel function p of the adaptive robust kernel is expressed as:

9. The adaptive robust kernel based monocular visual inertial SLAM system of claim 6, wherein in the local mapping module, a cost function that minimizes reprojection error is expressed as:

wherein R is a rotation matrix, t is a translation vector,

for the set of all matching feature points, p is the robust kernel function,

a two-norm sum of squares representing the reprojection error;

in the loop detection module, the adaptive robust kernel function is applied to Sim (3) solution, and is represented as:

10. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program, when loaded into the processor, implements the adaptive robust kernel based monocular visual inertia SLAM method according to any one of claims 1-5.