CN113720323B

CN113720323B - Monocular vision inertial navigation SLAM method and device based on point-line feature fusion

Info

Publication number: CN113720323B
Application number: CN202110873290.0A
Authority: CN
Inventors: 樊渊; 潘高峰; 宋程; 潘天红; 李腾; 董翔; 程松松; 方笑晗
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2024-01-23
Anticipated expiration: 2041-07-30
Also published as: CN113720323A

Abstract

The invention discloses a monocular vision inertial navigation SLAM method and device based on point-line feature fusion, wherein the method comprises the following steps: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information; obtaining a plurality of frame images in the moving process of the camera, and taking image frames successfully matched with the point line characteristics in all frame images as key frames; removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps; repeating the camera track to perform cycle detection and cycle correction to obtain a new local map, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment; the invention has the advantages that: with higher accuracy and smaller trajectory errors.

Description

Monocular vision inertial navigation SLAM method and device based on point-line feature fusion

Technical Field

The invention relates to the field of map construction, in particular to a monocular vision inertial navigation SLAM method and device based on point-line feature fusion.

Background

With the gradual development of SLAM (synchronous localization and mapping) technology, vision-based SLAM algorithms are becoming a hotspot of current research. The monocular vision SLAM method based on the point features is a method for detecting and extracting surrounding environment angular points by taking a monocular camera as a sensor. The extracted characteristic points are used for estimating the pose of the camera and the 3D map by using the algorithm, so that the method has good performance and high calculation speed. Existing relatively sophisticated algorithms are: PTAM, ORB-SLAM2, and the like. But the monocular vision SLAM method based on point features needs to guarantee the texture degree of the surrounding environment, which does not perform well in low-texture scenes. The visual SLAM algorithm based on the line characteristics is proposed for the lack or failure of extraction of the point characteristic algorithm in the low-texture scene. The algorithm utilizes line feature extraction and matching of the surrounding environment to complete the positioning of the robot and the composition of the environment. The line feature has the advantages of describing the edge information of the environment, enriching the geometric information in the environment and well describing the low-texture environment information. Such algorithms do not rely on the point features of the environment, but rather rely entirely on the extraction and matching of line features of the environment. The existing mature technology comprises the following steps: LSD, LCD, etc. However, the visual SLAM algorithm based on the line features only uses the line features, which has great requirements on computing resources, can slow down computing speed, and can lose advantages in texture environments. The dynamic visual SLAM algorithm based on deep learning is based on a neural network technology to identify a dynamic object, then eliminates the dynamic object, and finally matches and patterns the environment outside the dynamic object. The main technical point is that a neural network (such as CNN) is used for identifying the dynamic object from which the characteristic points are extracted, the dynamic object is rejected and is not used, the rest static object is used for calculation, and the complexity of operation is increased and the discrimination of the dynamic object is relatively complicated due to the addition of the deep learning technology. Based on the problems of discomfort and complex operation of the methods in the low-texture scene, the prior art emerges a visual inertial navigation SLAM method for fusing the point-line characteristics, and the positioning accuracy is improved in a point-line fusion mode and the method is suitable for the low-texture scene.

Chinese patent application publication No. CN109579840a discloses a tight phase binocular vision inertial SLAM method of point-line feature fusion, comprising the steps of: determining a transformation relation between a camera coordinate system and an inertial sensor coordinate system: establishing a point money feature and an IMU tracking line hand, solving an initial three-dimensional point line three-dimensional coordinate, after the IMU is initialized, predicting the position of a feature point line by using the IMU, establishing a correct feature data association, and solving pose transformation of a continuous frame by combining the IMU and a point line feature reprojection error item; establishing a point line characteristic and IMU local beam method adjustment thread, and optimizing a point line three-dimensional coordinate, a key frame pose and an IMU state quantity in a local key frame window; and establishing a dotted line characteristic loop detection thread, calculating the score of the bag-of-words model by using the dotted line characteristic weight to detect the loop, and optimizing the global state quantity. The patent application can ensure stability and high precision under the conditions of less number of feature points and rapid movement of a camera, is suitable for low-texture scenes, but weights the dotted lines in the word band model, and errors caused by insufficient extraction of the dotted lines in the early stage can be accumulated to cause larger track errors in the later stage.

Disclosure of Invention

The technical problem to be solved by the invention is that the track error of the visual inertia SLAM method based on the point-line fusion in the prior art is larger, so that the finally generated map has low precision.

The invention solves the technical problems by the following technical means: monocular vision inertial navigation SLAM method based on point-line feature fusion, the method comprises the following steps:

s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information;

s2: carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while extracting the point line features to obtain the pose of a camera in real time, wherein the IMU is arranged at the center of gravity of a monocular camera, and also obtains the pose of the camera in real time, and fusing the poses of the cameras obtained in the two ways to obtain a camera moving process, wherein a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames;

s3: removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;

s4: initializing a system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and performing average value operation on the results of the same local map after the cyclic detection and cyclic correction are finished to obtain a final local map;

s5: and forming a global map by all the final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.

According to the invention, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with larger errors are removed, then the camera track is repeatedly subjected to cyclic detection and cyclic correction, and the track is averaged to obtain a final track, namely a final local map, and the process has higher precision and smaller track error.

Further, before the step S1, the method further includes an IMU initialization, where the IMU initialization process is:

the parameters acquired by the IMU are put together to form a state vector y _k ＝{s,R _wg ,b,v _0:k (wherein s represents a scale factor, R) _wg Representing a gravity vector, which is represented in world coordinate system as g=r _wg g _I Wherein g _I ＝(0,0,G) ^T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v _0:k Representing a non-scale body velocity from a first frame image to a last frame image;

the posterior distribution maximized in IMU initialization is p (y _k |I _0: )∝p(I _0: |y _k )p(y _k ) Wherein I _0:k Measurement information representing the first frame image to the last frame image,representing measurement information of a first frame image to a last frame image which have been measured andp(y _k |I _0:k ) Maximized posterior distribution, oc represents the sum of values corresponding to p (I _0:k |y _k ) Represents likelihood, p (y) _k ) Representing a priori distribution;

the maximum a posteriori estimation of the IMU is expressed as

Wherein I is _i-1 Measurement information g representing the i-1 th frame _dir Representing the gravity vector, v _i-1 A non-scale body velocity, v, representing an i-1 th frame of image _i A non-scale body velocity representing an i-th frame image;

taking the negative logarithm of the maximum posterior estimate of the IMU and assuming that the IMU pre-integral and the prior distribution obey gaussian distribution, the IMU final maximum posterior estimate is then expressed as

Wherein r is _p Representing an a priori residual term, I _i-1,i Representing measurement information of two adjacent frames.

Further, the step S2 includes: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.

Further, the process of matching the dotted line features in the step S2 is as follows:

point features in adjacent frames of images obtained by monocular cameras are respectively denoted as K _i And Q _j I, j=1, 2,3 …, K ₁ And Q is equal to _j Matching the features of the closest point features in the middle distance, and matching K ₂ And Q is equal to _j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;

line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively _c And l ^′ _c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows ₁ 、l ₂ Respectively represent straight lines l _c X-direction coefficients and y-direction coefficients, +.>Representation l ^′ _c Respectively to l _c Is a distance of (3).

Further, the step S2 of fusing the pose of the camera obtained in the two ways to obtain the camera moving process includes: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.

Further, repeating the camera trajectory for loop detection and loop correction in step S4 includes: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.

The invention also provides a monocular vision inertial navigation SLAM device based on point-line feature fusion, which comprises:

the information acquisition module is used for taking the image acquired by the monocular camera and the environmental information acquired by the IMU as input information;

the key frame acquisition module is used for carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while carrying out point line feature extraction to acquire the pose of the camera in real time, wherein the IMU is arranged at the center of gravity of the monocular camera, the IMU also acquires the pose of the camera in real time, the pose of the camera obtained in the two modes is fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames;

the BA optimization module is used for eliminating key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, eliminating the point line feature numbers lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps;

the detection and correction module is used for initializing the system, repeating the circular detection and the circular correction of the camera track to obtain a new local map, and carrying out average value operation on the results of the same local map after the circular detection and the circular correction are finished, so as to obtain a final local map;

and the global optimization module is used for forming a global map from all final local maps, performing global BA optimization on the global map, and updating the global map through cycle detection and cycle correction to construct a complete map in the environment.

Further, the information acquisition module further includes an IMU initialization, where the IMU initialization process is:

the parameters acquired by the IMU are put together to form a state vector y _k ＝{s,R _wg ,b,v _0:k (wherein s represents a scale factor, R) _wg Representing a gravity vector, which is represented in world coordinate system as g=r _wg g _I Wherein G is _I ＝(0,0,G) ^T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v _0:k Representing a non-scale body velocity from a first frame image to a last frame image;

the posterior distribution maximized in IMU initialization is p (y _k |I _0:k )∝p(I _0:k |y _k )p(y _k ) Wherein I _0:k Measurement information representing the first frame image to the last frame image,measurement information representing the first to last frame image which has been measured and +.>p(y _k |I _0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I _0:k |y _k ) Represents likelihood, p (y) _k ) Representing a priori distribution;

the maximum a posteriori estimation of the IMU is expressed as

Further, the key frame acquisition module is further configured to: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.

Further, the process of matching the dotted line features in the key frame acquisition module is as follows:

line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively _c And l' _c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows ₁ 、l ₂ Respectively represent straight lines l _c X-direction coefficients and y-direction coefficients, +.>Representing l' _c Respectively to l _c Is a distance of (3).

Further, the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the key frame obtaining module includes: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.

Further, the detecting and correcting module for repeating the camera track to perform the cyclic detection and cyclic correction includes: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.

The invention has the advantages that: according to the invention, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is carried out after data with larger errors are removed, then the camera track is repeatedly subjected to cyclic detection and cyclic correction, and the track is averaged to obtain a final track, namely a final local map, and the process has higher precision and smaller track error.

Drawings

FIG. 1 is a flow chart of a monocular vision inertial navigation SLAM method based on point-line feature fusion disclosed in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a relationship between projection lines and matching line segments in a monocular vision inertial navigation SLAM method based on point-line feature fusion according to an embodiment of the present invention;

fig. 3 is a schematic diagram of distances from the end points of the matching line segments to the projection line in the monocular vision inertial navigation SLAM method based on the point-line feature fusion according to the embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1, the monocular vision inertial navigation SLAM method based on the point-line feature fusion comprises the following steps:

s1: taking an image acquired by a monocular camera and environmental information acquired by an IMU as input information; the camera and the IMU are operated in parallel, image information and environment measurement information are respectively extracted, and then the environment information of the camera and the IMU is initialized in a tightly coupled mode. By tightly coupled is meant that the information collected by the two persons is integrated together as input information, rather than being solely as input information. It should be noted that, before the step S1, the method further includes an IMU initialization, where the IMU initialization process is:

the maximum a posteriori estimation of the IMU is expressed as

Wherein I is _i-1 Measurement information g representing the i-1 th frame _dir Representing the gravity vector, v _i-1 A non-scale body velocity, v, representing an i-1 th frame of image _i And the non-scale organism speed of the ith frame image is represented.

in this embodiment, the FAST corner extraction algorithm is used to extract the point features, and the line segment detector LSD algorithm is used to extract the line features, where the point features are weighted and the weight of the line features is higher than that of the point features in the low texture scene. The FAST corner extraction algorithm belongs to the prior art, and the general process is as follows:

(1) Selecting a pixel p in the image, assuming its brightness is I _p 。

(2) Setting a threshold T (e.g., I _p 10%) of (c).

(3) Taking the pixel p as the center, 16 pixel points on a circle with the radius of 3 are selected.

(4) Assuming that the brightness of N consecutive points on the selected circle is greater than I _p +T or less than I _p T, then pixel p may be considered a point feature.

(5) The above four parts are cycled, and the same operation is performed for each pixel.

When extracting the point and line features, the basic premise that the point features are not easy to be extracted is considered that the method is carried out under low texture. And weighting the point and line characteristics. The invention is based on the low texture scene, the number of line feature extraction is more than that of point features, and therefore, the line features are taken as the main and the point features are taken as the auxiliary for weighting in the weighting process, the effective point line features extracted from the environment are weighted in the weighting mode, and the map can be initialized according to the line features when the point line features are extracted and the point features are not extracted from continuous frames, so that the ordered operation of the system is not influenced. Therefore, the invention divides the proportion of the points and the line features according to the number of the line features, avoids complex weight distribution calculation and ensures the real-time performance of the system. The threshold value of the number of line features is set to N.

Dot line feature weight w _p And w _l The distribution is as follows:

N _l >N：w _p ＝0.2，w _l ＝0.8

wherein N is _l Indicating the number of successful line matches.

The process of the dot line characteristic matching is as follows:

before the line feature matching is performed, the feature lines in the space should be expressed mathematically, so that the computer description is facilitated. Let the homogeneous coordinates of two endpoints of the space straight line L be P ₁ ＝[x ₁ ,y ₁ ,z ₁ ,w ₁ ] ^T And P ₂ ＝[x ₂ ,y ₂ ,z ₂ ,w ₂ ] ^T The pllck matrix is defined asIts non-homogeneous coordinate is +>And->The linear plgram coordinates are:

where L is a 6 x 1 vector, comprising n and v, n being the moment vector of the straight line, v being the direction vector of the straight line, n being perpendicular to the plane determined by the origin and the straight line. w (w) ₁ And w ₂ Representing the homogeneous factor.

The conversion relationship between the plgram matrix T and the coordinates L is as follows:

wherein n is ^∧ An antisymmetric matrix of moment vectors n representing straight lines.

Set up a linear transformation matrix H _cw It is composed of a rotation matrixAnd translation vector->The composition is formed. L (L) _c Is a straight line L in space _w Converted from the spatial coordinate system to the coordinates of the camera coordinate system. Handle L _c Expressed in terms of the plck coordinates:

wherein,representing an antisymmetric matrix of translation vectors.

Straight line L is set _c Projection onto image plane as l _c Its plck coordinates are expressed as:

wherein K is _l Projection matrix representing straight line, f _x 、f _y 、c _x 、c _y Representing the focal length and image center inside the parameters inside the cameras (each camera has a fixed focal length and image center), n _c Represents the normal vector in the expression of the pulgram, l ₁ 、l ₂ 、l ₃ Represents straight line l _c Coefficients of l _c ＝l ₁ x+l ₂ y+l ₃ z。

As shown in fig. 2 and 3, line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively _c And l' _c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows ₁ 、l ₂ Respectively represent straight lines l _c X-direction coefficients and y-direction coefficients, +.>Representing l' _c Respectively to l _c Is a distance of (3).

The camera moving process obtained by fusing the pose of the camera obtained in the two modes in the step S2 comprises the following steps: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.

S3: removing key frames with feature points lower than a first preset value from all key frames, forming a plurality of local maps by camera motion tracks formed by the rest key frames, removing the number of point line features lower than a second preset value from all the local maps, and performing BA optimization on the rest local maps; the BA optimization process is the prior art, and the document SLAM fourteen talkback-end 1 uploaded by JInterest 2020 on 1 month 14 days on CSDN blogs discloses the specific BA optimization process as follows:

the world coordinates are converted to camera coordinates using the rotation matrix R and translation matrix t of the camera.

P′＝Rp+t＝[X′,Y′,Z′]

Will P ^′ And (3) casting the coordinate to a normalization plane to obtain a normalization coordinate:

taking the distortion condition of the normalized coordinates into consideration to obtain the original pixel coordinates before de-distortion

Calculating pixel coordinates according to the internal reference model:

the observation data of the camera in the moving process is recorded as:

z＝h(x,y)

x refers to the pose of the camera, namely the corresponding transformation matrix T, y refers to the three-dimensional point p, and the observation error is as follows:

e＝z-h(T,p)

taking into account observations at other moments. Let z be _ij To be in pose T _i Site observation road sign p _j The resulting data, then the overall cost function is:

this least squares solution corresponds to the simultaneous adjustment of pose and landmark, the so-called BA.

S4: initializing a system, repeating the camera track to perform cyclic detection and cyclic correction to obtain a new local map, and performing average value operation on the results of the same local map after the cyclic detection and cyclic correction are finished to obtain a final local map; wherein repeating the camera trajectory for loop detection and loop correction comprises: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.

According to the technical scheme, the monocular camera and the IMU are respectively input with prior information to determine the camera moving process, BA optimization is performed after data with larger errors are removed, then the camera track is repeated, and the track is subjected to cyclic detection and cyclic correction, so that a final track, namely a final local map, is obtained on average, and the process has higher precision and smaller track errors.

Example 2

Based on embodiment 1 of the present invention, embodiment 2 of the present invention further provides a monocular vision inertial navigation SLAM device based on dotted line feature fusion, the device comprising:

Specifically, the information acquisition module further comprises an IMU initialization, and the IMU initialization process is as follows:

the maximum a posteriori estimation of the IMU is expressed as

Wherein the method comprises the steps of，r _p Representing an a priori residual term, I _i-1,i Representing measurement information of two adjacent frames.

Specifically, the key frame acquisition module is further configured to: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.

Specifically, the process of matching the dotted line features in the key frame acquisition module is as follows:

Specifically, the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the key frame obtaining module includes: and acquiring the pose of the camera according to the triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, removing poses of which pose deviations exceed a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process.

Specifically, the detecting and correcting module repeats the camera track to perform the cyclic detection and cyclic correction, including: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The monocular vision through SLAM method based on the point-line characteristic fusion is characterized by comprising the following steps of:

s2: carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while extracting the point line features to obtain the pose of a camera in real time, wherein the IMU is arranged at the center of gravity of a monocular camera, and also obtains the pose of the camera in real time, and fusing the poses of the cameras obtained in the two ways to obtain a camera moving process, wherein a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames; the process of matching the dotted line features in the step S2 is as follows:

point features in adjacent frames of images obtained by monocular camerasRespectively denoted as K _i And Q _j I, j=1, 2,3 …, K ₁ And Q is equal to _j Matching the features of the closest point features in the middle distance, and matching K ₂ And Q is equal to _j Matching the point characteristic features with the closest intermediate distance, and finishing the point characteristic matching by analogy;

line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively _c And l ^′ _c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows ₁ 、l ₂ Respectively represent straight lines l _c X-direction coefficients and y-direction coefficients, +.>Representation l ^′ _c Respectively to l _c Is a distance of (2);

the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the step S2 comprises the following steps: acquiring the pose of the camera according to triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, eliminating the pose of which the pose deviation exceeds a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process;

2. The monocular vision through-guide SLAM method based on the point-line feature fusion according to claim 1, wherein the step S1 is preceded by an IMU initialization, and the procedure of the IMU initialization is as follows:

the parameters acquired by the IMU are put together to form a state vector y _k ＝{s，R _wg ，b，v _0：k (wherein s represents a scale factor, R) _wg Representing a gravity vector, which is represented in world coordinate system as g=r _wg g _I Wherein g _I ＝(0，0，G) ^T G is the gravity magnitude modulus;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v _0：k Representing a non-scale body velocity from a first frame image to a last frame image;

the posterior distribution maximized in IMU initialization is p (y _k |I _0：k )∝p(I _0：k |y _k )p(y _k ) Wherein I _0：k Measurement information representing the first frame image to the last frame image,representing measurement information of a first frame image to a last frame image which have been measured andp(y _k |I _0：k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I _0：k |y _k ) Represents likelihood, p (y) _k ) Representing a priori distribution;

the maximum a posteriori estimation of the IMU is expressed as

Wherein l _i-1 Measurement information g representing the i-1 th frame _dir Representing the gravity vector, v _i-1 A non-scale body velocity, v, representing an i-1 th frame of image _i A non-scale body velocity representing an i-th frame image;

3. The monocular vision through-guide SLAM method based on the dotted line feature fusion according to claim 1, wherein the step S2 includes: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.

4. The monocular vision through-guide SLAM method of claim 1, wherein repeating the camera trajectory for loop detection and loop correction in step S4 comprises: in the process of repeating the camera track, the cyclic detection is mainly to search whether the key frame is similar to the currently detected key frame in a key frame database, if so, the camera track is repositioned by using the closed-loop candidate frame as the closed-loop candidate frame to obtain a new local map, and the cyclic correction is to perform average value operation on the results of the same local map after multiple times of detection and correction in the process of each cyclic detection to obtain a final local map.

5. Monocular vision through-guide SLAM device based on dotted line feature fusion, characterized in that the device includes:

the key frame acquisition module is used for carrying out point line feature extraction, point line feature weighting and point line feature matching based on input information, carrying out triangulation measurement while carrying out point line feature extraction to acquire the pose of the camera in real time, wherein the IMU is arranged at the center of gravity of the monocular camera, the IMU also acquires the pose of the camera in real time, the pose of the camera obtained in the two modes is fused to obtain a camera moving process, a plurality of frame images are obtained in the camera moving process, and image frames successfully matched with the point line features in all frame images are used as key frames; the process of the dot line feature matching in the key frame acquisition module is as follows:

line features in adjacent frames of images obtained by the monocular camera are denoted as l, respectively _c And l ^′ _c By the formulaAcquiring a reprojection error, and when the reprojection error of the current line characteristic of the current frame of the image and a certain line characteristic in the adjacent frame is minimum, successfully matching the line characteristic, wherein l is as follows ₁ 、l ₂ Respectively represent straight lines l _c X-direction coefficients and y-direction coefficients, +.>Representation l ^′ _c Two of (2)Each end point is respectively to l _c Is a distance of (2);

the camera moving process obtained by fusing the pose of the camera obtained in the two modes in the key frame obtaining module comprises the following steps: acquiring the pose of the camera according to triangulation measurement to form a first moving track of the camera, acquiring the pose of the camera according to the IMU to form a second moving track of the camera, eliminating the pose of which the pose deviation exceeds a third preset value at the same moment after the first moving track and the second moving track are overlapped according to a time axis, and then averaging the first moving track and the second moving track to obtain a camera moving process;

6. The monocular vision through-guide SLAM device based on dotted line feature fusion of claim 5, wherein the information acquisition module is preceded by an IMU initialization, the procedure of IMU initialization is:

the parameters acquired by the IMU are put together to form a state vector y _k ＝{s,R _wg ,b,v _0:k (wherein s represents a scale factor, R) _wg Representing a gravity vector, which is represented in world coordinate system as g=r _wg g _I Wherein g _I ＝(0,0,G) ^T G is the gravityA modulus value;representing the bias of the accelerometer and gyroscope, which is constant during the initialization phase; v _0:k Representing a non-scale body velocity from a first frame image to a last frame image;

the posterior distribution maximized in IMU initialization is p (y _k |I _0:k )∝p(I _0:k |y _k )p(y _k ) Wherein I _0:k Measurement information representing the first frame image to the last frame image,representing measurement information of a first frame image to a last frame image which have been measured andp(y _k |I _0:k ) Represents the maximized posterior distribution, and c represents the sum of the values corresponding to p (I _0:k |y _k ) Represents likelihood, p (y) _k ) Representing a priori distribution;

the maximum a posteriori estimation of the IMU is expressed as

7. The monocular vision through-screen SLAM device of claim 5, wherein the keyframe acquisition module is further configured to: and (3) extracting point features by using a FAST corner extraction algorithm, extracting line features by using a line segment detector LSD algorithm, and weighting the point features and weighting the line features higher than the point features in a low-texture scene.